[DCOS_OSS-1591] Long running Connections to VIPs timeout and cause service failures. Created: 25/Aug/17  Updated: 10/Feb/21  Resolved: 10/Feb/21

Status: Resolved
Project: DC/OS
Component/s: networking
Affects Version/s: DC/OS 1.10.2
Fix Version/s: None

Type: Task Priority: Medium
Reporter: Jeffrey Zampieron Assignee: Vinod Kone (Inactive)
Resolution: Won't Do  
Labels: add_team:20200603, documentation, issuetype:improvement
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Team: DELETE DKP Workloads Team

 Description   

In my setup, we use a VIP to connect to pgpool from a set of java services.
The VIPs timeout idle connections by default and this caused the services to fail.

Anyone running microservices on DCOS using database connections via VIPs will most likely have this failure.

This is documented in the https://dcos.io/docs/1.9/networking/load-balancing-vips/virtual-ip-addresses/ FAQ at the bottom, but does not include a detailed discussion of the configuration to either control or mitigate the situation.

I found the following adjustments to be necessary to avoid the problem:
a. Enable the JDBC driver tcp keep alive setting.
b. Reset the VM default `net.ipv4.tcp_keepalive_time` from `7200` seconds down to `3600` seconds.
c. Reset the HikariCP max connection lifetime from 30 minutes down to 4 minutes.

A FAQ documentation update is probably sufficient, but I'm not familiar enough with the details of minuteman to do this completely.

This issue may also impact the DC/OS installer to set the proper TCP keep alive settings.



 Comments   
Comment by deric (Inactive) [ 24/Mar/18 ]

I'm having the same issue with TCP connections. Is there any progress/any debugging output needed to investigate this issue?

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
Comment by deric (Inactive) [ 24/Mar/18 ]

To demonstrate the issue I've created a simple publisher/subscriber service over ZeroMQ:

{
  "id": "/zmq/pub",
  "backoffFactor": 1.15,
  "backoffSeconds": 1,
  "cmd": "python3 pub.py -v",
  "container": {
    "portMappings": [
      {
        "containerPort": 6500,
        "hostPort": 0,
        "labels": {
          "VIP_0": "/zmq/pub:6500"
        },
        "protocol": "tcp",
        "servicePort": 10123,
        "name": "pub"
      }
    ],
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "deric/pub-sub-sleep:latest",
      "forcePullImage": false,
      "privileged": false,
      "parameters": []
    }
  },
  "cpus": 0.1,
  "disk": 0,
  "instances": 1,
  "maxLaunchDelaySeconds": 3600,
  "mem": 128,
  "gpus": 0,
  "networks": [
    {
      "mode": "container/bridge"
    }
  ],
  "requirePorts": false,
  "upgradeStrategy": {
    "maximumOverCapacity": 1,
    "minimumHealthCapacity": 1
  },
  "killSelection": "YOUNGEST_FIRST",
  "unreachableStrategy": {
    "inactiveAfterSeconds": 0,
    "expungeAfterSeconds": 0
  },
  "healthChecks": [],
  "fetch": [],
  "constraints": []
}

and subscriber:

{
  "id": "/zmq/sub",
  "backoffFactor": 1.15,
  "backoffSeconds": 1,
  "cmd": "python3 sub.py --host tcp://zmqpub.marathon.l4lb.thisdcos.directory:6500",
  "container": {
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "deric/pub-sub-sleep:latest",
      "forcePullImage": true,
      "privileged": false,
      "parameters": []
    }
  },
  "cpus": 0.1,
  "disk": 0,
  "env": {  },
  "instances": 1,
  "maxLaunchDelaySeconds": 3600,
  "mem": 128,
  "gpus": 0,
  "networks": [
    {
      "mode": "host"
    }
  ],
  "portDefinitions": [],
  "requirePorts": false,
  "upgradeStrategy": {
    "maximumOverCapacity": 1,
    "minimumHealthCapacity": 1
  },
  "killSelection": "YOUNGEST_FIRST",
  "unreachableStrategy": {
    "inactiveAfterSeconds": 0,
    "expungeAfterSeconds": 0
  },
  "healthChecks": [],
  "fetch": [],
  "constraints": []
}

Between sending messages publisher exponentially increases sleep interval:

I0324 15:02:29.154968 28033 executor.cpp:160] Starting task zmq_pub.5ee5560e-2f74-11e8-bd7f-fe0d65b1eb4a
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[INFO] Current libzmq version is 4.1.6
[INFO] Current pyzmq version is 17.0.0
[INFO] Pushing messages to: tcp://*:6500
[DEBUG] msg #0, now sleeping for 2 s
[DEBUG] msg #1, now sleeping for 4 s
[DEBUG] msg #2, now sleeping for 12 s
[DEBUG] msg #3, now sleeping for 32 s
[DEBUG] msg #4, now sleeping for 86 s
[DEBUG] msg #5, now sleeping for 235 s
[DEBUG] msg #6, now sleeping for 638 s
[DEBUG] msg #7, now sleeping for 1735 s
[DEBUG] msg #8, now sleeping for 4716 s

subscriber joined the show late, thus missing first few messages:

I0324 15:02:11.469764 26516 exec.cpp:162] Version: 1.4.0
I0324 15:02:11.472103 26524 exec.cpp:237] Executor registered on agent b2156e36-4853-4fb1-ad88-948ddfd39ff8-S7
I0324 15:02:11.472849 26531 executor.cpp:120] Registered docker executor on 195.201.82.178
I0324 15:02:11.472980 26526 executor.cpp:160] Starting task zmq_sub.50bd118d-2f74-11e8-bd7f-fe0d65b1eb4a
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[INFO] Current libzmq version is 4.1.6
[INFO] Current  pyzmq version is 17.0.0
[INFO] Subscribing to messages from: tcp://zmqpub.marathon.l4lb.thisdcos.directory:6500
[INFO] HERE
[INFO] [1] b'msg #4, now sleeping for 86 s'
[INFO] [2] b'msg #5, now sleeping for 235 s'
[INFO] [3] b'msg #6, now sleeping for 638 s'
[INFO] [4] b'msg #7, now sleeping for 1735 s'

nonetheless the last [DEBUG] msg #8, now sleeping for 4716 s didn't arrive.

Comment by Deepak Goel [ 24/Mar/18 ]

deric TCP keepalive has to be enabled by the application. Can you take a tcpdump and check if your application is using keepalive. Most probably it is not. Usually, there are two ways to handle long live connections: 1. Either you enable keepalive at the application level or 2. Increase the idle connection timeout value in IPVS (default is 900 sec).

Comment by deric (Inactive) [ 27/Mar/18 ]

@dgoel The application wasn't using keepalive, in case of ZeroMQ the parameter is called TCP_KEEPALIVE_IDLE. After setting TCP_KEEPALIVE_IDLE=1000 I've managed to sustain opened TCP connection for at least 26 hours without sending a message on the channel. Here's the source code I've used for testing keepalive settings.

Where is the "the idle connection timeout value in IPVS" settings? I wasn't able to find anywhere. I assume it should be part of dcos-navstar service configuration.

Comment by deric (Inactive) [ 27/Mar/18 ]

Oh, you probably mean:

ipvsadm -l --timeout
Timeout (tcp tcpfin udp): 900 120 300

Funny, that I've managed to keep alive connection with setting keepalive=1000. It appears that the connection won't timeout immediately.

According to Docker knowledge base the timeout can be modified by

sysctl -w net.ipv4.tcp_keepalive_time=600

IMHO these settings are independent, overriding IPVS timeout should be done via:

ipvsadm --set 3600 120 300

Would it be possible to pass the --persistent flag from Marathon config?

Comment by Deepak Goel [ 27/Mar/18 ]

we currently don't have an automated way of changing IPVS settings

Generated at Wed May 18 09:12:26 CDT 2022 using JIRA 7.8.4#78004-sha1:5704c55c9196a87d91490cbb295eb482fa3e65cf.