[DCOS_OSS-3747] test_vip[Container.DOCKER-Network.HOST-Network.USER] failed with `assert 0 == 1` in `wait_for_tasks_healthy` Created: 09/Jul/18  Updated: 09/Nov/18  Resolved: 09/Jul/18

Status: Resolved
Project: DC/OS
Component/s: networking
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Jan-Philip Gehrcke (Inactive) Assignee: Sergey Urbanovich (Inactive)
Resolution: Duplicate  
Labels: networking
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates DCOS_OSS-2115 test_vip failed with RetryError on Ma... Resolved
Relates
relates to DCOS_OSS-2115 test_vip failed with RetryError on Ma... Resolved
Team: DELETE Networking Team
Sprint: Networking Team 1.12 Sprint 8
Story Points: 5

 Description   

Seen here: https://github.com/dcos/dcos/pull/3007

Failed with `assert 0 == 1` in `wait_for_tasks_healthy`:

[04:25:07]	[Step 8/9] _____________ test_vip[Container.DOCKER-Network.HOST-Network.USER] _____________
[04:25:07]	[Step 8/9] 
[04:25:07]	[Step 8/9] dcos_api_session = <dcos_test_utils.dcos_api.DcosApiSession object at 0x7fb3a6658e80>
[04:25:07]	[Step 8/9] container = <Container.DOCKER: 'DOCKER'>, vip_net = <Network.HOST: 'HOST'>
[04:25:07]	[Step 8/9] proxy_net = <Network.USER: 'USER'>
[04:25:07]	[Step 8/9] 
[04:25:07]	[Step 8/9]     @pytest.mark.slow
[04:25:07]	[Step 8/9]     @pytest.mark.skipif(
[04:25:07]	[Step 8/9]         not lb_enabled(),
[04:25:07]	[Step 8/9]         reason='Load Balancer disabled')
[04:25:07]	[Step 8/9]     @pytest.mark.parametrize(
[04:25:07]	[Step 8/9]         'container,vip_net,proxy_net',
[04:25:07]	[Step 8/9]         generate_vip_app_permutations())
[04:25:07]	[Step 8/9]     def test_vip(dcos_api_session,
[04:25:07]	[Step 8/9]                  container: marathon.Container,
[04:25:07]	[Step 8/9]                  vip_net: marathon.Network,
[04:25:07]	[Step 8/9]                  proxy_net: marathon.Network):
[04:25:07]	[Step 8/9]         '''Test VIPs between the following source and destination configurations:
[04:25:07]	[Step 8/9]             * containers: DOCKER, UCR and NONE
[04:25:07]	[Step 8/9]             * networks: USER, BRIDGE (docker only), HOST
[04:25:07]	[Step 8/9]             * agents: source and destnations on same agent or different agents
[04:25:07]	[Step 8/9]             * vips: named and unnamed vip
[04:25:07]	[Step 8/9]     
[04:25:07]	[Step 8/9]         Origin app will be deployed to the cluster with a VIP. Proxy app will be
[04:25:07]	[Step 8/9]         deployed either to the same host or elsewhere. Finally, a thread will be
[04:25:07]	[Step 8/9]         started on localhost (which should be a master) to submit a command to the
[04:25:07]	[Step 8/9]         proxy container that will ping the origin container VIP and then assert
[04:25:07]	[Step 8/9]         that the expected origin app UUID was returned
[04:25:07]	[Step 8/9]         '''
[04:25:07]	[Step 8/9]         errors = 0
[04:25:07]	[Step 8/9] >       tests = setup_vip_workload_tests(dcos_api_session, container, vip_net, proxy_net)
[04:25:07]	[Step 8/9] 
[04:25:07]	[Step 8/9] test_networking.py:101: 
[04:25:07]	[Step 8/9] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[04:25:07]	[Step 8/9] test_networking.py:144: in setup_vip_workload_tests
[04:25:07]	[Step 8/9]     wait_for_tasks_healthy(dcos_api_session, origin_app)
[04:25:07]	[Step 8/9] ../../lib/python3.5/site-packages/retrying.py:49: in wrapped_f
[04:25:07]	[Step 8/9]     return Retrying(*dargs, **dkw).call(f, *args, **kw)
[04:25:07]	[Step 8/9] ../../lib/python3.5/site-packages/retrying.py:212: in call
[04:25:07]	[Step 8/9]     raise attempt.get()
[04:25:07]	[Step 8/9] ../../lib/python3.5/site-packages/retrying.py:247: in get
[04:25:07]	[Step 8/9]     six.reraise(self.value[0], self.value[1], self.value[2])
[04:25:07]	[Step 8/9] ../../lib/python3.5/site-packages/six.py:686: in reraise
[04:25:07]	[Step 8/9]     raise value
[04:25:07]	[Step 8/9] ../../lib/python3.5/site-packages/retrying.py:200: in call
[04:25:07]	[Step 8/9]     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
[04:25:07]	[Step 8/9] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[04:25:07]	[Step 8/9] 
[04:25:07]	[Step 8/9] dcos_api_session = <dcos_test_utils.dcos_api.DcosApiSession object at 0x7fb3a6658e80>
[04:25:07]	[Step 8/9] app_definition = {'acceptedResourceRoles': ['*', 'slave_public'], 'cmd': '/opt/mesosphere/bin/dcos-shell python /opt/mesosphere/active/...'type': 'DOCKER', 'volumes': [{'containerPath': '/opt/mesosphere', 'hostPath': '/opt/mesosphere', 'mode': 'RO'}]}, ...}
[04:25:07]	[Step 8/9] 
[04:25:07]	[Step 8/9]     @retrying.retry(wait_fixed=5000, stop_max_delay=20 * 60 * 1000)
[04:25:07]	[Step 8/9]     def wait_for_tasks_healthy(dcos_api_session, app_definition):
[04:25:07]	[Step 8/9]         info = dcos_api_session.marathon.get('v2/apps/{}'.format(app_definition['id'])).json()
[04:25:07]	[Step 8/9] >       assert info['app']['tasksHealthy'] == app_definition['instances']
[04:25:07]	[Step 8/9] E       assert 0 == 1


 Comments   
Comment by Jan-Philip Gehrcke (Inactive) [ 09/Jul/18 ]

The `team` field value is of course controversial. I have created this because DCOS_OSS-2115 does not quite seem to cover the same failure mode, albeit a similar one.

Comment by Mergebot [ 09/Jul/18 ]

@jp overrode teamcity/dcos/test/aws/onprem/static status of dcos/dcos/pull/3007 (Title: [1.10] Mergebot Automated Train PR - 2018-Jun-26-10-00, Branch: 1.10) with the failure noted in this JIRA. Here are the TeamCity failure Logs for reference.

Comment by Sergey Urbanovich (Inactive) [ 09/Jul/18 ]

The test failed on setup_vip_workload_tests. it's a well-known flakiness of test_vip.

>       tests = setup_vip_workload_tests(dcos_api_session, container, vip_net, proxy_net)

test_networking.py:101: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_networking.py:146: in setup_vip_workload_tests
    wait_for_tasks_healthy(dcos_api_session, proxy_app)
../../lib/python3.5/site-packages/retrying.py:49: in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
../../lib/python3.5/site-packages/retrying.py:212: in call
    raise attempt.get()
../../lib/python3.5/site-packages/retrying.py:247: in get
    six.reraise(self.value[0], self.value[1], self.value[2])
../../lib/python3.5/site-packages/six.py:686: in reraise
    raise value
../../lib/python3.5/site-packages/retrying.py:200: in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
Generated at Tue May 24 04:21:31 CDT 2022 using JIRA 7.8.4#78004-sha1:5704c55c9196a87d91490cbb295eb482fa3e65cf.