• Type: Bug
    • Status: Resolved
    • Priority: Medium
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: marathon-api
    • Labels:


      Hey there,

      I'm having trouble setting up failover docker containers, it seems that no matter if my docker host is down, containers are not rebalanced on another host.


      I'm trying a simple topology, with 3 masters (only one active) and a 1 quorum. My master args are :

       /usr/sbin/mesos-master --zk=zk://,, --cluster=mesos_cluster --log_dir=/var/log/mesos --work_dir=/var/mesos --hostname=foobar --quorum=1

      and my slaves are all configured on the same pattern :
      /usr/sbin/mesos-slave --master=zk://,, --containerizers=mesos,docker --log_dir=/var/log/mesos --work_dir=/var/mesos --docker_config=/root/.dockercfg --hostname=foobar --resources=file:///etc/mesos-resources.txt

      I'm rolling 10 instances of the attached json, changing the mock-producer-x value and PARTITION value to assign 10 different numbers

      In this context, my application is properly deployed and runs smoothly on my 3 nodes.

      My problem occurs if one of my nodes falls down (here, I'm doing a reboot or ifconfig eth1 down), the assigned containers are not seen as "running" on marathon anymore, BUT if I try and restart them, they are properly restarted elsewhere. When the missing node pops back in the cluster, its containers are still up (if it's just a simple network failure) and my issue comes from the fact that they are never killed nor rebalanced, in any way.

      [EDIT] : I also tried the --recover=cleanup option, who failed all my agents.




            • Assignee:
              theonlydoo theonlydoo
              theonlydoo theonlydoo
              ( DO NOT USE ) Orchestration Team
              Karsten Jeschkies (Inactive), theonlydoo
            • Watchers:
              2 Start watching this issue


              • Created: