Uploaded image for project: 'Marathon'
  1. Marathon
  2. MARATHON-7155

can't able to run new tasks when leading mesos masters from machine m1 and leading marathon from m2 machine,



      HI I've test environment having 3 masters and 2 slaves those are like m1, m2, m3 and s1, s2 respectively.

      I want to achieve HA for mesos and marathon with help of zookeeper.

      Installation :

      OS: RHEL 7.2 

      Type : VM's Virtual Box

      I've installed mesos, marathon and zookeeper in offline mode. I,e

      Mesos : Downloaded mesos binaries and extract rpm packages.

      Marathon and Zookeeper : Downloaded tar.gz file and extracted using binaries.

      3 masters : m1, m2, m3

      2 slaves : s1, s2

      Staring zookeeper 

      Started zookeeper first in masters i.e m1, m2, m3 one chosen as leader ex: m1 -> leader,

      m2-> follower,  m3->follower.





      Starting Mesos masters and slaves:

       Executed mesos binary with options in leading master i.e m1. and then started mesos in followers also i.e m2, m3.


      m1: mesos-master --ip= --hostname= --quorum=2 --cluster=testcluster --zk=zk://,, --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      m2: mesos-master --ip= --hostname= --quorum=2 --cluster=testcluster --zk=zk://,, --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      m3: mesos-master --ip= --hostname= --quorum=2 --cluster=testcluster --zk=zk://,, --work_dir=/opt/mesosWorkDir --log_dir=/opt/mesosWorkDir/logs 
      s1 :
      mesos-slave --master=zk://,, --ip= --containerizers=docker,mesos --hostname= executor_registration_timeout=10minss1 :
      s2 :
      mesos-slave --master=zk://,, --ip= --containerizers=docker,mesos --hostname= executor_registration_timeout=10mins


      Starting Marathon : 

      Started Marathon in leading machine i.e m1 and then started marathon in remaining machines i.e m2 and m3.


      ./start --master=zk://,, --zk zk://,,


      Now Cluster state is like below

      m1–> leading mesos master, leading marathon master.

      m2-> non-leading mesos master, non leading marathon

      m3-> non-leading mesos master, non leading marathon

      and slave1, slav2.


      I created some sample applications(t1, t2) via marathon from m1 and Able to run successfully.

      When I power off m1 vm, then m2 took leading for mesos master and m3 took leading for marathon Cluster state is like below.

       m1-> Power off(Unavaliable)

      m2-> leading mesos master, non-leading marathon

      m3->non-leading mesos master, leading marathon.

      I tried to create and run sample app(t3) via marathon and task status is went for "Waiting" status forever. i.e can't able run but previous task running i.e t1 and t2. 


       Mesos leading from one machine and marathon leading from another machine is expected behaviour? If yes why can't able to run new task from marathon and how can we run.

       Will it happen like this.i.e choosing masters from different machines for leading mesos and marathon?

      Am I doing correct of in all the config.?


       I did like this for five times, 2 to 3 times happened like this. other cases was choosing mesos and marathon from same machine.




        1. image-2017-03-26-15-48-54-786.png
          168 kB
        2. marathon.log
          539 kB
        3. marathon.log
          59 kB
        4. marathon.log
          17 kB
        5. mesos.log
          46 kB
        6. mesos.log
          215 kB
        7. mesos.log
          624 kB
        8. mesos-master.ERROR
          4 kB
        9. mesos-master.ERROR
          29 kB
        10. mesos-master.INFO
          30 kB
        11. mesos-master.INFO
          207 kB
        12. mesos-master.INFO
          608 kB
        13. mesos-master.WARNING
          1 kB
        14. mesos-master.WARNING
          43 kB
        15. mesos-master.WARNING
          319 kB



            • Assignee:
              matthias Matthias Veit (Inactive)
              naren970 naren970
              ( DO NOT USE ) Orchestration Team
              Matthias Veit (Inactive), naren970
            • Watchers:
              2 Start watching this issue


              • Created: