Details

    • Type: Bug
    • Status: Resolved
    • Priority: Medium
    • Resolution: Won't Do
    • Affects Version/s: Marathon 1.1.7
    • Fix Version/s: None
    • Labels:
      None

      Description

      Hi,

      I use Mesos environment in Rancher orchestration platform (Mesos 0.28 + Marathon 1.1.1) and just want to know the Marathon behavior when suspending/killing tasks. I have a simple application running in docker container (official Nginx image from docker-hub). When I try to restart/suspend the application, Mesos UI shows me the task was not finished but killed. It's strange, because Nginx container can stop immediately without any problems. Examples below:

      1. Nginx container is powered by Marathon and I send SIGTERM to Nginx docker container from server CLI:
      [root@mesos-slave ~]# docker ps | grep nginx
      4400576b492b nginx:latest "nginx -g 'daemon ..." 33 seconds ago Up 32 seconds 0.0.0.0:31492->80/tcp mesos-06e2b969-c12f-4012-bad8-e342f5e0e5c5-S0.9626c3c0-6f67-4679-b5d1-e85ec06b347b
      [root@mesos-slave ~]# docker kill -s 15 4400576b492b
      4400576b492b
      

      that's what I see when tracing the process:

      [root@mesos-slave ~]# strace -p 31519
      strace: Process 31519 attached
      rt_sigsuspend([], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
      --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=0, si_uid=0} ---
      gettimeofday({1507799197, 273822}, NULL) = 0
      rt_sigreturn({mask=[HUP INT QUIT USR1 USR2 ALRM TERM CHLD WINCH IO]}) = -1 EINTR (Interrupted system call)
      gettimeofday({1507799197, 274015}, NULL) = 0
      sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\0\0\0\0", 32}], msg_controllen=0, msg_flags=0}, 0) = 32
      setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 50000}}, NULL) = 0
      rt_sigsuspend([], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
      --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=7, si_uid=101, si_status=0, si_utime=0, si_stime=0} ---
      gettimeofday({1507799197, 279843}, NULL) = 0
      wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 7
      wait4(-1, 0x7ffe22dece34, WNOHANG, NULL) = -1 ECHILD (No child processes)
      rt_sigreturn({mask=[HUP INT QUIT USR1 USR2 ALRM TERM CHLD WINCH IO]}) = -1 EINTR (Interrupted system call)
      gettimeofday({1507799197, 280185}, NULL) = 0
      close(3) = 0
      close(7) = 0
      unlink("/var/run/nginx.pid") = 0
      close(6) = 0
      futex(0x7f5f839878ec, FUTEX_WAKE_PRIVATE, 2147483647) = 0
      exit_group(0) = ?
      +++ exited with 0 +++
      

      Everything stops gracefully and takes less that a second

             2. Nginx container is powered by Marathon and I press Suspend/Restart in Marathon UI, expecting this task to shutdown gracefully, but...:

      [root@mesos-slave ~]# strace -p 31799
      strace: Process 31799 attached
      rt_sigsuspend([], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
      --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=0, si_uid=0} ---
      gettimeofday({1507799294, 679195}, NULL) = 0
      +++ killed by SIGKILL +++
      

      Everything is killed by Marathon and also takes less that a second (no 3sec timeout between SIGTERM and SIGKILL)

      Can you tell me why is that? I need Marathon to shutdown/restart tasks gracefully and it seems to be it just kills them. Is that my fault? Is the software version too old and graceful shutdown was not implemented in Mesos v0.28 + Marathon v1.1.1?

      Thanks

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              aleksejlopasov aleksejlopasov
              Team:
              ( DO NOT USE ) Orchestration Team
              Watchers:
              aleksejlopasov, Gilbert Song
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: