Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-621

Navstar crash after installing marathon-lb (DCOS 1.8.7)

    Details

      Description

      Hi,

      we are using DC/OS 1.8.7 on CentOS 7.3.1611 , with 3x master and 1x slave.

      Everything works fine, but after adding Marathon-lb we start receiving crash reports from navstar-env (attached as journalctl-navstar-marathonlb.txt ). The log file is obtained with journalctl -flu dcos-navstar.service

      The errors keep appearing for a variable interval of time (30 sec to 5 minutes), at a 30sec rate, then stop. If we try to start/stop another service while marathon-lb is running, the crashes keep appearing for the new instance ( log attached as journalctl-navstar-api.txt )

      We have been having this issue on a 1x master 2x slave configuration, too.

      If we stop marathon-lb and wait for the corresponding crashes to end, then we are able to start/stop other services without errors.

      It seems that the DNS configuration update is delayed until the errors stop: following the previous example and trying to dig api.marathon.autoip.dcos.thisdcos.directory after stopping the service, while the errors are appearing, the "old" IP keeps appearing in the ANSWER SECTION.

      I've also attached the marathon.json file for marathon-lb.

      What could be the issue?
      We're available to provide further information if needed.

      Thanks in advance,
      Marco

      EDIT: I've also attached relevant crash.log from /opt/mesosphere/packages/navstar--...../navstar/log/

        Attachments

        1. agent-relevant-logs.txt
          7 kB
        2. crash.log
          1.15 MB
        3. journalctl-navstar-api.txt
          2 kB
        4. journalctl-navstar-marathonlb.txt
          13 kB
        5. leader-relevant-logs.txt
          5 kB
        6. marathon-lb.marathon.json
          10 kB
        7. marathon-lb-stdout-logs.txt
          13 kB

          Activity

            People

            • Assignee:
              nsun Nicholas Sun (Inactive)
              Reporter:
              marco.reni Marco Reni
              Watchers:
              Adam Bordelon (Inactive), Albert Strasheim (Inactive), Marco Reni, Nicholas Sun (Inactive)
            • Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Zendesk Support

                  NextupJiraPlusStatus

                  Error rendering 'slack.nextup.jira:nextup-jira-plus-status'. Please contact your JIRA administrators.