Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-309

Navstar Service Repeatedly Fails on DC/OS 1.8.1

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: networking
    • Environment:

      Provider: AWS
      OS: CoreOS
      DC/OS Version: 1.8.1
      CloudFormation Image Commit: c1915a9f9f02caf7e34022eaea04f15ff853bd0e
      CloudFormation Template Generation Date: 2016-08-11 01:33:47.857178

      Description

      When bringing up a fresh installation of DC/OS, the Master nodes come up and then report that the "Navstar" service is unhealthy. A few seconds later, it shows as healthy. This then flops back and forth (unhealthy-healthy) across all the Masters.

      The same issue occurs as the agents come up, both public and private. The following is a log file from a public agent, but the same error is appearing on all nodes:

      Aug 17 15:27:43 ip-10-74-131-5.ec2.internal navstar-env[4691]: Exec: /opt/mesosphere/packages/navstar--b5ea729312aceb95e68ff5748a01ca09f7841adc/navstar/erts-8.0/bin/erlexec -noshell -noinput +Bd -boot /opt/mesosphere/packages/navstar--b5ea729312aceb95e68ff5748a01ca09f7841adc/navstar/releases/0.1.0/navstar -mode embedded -config /opt/mesosphere/packages/navstar--b5ea729312aceb95e68ff5748a01ca09f7841adc/navstar/releases/0.1.0/sys.config -boot_var ERTS_LIB_DIR /opt/mesosphere/packages/navstar--b5ea729312aceb95e68ff5748a01ca09f7841adc/navstar/erts-8.0/../lib -args_file /opt/mesosphere/packages/navstar--b5ea729312aceb95e68ff5748a01ca09f7841adc/navstar/releases/0.1.0/vm.args -- foreground
      Aug 17 15:27:43 ip-10-74-131-5.ec2.internal navstar-env[4691]: Root: /opt/mesosphere/packages/navstar--b5ea729312aceb95e68ff5748a01ca09f7841adc/navstar
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]: 2016-08-17 15:27:45 crash_report
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     initial_call: {supervisor,kernel,['Argument__1']}
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     pid: <0.708.0>
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     registered_name: []
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     error_info: {exit,{on_load_function_failed,enacl_nif},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,352}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     ancestors: [kernel_sup,<0.687.0>]
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     messages: []
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     links: [<0.688.0>]
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     dictionary: []
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     trap_exit: true
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     status: running
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     heap_size: 376
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     stack_size: 27
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     reductions: 117
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]: 2016-08-17 15:27:45 supervisor_report
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     supervisor: {local,kernel_sup}
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     errorContext: start_error
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     reason: {on_load_function_failed,enacl_nif}
      Aug 17 15:27:45 ip-10-74-131-5.ec2.internal navstar-env[4691]:     offender: [{pid,undefined},{id,kernel_safe_sup},{mfargs,{supervisor,start_link,[{local,kernel_safe_sup},kernel,safe]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]: 2016-08-17 15:27:46 crash_report
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     initial_call: {application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     pid: <0.686.0>
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     registered_name: []
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     error_info: {exit,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,enacl_nif}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,134}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     ancestors: [<0.685.0>]
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     messages: [{'EXIT',<0.687.0>,normal}]
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     links: [<0.685.0>,<0.684.0>]
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     dictionary: []
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     trap_exit: true
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     status: running
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     heap_size: 376
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     stack_size: 27
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     reductions: 152
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal mesos-slave[9786]: I0817 15:27:46.940361  9796 http.cpp:270] HTTP GET for /slave(1)/state from 10.74.131.5:46819 with User-Agent='Mesos-State / Host: ip-10-74-131-5, Pid: 9548'
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]: 2016-08-17 15:27:46 std_info
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     application: kernel
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     exited: {{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,enacl_nif}}},{kernel,start,[normal,[]]}}
      Aug 17 15:27:46 ip-10-74-131-5.ec2.internal navstar-env[4691]:     type: permanent
      Aug 17 15:27:47 ip-10-74-131-5.ec2.internal navstar-env[4691]: {"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,enacl_nif}}},{kernel,start,[normal,[]]}}}"}
      Aug 17 15:27:47 ip-10-74-131-5.ec2.internal navstar-env[4691]: [1B blob data]
      Aug 17 15:27:48 ip-10-74-131-5.ec2.internal mesos-slave[9786]: I0817 15:27:48.954396  9797 http.cpp:270] HTTP GET for /slave(1)/state from 10.74.131.5:46819 with User-Agent='Mesos-State / Host: ip-10-74-131-5, Pid: 9548'
      Aug 17 15:27:49 ip-10-74-131-5.ec2.internal navstar-env[4691]: Crash dump is being written to: erl_crash.dump...done
      Aug 17 15:27:49 ip-10-74-131-5.ec2.internal navstar-env[4691]: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,enacl_nif}}},{kernel,start,[normal,[]]}}})
      Aug 17 15:27:49 ip-10-74-131-5.ec2.internal systemd[1]: dcos-navstar.service: Main process exited, code=exited, status=1/FAILURE
      

        Attachments

          Activity

            People

            • Assignee:
              drewkerrigan Drew Kerrigan (Inactive)
              Reporter:
              mattf Matt Fuller (Inactive)
              Team:
              DELETE Networking Team
              Watchers:
              Brad B (Inactive), Deepak Goel, Drew Kerrigan (Inactive), Gustavo Brian (Inactive), Martin Vojtek (Inactive), Matt Fuller (Inactive), Miroslav Kos (Inactive), niclash, Sargun Dhillon (Inactive), vitalii migov (Inactive)
            • Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Zendesk Support

                  NextupJiraPlusStatus

                  Error rendering 'slack.nextup.jira:nextup-jira-plus-status'. Please contact your JIRA administrators.