Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-1428

Navstar failing to start after kernel upgrade

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Medium
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: navstar
    • Labels:
      None

      Description

      Hi,

      on our system (CentOS, DC/OS 1.9) we have upgraded the kernel from 3.10 to 4.12.2-1 and Docker to 17.06 - since then navstar shows up as unhealthy on all nodes. In the logs, I can see lots of crashes and restarts:

       

      Jul 18 16:47:47 xxxx bootstrap[7528]: [DEBUG] bootstrapping dcos-navstar
       Jul 18 16:47:48 xxxx navstar-env[7700]: Exec: /opt/mesosphere/packages/navstar-0b141af667446bbe42069fbdba130276f872061c/navstar/erts-8.2.2/bin/erlexec -noshell -noinput +Bd -boot /opt/mesosphere/packages/navstar0b141af667446bbe42069fbdba130276f872061c/navstar/releases/0.1.0/navstar -mode embedded -boot_var ERTS_LIB_DIR /opt/mesosphere/packages/navstar-0b141af667446bbe42069fbdba130276f872061c/navstar/lib -config /tmp/sys.navstar@x.x.x.68.config -args_file /tmp/vm.navstar@x.x.x.68.args -pa – foreground
       Jul 18 16:47:48 xxxx navstar-env[7700]: Root: /opt/mesosphere/packages/navstar--0b141af667446bbe42069fbdba130276f872061c/navstar
       Jul 18 16:47:48 xxxx navstar-env[7700]: /opt/mesosphere/packages/navstar--0b141af667446bbe42069fbdba130276f872061c/navstar
       Jul 18 16:47:49 xxxx navstar-env[7700]: Setup running ...
       Jul 18 16:47:49 xxxx navstar-env[7700]: Directories verified. Res = {[ok],[]}
       Jul 18 16:47:49 xxxx navstar-env[7700]: Setup finished processing hooks ...
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1125.0>@netlink_codec:nl_dec_nla:551 nl_dec_nla: unable to decode pay load of <<28,30,0,0>>
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1125.0>@gen_netlink_client:terminate:206 Terminating, due to: function_clause, in state: wait_for_responses_rtnl, with state data: {state,#Port<0.989>,21,7708,2,{<0.1162.0>,#Ref<0.0.16.38>},[],1,0}
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1166.0>@netlink_codec:nl_dec_nla:551 nl_dec_nla: unable to decode pay load of <<28,30,0,0>>
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1166.0>@gen_netlink_client:terminate:206 Terminating, due to: function_clause, in state: wait_for_responses_rtnl, with state data: {state,#Port<0.1007>,21,7708,2,{<0.1224.0>,#Ref<0.0.17.163>},[],1,0}
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1228.0>@netlink_codec:nl_dec_nla:551 nl_dec_nla: unable to decode pay load of <<28,30,0,0>>
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1228.0>@gen_netlink_client:terminate:206 Terminating, due to: function_clause, in state: wait_for_responses_rtnl, with state data: {state,#Port<0.1018>,21,7708,2,{<0.1251.0>,#Ref<0.0.17.181>},[],1,0}
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1253.0>@netlink_codec:nl_dec_nla:551 nl_dec_nla: unable to decode pay load of <<28,30,0,0>>
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1253.0>@gen_netlink_client:terminate:206 Terminating, due to: function_clause, in state: wait_for_responses_rtnl, with state data: {state,#Port<0.1025>,21,7708,2,{<0.1260.0>,#Ref<0.0.16.145>},[],1,0}
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1262.0>@netlink_codec:nl_dec_nla:551 nl_dec_nla: unable to decode pay load of <<28,30,0,0>>
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1262.0>@gen_netlink_client:terminate:206 Terminating, due to: function_clause, in state: wait_for_responses_rtnl, with state data: {state,#Port<0.1032>,21,7708,2,{<0.1269.0>,#Ref<0.0.1.770>},[],1,0}
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1271.0>@netlink_codec:nl_dec_nla:551 nl_dec_nla: unable to decode pay load of <<28,30,0,0>>
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.1271.0>@gen_netlink_client:terminate:206 Terminating, due to: function_clause, in state: wait_for_responses_rtnl, with state data: {state,#Port<0.1039>,21,7708,2,{<0.1278.0>,#Ref<0.0.1.794>},[],1,0}
       Jul 18 16:47:49 xxxx navstar-env[7700]: [warning] <0.929.0>@lashup_hyparview_events:handle_info:156 Received unknown info: {'EXIT',<0.997.0>,shutdown}, in state: {state,<0.938.0>,#Ref<0.0.1.297>,[],[]}
       Jul 18 16:47:51 xxxx navstar-env[7700]: {"Kernel pid terminated",application_controller,"{application_terminated,navstar_overlay,shutdown}"}
       Jul 18 16:47:51 xxxx navstar-env[7700]: Kernel pid terminated (application_controller) ({application_terminated,navstar_overlay,shutdown})
       Jul 18 16:47:52 xxxx navstar-env[7700]: Crash dump is being written to: erl_crash.dump...done
       Jul 18 16:47:52 xxxx systemd[1]: dcos-navstar.service: main process exited, code=exited, status=1/FAILURE
       Jul 18 16:47:52 xxxx systemd[1]: Unit dcos-navstar.service entered failed state.
       Jul 18 16:47:52 xxxx systemd[1]: dcos-navstar.service failed.
       
      

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sergeyurbanovich Sergey Urbanovich (Inactive)
                Reporter:
                marcoschuster Marco Schuster
                Team:
                DELETE Networking Team
                Watchers:
                Deepak Goel, frankiebagodonuts, Marco Schuster, Marian Zange, Mike Panchenko
              • Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Zendesk Support

                    NextupJiraPlusStatus

                    Error rendering 'slack.nextup.jira:nextup-jira-plus-status'. Please contact your JIRA administrators.