Details

    • Type: Task
    • Status: Resolved
    • Priority: Medium
    • Resolution: Won't Do
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      I've been fighting a bunch of weird inconsistent behavior in marathon over the past couple weeks, and after a couple days of investigation finally determined that the root cause was that some overly strict iptables rules were preventing mesos from properly communicating with marathon. We were blocking basically anything other than 8080 and 5050, which (I'm assuming) prevented the mesos-master from hitting the marathon master on the other (non-8080) port that it's listening on.

      The net effect of this is that unless the mesos leader and the marathon leader were on the same box, marathon would act in bizarre and unpredictable ways. Marathon would reference old stopped tasks, deployments would stall without any feedback, and rolled back deployments would reappear. Additionally, the marathon instance on the same box as the mesos leader would get spammed with re-registration requests (which presumably it rejected due to not being the leader). While in this state, the frameworks ui in mesos indicated no problems and the marathon framework was marked as successfully registered.

      It appears that in addition to occurring due to overly strict firewall rules, this type of issue can occur because of improperly set up hostnames resolving to localhost instead of the externally available ip address (see http://frankhinek.com/build-mesos-multi-node-ha-cluster/#note2).

      Ideally, the marathon leader would be able to tell that it is not registered properly with mesos and provide some kind of visual warning. Alternatively (or additionally), it would be nice if the mesos frameworks page indicated that the framework was not properly registered. (I'm not sure if this is because of an issue in mesos thinking marathon accepted the reregistration when it didn't or if marathon isn't properly responding to the reregistration request to tell mesos that it's not accepting the reregistration).

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              GitHub_dkesler dkesler (Inactive)
              Team:
              ( DO NOT USE ) Orchestration Team
              Watchers:
              Chmielewski, Jason Gilanfarr (Inactive)
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: