We noticed today that the "latest-soak-cluster: root-marathon appears to be flapping" and "latest-soak-cluster: root-marathon leader appears to be flapping" monitors were muted, and were not receiving any data. Looking at the monitor histories, it appears that the last data point arrived on June 15.

      Looking into the logs of the soak-monitors systemd unit, we find the following:

      python3[1038]: [INFO 2017-07-25 23:23:17,934] 202 POST (803.8192ms)
      python3[1038]: [INFO 2017-07-25 23:23:47,990] Running 20 monitors: ['Dse1Monitor', 'MetronomeMonitor', 'ChronosMonitor', 'PosixAgentMonitor', 'MesosAgentMonitor', 'Hell
      python3[1038]: [INFO 2017-07-25 23:23:48,004] Monitor "EdgeLBMonitor" failed with JSONDecodeError: Expecting value: line 1 column 1 (char 0)
      python3[1038]: [INFO 2017-07-25 23:23:48,170] Monitor "MarathonMonitor" failed with KeyError: 'value'
      python3[1038]: [INFO 2017-07-25 23:23:48,242] Sending 352 metric(s) to Datadog

      Looking at the soak-cluster-monitor code, we find the following likely source of the MarathonMonitor error:

      Indeed, comparing the output of /marathon/metrics from a couple different clusters, we see that the 1.9 soak cluster provides a "gauges" field with entries like this:

          "": {
            "value": 34

      while the latest soak cluster returns a "gauges" field with entries like this:

          "": {
            "count": 925268,
            "min": 40,
            "max": 41,
            "p50": 41,
            "p75": 41,
            "p98": 41,
            "p99": 41,
            "p999": 41,
            "mean": 40.99508285522461,
            "tags": {
            "unit": {
              "name": "unknown",
              "label": "unknown"

      It seems that recently Marathon changed to a new metrics library, which likely caused this change in output; see this Slack discussion.


          Issue Links



              • Assignee:
                icharalampidis Ioannis Charalampidis
                greg Greg Mann
                ( DO NOT USE ) Orchestration Team
                Chris Lambert (Inactive), Chun-Hung Hsiao, Greg Mann, Ioannis Charalampidis, Ivan Chernetsky (Inactive), Jie Yu, kamaradclimber, Ken Sipe
              • Watchers:
                8 Start watching this issue


                • Created: