Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-1514

Question about retrieving scheduler metrics

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Medium
    • Resolution: Cannot Reproduce
    • Affects Version/s: DC/OS 1.9.2
    • Fix Version/s: DC/OS 1.9.2
    • Component/s: dcos-metrics
    • Labels:
      None
    • Story Points:
      2

      Description

      While querying DCOS API/CLI for dcos-cassandra-service node metrics reported to the Metrics component via statsd, we notice that only on 1.9.1 DC/OS environments does it return any Cassandra domain metrics, while on DC/OS 1.9.0 or the recently released 1.9.2 what appear to be mere container related metrics.

      Trying to figure out the root cause by monitoring the mesos-agent log for each of those environments above, diagnosing one Cassandra node at a time, show that the different environments use different versions of the dcos-metrics module:

      (1.9.2):

      Aug 07 12:47:01 **** mesos-agent[2514]: I0807 12:47:01.950639 2626 metrics_tcp_sender.cpp:252] TCP Throughput (bytes): sent=735, dropped=0, failed=0, pending=0 (state CONNECTED_DATA_READY)
      Aug 07 12:47:09 **** mesos-agent[2514]: I0807 12:47:09.864226 2533 http.cpp:307] HTTP GET for /slave(1)/state from ****:51128 with User-Agent='dcos-metrics/1.1.0-76-g1b4e013'
      

      (1.9.0):

      Aug 07 13:10:26 *** mesos-agent[1927]: I0807 13:10:26.439126 2007 metrics_tcp_sender.cpp:252] TCP Throughput (bytes): sent=371, dropped=0, failed=0, pending=0 (state CONNECTED_DATA_READY)
      Aug 07 13:10:26 **** mesos-agent[1927]: I0807 13:10:26.942142 1933 http.cpp:307] HTTP GET for /slave(1)/state from ****:39236 with User-Agent='dcos-metrics/1.1.0'
      

      (1.9.1):

      Aug 07 12:42:12 ma-10-21-3-**** mesos-agent[3524]: I0807 12:42:12.085670 3625 metrics_tcp_sender.cpp:252] TCP Throughput (bytes): sent=11655222, dropped=0, failed=0, pending=0 (state CONNECTED_DATA_READY)
      Aug 07 12:43:18 ma-10-21-3-**** mesos-agent[3524]: I0807 12:43:18.295280 3542 http.cpp:307] HTTP GET for /slave(1)/state from ****:54464 with User-Agent='dcos-metrics/1.9.1-2-gcca85f3'
      

      What can be done in order to over come this?
      What's indeed the latest stable release for dcos-metrics, and does it really resolve the below-mentioned ticket, as referred to on 1.9.2. release notes?:

      • DCOS-16350 - dcos-metrics drops nearly all app data.

      Thanks in advance for your support, 
      Avi Kalvo

        Attachments

          Activity

            People

            • Assignee:
              philip Philip Norman (Inactive)
              Reporter:
              avikalvo Avi Kalvo (Inactive)
              Team:
              DELETE Cluster Ops Team
            • Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Zendesk Support