Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-4622

v0 metrics API returns empty response for container's app metrics

    Details

    • Sprint:
      Observability Team Sprint 37
    • Story Points:
      5

      Description

      The following is copied from COPS-4333:

      dcos cli returns incorrect metrics data from 1.12.0 cluster during running

      dcos task metrics summary <task-id>
      

      command.
      There is not an issue in previous DCOS version (such as 1.11.7).
      So, it looks like we have a regression here.

      Steps to reproduce the issue:
      1) Deploy a 1.12.0 cluster with CCM (or your preferred method).
      2) Deploy a simple marathon app. For example, in my case I tried this app:

      {
        "id": "/bashcpuload",
        "backoffFactor": 1.15,
        "backoffSeconds": 1,
        "cmd": "curl https://speed.hetzner.de/100MB.bin > /tmp/100MB.bin; while true; do  tar -czf /tmp/100.tgz /tmp/100MB.bin; rm -f /tmp/100.tgz; echo \"-\"; done",
        "container": {
          "type": "DOCKER",
          "volumes": [],
          "docker": {
            "image": "bash",
            "forcePullImage": false,
            "privileged": false,
            "parameters": []
          }
        },
        "cpus": 1,
        "disk": 0,
        "instances": 2,
        "maxLaunchDelaySeconds": 3600,
        "mem": 512,
        "gpus": 0,
        "networks": [
          {
            "mode": "host"
          }
        ],
        "portDefinitions": [],
        "requirePorts": false,
        "upgradeStrategy": {
          "maximumOverCapacity": 1,
          "minimumHealthCapacity": 1
        },
        "killSelection": "YOUNGEST_FIRST",
        "unreachableStrategy": {
          "inactiveAfterSeconds": 0,
          "expungeAfterSeconds": 0
        },
        "healthChecks": [],
        "fetch": [],
        "constraints": []
      }
      

      3) Setup/attach to the DC/OS Cluster https://docs.mesosphere.com/1.12/cli/command-reference/dcos-cluster/dcos-cluster-setup/
      4) run "dcos task metrics summary <task-id>"
      (Task-id can be found via "dcos task")

      In DC/OS 1.11.x clusters I get the expected observed result:

      dcos task metrics summary <task-id>
      CPU MEM DISK
      204.88 (99.71%) 0.01GiB (0.93%) 0.00GiB (0.00%)
      

      In any DC/OS 1.12.0 cluster however, there are null/0 values returned no matter which task is checked.

      dcos task metrics summary <task-id>
      CPU MEM DISK
      0.00 (0.00%) 0.00GiB (0.00%) 0.00GiB (0.00%)
      

      Deeper investigation to the situation shows that 
      then I run "dcos task metrics summary..." command with higher verbosity, I see,
      that cli performs two requests to Agent's API like:

      http://<cluster-url>/system/v1/agent/<agent-id>/metrics/v0/containers/<container-id>
      http://<cluster-url>/system/v1/agent/<agent-id>/metrics/v0/containers/<container-id>/app

      I tried to send GET-request manually with curl to both of these URLs to both versions-clusters and I see that
      for 1.11-version cluster both urls return some relevant data;
      for 1.12-version the url, which ends with "/app", API returns 0 bytes and HTTP Code 204 (that means "no content").

      And one more notice.
      I see that url

      http://<cluster-url>/system/v1/agent/<agent-id>/metrics/v0/containers/<container-id>
      

      returns much more detailed strings from 1.11-version cluster, than in 1.12.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gracedo Grace Do
                Reporter:
                branden Branden Rolston
                Team:
                DELETE Observability Team
                Watchers:
                Branden Rolston, Daniel Baker, Grace Do
              • Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Zendesk Support

                    NextupJiraPlusStatus

                    Error rendering 'slack.nextup.jira:nextup-jira-plus-status'. Please contact your JIRA administrators.