[DCOS_OSS-3847] Add CI for Telegraf Created: 20/Jul/18  Updated: 09/Nov/18  Resolved: 02/Aug/18

Status: Resolved
Project: DC/OS
Component/s: dcos-metrics
Affects Version/s: None
Fix Version/s: DC/OS 1.12.0, RI-3

Type: Task Priority: Medium
Reporter: Branden Rolston Assignee: Branden Rolston
Resolution: Done  
Labels: cs_updated_fixVersion, metrics-2.0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks DCOS_OSS-3846 Add Telegraf fork to dcos GitHub org Resolved
Epic Link: DC/OS Telegraf
Team: DELETE Cluster Ops Team
Story Points: 3

 Description   

Our fork of Telegraf needsĀ Jenkins to run its tests on our customized branch(es) which get pulled into DC/OS, as well as pull requests.



 Comments   
Comment by Branden Rolston [ 26/Jul/18 ]

I currently have Jenkins set up to run tests against the 1.7.2-dcos branch on mesosphere/telegraf, but it's failing because tests for one of the upstream plugins (inputs/tail) hangs indefinitely and gets timed out after 10 minutes. I haven't been able to reproduce this locally, only on our Jenkins.

Example here: https://jenkins.mesosphere.com/service/jenkins/blue/organizations/jenkins/public-dcos-cluster-ops%2Ftelegraf%2Ftelegraf-dcos/detail/telegraf-dcos/18/pipeline

Here's the relevant portion of build output:

SIGQUIT: quit
PC=0x46adca m=0 sigcode=0

goroutine 9 [syscall]:
syscall.Syscall6(0xe8, 0x8, 0xc42014fccc, 0x7, 0xffffffffffffffff, 0x0, 0x0, 0xc420001e00, 0x300000002, 0xc420001e00)
	/usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5 fp=0xc42014fbf8 sp=0xc42014fbf0 pc=0x46ada5
syscall.EpollWait(0x8, 0xc42014fccc, 0x7, 0x7, 0xffffffffffffffff, 0x7, 0xc42014fdc0, 0x10000)
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:349 +0x7a fp=0xc42014fc70 sp=0xc42014fbf8 pc=0x468bea
github.com/influxdata/tail/vendor/gopkg.in/fsnotify%2ev1.(*fdPoller).wait(0xc4200143c0, 0xc400008000, 0x1, 0xc4200752f8)
	/go/src/github.com/influxdata/tail/vendor/gopkg.in/fsnotify.v1/inotify_poller.go:85 +0x91 fp=0xc42014fd38 sp=0xc42014fc70 pc=0x518981
github.com/influxdata/tail/vendor/gopkg.in/fsnotify%2ev1.(*Watcher).readEvents(0xc42007ab90)
	/go/src/github.com/influxdata/tail/vendor/gopkg.in/fsnotify.v1/inotify.go:179 +0x194 fp=0xc42015ffd8 sp=0xc42014fd38 pc=0x517d64
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc42015ffe0 sp=0xc42015ffd8 pc=0x45a8f1
created by github.com/influxdata/tail/vendor/gopkg.in/fsnotify%2ev1.NewWatcher
	/go/src/github.com/influxdata/tail/vendor/gopkg.in/fsnotify.v1/inotify.go:58 +0x1b4

goroutine 1 [chan receive]:
testing.(*T).Run(0xc420136000, 0x75b0ba, 0xf, 0x76d648, 0x476201)
	/usr/local/go/src/testing/testing.go:825 +0x301
testing.runTests.func1(0xc42011a000)
	/usr/local/go/src/testing/testing.go:1063 +0x64
testing.tRunner(0xc42011a000, 0xc420053df8)
	/usr/local/go/src/testing/testing.go:777 +0xd0
testing.runTests(0xc4200f84a0, 0x90e040, 0x4, 0x4, 0x411649)
	/usr/local/go/src/testing/testing.go:1061 +0x2c4
testing.(*M).Run(0xc42010e280, 0x0)
	/usr/local/go/src/testing/testing.go:978 +0x171
main.main()
	_testmain.go:48 +0x151

goroutine 18 [semacquire]:
sync.runtime_notifyListWait(0xc420128110, 0x0)
	/usr/local/go/src/runtime/sema.go:510 +0x10b
sync.(*Cond).Wait(0xc420128100)
	/usr/local/go/src/sync/cond.go:56 +0x80
github.com/influxdata/telegraf/testutil.(*Accumulator).Wait(0xc42012c120, 0x1)
	/go/src/github.com/influxdata/telegraf/testutil/accumulator.go:254 +0x52
github.com/influxdata/telegraf/plugins/inputs/tail.TestTailFromEnd(0xc420136000)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/tail/tail_test.go:80 +0x59d
testing.tRunner(0xc420136000, 0x76d648)
	/usr/local/go/src/testing/testing.go:777 +0xd0
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:824 +0x2e0

goroutine 19 [select]:
github.com/influxdata/tail.(*Tail).waitForChanges(0xc420168000, 0xc420044040, 0xc420044040)
	/go/src/github.com/influxdata/tail/tail.go:364 +0x16b
github.com/influxdata/tail.(*Tail).tailFileSync(0xc420168000)
	/go/src/github.com/influxdata/tail/tail.go:315 +0x5f1
created by github.com/influxdata/tail.TailFile
	/go/src/github.com/influxdata/tail/tail.go:133 +0x16f

goroutine 8 [select]:
github.com/influxdata/tail/watch.(*InotifyTracker).run(0xc42005ec80)
	/go/src/github.com/influxdata/tail/watch/inotify_tracker.go:224 +0x1ec
created by github.com/influxdata/tail/watch.glob..func1
	/go/src/github.com/influxdata/tail/watch/inotify_tracker.go:54 +0x145

goroutine 20 [chan receive]:
github.com/influxdata/telegraf/plugins/inputs/tail.(*Tail).receiver(0xc42013e000, 0xc420168000)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/tail/tail.go:138 +0xbd
created by github.com/influxdata/telegraf/plugins/inputs/tail.(*Tail).Start
	/go/src/github.com/influxdata/telegraf/plugins/inputs/tail/tail.go:122 +0x288

goroutine 21 [select]:
github.com/influxdata/tail/watch.(*InotifyFileWatcher).ChangeEvents.func1(0xc42012a060, 0xc420168078, 0xc42012a0c0)
	/go/src/github.com/influxdata/tail/watch/inotify.go:87 +0x13b
created by github.com/influxdata/tail/watch.(*InotifyFileWatcher).ChangeEvents
	/go/src/github.com/influxdata/tail/watch/inotify.go:77 +0x137

rax    0xfffffffffffffffc
rbx    0x0
rcx    0xffffffffffffffff
rdx    0x7
rdi    0x8
rsi    0xc42014fccc
rbp    0xc42014fc60
rsp    0xc42014fbf0
r8     0x0
r9     0x0
r10    0xffffffffffffffff
r11    0x206
r12    0x7e9000
r13    0x1
r14    0x0
r15    0x0
rip    0x46adca
rflags 0x206
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (10m0s).
FAIL	github.com/influxdata/telegraf/plugins/inputs/tail	600.008s
Comment by Branden Rolston [ 02/Aug/18 ]

Telegraf skips some tests in CI that are known to be flaky, and this is one of them. I configured the Jenkins job to set the env var that Telegraf tests probe for to see if they're in CI, and now runs succeed.

I have a job here that builds 1.7.2-dcos: https://jenkins.mesosphere.com/service/jenkins/job/public-dcos-cluster-ops/job/telegraf/job/telegraf-dcos/

And another job that builds pull requests: https://jenkins.mesosphere.com/service/jenkins/job/public-dcos-cluster-ops/job/telegraf/job/telegraf-dcos-pulls/

Comment by Catherine Southard [ 04/Sep/18 ]

Updating the fixVersion from 1.12 to 1.12.0 since the ticket has been marked as Resolved - Done

Generated at Sun May 22 09:20:41 CDT 2022 using JIRA 7.8.4#78004-sha1:5704c55c9196a87d91490cbb295eb482fa3e65cf.