Uploaded image for project: 'DC/OS'
  1. DC/OS
  2. DCOS_OSS-724

Mesos DNS: poll for Mesos leader changes instead of just relying on ZK watches?



      A Mesos DNS instance may miss a Mesos leader change, because it seems to only rely on the ZK watch mechanism for detecting a state transition.

      We believe to understand that the watch mechanism does not have the delivery guarantees that it pretends to have (quotes/reference below).

      To make Mesos DNS more reliably pick up a leader change, we could and maybe should make it periodically poll the corresponding relevant ZK node(s) (this does not need to happen with super high frequency – once every 5 or 10 seconds for instance could suffice). Relevant discussion from #dcos-networking: https://mesosphere.slack.com/archives/dcos-networking/p1486509188005541

      Excerpt from there:

      Jan-Philip Gehrcke Does mesos-go/detector poll periodically, in addition to just watch? This seems to be a hint: https://github.com/mesosphere/mesos-dns/blob/master/vendor/github.com/mesos/mesos-go/detector/zoo/detect.go#L42
      Nicholas Sun (@nsun) nope, that's just to rate-limit master changes
      a rather poor design decision imo
      the client should determine that
      the actual implementation is hidden deep deep in this file: https://github.com/mesosphere/mesos-dns/blob/master/vendor/github.com/mesos/mesos-go/detector/zoo/client2.go#L59
      conclusion: it does not poll, it only watches
      Jan-Philip Gehrcke Haha, glad I asked then.
      About the first non-obvious thing I learned about ZooKeeper is that watches are a toy, because the state change is attempted to be delivered once -- there are no delivery guarantees. So, polling must be done for getting confidence, and watches can be used as an optimization on top of that. Is that something we can easily add to Mesos DNS?
      James DeFelice a long while back we talked about ditching the implementation of ZK detection in mesos-go and writing something better. I don't think that happened. that said, I don't think it would be very difficult to add a redundant polling mechanism to mesos-go ZK detection.

      Some relevant quotes and references on the topic of delivery guarantees associated with the ZK watch mechanism:

      Quoting Tyler N from a relevant discussion:

      It is at-most-once delivery semantics. You should expect TCP connections to drop at any time. If you are only using watches, and your client reconnects after an issue, you have no idea of knowing what happened before you reconnected. This is one of those things that people writing distributed systems mess up frequently when using ZK because it works great in test environments, and will usually only cause really annoying bugs at higher scales when this rare event pops up, but it does pop up. At my last job we noticed a sharded redis proxy system occasionally would fail to get a watch that notified it about replica master changes, which caused really annoying data inconsistency issues, so we ended up completely removing watches in favor of polling. Twitter also moved to a poll-heavy service discovery system after a number of problems with zk watches and general scalability problems they ran into.

      This is the only 'worrying' input that I am aware of. And it is rather convincing. Polling yields easy to understand guarantees.

      However, if one looks at a little more official resources, it seems that with really careful client design, it might actually be possible to rely on watches + proper session state transition + automatic re-registering + an additional safety read after recconnect:

      From the ZK docs:

      When you disconnect from a server (for example, when the server fails), you will not get any watches until the connection is reestablished. For this reason session events are sent to all outstanding watch handlers. Use session events to go into a safe mode: you will not be receiving events while disconnected, so your process should act conservatively in that mode.

      From a ZK book:

      Say that a ZooKeeper client disconnects from a ZooKeeper server and connects to a different server in the ensemble. The client will send a list of outstanding watches. When reregistering the watch, the server will check to see if the watched znode has changed since the watch was registered. If the znode has changed, a watch event will be sent to the client; otherwise, the watch will be reregistered at the new server.


      CC [~james] Pawel Rozlach Albert Strasheim


          Issue Links



              • Assignee:
                jp Jan-Philip Gehrcke (Inactive)
                ( DO NOT USE ) Networking Team
                Adam Bordelon (Inactive), Deepak Goel, Gustav Paul, James DeFelice, Jan-Philip Gehrcke (Inactive), Pawel Rozlach
              • Watchers:
                6 Start watching this issue


                • Created: