Our current ZooKeeper restore method is insufficient to guarantee a consistent state after having restored from a backup. The problem here is that ephemeral nodes are contained in the backup and currently we just restore everything INCLUDING the ephemeral nodes.
Quick explanation how ZooKeeper sessions and ephemeral nodes work:
This is can cause weird behavior nicely outlined by the following article:
It has been implemented in this way because Mesosphere support engineers were doing ZooKeeper backups like this in the past. Also mentioned in the article is what would be necessary to get to a consistent state after restoration.
Deleting all ephemeral nodes from the backup before restoring (or even when taking the backup) would suffice to reach a consistent backup/restore procedure.
There has been prior work in this regard by a former Mesosphere engineer:
This script parses transaction logs. We would need to utilize/extend it to delete all the ephemeral nodes.