Hi zookeepers,
When I dig into ZooKeeper's internals, I have learned the following flaw about znode version in ZooKeeper: znode's version will be reset when znode is deleted/re-created. This is a trap for some operations which make updates based on znode version.
Let's see an example: a client gets the data of a znode (e.g, /test) and version(e.g, 1), change the data of the znode, and writes it back with the condition that the version does not change (still be 1). If another client deletes and re-creates this znode during the first client is updating the data, the version matches, but it now contains the wrong data.
The problem I can see is that the znode version is designed to be a monotonically increasing integer. If we can include the birth-date(timestamp) of the znode or zxid for the creation of the znode as part of the znode's version, and only the integer part of the version will increase every time when the znode is updated, while keeping the birth-date or zxid part of the version not change, we can avoid the problem.
Of course, there will be some cost for the new design: it needs bigger size for the version field.
Thanks,
- Robin
When I dig into ZooKeeper's internals, I have learned the following flaw about znode version in ZooKeeper: znode's version will be reset when znode is deleted/re-created. This is a trap for some operations which make updates based on znode version.
Let's see an example: a client gets the data of a znode (e.g, /test) and version(e.g, 1), change the data of the znode, and writes it back with the condition that the version does not change (still be 1). If another client deletes and re-creates this znode during the first client is updating the data, the version matches, but it now contains the wrong data.
The problem I can see is that the znode version is designed to be a monotonically increasing integer. If we can include the birth-date(timestamp) of the znode or zxid for the creation of the znode as part of the znode's version, and only the integer part of the version will increase every time when the znode is updated, while keeping the birth-date or zxid part of the version not change, we can avoid the problem.
Of course, there will be some cost for the new design: it needs bigger size for the version field.
Thanks,
- Robin