Hm, just saw something a little fishy.
(In the following logs, analytics1021 (id 21) and analytics1022 (id 22) are Brokers and analytics1023,analytics1024,analytics1025 are Zookeepers.)
At 2014-03-20 21:12:26, analytics1021 lost its connection to zookeeper. It reconnected to analytics1023, but as it was doing so, it logged a 'zookeeper state changed (Expired)’ message, which apparently caused the newly opened socket to close again. Am I reading that right?
In the time that it took analytics1021 to finally re-establish a zookeeper connection, analytics1022 had become the leader for all partitions. I can rebalance the leaders, but I’d prefer if this didn’t happen in the first place.
1. What does 'zookeeper state changed (Expired)’ mean?
2. Has anyone seen issues like this before? Where zookeeper connections are flaky enough to cause leader elections?
Thanks!
-Andrew Otto
[2014-03-20 21:12:26,427] 685698013 [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 10067ms for sessionid 0x1425d903a3f1f89, closing socket connection and attempting reconnect
[2014-03-20 21:12:26,528] 685698114 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (Disconnected)
[2014-03-20 21:12:27,252] 685698838 [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server analytics1023.eqiad.wmnet/10.64.5.15:2181
[2014-03-20 21:12:27,271] 685698857 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
[2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired)
[2014-03-20 21:12:27,273] 685698859 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x1425d903a3f1f89 has expired, closing socket connection
[2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient [ at ] 63203b59
[2014-03-20 21:12:27,290] 685698876 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.KafkaZooKeeper - re-registering broker info in ZK for broker 21
[2014-03-20 21:12:27,290] 685698876 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down
[2014-03-20 21:12:27,290] 685698876 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server analytics1023.eqiad.wmnet/10.64.5.15:2181
[2014-03-20 21:12:27,292] 685698878 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
[2014-03-20 21:12:27,313] 685698899 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid = 0xff429b4201dd2352, negotiated timeout = 6000
[2014-03-20 21:12:27,314] 685698900 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (SyncConnected)
[2014-03-20 21:12:27,324] 685698910 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.KafkaZooKeeper - done re-registering broker
[2014-03-20 21:12:27,324] 685698910 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.KafkaZooKeeper - Subscribing to /brokers/topics path to watch for new topics
[2014-03-20 21:12:27,370] 685698956 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener - New leader is 22
(In the following logs, analytics1021 (id 21) and analytics1022 (id 22) are Brokers and analytics1023,analytics1024,analytics1025 are Zookeepers.)
At 2014-03-20 21:12:26, analytics1021 lost its connection to zookeeper. It reconnected to analytics1023, but as it was doing so, it logged a 'zookeeper state changed (Expired)’ message, which apparently caused the newly opened socket to close again. Am I reading that right?
In the time that it took analytics1021 to finally re-establish a zookeeper connection, analytics1022 had become the leader for all partitions. I can rebalance the leaders, but I’d prefer if this didn’t happen in the first place.
1. What does 'zookeeper state changed (Expired)’ mean?
2. Has anyone seen issues like this before? Where zookeeper connections are flaky enough to cause leader elections?
Thanks!
-Andrew Otto
[2014-03-20 21:12:26,427] 685698013 [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 10067ms for sessionid 0x1425d903a3f1f89, closing socket connection and attempting reconnect
[2014-03-20 21:12:26,528] 685698114 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (Disconnected)
[2014-03-20 21:12:27,252] 685698838 [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server analytics1023.eqiad.wmnet/10.64.5.15:2181
[2014-03-20 21:12:27,271] 685698857 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
[2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired)
[2014-03-20 21:12:27,273] 685698859 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x1425d903a3f1f89 has expired, closing socket connection
[2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient [ at ] 63203b59
[2014-03-20 21:12:27,290] 685698876 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.KafkaZooKeeper - re-registering broker info in ZK for broker 21
[2014-03-20 21:12:27,290] 685698876 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down
[2014-03-20 21:12:27,290] 685698876 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server analytics1023.eqiad.wmnet/10.64.5.15:2181
[2014-03-20 21:12:27,292] 685698878 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
[2014-03-20 21:12:27,313] 685698899 [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid = 0xff429b4201dd2352, negotiated timeout = 6000
[2014-03-20 21:12:27,314] 685698900 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (SyncConnected)
[2014-03-20 21:12:27,324] 685698910 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.KafkaZooKeeper - done re-registering broker
[2014-03-20 21:12:27,324] 685698910 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.KafkaZooKeeper - Subscribing to /brokers/topics path to watch for new topics
[2014-03-20 21:12:27,370] 685698956 [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad] INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener - New leader is 22