Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

network connection between kafka nodes

$
0
0
Hello, Kafka experts

I have a production cluster which has three nodes(.100, .101, .102) I am
using a C# producer to publish data to kafka brokers, it works for a while
but started to lose connection error to 2 nodes of cluster. Here is the C#
producer error:

[2015-01-13 01:49:49,786] ERROR
[ConsumerFetcherThread-console-consumer-52088_vagrant-ubuntu-trusty-64-1421113533029-20c40ebf-0-101],
Error for partition [PofApiTest77,5] to broker 101:class
kafka.common.NotLeaderForPartitionException
(kafka.consumer.ConsumerFetcherThread)

To duplicate this issue, I run a producer test on vagrant to send data, and
this is what I get:
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test-rep-three 50000000000 100 -1 acks=1 bootstrap.servers=
10.100.50.100:9092,10.100.50.101:9092,10.100.50.102:9092
buffer.memory=67108864 batch.size=8196

536403 records sent, 107259.1 records/sec (10.23 MB/sec), 3993.0 ms avg
latency, 11306.0 max latency.
[2015-01-13 17:49:44,055] WARN Error in I/O with harmful-jar.master/
10.100.50.102 (org.apache.kafka.common.network.Selector)
java.io.EOFException
at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
at org.apache.kafka.common.network.Selector.poll(Selector.java:242)
at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:745)
[2015-01-13 17:49:44,059] WARN Error in I/O with harmful-jar.master/
10.100.50.102 (org.apache.kafka.common.network.Selector)
java.io.EOFException
at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
at org.apache.kafka.common.network.Selector.poll(Selector.java:242)
at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:745)

[2015-01-13 17:52:38,384] WARN Error in I/O with voluminous-mass.master/
10.100.50.101 (org.apache.kafka.common.network.Selector)
java.io.EOFException
at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
at org.apache.kafka.common.network.Selector.poll(Selector.java:242)
at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:745)

Seems the connection was cut off. I tail the kafka/logs/state-change.log

[2015-01-13 17:49:49,028] TRACE Broker 102 received LeaderAndIsr request
(LeaderAndIsrInfo:(Leader:102,ISR:101,100,102,LeaderEpoch:68,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,101,100)
correlation id 7 from controller 101 epoch 1781 for partition
[PofApiTest77,5] (state.change.logger)
[2015-01-13 17:49:49,030] TRACE Broker 102 handling LeaderAndIsr request
correlationId 7 from controller 101 epoch 1781 starting the become-leader
transition for partition [PofApiTest77,5] (state.change.logger)
[2015-01-13 17:49:49,032] TRACE Broker 102 stopped fetchers as part of
become-leader request from controller 101 epoch 1781 with correlation id 7
for partition [PofApiTest77,5] (state.change.logger)
[2015-01-13 17:49:49,040] TRACE Broker 102 completed LeaderAndIsr request
correlationId 7 from controller 101 epoch 1781 for the become-leader
transition for partition [PofApiTest77,5] (state.change.logger)
[2015-01-13 17:49:49,042] TRACE Broker 102 cached leader info
(LeaderAndIsrInfo:(Leader:102,ISR:101,100,102,LeaderEpoch:68,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,101,100)
for partition [PofApiTest77,5] in response to UpdateMetadata request sent
by controller 101 epoch 1781 with correlation id 7 (state.change.logger)
[2015-01-13 17:49:49,045] TRACE Broker 102 received LeaderAndIsr request
(LeaderAndIsrInfo:(Leader:102,ISR:101,100,102,LeaderEpoch:529,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,100,101)
correlation id 8 from controller 101 epoch 1781 for partition
[test-rep-three,5] (state.change.logger)
[2015-01-13 17:49:49,045] TRACE Broker 102 handling LeaderAndIsr request
correlationId 8 from controller 101 epoch 1781 starting the become-leader
transition for partition [test-rep-three,5] (state.change.logger)
[2015-01-13 17:49:49,048] TRACE Broker 102 stopped fetchers as part of
become-leader request from controller 101 epoch 1781 with correlation id 8
for partition [test-rep-three,5] (state.change.logger)
[2015-01-13 17:49:49,049] TRACE Broker 102 completed LeaderAndIsr request
correlationId 8 from controller 101 epoch 1781 for the become-leader
transition for partition [test-rep-three,5] (state.change.logger)
[2015-01-13 17:49:49,051] TRACE Broker 102 cached leader info
(LeaderAndIsrInfo:(Leader:102,ISR:101,100,102,LeaderEpoch:529,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,100,101)
for partition [test-rep-three,5] in response to UpdateMetadata request sent
by controller 101 epoch 1781 with correlation id 8 (state.change.logger)
[2015-01-13 17:49:49,053] TRACE Broker 102 received LeaderAndIsr request
(LeaderAndIsrInfo:(Leader:102,ISR:101,100,102,LeaderEpoch:528,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,101,100)
correlation id 9 from controller 101 epoch 1781 for partition
[test-rep-three,2] (state.change.logger)
[2015-01-13 17:49:49,053] TRACE Broker 102 handling LeaderAndIsr request
correlationId 9 from controller 101 epoch 1781 starting the become-leader
transition for partition [test-rep-three,2] (state.change.logger)
[2015-01-13 17:49:49,054] TRACE Broker 102 stopped fetchers as part of
become-leader request from controller 101 epoch 1781 with correlation id 9
for partition [test-rep-three,2] (state.change.logger)
[2015-01-13 17:49:49,055] TRACE Broker 102 completed LeaderAndIsr request
correlationId 9 from controller 101 epoch 1781 for the become-leader
transition for partition [test-rep-three,2] (state.change.logger)
[2015-01-13 17:49:49,057] TRACE Broker 102 cached leader info
(LeaderAndIsrInfo:(Leader:102,ISR:101,100,102,LeaderEpoch:528,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,101,100)
for partition [test-rep-three,2] in response to UpdateMetadata request sent
by controller 101 epoch 1781 with correlation id 9 (state.change.logger)
[2015-01-13 17:49:49,058] TRACE Broker 102 received LeaderAndIsr request
(LeaderAndIsrInfo:(Leader:102,ISR:100,101,102,LeaderEpoch:68,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,100,101)
correlation id 10 from controller 101 epoch 1781 for partition
[PofApiTest77,2] (state.change.logger)
[2015-01-13 17:49:49,058] TRACE Broker 102 handling LeaderAndIsr request
correlationId 10 from controller 101 epoch 1781 starting the become-leader
transition for partition [PofApiTest77,2] (state.change.logger)
[2015-01-13 17:49:49,058] TRACE Broker 102 stopped fetchers as part of
become-leader request from controller 101 epoch 1781 with correlation id 10
for partition [PofApiTest77,2] (state.change.logger)
[2015-01-13 17:49:49,059] TRACE Broker 102 completed LeaderAndIsr request
correlationId 10 from controller 101 epoch 1781 for the become-leader
transition for partition [PofApiTest77,2] (state.change.logger)
[2015-01-13 17:49:49,060] TRACE Broker 102 cached leader info
(LeaderAndIsrInfo:(Leader:102,ISR:100,101,102,LeaderEpoch:68,ControllerEpoch:1781),ReplicationFactor:3),AllReplicas:102,100,101)
for partition [PofApiTest77,2] in response to UpdateMetadata request sent
by controller 101 epoch 1781 with correlation id 10 (state.change.logger)

Does anyone have similar issue to lose network connection between nodes?

thanks

Alec Li

Viewing all articles
Browse latest Browse all 5648

Trending Articles