Hi,
We need a reliable low-latency message queue that can scale. Kafka looks like a right system for this role.
I am running performance tests on multiple platforms: Linux and Windows. For test purposes I create topics with 2 replicas and multiple partitions. In all deployments running test producers that wait for both replicas' acks practically kills Kafka throughput. For example, on the following deployment on Linux machines: 2 Kafka brokers, 1 Zookeeper node, 4 client hosts to create load, 4 topics with 10 partitions each and 2 replicas
- running tests with "--request-num-acks 1" produces ~3,600 msgs/sec
- running tests with "--request-num-acks -1" produces ~348 msgs/sec
Here is output of one of the four concurrent processes:
[User [ at ] Client2 kafka_2.8.0-0.8.0]$ bin/kafka-producer-perf-test.sh --broker-list 10.0.0.8:9092,10.0.0.10:9092 --compression-codec 0 --message-size 1024 --request-num-acks -1 --sync --messages 100000 -threads 10 --show-detailed-stats --reporting-interval 1000 --topics c12 | grep -v "at "
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
[2014-01-29 23:21:16,720] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,825] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,830] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,831] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,839] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,841] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,847] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,858] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,862] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,867] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:32:03,830] WARN Produce request with correlation id 11467 failed due to [c12,2]: kafka.common.RequestTimedOutException (kafka.producer.async.DefaultEventHandler)
[2014-01-29 23:32:03,831] WARN Produce request with correlation id 11859 failed due to [c12,8]: kafka.common.RequestTimedOutException (kafka.producer.async.DefaultEventHandler)
[2014-01-29 23:32:03,831] WARN Failed to send producer request with correlation id 11819 to broker 0 with data for partitions [c12,8] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11315 to broker 0 with data for partitions [c12,6] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11191 to broker 0 with data for partitions [c12,4] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11791 to broker 0 with data for partitions [c12,4] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11395 to broker 0 with data for partitions [c12,6] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11631 to broker 0 with data for partitions [c12,4] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 10563 to broker 0 with data for partitions [c12,0] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 10907 to broker 0 with data for partitions [c12,2] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
2014-01-29 23:21:16:562, 2014-01-29 23:40:15:886, 0, 1024, 200, 97.66, 0.0857, 100000, 87.7713
The test result is consistent and reproducible in all deployments: numbers can vary but changing acks setting consistently reduces Kafka throughput 4-10 times.
Is it expected system behavior? Any tuning options to resolve the problem?
Thank you,
Michael Popov
We need a reliable low-latency message queue that can scale. Kafka looks like a right system for this role.
I am running performance tests on multiple platforms: Linux and Windows. For test purposes I create topics with 2 replicas and multiple partitions. In all deployments running test producers that wait for both replicas' acks practically kills Kafka throughput. For example, on the following deployment on Linux machines: 2 Kafka brokers, 1 Zookeeper node, 4 client hosts to create load, 4 topics with 10 partitions each and 2 replicas
- running tests with "--request-num-acks 1" produces ~3,600 msgs/sec
- running tests with "--request-num-acks -1" produces ~348 msgs/sec
Here is output of one of the four concurrent processes:
[User [ at ] Client2 kafka_2.8.0-0.8.0]$ bin/kafka-producer-perf-test.sh --broker-list 10.0.0.8:9092,10.0.0.10:9092 --compression-codec 0 --message-size 1024 --request-num-acks -1 --sync --messages 100000 -threads 10 --show-detailed-stats --reporting-interval 1000 --topics c12 | grep -v "at "
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
[2014-01-29 23:21:16,720] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,825] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,830] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,831] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,839] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,841] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,847] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,858] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,862] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:21:16,867] WARN Property reconnect.interval is not valid (kafka.utils.VerifiableProperties)
[2014-01-29 23:32:03,830] WARN Produce request with correlation id 11467 failed due to [c12,2]: kafka.common.RequestTimedOutException (kafka.producer.async.DefaultEventHandler)
[2014-01-29 23:32:03,831] WARN Produce request with correlation id 11859 failed due to [c12,8]: kafka.common.RequestTimedOutException (kafka.producer.async.DefaultEventHandler)
[2014-01-29 23:32:03,831] WARN Failed to send producer request with correlation id 11819 to broker 0 with data for partitions [c12,8] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11315 to broker 0 with data for partitions [c12,6] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11191 to broker 0 with data for partitions [c12,4] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11791 to broker 0 with data for partitions [c12,4] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11395 to broker 0 with data for partitions [c12,6] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 11631 to broker 0 with data for partitions [c12,4] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 10563 to broker 0 with data for partitions [c12,0] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
[2014-01-29 23:32:03,834] WARN Failed to send producer request with correlation id 10907 to broker 0 with data for partitions [c12,2] (kafka.producer.async.DefaultEventHandler) java.net.SocketTimeoutException
2014-01-29 23:21:16:562, 2014-01-29 23:40:15:886, 0, 1024, 200, 97.66, 0.0857, 100000, 87.7713
The test result is consistent and reproducible in all deployments: numbers can vary but changing acks setting consistently reduces Kafka throughput 4-10 times.
Is it expected system behavior? Any tuning options to resolve the problem?
Thank you,
Michael Popov