I am trying to tune mirrormaker configurations based on this doc
<https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+(MirrorMaker)#Kafkamirroring%28MirrorMaker%29-Consumerandsourceclustersocketbuffersizes>
and
would like know your recommendations.
Our configuration: We are doing inter datacenter replication with 5 brokers
in source and destination DC and 2 mirrormakers doing replication. We have
about 4 topics with 4 partitions each.
I have been consumerOffsetChecker to analysis lag based on tuning.
1. num.streams : - We have set num.streams=2 so that 4 partitions will
be shared between 2 mirrormaker. Increasing num.streams more than this did
not improve any performance, is this correct?
2. num.producers:- We initially set num.producers = 4 (assuming one
producer thread per topic), then we bumped num.producers = 16, but did not
see any improvement in performance..? Is this correct..? How do we
determine optimum value for num.producers ?
3. *socket.buffersize : *We initially had default values for these, then
I changed socket.send.buffer.bytes on source broker,
socket.receive.buffer.bytes, fetch.message.max.bytes on mirrormaker
consumer properties, socket.receive.buffer.bytes,
socket.request.max.bytes on destination broker all to
1024*1024*1024(1073741824) . This did improve the performance, but I could
not get Lag to < 100.
Here is how our lag looks like after above changes:
Group Topic Pid Offset
logSize Lag Owner
mirrormakerProd FunnelProto 0 554704539
554717088 12549
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd FunnelProto 1 547370573
547383136 12563
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd FunnelProto 2 553124930
553125742 812
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd FunnelProto 3 552990834
552991650 816
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd agent 0 35438 35440
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd agent 1 35447 35448
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd agent 2 35375 35375
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd agent 3 35336 35336
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd internal_metrics 0 1930852823
1930917418 64595
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd internal_metrics 1 1937237324
1937301841 64517
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd internal_metrics 2 1945894901
1945904067 9166
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd internal_metrics 3 1946906932
1946915928 8996
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd jmx 0 485270038
485280882 10844
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd jmx 1 486363914
486374759 10845
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd jmx 2 491783842
491784826 984
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd jmx 3 485675629
485676643 1014
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
In mirrormaker logs, I see topic metadata is fetched after every 10mins and
connection reestablished with producers for producing. Is this normal? If
it's continuously producing, why does it need to reconnect to destination
brokers for producing.?
What else can we tune to bring lag < 100 ..? This is just small set of
data we are currently testing, the real production traffic will be very
large. How can compute optimum configuration as data traffic increases.?
Thanks for help,
Thanks,
Raja.
<https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+(MirrorMaker)#Kafkamirroring%28MirrorMaker%29-Consumerandsourceclustersocketbuffersizes>
and
would like know your recommendations.
Our configuration: We are doing inter datacenter replication with 5 brokers
in source and destination DC and 2 mirrormakers doing replication. We have
about 4 topics with 4 partitions each.
I have been consumerOffsetChecker to analysis lag based on tuning.
1. num.streams : - We have set num.streams=2 so that 4 partitions will
be shared between 2 mirrormaker. Increasing num.streams more than this did
not improve any performance, is this correct?
2. num.producers:- We initially set num.producers = 4 (assuming one
producer thread per topic), then we bumped num.producers = 16, but did not
see any improvement in performance..? Is this correct..? How do we
determine optimum value for num.producers ?
3. *socket.buffersize : *We initially had default values for these, then
I changed socket.send.buffer.bytes on source broker,
socket.receive.buffer.bytes, fetch.message.max.bytes on mirrormaker
consumer properties, socket.receive.buffer.bytes,
socket.request.max.bytes on destination broker all to
1024*1024*1024(1073741824) . This did improve the performance, but I could
not get Lag to < 100.
Here is how our lag looks like after above changes:
Group Topic Pid Offset
logSize Lag Owner
mirrormakerProd FunnelProto 0 554704539
554717088 12549
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd FunnelProto 1 547370573
547383136 12563
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd FunnelProto 2 553124930
553125742 812
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd FunnelProto 3 552990834
552991650 816
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd agent 0 35438 35440
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd agent 1 35447 35448
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd agent 2 35375 35375
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd agent 3 35336 35336
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd internal_metrics 0 1930852823
1930917418 64595
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd internal_metrics 1 1937237324
1937301841 64517
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd internal_metrics 2 1945894901
1945904067 9166
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd internal_metrics 3 1946906932
1946915928 8996
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
mirrormakerProd jmx 0 485270038
485280882 10844
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
mirrormakerProd jmx 1 486363914
486374759 10845
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
mirrormakerProd jmx 2 491783842
491784826 984
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
mirrormakerProd jmx 3 485675629
485676643 1014
mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
In mirrormaker logs, I see topic metadata is fetched after every 10mins and
connection reestablished with producers for producing. Is this normal? If
it's continuously producing, why does it need to reconnect to destination
brokers for producing.?
What else can we tune to bring lag < 100 ..? This is just small set of
data we are currently testing, the real production traffic will be very
large. How can compute optimum configuration as data traffic increases.?
Thanks for help,
Thanks,
Raja.