Hi,
We have an ActiveMQ multibroker network with 4 nodes. There is 1 master
node and 3 slave nodes,
every message is broadcasted between all nodes using persistent topics and
durable subscriptions.
ActiveMQ (version 5.9.1) is embedded into a Spring based Java Web
Application running on Tomcat 7.
It works well mostly, but today we had to face a serious issue for the
second time.
The connection between the master node and a slave node is lost:
WARN 2015.01.10 14:00:29,281
org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an
exception processing local command
java.net.SocketException: Connection reset
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:176)
at
org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:304)
at
org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:286)
at
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
at
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
at
org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
at
org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:968)
at
org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:177)
at
org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
at
org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
at
org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:247)
at
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133)
at
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO 2015.01.10 14:00:29,335
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.net.SocketException: Connection reset
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,347
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
WARN 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an
exception processing local command
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at
org.apache.activemq.openwire.v10.BaseDataStreamMarshaller.tightUnmarshalByteSequence(BaseDataStreamMarshaller.java:440)
at
org.apache.activemq.openwire.v10.MessageMarshaller.tightUnmarshal(MessageMarshaller.java:72)
at
org.apache.activemq.openwire.v10.ActiveMQMessageMarshaller.tightUnmarshal(ActiveMQMessageMarshaller.java:66)
at
org.apache.activemq.openwire.v10.ActiveMQTextMessageMarshaller.tightUnmarshal(ActiveMQTextMessageMarshaller.java:66)
at
org.apache.activemq.openwire.OpenWireFormat.doUnmarshal(OpenWireFormat.java:371)
at
org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:285)
at
org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)
at
org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)
at
org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
at java.lang.Thread.run(Thread.java:745)
INFO 2015.01.10 14:00:29,347
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,383
org.apache.activemq.network.DemandForwardingBridgeSupport -
PERCMASTERBroker bridge to BALASSIBroker stopped
After a few seconds the connection has been reestablished automatically:
INFO 2015.01.10 14:00:45,563
org.apache.activemq.broker.TransportConnection - Started responder end of
duplex bridge clientConnector [ at ] ID:balaserver1-45698-1418680167169-1:1
INFO 2015.01.10 14:00:45,570
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#42 and tcp:///10.10.0.58:54625 [ at ] 62000
(BALASSIBroker) has been established.
But after then the cpu usage goes up to 100% (effectively killed the
application) and the log full of duplicate message errors:
WARN 2015.01.10 14:00:46,599
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=0,cacheEnabled=true,maxBatchSize:1,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36636, 4
WARN 2015.01.10 14:00:46,639 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:2:1:36636, redirecting for dlq
processing
WARN 2015.01.10 14:00:48,526
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36637, 4
WARN 2015.01.10 14:00:48,528 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:2:1:36637, redirecting for dlq
processing
WARN 2015.01.10 14:00:48,546
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:3,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:3:1:37026, 4
WARN 2015.01.10 14:00:48,547 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:3:1:37026, redirecting for dlq
processing
WARN 2015.01.10 14:00:48,560
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36638, 4
The only thing we could do is stop the application (on all nodes), clear
the activemq-data folder and restart the application.
When we simply tried to restart it, the startup process has stucked with no
error message, just using 100% of the CPU.
Do you have any idea what could have happened here?
Thanks,
Zoltan Nyari
We have an ActiveMQ multibroker network with 4 nodes. There is 1 master
node and 3 slave nodes,
every message is broadcasted between all nodes using persistent topics and
durable subscriptions.
ActiveMQ (version 5.9.1) is embedded into a Spring based Java Web
Application running on Tomcat 7.
It works well mostly, but today we had to face a serious issue for the
second time.
The connection between the master node and a slave node is lost:
WARN 2015.01.10 14:00:29,281
org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an
exception processing local command
java.net.SocketException: Connection reset
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:176)
at
org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:304)
at
org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:286)
at
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
at
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
at
org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
at
org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:968)
at
org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:177)
at
org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
at
org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
at
org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:247)
at
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133)
at
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO 2015.01.10 14:00:29,335
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.net.SocketException: Connection reset
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,347
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
WARN 2015.01.10 14:00:29,346
org.apache.activemq.network.DemandForwardingBridgeSupport - Caught an
exception processing local command
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at
org.apache.activemq.openwire.v10.BaseDataStreamMarshaller.tightUnmarshalByteSequence(BaseDataStreamMarshaller.java:440)
at
org.apache.activemq.openwire.v10.MessageMarshaller.tightUnmarshal(MessageMarshaller.java:72)
at
org.apache.activemq.openwire.v10.ActiveMQMessageMarshaller.tightUnmarshal(ActiveMQMessageMarshaller.java:66)
at
org.apache.activemq.openwire.v10.ActiveMQTextMessageMarshaller.tightUnmarshal(ActiveMQTextMessageMarshaller.java:66)
at
org.apache.activemq.openwire.OpenWireFormat.doUnmarshal(OpenWireFormat.java:371)
at
org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:285)
at
org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)
at
org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)
at
org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
at java.lang.Thread.run(Thread.java:745)
INFO 2015.01.10 14:00:29,347
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#30 and tcp:///10.10.0.58:40819 [ at ] 62000
shutdown due to a local error: java.io.EOFException
INFO 2015.01.10 14:00:29,383
org.apache.activemq.network.DemandForwardingBridgeSupport -
PERCMASTERBroker bridge to BALASSIBroker stopped
After a few seconds the connection has been reestablished automatically:
INFO 2015.01.10 14:00:45,563
org.apache.activemq.broker.TransportConnection - Started responder end of
duplex bridge clientConnector [ at ] ID:balaserver1-45698-1418680167169-1:1
INFO 2015.01.10 14:00:45,570
org.apache.activemq.network.DemandForwardingBridgeSupport - Network
connection between vm://PERCMASTERBroker#42 and tcp:///10.10.0.58:54625 [ at ] 62000
(BALASSIBroker) has been established.
But after then the cpu usage goes up to 100% (effectively killed the
application) and the log full of duplicate message errors:
WARN 2015.01.10 14:00:46,599
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=0,cacheEnabled=true,maxBatchSize:1,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36636, 4
WARN 2015.01.10 14:00:46,639 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:2:1:36636, redirecting for dlq
processing
WARN 2015.01.10 14:00:48,526
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36637, 4
WARN 2015.01.10 14:00:48,528 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:2:1:36637, redirecting for dlq
processing
WARN 2015.01.10 14:00:48,546
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:3,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:3:1:37026, 4
WARN 2015.01.10 14:00:48,547 org.apache.activemq.broker.region.Queue -
duplicate message from store
ID:balaserver1-45698-1418680167169-6:1:3:1:37026, redirecting for dlq
processing
WARN 2015.01.10 14:00:48,560
org.apache.activemq.broker.region.cursors.AbstractStoreCursor -
org.apache.activemq.broker.region.cursors.QueueStorePrefetch [ at ] 4ed7430a:Consumer.PERCMASTER.VirtualTopic.SFOperationShare,batchResetNeeded=false,storeHasMessages=true,size=1,cacheEnabled=true,maxBatchSize:2,hasSpace:true
- cursor got duplicate: ID:balaserver1-45698-1418680167169-6:1:2:1:36638, 4
The only thing we could do is stop the application (on all nodes), clear
the activemq-data folder and restart the application.
When we simply tried to restart it, the startup process has stucked with no
error message, just using 100% of the CPU.
Do you have any idea what could have happened here?
Thanks,
Zoltan Nyari