Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Broker leaks FDs - Too many open files

$
0
0
Our production broker (v.5.6.0) keeps dying while in heavy use. The broker
log is filled with:

2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector
- Could not accept connection : java.net.SocketException: Too many open
files

2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector
- Could not accept connection : java.net.SocketException: Too many open
files

2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector
- Could not accept connection : java.net.SocketException: Too many open
files

This is logged at such a rapid rate that the logs roll and hide the initial
error/warning. We capture open fd of the broker's process and notice that
when the broker starts to croak the open fd count just explodes. Here is
part of the open fd log. The first column shows broker's open fds and each
line is logged every 60 secs.

1284 12569 160194 -- normal count
1294 12669 161438
1305 12779 162812
1318 12909 164426
1328 13009 165658 --------- FD explosion
1393 13659 173816
1528 15009 190748
1611 15839 201152
1701 16739 212419
1951 19239 243520
2310 22830 290374
2667 26399 332362
3013 29859 375262
3369 33422 422638
3729 37019 464017
4111 40841 515342
4484 44570 561933
4870 48432 609992
5249 52219 652157
5634 56071 705356
6019 59919 747457
6484 64571 811476
6892 68652 862375
7307 72802 914122
7727 77002 966555
8129 81022 1016717
8336 83090 1042601
8336 83090 1042584
8336 83090 1042583

It normally shows ~1300 fds and this is more or less constant overtime, but
eventually it rapidly increases to 8336 and the broker becomes unusable. The
ulimit is set to 4094. The netstat shows a ton of sockets in CLOSE_WAIT
suggesting that the broker is not closing its side of a socket.

I found related open issue
https://issues.apache.org/jira/browse/AMQ-4531?page=com.atlassian.jirafisheyeplugin:fisheye-issuepanel

This Jira states that the problem surfaces in 5.8.0 and when
maximumConnections is set. We dont use this setting and we run with an older
version of AMQ. Any ideas how to deal with this? Would closeAsync=false have
any effect?

JC

Viewing all articles
Browse latest Browse all 5648

Trending Articles