Hello,
I'm evaluating Apollo for our current project and so far it worked well for
our use case, however today I've encountered a problem, corrupted LevelDB...
What could possibly corrupt it? I have Apollo 1.7 installed on my laptop
and everything was fine till today when I discovered the problem (see some
log messages below). Nothing unusual happened, so cannot link this with any
incident...
So I was wondering:
What's the proper fix for this? This time i deleted all files in data dir
and restarted apollo, but I'm not sure it's the way to go in a production
environment...
What can be done to avoid disasters? A msg queue is a critical component of
our architecture so, if that stops most of the functionalities are stopped
too...
Is there an easy way to replicate persisted messages it?
Is there a more verbose logging that I could use to monitor/find the
problem in future?
Should I use different storage engine?
Is there a best practice/pattern to recover from this kind of situations?
Like re-publishing messages, but it increases complexity of our app.
Thank you in advance.
[Log messages]
2014-06-15 11:57:16,866 | INFO | OS : Linux 3.14.5-200.fc20.x86_64
("Fedora release 20 (Heisenbug)") |
2014-06-15 11:57:16,870 | INFO | JVM : Java HotSpot(TM) 64-Bit Server
VM 1.7.0_51 (Oracle Corporation) |
2014-06-15 11:57:16,870 | INFO | Apollo : 1.7 (at: /opt/apollo/home) |
2014-06-15 11:57:16,871 | INFO | OS is restricting the open file limit to:
100000 |
2014-06-15 11:57:17,077 | INFO | Starting store: leveldb store at
/opt/apollo/brokers/local/data |
2014-06-15 11:57:17,139 | INFO | Accepting connections at: tcp://
0.0.0.0:60013 |
2014-06-15 11:57:17,144 | INFO | Opening the log file took: 27.97 ms |
2014-06-15 11:57:17,196 | WARN | DB operation failed. (entering recovery
mode): org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption:
CURRENT file does not end with newline | 146a03f386f
2014-06-15 11:57:18,081 | INFO | virtual host startup is waiting on store
startup |
2014-06-15 11:57:18,199 | INFO | DB recovered from failure. |
2014-06-15 11:57:18,200 | ERROR | Store startup failure:
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption:
CURRENT file does not end with newline | 146a03f3870
2014-06-15 11:57:18,201 | INFO | virtual host startup is no longer
waiting. It waited a total of 1 seconds. |
2014-06-15 11:57:18,305 | INFO | Administration interface available at:
http://127.0.0.1:60080/
I'm evaluating Apollo for our current project and so far it worked well for
our use case, however today I've encountered a problem, corrupted LevelDB...
What could possibly corrupt it? I have Apollo 1.7 installed on my laptop
and everything was fine till today when I discovered the problem (see some
log messages below). Nothing unusual happened, so cannot link this with any
incident...
So I was wondering:
What's the proper fix for this? This time i deleted all files in data dir
and restarted apollo, but I'm not sure it's the way to go in a production
environment...
What can be done to avoid disasters? A msg queue is a critical component of
our architecture so, if that stops most of the functionalities are stopped
too...
Is there an easy way to replicate persisted messages it?
Is there a more verbose logging that I could use to monitor/find the
problem in future?
Should I use different storage engine?
Is there a best practice/pattern to recover from this kind of situations?
Like re-publishing messages, but it increases complexity of our app.
Thank you in advance.
[Log messages]
2014-06-15 11:57:16,866 | INFO | OS : Linux 3.14.5-200.fc20.x86_64
("Fedora release 20 (Heisenbug)") |
2014-06-15 11:57:16,870 | INFO | JVM : Java HotSpot(TM) 64-Bit Server
VM 1.7.0_51 (Oracle Corporation) |
2014-06-15 11:57:16,870 | INFO | Apollo : 1.7 (at: /opt/apollo/home) |
2014-06-15 11:57:16,871 | INFO | OS is restricting the open file limit to:
100000 |
2014-06-15 11:57:17,077 | INFO | Starting store: leveldb store at
/opt/apollo/brokers/local/data |
2014-06-15 11:57:17,139 | INFO | Accepting connections at: tcp://
0.0.0.0:60013 |
2014-06-15 11:57:17,144 | INFO | Opening the log file took: 27.97 ms |
2014-06-15 11:57:17,196 | WARN | DB operation failed. (entering recovery
mode): org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption:
CURRENT file does not end with newline | 146a03f386f
2014-06-15 11:57:18,081 | INFO | virtual host startup is waiting on store
startup |
2014-06-15 11:57:18,199 | INFO | DB recovered from failure. |
2014-06-15 11:57:18,200 | ERROR | Store startup failure:
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption:
CURRENT file does not end with newline | 146a03f3870
2014-06-15 11:57:18,201 | INFO | virtual host startup is no longer
waiting. It waited a total of 1 seconds. |
2014-06-15 11:57:18,305 | INFO | Administration interface available at:
http://127.0.0.1:60080/