RabbitMQ is a nice message queue server used in lot of networking monitoring appliances and other services. It has a very small footprint and can be integrated with lots of languages and resources.
All works well until you get the famous recovery log corrupted error.
.[36mjobs_1 |.[0m =INFO REPORT==== 11-Apr-2022::09:07:57 ===
.[36mjobs_1 |.[0m application: rabbit
.[36mjobs_1 |.[0m exited: {bad_return,
.[36mjobs_1 |.[0m {{rabbit,start,[normal,[]]},
.[36mjobs_1 |.[0m {'EXIT',
.[36mjobs_1 |.[0m {{badmatch,
.[36mjobs_1 |.[0m {error,
.[36mjobs_1 |.[0m {{{badmatch,
.[36mjobs_1 |.[0m {error,
.[36mjobs_1 |.[0m {not_a_dets_file,
.[36mjobs_1 |.[0m "/var/lib/rabbitmq/mnesia/rabbit@rabbitmq/recovery.dets"}}},
.[36mjobs_1 |.[0m [{rabbit_recovery_terms,open_table,0,
.[36mjobs_1 |.[0m [{file,"src/rabbit_recovery_terms.erl"},{line,126}]},
.[36mjobs_1 |.[0m {rabbit_recovery_terms,init,1,
Usually the issue is caused by disk full or in case docker is used “volume full” that is causing the server to shut down automatically.
Yes there is a warning but in some extreme cases you may get an out of space so suddenly that the warning is of no use. For example when your volume simply disappears because the physical resource that is behind gets disconnected.
The problem of the corrupted recovery log is caused by the shutdown operations thatis not being able to complete (no space on disk, remember ?), leaving you with a partially written recovery log.
To be able to restart RabbitMQ you just have to delete the recovery file.
rm -rf /var/lib/rabbitmq/mnesia/rabbit@rabbitmq/recovery.dets