Erlang/OTP Forums

Author Message

<  RabbitMQ mailing list  ~  persistent messages can't survive restart with new persister

Guest
Posted: Fri Dec 11, 2009 1:05 pm Reply with quote
Guest
Hello.

I've got strange error with new persister revision b6324e288cfd
(bug21673). If I send several persistent messages (delivery mode 2) to
queue (about 10-100 message 10kb each) without consumers and restart
rabbitmq then queued messages would disappear from queue as if they
are not persistent.
It reproduces only if number of messages is low. When I tryed to
restart rabbit with 1000 messages in queue all messages stayed in
queue.

Regards,
Anton Lebedevich.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
Guest
Posted: Fri Dec 11, 2009 2:21 pm Reply with quote
Guest
Hello Anton,

This is entirely correct. When you publish a message with delivery
mode 2 you are *not* _guaranteed_ that it hits disk. Publishing is an
async operation and you get no confirmation that it goes to disk. The
new persister does very aggressive caching in order to avoid doing lots
of tiny and expensive writes. As such, there will frequently be times
where if you restart the broker, you will lose several (maybe hundreds)
of messages.

If you really want a guarantee that the message has hit disk then you
must, as with the old persister, use a transaction. When you receive the
tx.commit-ok back from the broke, you have a guarantee that the messages
have been flushed to disk and appropriate fsync's called.

Whilst this behaviour may seem different from the old persister, it is
in fact not different, it's simply that the window of time in which a
message now waits before being sent to disk is probably bigger. There is
certainly no concerted effort in either the new or old persister to
offer any sort of guarantee that messages published outside of a
transaction, with delivery mode 2, are "promptly" written to disk.

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
gar1t
Posted: Fri Dec 11, 2009 3:31 pm Reply with quote
User Joined: 11 Aug 2009 Posts: 55
Hi Matthew,

On Fri, Dec 11, 2009 at 8:20 AM, Matthew Sackman <matthew@lshift.net> wrote:
> Hello Anton,
>
> This is entirely correct. When you publish a message with delivery
> mode 2 you are *not* _guaranteed_ that it hits disk. Publishing is an
> async operation and you get no confirmation that it goes to disk. The
> new persister does very aggressive caching in order to avoid doing lots
> of tiny and expensive writes. As such, there will frequently be times
> where if you restart the broker, you will lose several (maybe hundreds)
> of messages.

If you shut down gracefully the caches aren't flushed to disk?

Garrett

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
View user's profile Send private message
Guest
Posted: Fri Dec 11, 2009 3:38 pm Reply with quote
Guest
Hi Garrett,

On Fri, Dec 11, 2009 at 09:30:29AM -0600, Garrett Smith wrote:
> On Fri, Dec 11, 2009 at 8:20 AM, Matthew Sackman <matthew@lshift.net> wrote:
> > This is entirely correct. When you publish a message with delivery
> > mode 2 you are *not* _guaranteed_ that it hits disk. Publishing is an
> > async operation and you get no confirmation that it goes to disk. The
> > new persister does very aggressive caching in order to avoid doing lots
> > of tiny and expensive writes. As such, there will frequently be times
> > where if you restart the broker, you will lose several (maybe hundreds)
> > of messages.
>
> If you shut down gracefully the caches aren't flushed to disk?

No no, if it's a safe shutdown then yes, everything gets flushed out and
sync'd correctly. I should have made that clear above. You should only
lose messages in the event of a hard kill.

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
Guest
Posted: Fri Dec 11, 2009 4:05 pm Reply with quote
Guest
On Fri, Dec 11, 2009 at 6:38 PM, Matthew Sackman <matthew@lshift.net> wrote:
> Hi Garrett,
>
> On Fri, Dec 11, 2009 at 09:30:29AM -0600, Garrett Smith wrote:
>> On Fri, Dec 11, 2009 at 8:20 AM, Matthew Sackman <matthew@lshift.net> wrote:
>> > This is entirely correct. When you publish a message with delivery
>> > mode 2 you are *not* _guaranteed_ that it hits disk. Publishing is an
>> > async operation and you get no confirmation that it goes to disk. The
>> > new persister does very aggressive caching in order to avoid doing lots
>> > of tiny and expensive writes. As such, there will frequently be times
>> > where if you restart the broker, you will lose several (maybe hundreds)
>> > of messages.
>>
>> If you shut down gracefully the caches aren't flushed to disk?
>
> No no, if it's a safe shutdown then yes, everything gets flushed out and
> sync'd correctly. I should have made that clear above. You should only
> lose messages in the event of a hard kill.
>

Yes, it's logical to lose messages after kill -9.
But in in my case messages got lost after safe shutdown
(/etc/init.d/rabbitmq restart or
rabbitmqctl stop)

Regards,
Anton Lebedevich.


PS Matthew, sorry for message doubling, I've forgot to reply to list

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
Guest
Posted: Fri Dec 11, 2009 5:00 pm Reply with quote
Guest
Hi Anton,

On Fri, Dec 11, 2009 at 07:05:20PM +0300, mabrek wrote:
> On Fri, Dec 11, 2009 at 6:38 PM, Matthew Sackman <matthew@lshift.net> wrote:
> > On Fri, Dec 11, 2009 at 09:30:29AM -0600, Garrett Smith wrote:
> >> If you shut down gracefully the caches aren't flushed to disk?
> >
> > No no, if it's a safe shutdown then yes, everything gets flushed out and
> > sync'd correctly. I should have made that clear above. You should only
> > lose messages in the event of a hard kill.
>
> Yes, it's logical to lose messages after kill -9.
> But in in my case messages got lost after safe shutdown
> (/etc/init.d/rabbitmq restart or
> rabbitmqctl stop)

Thank you very much for this bug report. I'm afraid that's the risk with
testing code before it's gone through QA. I was able to reproduce this
and have fixed it, in revision 166365bd96ef. The problem was actually
nothing to do with flushing data out - that was happening correctly, and
the queue was also recovering correctly. However, it was then ignoring
some counts and thus erroneously reporting a length of 0.

Please update to the latest on bug21673 and let me know if you see it
fixed.

Btw, I also tested hard killing Rabbit and that did lose messages.
However, with just 10 messages in the queue, a safe shutdown and restart
now correctly picks up the 10 messages in the queue, whereas before it
didn't.

Thanks once again,

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
Guest
Posted: Sat Dec 12, 2009 10:35 am Reply with quote
Guest
Matthew Sackman wrote:
> On Fri, Dec 11, 2009 at 07:05:20PM +0300, mabrek wrote:
>> On Fri, Dec 11, 2009 at 6:38 PM, Matthew Sackman <matthew@lshift.net> wrote:
>>> On Fri, Dec 11, 2009 at 09:30:29AM -0600, Garrett Smith wrote:
>>>> If you shut down gracefully the caches aren't flushed to disk?
>>> No no, if it's a safe shutdown then yes, everything gets flushed out and
>>> sync'd correctly. I should have made that clear above. You should only
>>> lose messages in the event of a hard kill.
>> Yes, it's logical to lose messages after kill -9.
>> But in in my case messages got lost after safe shutdown
>> (/etc/init.d/rabbitmq restart or
>> rabbitmqctl stop)
>
> Thank you very much for this bug report. I'm afraid that's the risk with
> testing code before it's gone through QA. I was able to reproduce this
> and have fixed it, in revision 166365bd96ef. The problem was actually
> nothing to do with flushing data out - that was happening correctly, and
> the queue was also recovering correctly. However, it was then ignoring
> some counts and thus erroneously reporting a length of 0.
>
> Please update to the latest on bug21673 and let me know if you see it
> fixed.

I updated to rev c3f5c66513b6 and it fixed the problem.
rabbitmqctl list_queues correctly reports number of messages in queue after safe shutdown/restart.

Many thanks for your help.

Regards,
Anton Lebedevich.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum