Erlang/OTP Forums

Author Message

<  RabbitMQ mailing list  ~  rabbitmq dying

Guest
Posted: Thu Jun 19, 2008 9:28 am Reply with quote
Guest
"Ben Hood" <0x6e6562@gmail.com> wrote on 19/06/2008 10:22:04:
>
> You're confusing me Smile Is it accepting connections or not?
>

Sorry, I'm confused myself. What happened is that one of the other
developers here said the all his clients had disconnected from rabbitmq.
The error on his screen was about no heartbeat for 3 seconds. I tried to
connect with a lightweight java client that just creates and deletes a
queue and I couldn't. It tried for a few seconds then died with
INTERNAL_ERROR in the exception reason. So I started writing an email to
the list. By the time I had finished the email people could connect to
RabbitMQ again. I haven't restarted it.

Does that sound normal? Smile


*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
0x6e6562
Posted: Thu Jun 19, 2008 9:58 am Reply with quote
User Joined: 12 Jul 2007 Posts: 250
David,

On Thu, Jun 19, 2008 at 10:27 AM, <David.Corcoran@edftrading.com> wrote:
> Sorry, I'm confused myself. What happened is that one of the other
> developers here said the all his clients had disconnected from rabbitmq.
> The error on his screen was about no heartbeat for 3 seconds. I tried to
> connect with a lightweight java client that just creates and deletes a
> queue and I couldn't. It tried for a few seconds then died with
> INTERNAL_ERROR in the exception reason. So I started writing an email to
> the list. By the time I had finished the email people could connect to
> RabbitMQ again. I haven't restarted it.

Can I assume now that you can use the lightweight client to perform
the queue creation/delete?

Can I also ask some questions about how the Java client is running:

- What is the network connection to the broker like?
- Have you tried setting a higher heartbeat in the client (the default
is 3 seconds)?
- What OS are the clients running (this is a long shot because it
shouldn't matter, but we do know of a *bug* in the .NET Socket API in
this regard)?

> Does that sound normal? Smile

Depends what *normal* is Smile

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
View user's profile Send private message
Guest
Posted: Thu Jun 19, 2008 10:07 am Reply with quote
Guest
"Ben Hood" <0x6e6562@gmail.com> wrote on 19/06/2008 10:58:02:

> David,
>
>
> Can I assume now that you can use the lightweight client to perform
> the queue creation/delete?

Yes. And actually everyone was able to reconnect after waiting a minute or
two.

However the error is happening again though. The exception is
ShutdownSignalException. And now my lightweight client can't connect
again. It may start working again in a few minutes like last time.

>
> Can I also ask some questions about how the Java client is running:
>
> - What is the network connection to the broker like?
Gigabit LAN.

> - Have you tried setting a higher heartbeat in the client (the default
> is 3 seconds)?
No, but I can't connect at all at the moment.

> - What OS are the clients running (this is a long shot because it
> shouldn't matter, but we do know of a *bug* in the .NET Socket API in
> this regard)?
Ubuntu 64 and Rhel5 running on java OpenJDK 6.

> Depends what *normal* is Smile
Smile


*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
Guest
Posted: Thu Jun 19, 2008 10:24 am Reply with quote
Guest
RabbitMQ eventually crashed.

The following line is the last log entry (from startup.err):
eheap_alloc: Cannot allocate 1271244 bytes of memory (of type "heap").

/usr/sbin/rabbitmq-server: line 74: 32733 Aborted erl -pa
$(dirname $0)/../ebin ${START_RABBIT} -sname ${NODENAME} -boot start_sasl
+W w ${ERL_ARGS} -rabbit tcp_listeners '[{"'${NODE_IP_ADDRESS}'",
'${NODE_PORT}'}]' -sasl errlog_type error -kernel error_logger
'{file,"'${LOGS}'"}' -sasl sasl_error_logger '{file,"'${SASL_LOGS}'"}'
-os_mon start_cpu_sup true -os_mon start_disksup false -os_mon start_memsup
false -os_mon start_os_sup false -mnesia dir "\"${MNESIA_DIR}\""
${CLUSTER_CONFIG} ${RABBIT_ARGS} "$@"

So I guess it finally ran out of memory.

rabbitmq-discuss-bounces@lists.rabbitmq.com wrote on 19/06/2008 11:06:43:

>
>
> "Ben Hood" <0x6e6562@gmail.com> wrote on 19/06/2008 10:58:02:
>
> > David,
> >
> >
> > Can I assume now that you can use the lightweight client to perform
> > the queue creation/delete?
>
> Yes. And actually everyone was able to reconnect after waiting a minute
or
> two.
>
> However the error is happening again though. The exception is
> ShutdownSignalException. And now my lightweight client can't connect
> again. It may start working again in a few minutes like last time.
>
> >
> > Can I also ask some questions about how the Java client is running:
> >
> > - What is the network connection to the broker like?
> Gigabit LAN.
>
> > - Have you tried setting a higher heartbeat in the client (the default
> > is 3 seconds)?
> No, but I can't connect at all at the moment.
>
> > - What OS are the clients running (this is a long shot because it
> > shouldn't matter, but we do know of a *bug* in the .NET Socket API in
> > this regard)?
> Ubuntu 64 and Rhel5 running on java OpenJDK 6.
>
> > Depends what *normal* is Smile
> Smile
>
>
> *********************************************************************
> This communication contains confidential information, some or all of
> which may be privileged. It is for the intended recipient only and
> others must not disclose, distribute, copy, print or rely on this
> communication. If an addressing or transmission error has
> misdirected this communication, please notify the sender by replying
> to this e-mail and then delete the e-mail. E-mail sent to EDF
> Trading may be monitored by the company. Thank you.
> EDF Trading Limited
> 80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
> A Company registered in England No. 4255974.
> Switchboard: 020 7061 4000
> EDF Trading Markets Limited is a member of the EDF Trading Limited
> Group and is authorised and regulated by the Financial Services
Authority.
> VAT number: GB 735 5479 07
> *********************************************************************
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss@lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
Guest
Posted: Thu Jun 19, 2008 12:13 pm Reply with quote
Guest
David.Corcoran@edftrading.com wrote:
>> Can I assume now that you can use the lightweight client to perform
>> the queue creation/delete?
>
> Yes. And actually everyone was able to reconnect after waiting a minute or
> two.

Maybe something related to SO_LINGER (which sould be off by default but
*is* platform-specific) or a too high value of net.ipv4.tcp_fin_timeout?
These (or a combination, coupled with delayed finalizatoin by a GC) could
lead to all sorts of other pile-up/timeout effects..just guessing.

Holger

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
0x6e6562
Posted: Mon Jun 23, 2008 11:02 pm Reply with quote
User Joined: 12 Jul 2007 Posts: 250
Dave,

On Thu, Jun 19, 2008 at 11:23 AM, <David.Corcoran@edftrading.com> wrote:
> RabbitMQ eventually crashed.
>
> The following line is the last log entry (from startup.err):
> eheap_alloc: Cannot allocate 1271244 bytes of memory (of type "heap").
>
> /usr/sbin/rabbitmq-server: line 74: 32733 Aborted erl -pa
> $(dirname $0)/../ebin ${START_RABBIT} -sname ${NODENAME} -boot start_sasl
> +W w ${ERL_ARGS} -rabbit tcp_listeners '[{"'${NODE_IP_ADDRESS}'",
> '${NODE_PORT}'}]' -sasl errlog_type error -kernel error_logger
> '{file,"'${LOGS}'"}' -sasl sasl_error_logger '{file,"'${SASL_LOGS}'"}'
> -os_mon start_cpu_sup true -os_mon start_disksup false -os_mon start_memsup
> false -os_mon start_os_sup false -mnesia dir "\"${MNESIA_DIR}\""
> ${CLUSTER_CONFIG} ${RABBIT_ARGS} "$@"
>

Subsequent to the suggestions I made via IM, have you managed to
progress this at all?

Thx,

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
View user's profile Send private message
Guest
Posted: Tue Jun 24, 2008 9:59 am Reply with quote
Guest
"Ben Hood" <0x6e6562@gmail.com> wrote on 24/06/2008 00:02:03:

> Subsequent to the suggestions I made via IM, have you managed to
> progress this at all?
>

Hey Ben,

Unfortunately we're still able to crash it quite regularly. I changed the
message handlers to use only a single reply queue and now it's more stable
but not perfect. The only thing that still looks suspicious is that the
producers are sometimes run in threads. We only have one connection, which
is apparently thread safe (from the javadocs), and we create a channel for
each thread but maybe this is causing problems? We're going to make them
single threaded and keep testing.

I'm also going to upgrade everywhere to erlang 12b-3 to see if that helps.
I'll also apply the connection disconnect patch whenever it's available.
We're disconnecting nicely most of the time now but if you're debugging
then stopping the debugger causes a forced disconnect which causes an error
in the rabbitmq logs.

Thanks,

Dave


*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
0x6e6562
Posted: Tue Jun 24, 2008 11:19 am Reply with quote
User Joined: 12 Jul 2007 Posts: 250
Dave,

On Tue, Jun 24, 2008 at 10:58 AM, <David.Corcoran@edftrading.com> wrote:
> Unfortunately we're still able to crash it quite regularly.

When you say you are able to crash it regularly, are you saying the
rabbit process dies?

> I changed the
> message handlers to use only a single reply queue and now it's more stable
> but not perfect.

Have you noticed any difference in performance by doing that?

> The only thing that still looks suspicious is that the
> producers are sometimes run in threads. We only have one connection, which
> is apparently thread safe (from the javadocs), and we create a channel for
> each thread but maybe this is causing problems? We're going to make them
> single threaded and keep testing.

You need to make sure you correctly differentiating between
connections and channels.

The connection and channel manager are threadsafe, but each actual
channel should not be shared between threads.

In the AMQP model, the channel is the smallest unit of parallelism.

Hence on the client side, you should use one channel per application thread.

> I'm also going to upgrade everywhere to erlang 12b-3 to see if that helps.

I think you'll definitely get milage out of this, apart from having
many other bug fixes that have been made since 11b-5 was released.

> I'll also apply the connection disconnect patch whenever it's available.

I don't think that's going to change much for you, it'll just make the
log files look nicer Smile

> We're disconnecting nicely most of the time now but if you're debugging
> then stopping the debugger causes a forced disconnect which causes an error
> in the rabbitmq logs.

I wouldn't worry too much about that, although it is unsightly.
I am more concerned about the symptoms you were describing last week
about Rabbit becoming unavailable for some periods.


HTH,

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
View user's profile Send private message
0x6e6562
Posted: Fri Jun 27, 2008 1:37 pm Reply with quote
User Joined: 12 Jul 2007 Posts: 250
Dave,

On Tue, Jun 24, 2008 at 10:58 AM, <David.Corcoran@edftrading.com> wrote:
> I'm also going to upgrade everywhere to erlang 12b-3 to see if that helps.
> I'll also apply the connection disconnect patch whenever it's available.
> We're disconnecting nicely most of the time now but if you're debugging
> then stopping the debugger causes a forced disconnect which causes an error
> in the rabbitmq logs.

Just FYI:

The patch is currently available on a branch (we do all of our dev
work on a per-branch basis before we QA a piece of work and merge it
back into the trunk), awaiting QA for merging.

I can export this if and when you need it, you may however wish to
defer this to a later point, if it is not so critical.

HTH,

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 2 of 2
Goto page Previous  1, 2
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum