| Author |
Message |
|
| Guest |
Posted: Wed Jun 18, 2008 2:24 pm |
|
|
|
Guest
|
Hi,
Recently rabbitmq has been dying on us and I think I've found the problem.
What usually happens is that the clients timeout and disconnect (they have
a 3 second heartbeat) and reconnecting doesn't work. We get a
"java.net.ConnectException: Connection refused" exception. The 'beam' task
is also currently using 6% CPU and about 2GB of RAM.
The errors look like:
Mnesia(rabbit@vsdlbblue01): ** WARNING ** Mnesia is overloaded: {dump_log,
time_threshold}
Mnesia(rabbit@vsdlbblue01): ** WARNING ** Mnesia is overloaded: {mnesia_tm,
message_queue_len,
[705,850]}
error on TCP connection from 10.80.12.26:47327
{timeout,{frame_payload,3,1,29421}}
etc.
It looks like we might be leaving messages lying around? If I'm correct is
there a way of seeing the queues and which have lots of messages? I've
attached the last few hours of the log file in case that helps.
Thanks,
Dave
(See attached file: rabbit.zip)
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Wed Jun 18, 2008 2:49 pm |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Dave,
I've had a quick look at the log file, and it seems that your clients
are dying, which in turn is badly handled in the broker.
This error
Error in process <0.7321.2> on node 'rabbit@vsdlbblue01' with exit
value: {badarg,[{erlang,port_command,[#Port<0.27825>,[<<7 bytes>>,<<36
bytes>>,<<1 byte>>]]},{rabbit_writer,internal_send_command_async,3},{rabbit_writer,handle_message,2},{rabbit_writer,mainloop,1}]}
suggests that the opposing peer is no longer there and causes a follow
on error message:
Error in process <0.30205.1> on node 'rabbit@vsdlbblue01' with exit
value: {{badmatch,{error,[{exit,{timeout,{gen_server,call,[<0.30206.1>,{notify_down,<0.30204.1>}]}}}]}},[{rabbit_channel,terminate,2},{buffering_proxy,mainloop,4}]}
which is a symptom of the first error not being handled correctly.
There is a bug for this already and will be fixed very soon, please
let us know what the urgency on this is, because we could get a patch
out quicker if necessary.
This is not the complete answer though, which we'll look into, but I
just wanted to give some feedback as soon as possible.
A few questions to help us diagnose this:
- What version of Rabbit are you using?
- Does the Rabbit process actually die or just the TCP listener?
Thanks,
Ben
On Wed, Jun 18, 2008 at 3:23 PM, <David.Corcoran@edftrading.com> wrote:
>
> Hi,
>
> Recently rabbitmq has been dying on us and I think I've found the problem.
> What usually happens is that the clients timeout and disconnect (they have
> a 3 second heartbeat) and reconnecting doesn't work. We get a
> "java.net.ConnectException: Connection refused" exception. The 'beam' task
> is also currently using 6% CPU and about 2GB of RAM.
>
> The errors look like:
> Mnesia(rabbit@vsdlbblue01): ** WARNING ** Mnesia is overloaded: {dump_log,
>
> time_threshold}
>
> Mnesia(rabbit@vsdlbblue01): ** WARNING ** Mnesia is overloaded: {mnesia_tm,
>
> message_queue_len,
>
> [705,850]}
>
> error on TCP connection from 10.80.12.26:47327
> {timeout,{frame_payload,3,1,29421}}
>
> etc.
>
> It looks like we might be leaving messages lying around? If I'm correct is
> there a way of seeing the queues and which have lots of messages? I've
> attached the last few hours of the log file in case that helps.
>
> Thanks,
>
> Dave
>
> (See attached file: rabbit.zip)
>
> *********************************************************************
> This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
> EDF Trading Limited
> 80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
> A Company registered in England No. 4255974.
> Switchboard: 020 7061 4000
> EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
> VAT number: GB 735 5479 07
> *********************************************************************
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss@lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Wed Jun 18, 2008 3:00 pm |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Dave,
On Wed, Jun 18, 2008 at 3:49 PM, Ben Hood <0x6e6562@gmail.com> wrote:
> I've had a quick look at the log file, and it seems that your clients
> are dying, which in turn is badly handled in the broker.
> A few questions to help us diagnose this:
I almost forgot to ask if you can reproduce the problem?
Do you have a test case that recreates this issue that you could send us?
If not, can you describe what you were doing, how many clients you had
sending and receiving to and from how many (and what type of)
exchanges and queues, please?
Thx,
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed Jun 18, 2008 3:24 pm |
|
|
|
Guest
|
Hi Ben,
Thanks for the quick response. If you have a patch it would be great
because we're going live with rabbitmq in a few weeks. Unfortunately this
problem never showed up during my early tests and is only showing up now
that we're hitting it quite heavily in our dev environment.
Some of our code can produce 50,000 messages in a few minutes and it can
take half an hour to process them. During that time I guess other processes
could also be producing large amounts of messages. Each message can be up
to a few KB in size so it can be quite a bit of data.
I can't give you a simple test case now but I might be able to put one
together. RabbitMQ only dies about every week so it's hard to reproduce.
They way our code works is that a server sends up to 50,000 jobs (messages)
to a job queue. There are 40 consumers that read the jobs and process them
and send the results back to a temporary reply queue. So, no exchanges and
just 1 queue. If several people are using the same instance of RabbitMq,
which might happen, there might be a few queues but no more than 8 or so.
You're right about the clients, when we restart them we do it through a
kill -9 so they don't disconnect gracefully. It may seem strange but the
clients are stateless and lightweight and we've always killed them through
a kill -9.
The RabbitMQ process doesn't actually die. I'm nearly positive about that.
I checked before I did a restart and 'beam' was using lots of RAM and a
little CPU but nothing could connect.
Versions:
RabbitMQ 1.3.0-1
Ubuntu 8.04 amd64 with 4GB RAM
Erlang 1:11.b.5dfsg-11
OpenJDK 1.6.0_06-b02 64-bit
Thanks,
Dave
"Ben Hood"
<0x6e6562@gmail.c
om> To
Sent by: rabbitmq-discuss@lists.rabbitmq.com
rabbitmq-discuss- cc
bounces@lists.rab
bitmq.com Subject
Re: [rabbitmq-discuss] rabbitmq
dying
18/06/2008 15:49
Dave,
I've had a quick look at the log file, and it seems that your clients
are dying, which in turn is badly handled in the broker.
This error
Error in process <0.7321.2> on node 'rabbit@vsdlbblue01' with exit
value: {badarg,[{erlang,port_command,[#Port<0.27825>,[<<7 bytes>>,<<36
bytes>>,<<1
byte>>]]},{rabbit_writer,internal_send_command_async,3},{rabbit_writer,handle_message,2},{rabbit_writer,mainloop,1}]}
suggests that the opposing peer is no longer there and causes a follow
on error message:
Error in process <0.30205.1> on node 'rabbit@vsdlbblue01' with exit
value:
{{badmatch,{error,[{exit,{timeout,{gen_server,call,[<0.30206.1>,{notify_down,<0.30204.1>}]}}}]}},[{rabbit_channel,terminate,2},{buffering_proxy,mainloop,4}]}
which is a symptom of the first error not being handled correctly.
There is a bug for this already and will be fixed very soon, please
let us know what the urgency on this is, because we could get a patch
out quicker if necessary.
This is not the complete answer though, which we'll look into, but I
just wanted to give some feedback as soon as possible.
A few questions to help us diagnose this:
- What version of Rabbit are you using?
- Does the Rabbit process actually die or just the TCP listener?
Thanks,
Ben
On Wed, Jun 18, 2008 at 3:23 PM, <David.Corcoran@edftrading.com> wrote:
>
> Hi,
>
> Recently rabbitmq has been dying on us and I think I've found the
problem.
> What usually happens is that the clients timeout and disconnect (they
have
> a 3 second heartbeat) and reconnecting doesn't work. We get a
> "java.net.ConnectException: Connection refused" exception. The 'beam'
task
> is also currently using 6% CPU and about 2GB of RAM.
>
> The errors look like:
> Mnesia(rabbit@vsdlbblue01): ** WARNING ** Mnesia is overloaded:
{dump_log,
>
> time_threshold}
>
> Mnesia(rabbit@vsdlbblue01): ** WARNING ** Mnesia is overloaded:
{mnesia_tm,
>
> message_queue_len,
>
> [705,850]}
>
> error on TCP connection from 10.80.12.26:47327
> {timeout,{frame_payload,3,1,29421}}
>
> etc.
>
> It looks like we might be leaving messages lying around? If I'm correct
is
> there a way of seeing the queues and which have lots of messages? I've
> attached the last few hours of the log file in case that helps.
>
> Thanks,
>
> Dave
>
> (See attached file: rabbit.zip)
>
> *********************************************************************
> This communication contains confidential information, some or all of
which may be privileged. It is for the intended recipient only and others
must not disclose, distribute, copy, print or rely on this communication.
If an addressing or transmission error has misdirected this communication,
please notify the sender by replying to this e-mail and then delete the
e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank
you.
> EDF Trading Limited
> 80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
> A Company registered in England No. 4255974.
> Switchboard: 020 7061 4000
> EDF Trading Markets Limited is a member of the EDF Trading Limited Group
and is authorised and regulated by the Financial Services Authority.
> VAT number: GB 735 5479 07
> *********************************************************************
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss@lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Wed Jun 18, 2008 3:38 pm |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Dave,
On Wed, Jun 18, 2008 at 4:23 PM, <David.Corcoran@edftrading.com> wrote:
> The RabbitMQ process doesn't actually die. I'm nearly positive about that.
> I checked before I did a restart and 'beam' was using lots of RAM and a
> little CPU but nothing could connect.
Can you tell me whether you can connect to the TCP socket on 5672 or
not when this happens? I'm trying to work whether the TCP listener has
dies or the AMQP protocol handler.
Thx,
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed Jun 18, 2008 3:58 pm |
|
|
|
Guest
|
> Can you tell me whether you can connect to the TCP socket on 5672 or
> not when this happens? I'm trying to work whether the TCP listener has
> dies or the AMQP protocol handler.
>
Ben,
Sorry, unfortunately I had to restart RabbitMQ so I can't test this. I'll
do it next time it happens.
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Wed Jun 18, 2008 11:05 pm |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Dave,
On Wed, Jun 18, 2008 at 3:23 PM, <David.Corcoran@edftrading.com> wrote:
> It looks like we might be leaving messages lying around? If I'm correct is
> there a way of seeing the queues and which have lots of messages?
In rabbit_amqqueue there are the functions stat/1 and stat_all/0 which
will tell you how many messages are in the queue.
HTH,
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Wed Jun 18, 2008 11:11 pm |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Dave,
On Wed, Jun 18, 2008 at 4:23 PM, <David.Corcoran@edftrading.com> wrote:
> I can't give you a simple test case now but I might be able to put one
> together. RabbitMQ only dies about every week so it's hard to reproduce.
> They way our code works is that a server sends up to 50,000 jobs (messages)
> to a job queue. There are 40 consumers that read the jobs and process them
> and send the results back to a temporary reply queue. So, no exchanges and
> just 1 queue. If several people are using the same instance of RabbitMq,
> which might happen, there might be a few queues but no more than 8 or so.
From what you describe, this shouldn't really be an issue for Rabbit.
50K messages should not really be a problem, so it would be good to be
able to recreate this. I will try to write a test scenario to try to
provoke this, but the more input I get the better in helping me
reproduce this.
>
> You're right about the clients, when we restart them we do it through a
> kill -9 so they don't disconnect gracefully. It may seem strange but the
> clients are stateless and lightweight and we've always killed them through
> a kill -9.
There is a *nicer* way to shut a client down, but at the end of the
day, the server *should* be able to handle any type of behaviour from
a client.
HTH,
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu Jun 19, 2008 8:06 am |
|
|
|
Guest
|
"Ben Hood" <0x6e6562@gmail.com> wrote on 19/06/2008 00:10:36:
> Dave,
>
> From what you describe, this shouldn't really be an issue for Rabbit.
> 50K messages should not really be a problem, so it would be good to be
> able to recreate this. I will try to write a test scenario to try to
> provoke this, but the more input I get the better in helping me
> reproduce this.
Morning Ben,
I'll try to reproduce it with a small test myself. The message "Mnesia is
overloaded" seems to hit sometimes when the message queue length is only
200,000 or so. Is this normal? Or perhaps the better questions is when does
Mnesia become overloaded? Does it need more disk or maybe more memory? Btw,
our queues are 'non-durable'.
> There is a *nicer* way to shut a client down, but at the end of the
> day, the server *should* be able to handle any type of behaviour from
> a client.
Yeah, we could be polite to our clients and let them shutdown nicely but
because they're stateless we've always done it this way. If the patch isn't
available in time I can have a look into it.
Thanks,
Dave
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu Jun 19, 2008 8:13 am |
|
|
|
Guest
|
rabbitmq-discuss-bounces@lists.rabbitmq.com wrote on 19/06/2008 00:04:27:
> In rabbit_amqqueue there are the functions stat/1 and stat_all/0 which
> will tell you how many messages are in the queue.
>
Hey Ben,
That looks perfect. Unfortunately I'm not sure how to connect to Rabbit to
run those commands. I had a look at the mailing lists and found some
commands that aren't working for me. Perhaps you can help?
dave@vsdlbblue01:~$ erl -sname temp -remsh rabbit@localhost
Erlang (BEAM) emulator version 5.5.2 [source] [async-threads:0] [hipe]
[kernel-poll:false]
{error_logger,{{2008,6,19},{8,49,47}},"~s~n",["Error in process <0.29.0> on
node 'temp@vsdlbblue01' with exit value:
{badarg,[{erlang,list_to_existing_atom,[\"rabbit@vsdlbblue01\"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}\n"]}
*** ERROR: Shell process terminated! (^G to start new job) ***
=ERROR REPORT==== 19-Jun-2008::08:49:47 ===
Error in process <0.29.0> on node 'temp@vsdlbblue01' with exit value:
{badarg,[{erlang,list_to_existing_atom,["rabbit@vsdlbblue01"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
And:
dave@vsdlbblue01:~$ erl -sname temp -remsh rabbit@vsdlbblue01
Erlang (BEAM) emulator version 5.5.2 [source] [async-threads:0] [hipe]
[kernel-poll:false]
*** ERROR: Shell process terminated! (^G to start new job) ***
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
RabbitMQ is running on the machine (rhel5 i686) with the following args:
/usr/lib/erlang/erts-5.5.2/bin/beam -W w -K true -A30 -- -root
/usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -pa
/usr/sbin/../ebin -noshell -noinput -s rabbit -sname rabbit -boot
start_sasl -kernel inet_default_listen_options
[{sndbuf,16384},{recbuf,4096}] -rabbit tcp_listeners [{"0.0.0.0", 5672}]
-sasl errlog_type error -kernel error_logger
{file,"/var/log/rabbitmq/rabbit.log"} -sasl sasl_error_logger
{file,"/var/log/rabbitmq/rabbit-sasl.log"} -os_mon start_cpu_sup true
-os_mon start_disksup false -os_mon start_memsup false -os_mon start_os_sup
false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit" -noshell -noinput
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Thu Jun 19, 2008 8:42 am |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
On Thu, Jun 19, 2008 at 9:12 AM, <David.Corcoran@edftrading.com> wrote:
>
> dave@vsdlbblue01:~$ erl -sname temp -remsh rabbit@localhost
I wouldn't have expected this to work because I would thought that the
name of the rabbit node is rabbit@vsdlbblue01.
> RabbitMQ is running on the machine (rhel5 i686) with the following args:
> /usr/lib/erlang/erts-5.5.2/bin/beam -W w -K true -A30 -- -root
> /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -pa
> /usr/sbin/../ebin -noshell -noinput -s rabbit -sname rabbit -boot
> start_sasl -kernel inet_default_listen_options
> [{sndbuf,16384},{recbuf,4096}] -rabbit tcp_listeners [{"0.0.0.0", 5672}]
> -sasl errlog_type error -kernel error_logger
> {file,"/var/log/rabbitmq/rabbit.log"} -sasl sasl_error_logger
> {file,"/var/log/rabbitmq/rabbit-sasl.log"} -os_mon start_cpu_sup true
> -os_mon start_disksup false -os_mon start_memsup false -os_mon start_os_sup
> false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit" -noshell -noinput
Can you ping rabbit@vsdlbblue01?
E.g. $erl -sname temp
(temp@lsh-226)9> net_adm:ping('rabbit@vsdlbblue01).
If the node is pingable, this will return pong, otherwise pang.
HTH,
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Thu Jun 19, 2008 8:50 am |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
On Thu, Jun 19, 2008 at 9:06 AM, <David.Corcoran@edftrading.com> wrote:
> I'll try to reproduce it with a small test myself. The message "Mnesia is
> overloaded" seems to hit sometimes when the message queue length is only
> 200,000 or so. Is this normal? Or perhaps the better questions is when does
> Mnesia become overloaded? Does it need more disk or maybe more memory? Btw,
> our queues are 'non-durable'.
The messages themselves are not actually stored in mnesia. Mnesia just
maintains the existence of a queue with a unique name in a cluster.
The mnesia overloaded warning may indicate that there is a lot of
t-log activity caused by pending activities.
A good diagnostic for this is mnesia:info(). If we can get the remote
shell going, then the output of this would be good.
Furthermore having a script so that we can reproduce it here would be
good as well.
I also wondering why so many messages have been backed up. Can they
got get consumed at a fast rate?
>
>> There is a *nicer* way to shut a client down, but at the end of the
>> day, the server *should* be able to handle any type of behaviour from
>> a client.
>
> Yeah, we could be polite to our clients and let them shutdown nicely but
> because they're stateless we've always done it this way. If the patch isn't
> available in time I can have a look into it.
The patch I was talking about neatens out the server side handling of
clients just going AWOL, but I don't think that is going to solve your
problem, which is what we need to get to the bottom of.
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Thu Jun 19, 2008 9:00 am |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Dave,
On Thu, Jun 19, 2008 at 9:06 AM, <David.Corcoran@edftrading.com> wrote:
> Yeah, we could be polite to our clients and let them shutdown nicely but
> because they're stateless we've always done it this way. If the patch isn't
> available in time I can have a look into it.
That's a fair point. Based on this there may be a few things to consider:
1. Us adding a JVM shutdown hook to the java client to catch this kind
of thing and observe the protocol shutdown procedure. This would mean
you sending the JVM a friendlier signal than -9
2. Even though your clients are stateless, from a performance
perspective you may want to consider reusing the same AMQP channel
across message sends. Setting up each channel and the associated AMQP
handshake *may* be unnecessary overhead. It is after all a connection
orientated protocol
HTH,
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu Jun 19, 2008 9:13 am |
|
|
|
Guest
|
rabbitmq-discuss-bounces@lists.rabbitmq.com wrote on 19/06/2008 09:41:14:
>
> Can you ping rabbit@vsdlbblue01?
>
> E.g. $erl -sname temp
> (temp@lsh-226)9> net_adm:ping('rabbit@vsdlbblue01).
>
I get a 'pang' back. However RabbitMQ is refusing connections again so that
might not mean much. 'Beam' is still running and I can telnet into the
server on port 5672.
I can keep it down for a few more minutes if there's anything else I can
run that might be useful?
.....
Bizarre as this may seem RabbitMQ is now accepting connections again. When
it wasn't I got the error:
java.io.IOException
Caused by: com.rabbitmq.client.ShutdownSignalException (connection error;
reason: {#method<connection.close>(reply code=541, reply
text=INTERNAL_ERROR, class id=0, method id=0),null,""})
at
com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:577)
at
com.rabbitmq.client.impl.AMQConnection.handleConnectionClose(AMQConnection.java:563)
at
com.rabbitmq.client.impl.AMQConnection.processControlCommand(AMQConnection.java:540)
at
com.rabbitmq.client.impl.AMQConnection$1.processAsync(AMQConnection.java:7
at
com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:143)
at
com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:9
at
com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:443)
And everyone else using it was disconnected.
Logs again:
(See attached file: rabbit.zip)
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Thu Jun 19, 2008 9:22 am |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
On Thu, Jun 19, 2008 at 10:13 AM, <David.Corcoran@edftrading.com> wrote:
>
> I get a 'pang' back. However RabbitMQ is refusing connections again so that
> might not mean much. 'Beam' is still running and I can telnet into the
> server on port 5672.
>
> I can keep it down for a few more minutes if there's anything else I can
> run that might be useful?
>
> .....
>
> Bizarre as this may seem RabbitMQ is now accepting connections again.
You're confusing me Is it accepting connections or not?
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
|
|
All times are GMT
Page 1 of 2
Goto page 1, 2 Next
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|