| Author |
Message |
|
| Guest |
Posted: Wed May 07, 2008 11:32 am |
|
|
|
Guest
|
Hi,
I've run into a problem testing RabbitMQ. My Java clients keep
disconnecting when using a lot of CPU and the error in the RabbitMQ logs
is:
=ERROR REPORT==== 7-May-2008::11:41:44 ===
error on TCP connection from 127.0.0.1:37299
{timeout,frame_header}
I'm using channel.basicGet and I think I've narrowed down the problem. When
I get a message I process it, then call basicAck once it's done. So the
code looks like:
GetResponse basicGet = channel.basicGet(ticket, queueName, false);
if (basicGet != null) {
process(basicGet);
channel.basicAck(basicGet.getEnvelope().getDeliveryTag(), false);
}
if process() looks like this:
long start = System.currentTimeMillis();
while(true) {
long now = System.currentTimeMillis();
if(now - start > 10000) {
break;
}
}
The client will disconnect. If I add a Thread.sleep(0) into the loop it
will work fine. The sleep 0 just yields. In my real code it doesn't do that
loop but does do a lot of maths that can take up to about 1 minute so it
has the same effect of killing the CPU for a while.
I guess what's happening is that the connection thread isn't getting any
time to send heartbeats and the server is disconnecting it. Is there a work
around for this? Can I change the heartbeat?
A little more information if that helps:
- Quad core machine, only using 1 cpu during this test
- 4GB Ram
- Erlang 5.5.5 (64bit)
- Ubunut 64
- Rabbit mq 1.3.0-1
Thanks,
Dave
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed May 07, 2008 5:18 pm |
|
|
|
Guest
|
David,
David.Corcoran@edftrading.com wrote:
> I've run into a problem testing RabbitMQ. My Java clients keep
> disconnecting when using a lot of CPU and the error in the RabbitMQ logs
> is:
> =ERROR REPORT==== 7-May-2008::11:41:44 ===
> error on TCP connection from 127.0.0.1:37299
> {timeout,frame_header}
>
> I'm using channel.basicGet and I think I've narrowed down the problem. When
> I get a message I process it, then call basicAck once it's done. So the
> code looks like:
> GetResponse basicGet = channel.basicGet(ticket, queueName, false);
> if (basicGet != null) {
> process(basicGet);
> channel.basicAck(basicGet.getEnvelope().getDeliveryTag(), false);
> }
>
> if process() looks like this:
> long start = System.currentTimeMillis();
> while(true) {
> long now = System.currentTimeMillis();
> if(now - start > 10000) {
> break;
> }
> }
> The client will disconnect. If I add a Thread.sleep(0) into the loop it
> will work fine. The sleep 0 just yields. In my real code it doesn't do that
> loop but does do a lot of maths that can take up to about 1 minute so it
> has the same effect of killing the CPU for a while.
>
> I guess what's happening is that the connection thread isn't getting any
> time to send heartbeats and the server is disconnecting it. Is there a work
> around for this? Can I change the heartbeat?
>
> A little more information if that helps:
> - Quad core machine, only using 1 cpu during this test
> - 4GB Ram
> - Erlang 5.5.5 (64bit)
> - Ubunut 64
> - Rabbit mq 1.3.0-1
That is weird. The application thread that is calling process(...), and
the connection thread are separate threads. The workload on the former
should not prevent the latter from sending heartbeats, certainly not on
a quad-core machine.
Can you post the complete example? Also, what version of the jvm are you
running?
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Thu May 08, 2008 8:29 am |
|
|
|
Guest
|
Hi Matthias,
It looks like you were right about the java version. Running in java
1.5.0.15 doesn't work. Running in java-6-openjdk or java-6 (1.6.0.06) works
fine. I think it's a bug, perhaps related to this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6383015
I've attached the code that displays the problem. Putting back the
Thread.yield() in java 5 makes it work, but obviously isn't a viable
solution.
I guess it's time we upgrade to java 6
Thanks again for your help,
Dave
(See attached file: TimeoutTest.java)
Matthias
Radestock
<matthias@lshift. To
net> David.Corcoran@edftrading.com
Sent by: cc
rabbitmq-discuss- rabbitmq-discuss@lists.rabbitmq.com
bounces@lists.rab Subject
bitmq.com Re: [rabbitmq-discuss] TCP timeouts
07/05/2008 18:18
David,
David.Corcoran@edftrading.com wrote:
> I've run into a problem testing RabbitMQ. My Java clients keep
> disconnecting when using a lot of CPU and the error in the RabbitMQ logs
> is:
> =ERROR REPORT==== 7-May-2008::11:41:44 ===
> error on TCP connection from 127.0.0.1:37299
> {timeout,frame_header}
>
> I'm using channel.basicGet and I think I've narrowed down the problem.
When
> I get a message I process it, then call basicAck once it's done. So the
> code looks like:
> GetResponse basicGet = channel.basicGet(ticket, queueName, false);
> if (basicGet != null) {
> process(basicGet);
> channel.basicAck(basicGet.getEnvelope().getDeliveryTag(), false);
> }
>
> if process() looks like this:
> long start = System.currentTimeMillis();
> while(true) {
> long now = System.currentTimeMillis();
> if(now - start > 10000) {
> break;
> }
> }
> The client will disconnect. If I add a Thread.sleep(0) into the loop it
> will work fine. The sleep 0 just yields. In my real code it doesn't do
that
> loop but does do a lot of maths that can take up to about 1 minute so it
> has the same effect of killing the CPU for a while.
>
> I guess what's happening is that the connection thread isn't getting any
> time to send heartbeats and the server is disconnecting it. Is there a
work
> around for this? Can I change the heartbeat?
>
> A little more information if that helps:
> - Quad core machine, only using 1 cpu during this test
> - 4GB Ram
> - Erlang 5.5.5 (64bit)
> - Ubunut 64
> - Rabbit mq 1.3.0-1
That is weird. The application thread that is calling process(...), and
the connection thread are separate threads. The workload on the former
should not prevent the latter from sending heartbeats, certainly not on
a quad-core machine.
Can you post the complete example? Also, what version of the jvm are you
running?
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 6:05 am |
|
|
|
Guest
|
David,
David.Corcoran@edftrading.com wrote:
> It looks like you were right about the java version. Running in java
> 1.5.0.15 doesn't work. Running in java-6-openjdk or java-6 (1.6.0.06) works
> fine. I think it's a bug, perhaps related to this:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6383015
Interesting. Yes, that looks like it would explain the behaviour you saw.
> I've attached the code that displays the problem. Putting back the
> Thread.yield() in java 5 makes it work, but obviously isn't a viable
> solution.
In the absence of even the most basic fairness guarantees from the jvm -
such as no starvation - asking programmers to insert Thread.yield() in
strategic places is about he best you can do.
Thanks for the analysis. It will save users a lot of time when running
across the same problem.
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 12:07 pm |
|
|
|
Guest
|
Non-Erlang-related-Java-is-funny-offtopic warning!
> David.Corcoran@edftrading.com wrote:
>> It looks like you were right about the java version. Running in java
>> 1.5.0.15 doesn't work. Running in java-6-openjdk or java-6 (1.6.0.06) works
>> fine. I think it's a bug, perhaps related to this:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6383015
>
> Interesting. Yes, that looks like it would explain the behaviour you saw.
A very interesting anomaly indeed, especially considering that it only
seems to affect 1.5, not 1.4 or 1.6. However, as far as I can tell this
behaviour can only occur with one CPU. I bet that David's test will run
fine when he enables more cores and/or starts running "real" code in the
process() method - meaning: code that creates garbage and doesn't just
repeatedly busy-waits on a native method, which accidentally must also
obey special synchronization rules. David, could you please try this if it
is not too inconvenient?
Ideally process() should run in a sepate thread anyway so that the
controlling thread can properly wait for a result or - more importantly -
cancel the computation if necessary.
> In the absence of even the most basic fairness guarantees from the jvm -
> such as no starvation - asking programmers to insert Thread.yield() in
> strategic places is about he best you can do.
Unfortunately this might not have the effect that one expects, as yield() is:
- free to be implemented as no-op; I think that at one point the server VM
explicitly optimized it away
- implemented differently in various versions of the JVM, both client and
server
- dependent on OS- and GC-algorithm: for a fun time, search for "yield" in
http://openjdk.neojava.org/hotspot/xref/src/share/vm/runtime/globals.hpp
- yielding native threads might not even work according to the general
expectations of either apps or the JVM: until recently (or since then,
depending on point of view) Linux didn't really yield() "correctly"
(according to POSIX), resulting in..yet another magic kernel flag!
Hilarious discussion in: http://kerneltrap.org/Linux/CFS_and_sched_yield
IMHO sprinkling yield() or sleeps() across a codebase has a definite
"magic pixie dust" smell, and should be avoided at all costs. The JCIP
book explains all this in chapter 10.3.1 ("Starvation") and pretty much
only recommends yields/sleep to create artificial delays when debugging
(to create races).
Besides, LockSupport.parkNanos() sounds much more advanced anyway.
-h
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 3:04 pm |
|
|
|
Guest
|
Hey Holger,
Thanks for the response. I've answered some of your queries below.
>
> A very interesting anomaly indeed, especially considering that it only
> seems to affect 1.5, not 1.4 or 1.6. However, as far as I can tell this
> behaviour can only occur with one CPU. I bet that David's test will run
> fine when he enables more cores and/or starts running "real" code in the
> process() method - meaning: code that creates garbage and doesn't just
> repeatedly busy-waits on a native method, which accidentally must also
> obey special synchronization rules. David, could you please try this if
it
> is not too inconvenient?
Unfortunately this happens in the real world too. The code with the while
loop was just to have a simple example that exhibited the same problem I
was having. Most of our cpu intensive code is manipulating matrices of
doubles so it uses full cpu and there is little or no garbage collection.
We use Colt (http://acs.lbl.gov/~hoschek/colt/) for most of it and it's so
optimised I'd be surprised if it had slow bits where other threads might
get a look in.
Also my machine is a quad core 3Ghz Xeon and this happens when just running
1 cpu intensive task.
> Ideally process() should run in a separate thread anyway so that the
> controlling thread can properly wait for a result or - more importantly -
> cancel the computation if necessary.
>
Process doing the calculations inline makes things conceptually a little
easier and each computation takes less than 30 (usually) so there's no need
to cancel them. Also even if it ran in a separate thread the heartbeat
thread wouldn't be scheduled because the computation thread would still be
too busy.
>
> Unfortunately this might not have the effect that one expects, as yield()
is:
>
I agree, yield is a complicated beast and I would never rely on it as a
solution. It was more just a way to illustrate the problem.
If RabbitMQ has configurable heartbeat timeouts this problem might
disappear. For example if you could up the disconnect timeout to 5 seconds
or more it's probably more likely the heartbeat thread would have had a
time slice by then. Still, the only real solution is to use Java6.
Regards,
Dave
*********************************************************************
This communication contains confidential information, some or all of which may be privileged. It is for the intended recipient only and others must not disclose, distribute, copy, print or rely on this communication. If an addressing or transmission error has misdirected this communication, please notify the sender by replying to this e-mail and then delete the e-mail. E-mail sent to EDF Trading may be monitored by the company. Thank you.
EDF Trading Limited
80 Victoria Street, 3rd Floor, Cardinal Place, London, SW1E 5JL
A Company registered in England No. 4255974.
Switchboard: 020 7061 4000
EDF Trading Markets Limited is a member of the EDF Trading Limited Group and is authorised and regulated by the Financial Services Authority.
VAT number: GB 735 5479 07
*********************************************************************
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 3:10 pm |
|
|
|
Guest
|
|
| Back to top |
|
|
|
All times are GMT
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|