Erlang/OTP Forums

Author Message

<  RabbitMQ mailing list  ~  RabbitMQ running at 100% CPU.

Guest
Posted: Tue Apr 15, 2008 7:45 pm Reply with quote
Guest
Michael,

Michael Arnoldus wrote:

> Thank you for your suggestion. There is no strace on Mac OS X, but I did
> find a way to see what the C-program was doing (see below).

I can now reliably reproduce the problem on a Leopard machine at our
offices, running on a ppc platform with Erlang R12B-0 from macports and
the latest development snapshot of rabbit.

The results indicate that this is probably a bug in Leopard, or a bug in
the Erlang runtime that only manifests itself on Leopard.

The trigger appears to be a tcp client closing the socket when the
server is in the middle of writing to its peer.

The next step is to construct a test case that doesn't involve rabbit.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Tue Apr 15, 2008 8:08 pm Reply with quote
Guest
Matthias,

Thanks - wonderful - and it fits with the traces I observed.

Since we are running on Intel HW, let me know if I can help by running
stuff or reproduce the problem.

Michael

On Apr 15, 2008, at 21:44 , Matthias Radestock wrote:

> Michael,
>
> Michael Arnoldus wrote:
>
>> Thank you for your suggestion. There is no strace on Mac OS X, but
>> I did find a way to see what the C-program was doing (see below).
>
> I can now reliably reproduce the problem on a Leopard machine at our
> offices, running on a ppc platform with Erlang R12B-0 from macports
> and the latest development snapshot of rabbit.
>
> The results indicate that this is probably a bug in Leopard, or a
> bug in the Erlang runtime that only manifests itself on Leopard.
>
> The trigger appears to be a tcp client closing the socket when the
> server is in the middle of writing to its peer.
>
> The next step is to construct a test case that doesn't involve rabbit.
>
>
> Matthias.



Post recived from mailinglist
Guest
Posted: Tue Apr 15, 2008 8:16 pm Reply with quote
Guest
Michael,

Michael Arnoldus wrote:
> Since we are running on Intel HW, let me know if I can help by running
> stuff or reproduce the problem.

Thanks for the offer. I might take you up on it once I have a simple
test case ready.

Also, a minor correction to what I wrote ...

> On Apr 15, 2008, at 21:44 , Matthias Radestock wrote:
>> I can now reliably reproduce the problem on a Leopard machine at our
>> offices, running on a ppc platform with Erlang R12B-0 from macports
>> and the latest development snapshot of rabbit.

It's R12B-2 actually.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Tue Apr 15, 2008 8:16 pm Reply with quote
Guest
> The results indicate that this is probably a bug in Leopard, or a bug in
> the Erlang runtime that only manifests itself on Leopard.
>
> The trigger appears to be a tcp client closing the socket when the
> server is in the middle of writing to its peer.
>
> The next step is to construct a test case that doesn't involve rabbit.
>
Matthias,

I'm not sure whether this will help, but anyway: Leopard (and presumably
OS X in general) handles connection closed by the other peer by
returning POLLHUP from the poll rather than POLLIN as is common on most
platforms. Can that confuse Erlang/Rabbit?

Martin

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Tue Apr 15, 2008 9:07 pm Reply with quote
Guest
Martin,

Martin Sustrik wrote:
> I'm not sure whether this will help, but anyway: Leopard (and presumably
> OS X in general) handles connection closed by the other peer by
> returning POLLHUP from the poll rather than POLLIN as is common on most
> platforms. Can that confuse Erlang/Rabbit?

That is certainly worth looking into since a misdiagnosis of the event
might cause the Erlang runtime to think the socket is ready for
reading/writing when it fact it has been closed by the peer. Thanks for
the pointer!


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Wed Apr 16, 2008 6:51 am Reply with quote
Guest
Michael, and anybody else who can spare a couple of minutes,

Matthias Radestock wrote:
> Michael Arnoldus wrote:
>> Since we are running on Intel HW, let me know if I can help by running
>> stuff or reproduce the problem.
>
> Thanks for the offer. I might take you up on it once I have a simple
> test case ready.

...which I now have. See attached.


To run this,

1) save the attached file in some directory

2) cd to that directory

3) run the erlang shell, i.e. 'erl'

4) monitor the CPU consumption of the erlang process (usually called
'beam' or 'beam.smp') with a program like 'top'

5) at the Erlang prompt, compile the program with
c(sock_spin).
which should return
{ok,sock_spin}

6) still at the Erlang prompt, pick a port (e.g. 5678) and run
sock_spin:working(5678).

7) connect to the chosen port with, say, netcat (telnet should work too,
but seems to be harder to kill; see next step), e.g.
nc localhost 5678 > /dev/null

Cool terminate the connection, e.g. by ^C-ing netcat or killing the process.

9) At this point (it may take a few seconds) the Erlang shell should
return something like {error, closed} or {error, einval}. Check the CPU
usage of the Erlang process.

Now repeat steps 6-9 but call
sock_spin:broken(5678).
instead.

Finally, to quit the Erlang shell just type
q().
at the prompt.


The CPU consumption of the Erlang process reported in step 9 should be
near 0% at the end of both tests. However, on some systems the second
test leaves the Erlang process consuming 100% CPU, though the Erlang
shell remains responsive. I am interested in finding out which systems
exhibit this behaviour and which don't.

When reporting your results please include information about your system
(if you are on Unix just run 'uname -a') and Erlang version (the version
number displayed when starting the Erlang shell will do just fine).


Regards,

Matthias.


Post recived from mailinglist
Guest
Posted: Wed Apr 16, 2008 9:36 am Reply with quote
Guest
Matthias,

Nice work!!!

As expected I get 100% CPU wit sock_spin:broken().

uname -a:
Darwin Hobbes.local 9.2.2 Darwin Kernel Version 9.2.2: Tue Mar 4
21:17:34 PST 2008; root:xnu-1228.4.31~1/RELEASE_I386 i386

erlang version:
Erlang (BEAM) emulator version 5.6.1 [source] [smp:2] [async-threads:
0] [kernel-poll:false]

I'll be happy to try it on other HW and/or other versions if you think
you need this.

Regards,

Michael

On Apr 16, 2008, at 8:50 , Matthias Radestock wrote:

> Michael, and anybody else who can spare a couple of minutes,
>
> Matthias Radestock wrote:
>> Michael Arnoldus wrote:
>>> Since we are running on Intel HW, let me know if I can help by
>>> running stuff or reproduce the problem.
>> Thanks for the offer. I might take you up on it once I have a
>> simple test case ready.
>
> ...which I now have. See attached.
>
>
> To run this,
>
> 1) save the attached file in some directory
>
> 2) cd to that directory
>
> 3) run the erlang shell, i.e. 'erl'
>
> 4) monitor the CPU consumption of the erlang process (usually called
> 'beam' or 'beam.smp') with a program like 'top'
>
> 5) at the Erlang prompt, compile the program with
> c(sock_spin).
> which should return
> {ok,sock_spin}
>
> 6) still at the Erlang prompt, pick a port (e.g. 5678) and run
> sock_spin:working(5678).
>
> 7) connect to the chosen port with, say, netcat (telnet should work
> too, but seems to be harder to kill; see next step), e.g.
> nc localhost 5678 > /dev/null
>
> Cool terminate the connection, e.g. by ^C-ing netcat or killing the
> process.
>
> 9) At this point (it may take a few seconds) the Erlang shell should
> return something like {error, closed} or {error, einval}. Check the
> CPU usage of the Erlang process.
>
> Now repeat steps 6-9 but call
> sock_spin:broken(5678).
> instead.
>
> Finally, to quit the Erlang shell just type
> q().
> at the prompt.
>
>
> The CPU consumption of the Erlang process reported in step 9 should
> be near 0% at the end of both tests. However, on some systems the
> second test leaves the Erlang process consuming 100% CPU, though the
> Erlang shell remains responsive. I am interested in finding out
> which systems exhibit this behaviour and which don't.
>
> When reporting your results please include information about your
> system (if you are on Unix just run 'uname -a') and Erlang version
> (the version number displayed when starting the Erlang shell will do
> just fine).
>
>
> Regards,
>
> Matthias.
> -module(sock_spin).
>
> -compile(export_all).
>
> working(Port) ->
> spin(Port, []).
>
> broken(Port) ->
> spin(Port, [{active, false}]).
>
> spin(Port, Opts) ->
> {ok, LSock} = gen_tcp:listen(Port, Opts),
> {ok, Sock} = gen_tcp:accept(LSock),
> Res = send(Sock, list_to_binary(lists:duplicate(10000, $A))),
> ok = gen_tcp:close(LSock),
> Res.
>
> send(Sock, B) ->
> case gen_tcp:send(Sock, B) of
> ok -> send(Sock, B);
> Other -> Other
> end.
>



Post recived from mailinglist
Guest
Posted: Wed Apr 16, 2008 9:55 am Reply with quote
Guest
Michael,

Michael Arnoldus wrote:
> As expected I get 100% CPU wit sock_spin:broken().

Excellent (well, in a way Wink

> uname -a:
> Darwin Hobbes.local 9.2.2 Darwin Kernel Version 9.2.2: Tue Mar 4
> 21:17:34 PST 2008; root:xnu-1228.4.31~1/RELEASE_I386 i386
>
> erlang version:
> Erlang (BEAM) emulator version 5.6.1 [source] [smp:2] [async-threads:0]
> [kernel-poll:false]

That's a useful data point since it's a slightly different version of
the O/S (9.2.2 on i386 vs 9.1.0 on ppc for me) and Erlang (R12B-1 vs
R12B-2 for me).

> I'll be happy to try it on other HW and/or other versions if you think
> you need this.

That would be great. I am particularly interested in the following
combinations:
- R11B-x on Mac OS X Leopard
- R12B-x on Mac OS X Tiger


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Wed Apr 16, 2008 10:21 am Reply with quote
Guest
On Apr 16, 2008, at 11:54 , Matthias Radestock wrote:

> Michael,
>
> That would be great. I am particularly interested in the following
> combinations:
> - R11B-x on Mac OS X Leopard

100% CPU with sock_spin:broken().

uname -a:
Darwin AHP.local 9.2.2 Darwin Kernel Version 9.2.2: Tue Mar 4
21:17:34 PST 2008; root:xnu-1228.4.31~1/RELEASE_I386 i386

erlang version:
Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0]
[kernel-poll:false]

>
> - R12B-x on Mac OS X Tiger

No Tiger at work. I can try this at a friends house, but it'll take a
day or two - let me know if that's interesting.

Michael



Post recived from mailinglist
Guest
Posted: Wed Apr 16, 2008 4:46 pm Reply with quote
Guest
Michael,

Michael Arnoldus wrote:
>> - R11B-x on Mac OS X Leopard
>
> 100% CPU with sock_spin:broken().

cheers.

>> - R12B-x on Mac OS X Tiger
>
> No Tiger at work. I can try this at a friends house, but it'll take a
> day or two - let me know if that's interesting.

Alexis has Tiger on his laptop and tried it there. It worked, i.e. no
spinning, which is consistent with the tests we conducted some weeks ago.

So the problem does indeed appear to be confined to Leopard.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Fri Apr 18, 2008 5:12 pm Reply with quote
Guest
All,

Matthias Radestock wrote:
> So the problem does indeed appear to be confined to Leopard.

The Erlang/OTP team have produced a patch and I have verified that it
works. The official fix will be available in the upcoming R12B-3 release
of Erlang/OTP.

See http://www.erlang.org/pipermail/erlang-bugs/2008-April/000745.html
for the details.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist
Guest
Posted: Fri Apr 18, 2008 7:59 pm Reply with quote
Guest
Matthias,

This is great news. Thank you!

Michael

On Apr 18, 2008, at 19:18 , Matthias Radestock wrote:

> All,
>
> Matthias Radestock wrote:
>> So the problem does indeed appear to be confined to Leopard.
>
> The Erlang/OTP team have produced a patch and I have verified that it
> works. The official fix will be available in the upcoming R12B-3
> release
> of Erlang/OTP.
>
> See http://www.erlang.org/pipermail/erlang-bugs/2008-April/000745.html
> for the details.
>
>
> Matthias.
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss@lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



Post recived from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum