| Author |
Message |
|
| Guest |
Posted: Tue Apr 15, 2008 7:45 pm |
|
|
|
Guest
|
Michael,
Michael Arnoldus wrote:
> Thank you for your suggestion. There is no strace on Mac OS X, but I did
> find a way to see what the C-program was doing (see below).
I can now reliably reproduce the problem on a Leopard machine at our
offices, running on a ppc platform with Erlang R12B-0 from macports and
the latest development snapshot of rabbit.
The results indicate that this is probably a bug in Leopard, or a bug in
the Erlang runtime that only manifests itself on Leopard.
The trigger appears to be a tcp client closing the socket when the
server is in the middle of writing to its peer.
The next step is to construct a test case that doesn't involve rabbit.
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue Apr 15, 2008 8:08 pm |
|
|
|
Guest
|
Matthias,
Thanks - wonderful - and it fits with the traces I observed.
Since we are running on Intel HW, let me know if I can help by running
stuff or reproduce the problem.
Michael
On Apr 15, 2008, at 21:44 , Matthias Radestock wrote:
> Michael,
>
> Michael Arnoldus wrote:
>
>> Thank you for your suggestion. There is no strace on Mac OS X, but
>> I did find a way to see what the C-program was doing (see below).
>
> I can now reliably reproduce the problem on a Leopard machine at our
> offices, running on a ppc platform with Erlang R12B-0 from macports
> and the latest development snapshot of rabbit.
>
> The results indicate that this is probably a bug in Leopard, or a
> bug in the Erlang runtime that only manifests itself on Leopard.
>
> The trigger appears to be a tcp client closing the socket when the
> server is in the middle of writing to its peer.
>
> The next step is to construct a test case that doesn't involve rabbit.
>
>
> Matthias.
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue Apr 15, 2008 8:16 pm |
|
|
|
Guest
|
Michael,
Michael Arnoldus wrote:
> Since we are running on Intel HW, let me know if I can help by running
> stuff or reproduce the problem.
Thanks for the offer. I might take you up on it once I have a simple
test case ready.
Also, a minor correction to what I wrote ...
> On Apr 15, 2008, at 21:44 , Matthias Radestock wrote:
>> I can now reliably reproduce the problem on a Leopard machine at our
>> offices, running on a ppc platform with Erlang R12B-0 from macports
>> and the latest development snapshot of rabbit.
It's R12B-2 actually.
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue Apr 15, 2008 8:16 pm |
|
|
|
Guest
|
> The results indicate that this is probably a bug in Leopard, or a bug in
> the Erlang runtime that only manifests itself on Leopard.
>
> The trigger appears to be a tcp client closing the socket when the
> server is in the middle of writing to its peer.
>
> The next step is to construct a test case that doesn't involve rabbit.
>
Matthias,
I'm not sure whether this will help, but anyway: Leopard (and presumably
OS X in general) handles connection closed by the other peer by
returning POLLHUP from the poll rather than POLLIN as is common on most
platforms. Can that confuse Erlang/Rabbit?
Martin
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Tue Apr 15, 2008 9:07 pm |
|
|
|
Guest
|
Martin,
Martin Sustrik wrote:
> I'm not sure whether this will help, but anyway: Leopard (and presumably
> OS X in general) handles connection closed by the other peer by
> returning POLLHUP from the poll rather than POLLIN as is common on most
> platforms. Can that confuse Erlang/Rabbit?
That is certainly worth looking into since a misdiagnosis of the event
might cause the Erlang runtime to think the socket is ready for
reading/writing when it fact it has been closed by the peer. Thanks for
the pointer!
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed Apr 16, 2008 6:51 am |
|
|
|
Guest
|
Michael, and anybody else who can spare a couple of minutes,
Matthias Radestock wrote:
> Michael Arnoldus wrote:
>> Since we are running on Intel HW, let me know if I can help by running
>> stuff or reproduce the problem.
>
> Thanks for the offer. I might take you up on it once I have a simple
> test case ready.
...which I now have. See attached.
To run this,
1) save the attached file in some directory
2) cd to that directory
3) run the erlang shell, i.e. 'erl'
4) monitor the CPU consumption of the erlang process (usually called
'beam' or 'beam.smp') with a program like 'top'
5) at the Erlang prompt, compile the program with
c(sock_spin).
which should return
{ok,sock_spin}
6) still at the Erlang prompt, pick a port (e.g. 5678) and run
sock_spin:working(5678).
7) connect to the chosen port with, say, netcat (telnet should work too,
but seems to be harder to kill; see next step), e.g.
nc localhost 5678 > /dev/null
terminate the connection, e.g. by ^C-ing netcat or killing the process.
9) At this point (it may take a few seconds) the Erlang shell should
return something like {error, closed} or {error, einval}. Check the CPU
usage of the Erlang process.
Now repeat steps 6-9 but call
sock_spin:broken(5678).
instead.
Finally, to quit the Erlang shell just type
q().
at the prompt.
The CPU consumption of the Erlang process reported in step 9 should be
near 0% at the end of both tests. However, on some systems the second
test leaves the Erlang process consuming 100% CPU, though the Erlang
shell remains responsive. I am interested in finding out which systems
exhibit this behaviour and which don't.
When reporting your results please include information about your system
(if you are on Unix just run 'uname -a') and Erlang version (the version
number displayed when starting the Erlang shell will do just fine).
Regards,
Matthias.
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed Apr 16, 2008 9:36 am |
|
|
|
Guest
|
Matthias,
Nice work!!!
As expected I get 100% CPU wit sock_spin:broken().
uname -a:
Darwin Hobbes.local 9.2.2 Darwin Kernel Version 9.2.2: Tue Mar 4
21:17:34 PST 2008; root:xnu-1228.4.31~1/RELEASE_I386 i386
erlang version:
Erlang (BEAM) emulator version 5.6.1 [source] [smp:2] [async-threads:
0] [kernel-poll:false]
I'll be happy to try it on other HW and/or other versions if you think
you need this.
Regards,
Michael
On Apr 16, 2008, at 8:50 , Matthias Radestock wrote:
> Michael, and anybody else who can spare a couple of minutes,
>
> Matthias Radestock wrote:
>> Michael Arnoldus wrote:
>>> Since we are running on Intel HW, let me know if I can help by
>>> running stuff or reproduce the problem.
>> Thanks for the offer. I might take you up on it once I have a
>> simple test case ready.
>
> ...which I now have. See attached.
>
>
> To run this,
>
> 1) save the attached file in some directory
>
> 2) cd to that directory
>
> 3) run the erlang shell, i.e. 'erl'
>
> 4) monitor the CPU consumption of the erlang process (usually called
> 'beam' or 'beam.smp') with a program like 'top'
>
> 5) at the Erlang prompt, compile the program with
> c(sock_spin).
> which should return
> {ok,sock_spin}
>
> 6) still at the Erlang prompt, pick a port (e.g. 5678) and run
> sock_spin:working(5678).
>
> 7) connect to the chosen port with, say, netcat (telnet should work
> too, but seems to be harder to kill; see next step), e.g.
> nc localhost 5678 > /dev/null
>
> terminate the connection, e.g. by ^C-ing netcat or killing the
> process.
>
> 9) At this point (it may take a few seconds) the Erlang shell should
> return something like {error, closed} or {error, einval}. Check the
> CPU usage of the Erlang process.
>
> Now repeat steps 6-9 but call
> sock_spin:broken(5678).
> instead.
>
> Finally, to quit the Erlang shell just type
> q().
> at the prompt.
>
>
> The CPU consumption of the Erlang process reported in step 9 should
> be near 0% at the end of both tests. However, on some systems the
> second test leaves the Erlang process consuming 100% CPU, though the
> Erlang shell remains responsive. I am interested in finding out
> which systems exhibit this behaviour and which don't.
>
> When reporting your results please include information about your
> system (if you are on Unix just run 'uname -a') and Erlang version
> (the version number displayed when starting the Erlang shell will do
> just fine).
>
>
> Regards,
>
> Matthias.
> -module(sock_spin).
>
> -compile(export_all).
>
> working(Port) ->
> spin(Port, []).
>
> broken(Port) ->
> spin(Port, [{active, false}]).
>
> spin(Port, Opts) ->
> {ok, LSock} = gen_tcp:listen(Port, Opts),
> {ok, Sock} = gen_tcp:accept(LSock),
> Res = send(Sock, list_to_binary(lists:duplicate(10000, $A))),
> ok = gen_tcp:close(LSock),
> Res.
>
> send(Sock, B) ->
> case gen_tcp:send(Sock, B) of
> ok -> send(Sock, B);
> Other -> Other
> end.
>
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed Apr 16, 2008 9:55 am |
|
|
|
Guest
|
Michael,
Michael Arnoldus wrote:
> As expected I get 100% CPU wit sock_spin:broken().
Excellent (well, in a way
> uname -a:
> Darwin Hobbes.local 9.2.2 Darwin Kernel Version 9.2.2: Tue Mar 4
> 21:17:34 PST 2008; root:xnu-1228.4.31~1/RELEASE_I386 i386
>
> erlang version:
> Erlang (BEAM) emulator version 5.6.1 [source] [smp:2] [async-threads:0]
> [kernel-poll:false]
That's a useful data point since it's a slightly different version of
the O/S (9.2.2 on i386 vs 9.1.0 on ppc for me) and Erlang (R12B-1 vs
R12B-2 for me).
> I'll be happy to try it on other HW and/or other versions if you think
> you need this.
That would be great. I am particularly interested in the following
combinations:
- R11B-x on Mac OS X Leopard
- R12B-x on Mac OS X Tiger
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed Apr 16, 2008 10:21 am |
|
|
|
Guest
|
On Apr 16, 2008, at 11:54 , Matthias Radestock wrote:
> Michael,
>
> That would be great. I am particularly interested in the following
> combinations:
> - R11B-x on Mac OS X Leopard
100% CPU with sock_spin:broken().
uname -a:
Darwin AHP.local 9.2.2 Darwin Kernel Version 9.2.2: Tue Mar 4
21:17:34 PST 2008; root:xnu-1228.4.31~1/RELEASE_I386 i386
erlang version:
Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0]
[kernel-poll:false]
>
> - R12B-x on Mac OS X Tiger
No Tiger at work. I can try this at a friends house, but it'll take a
day or two - let me know if that's interesting.
Michael
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Wed Apr 16, 2008 4:46 pm |
|
|
|
Guest
|
Michael,
Michael Arnoldus wrote:
>> - R11B-x on Mac OS X Leopard
>
> 100% CPU with sock_spin:broken().
cheers.
>> - R12B-x on Mac OS X Tiger
>
> No Tiger at work. I can try this at a friends house, but it'll take a
> day or two - let me know if that's interesting.
Alexis has Tiger on his laptop and tried it there. It worked, i.e. no
spinning, which is consistent with the tests we conducted some weeks ago.
So the problem does indeed appear to be confined to Leopard.
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri Apr 18, 2008 5:12 pm |
|
|
|
Guest
|
|
| Back to top |
|
| Guest |
Posted: Fri Apr 18, 2008 7:59 pm |
|
|
|
Guest
|
|
| Back to top |
|
|
|
All times are GMT
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|