| Author |
Message |
|
| 0x6e6562 |
Posted: Fri May 09, 2008 9:57 am |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Ed,
Thanks for taking time to look into this so deeply.
You have indeed found a bug in the network client with the fact that it is not shutting the per-channel writer processes down. See below.
On 8 May 2008, at 22:12, Edwin Fine wrote:
Quote:
Sorry for the long email. Trying to get to the bottom of the problems. I have done a LOT more investigation into the Erlang network client, and I believe there are two related bugs, one in the client, and the other I don't know where, or perhaps I am abusing how AMQP is supposed to work.
Bug #1: rabbit_writer not shut down
The first (and easier) bug, which I have fixed eventually, is that every time you start a channel, the network client creates a rabbit_writer, but never shuts it down. Therefore, each channel that is created leaves behind a rabbit_writer process.
What happens is that in the handshake, the network driver starts a rabbit_writer on channel 0. I believe this is used for the connection itself. Its pid is stored in writer_pid in the connection_state. This writer gets cleaned up properly when the connection is shut down. There is no problem with this.
Thereafter, when a channel is opened and amqp_network_driver:open_channel is called, another writer is started (correctly - need one per connection, right?). There is a singleton reader. Anyway, this writer is never terminated. The writer is registered as a peer to the channel, using amqp_channel:register_direct_peer. This causes a bug, because the registered peer is never shut down, probably because the direct peer never should be shut down... but this is the NETWORK peer.
So what I did (and you may have a better way) is to add another function, amqp_channel:register_network_peer. This sets a "direct" flag in the channel_state (which I had to add) to false. Calling register_direct_peer sets the flag to true. When amqp_channel:channel_cleanup() is eventually called (and I had to do something for this, too), it checks to see if direct is false. If so, it stops the writer in writer_pid, otherwise it leaves it alone.
I can see what the intention is. The main thing is that you have understood how all of the processes hang together which is a little tricky in the network case because the client is using a common codebase with the server to manage socket IO and AMQP channels. This is why it may not be |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 10:57 am |
|
|
|
Guest
|
Ben,
Ben Hood wrote:
> To achieve this I think the cleanup should be contained within the
> network driver either as a callback to allow the network driver to stop
> the channel writer process or even linking the channel process with the
> writer process so that when the channel exits, the writer process
> receives a signal as well. I think I prefer the latter, but this would
> mean changing the writer module, which is core module of the server and
> we'd need to look at the implications of this.
The server is already linking the writer and the channel, though the way
that is done may not be immediately obvious.
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Fri May 09, 2008 11:11 am |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Matthias,
On 9 May 2008, at 11:57, Matthias Radestock wrote:
> Ben Hood wrote:
>> To achieve this I think the cleanup should be contained within the
>> network driver either as a callback to allow the network driver to
>> stop the channel writer process or even linking the channel process
>> with the writer process so that when the channel exits, the writer
>> process receives a signal as well. I think I prefer the latter, but
>> this would mean changing the writer module, which is core module of
>> the server and we'd need to look at the implications of this.
>
> The server is already linking the writer and the channel, though the
> way that is done may not be immediately obvious.
Are you talking about start_link/4 in rabbit_channel? If I'm not
mistaken, this is linking the writing and the channel in the direct
case, but not in the network case.
In the network driver, the writer is started in the start_writer/2
function on a per channel basis:
start_writer(Sock, Channel) ->
rabbit_writer:start(Sock, Channel, ?FRAME_MIN_SIZE).
This then starts a new process to translate Erlang messages into AMQP
frames.
But this loop process is not linked to anything as far as I can see,
which is the behaviour that Edwin is reporting.
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 11:24 am |
|
|
|
Guest
|
Ben Hood wrote:
> On 9 May 2008, at 11:57, Matthias Radestock wrote:
>> The server is already linking the writer and the channel, though the
>> way that is done may not be immediately obvious.
>
> Are you talking about start_link/4 in rabbit_channel? If I'm not
> mistaken, this is linking the writing and the channel in the direct
> case, but not in the network case.
I was talking about what the server does. Nothing to do with the Erlang
client, other than they share a code base. You appeared to be saying
that linking the channel and the writer requires changes to that code
base. I want to understand why that is the case, given that the server
is establish precisely these kinds of links using that code base. Or, to
put it differently, if the server can do it, why can't the Erlang client?
Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| 0x6e6562 |
Posted: Fri May 09, 2008 11:41 am |
|
|
|
User
Joined: 12 Jul 2007
Posts: 250
|
Matthias,
On 9 May 2008, at 12:24, Matthias Radestock wrote:
> Ben Hood wrote:
>> On 9 May 2008, at 11:57, Matthias Radestock wrote:
>>> The server is already linking the writer and the channel, though
>>> the way that is done may not be immediately obvious.
>> Are you talking about start_link/4 in rabbit_channel? If I'm not
>> mistaken, this is linking the writing and the channel in the
>> direct case, but not in the network case.
>
> I was talking about what the server does. Nothing to do with the
> Erlang client, other than they share a code base. You appeared to be
> saying that linking the channel and the writer requires changes to
> that code base. I want to understand why that is the case, given
> that the server is establish precisely these kinds of links using
> that code base. Or, to put it differently, if the server can do it,
> why can't the Erlang client?
That is right, I did actually say it may need a change to the server
code. But I was thinking aloud rather than proposing something
definite because I don't want to change anything in the server without
good reason.
Actually on closer inspection it may reasonably straight forward to
do. The Pid of the writer process already being registered with the
channel process, so it would be a case of either linking the two
processes together or having the channel send the writer a shutdown
message when the channel exits.
Ben
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Post received from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 1:37 pm |
|
|
|
Guest
|
Ben,
Quote:
In your patch, you also added an extra method to handle a spurious consume_ok message;
%% Edwin Fine bugfix: This is actually being called wrong from somewhere,
%% but this will fix it.
handle_method({'basic.consume_ok', ConsumerTag}, State) ->
|
|
|
| Back to top |
|
| Guest |
Posted: Fri May 09, 2008 3:05 pm |
|
|
|
Guest
|
I wrote:
"Please understand that all the code I have submitted (other than amqp_channel) is a pure hack to try to expose and duplicate the errors!".
That should have read, "other than rbmq_admin.erl" - the module that I wrote to do admin tasks, which is only a semi-hack
Ed
On Fri, May 9, 2008 at 9:30 AM, Edwin Fine <rabbitmq-discuss_efine@usa.net (rabbitmq-discuss_efine@usa.net)> wrote:
Quote: Ben,
Quote:
In your patch, you also added an extra method to handle a spurious consume_ok message;
%% Edwin Fine bugfix: This is actually being called wrong from somewhere,
%% but this will fix it.
handle_method({'basic.consume_ok', ConsumerTag}, State) ->
|
|
|
| Back to top |
|
|
|
All times are GMT
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|