Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Mnesia does not detect netsplit

Guest
Posted: Thu Sep 29, 2011 8:55 am Reply with quote
Guest
Hi,

We found a case where mnesia does not detect a netsplit.

Let's say we are running two mnesia nodes, A and B:
At startup, node A can't connect to node B (specified in the mnesia
config parameter extra_db_nodes). In this case node B is actually
running, but because of a temporary network issue, or node B being
heavily loaded, net_kernel:connect fails. When node A and B eventually
are connected (for example due to a non-mnesia process sending a
message between the nodes), mnesia does not detect the split, and the
two isles continue to run separately.

Note that when we say that mnesia does not detect the netsplit, we
mean that mnesia does not generate any 'inconsistent_database' event.

How to reproduce.
* In this example we simulate a network problem (net_kernel:connect
failure) by having the two nodes use different cookies.
------------------
$ erl -name test1@127.0.0.1 -mnesia schema_location ram -mnesia
extra_db_nodes "['test2@127.0.0.1']" -setcookie a
(test1@127.0.0.1)1> application:start(mnesia),
mnesia:subscribe(system), mnesia:create_table(my_table, []).
$ erl -name test2@127.0.0.1 -mnesia schema_location ram -mnesia
extra_db_nodes "['test1@127.0.0.1']" -setcookie b
(test2@127.0.0.1)1> application:start(mnesia),
mnesia:subscribe(system), mnesia:create_table(my_other_table, []).
%% Connect nodes
(test1@127.0.0.1)2> erlang:set_cookie(node(), b),
net_kernel:connect('test2@127.0.0.1').
(test1@127.0.0.1)3> nodes().
['test2@127.0.0.1']
(test1@127.0.0.1)4> mnesia:info().
...
running db nodes = ['test1@127.0.0.1']
stopped db nodes = ['test2@127.0.0.1']
...

------------------
Expected behaviour: subscriber gets a 'inconsistent_database' event
Actual behaviour: subscriber does not get any event.

Compare to this case, where mnesia correctly detects a inconsistent database:
------------------
$ erl -name test1@127.0.0.1 -mnesia schema_location ram -mnesia
extra_db_nodes "['test2@127.0.0.1']" -setcookie a
(test1@127.0.0.1)1> application:start(mnesia),
mnesia:subscribe(system), mnesia:create_table(my_table, []).
$ erl -name test2@127.0.0.1 -@localhost -mnesia schema_location ram
-mnesia extra_db_nodes "['test1@127.0.0.1']" -setcookie a
(test2@127.0.0.1)1> application:start(mnesia),
mnesia:subscribe(system), mnesia:create_table(my_other_table, []).
(test2@127.0.0.1)2> net_kernel:disconnect('test1@127.0.0.1').
(test2@127.0.0.1)3> net_kernel:connect('test1@127.0.0.1').
(test2@127.0.0.1)4> flush().
Shell got {mnesia_system_event,{mnesia_down,'test1@127.0.0.1'}}
Shell got {mnesia_system_event,
{inconsistent_database,running_partitioned_network,
'test1@127.0.0.1'}}

We found that the mnesia code that detects netsplits is in
mnesia_monitor. It uses net_kernel:monitor_nodes(true), to monitor
nodes going up and down. In the problematic scenario, when the
mnesia_monitor gets the the 'nodeup', it seems to ignore it since a
node down has not been seen.
Trace:
(<0.53.0>) call
mnesia_monitor:handle_info({nodeup,'test1@127.0.0.1'},{state,<0.52.0>,[],[],true,[],undefined,[]})
(<0.53.0>) call mnesia_recover:has_mnesia_down('test1@127.0.0.1')
(<0.53.0>) returned from mnesia_recover:has_mnesia_down/1 -> false

Does anyone have an idea about how we could work around this issue? If
we would detect the split ourselves, is there anyway we could get
mnesia to reconnect the nodes?

Regards
Jonas
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist
Guest
Posted: Thu Sep 29, 2011 9:03 am Reply with quote
Guest
FYI. I posted a suggestion on the mailing list for a network partition detector application.

http://erlang.org/pipermail/erlang-questions/2011-August/060702.html

If you have any questions, please send to me off list.

thanks,

Joseph Norton
norton@alum.mit.edu



On Sep 29, 2011, at 5:55 PM, Jonas Boberg wrote:

> Hi,
>
> We found a case where mnesia does not detect a netsplit.
>
> Let's say we are running two mnesia nodes, A and B:
> At startup, node A can't connect to node B (specified in the mnesia
> config parameter extra_db_nodes). In this case node B is actually
> running, but because of a temporary network issue, or node B being
> heavily loaded, net_kernel:connect fails. When node A and B eventually
> are connected (for example due to a non-mnesia process sending a
> message between the nodes), mnesia does not detect the split, and the
> two isles continue to run separately.
>
> Note that when we say that mnesia does not detect the netsplit, we
> mean that mnesia does not generate any 'inconsistent_database' event.
>
> How to reproduce.
> * In this example we simulate a network problem (net_kernel:connect
> failure) by having the two nodes use different cookies.
> ------------------
> $ erl -name test1@127.0.0.1 -mnesia schema_location ram -mnesia
> extra_db_nodes "['test2@127.0.0.1']" -setcookie a
> (test1@127.0.0.1)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_table, []).
> $ erl -name test2@127.0.0.1 -mnesia schema_location ram -mnesia
> extra_db_nodes "['test1@127.0.0.1']" -setcookie b
> (test2@127.0.0.1)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_other_table, []).
> %% Connect nodes
> (test1@127.0.0.1)2> erlang:set_cookie(node(), b),
> net_kernel:connect('test2@127.0.0.1').
> (test1@127.0.0.1)3> nodes().
> ['test2@127.0.0.1']
> (test1@127.0.0.1)4> mnesia:info().
> ...
> running db nodes = ['test1@127.0.0.1']
> stopped db nodes = ['test2@127.0.0.1']
> ...
>
> ------------------
> Expected behaviour: subscriber gets a 'inconsistent_database' event
> Actual behaviour: subscriber does not get any event.
>
> Compare to this case, where mnesia correctly detects a inconsistent database:
> ------------------
> $ erl -name test1@127.0.0.1 -mnesia schema_location ram -mnesia
> extra_db_nodes "['test2@127.0.0.1']" -setcookie a
> (test1@127.0.0.1)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_table, []).
> $ erl -name test2@127.0.0.1 -@localhost -mnesia schema_location ram
> -mnesia extra_db_nodes "['test1@127.0.0.1']" -setcookie a
> (test2@127.0.0.1)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_other_table, []).
> (test2@127.0.0.1)2> net_kernel:disconnect('test1@127.0.0.1').
> (test2@127.0.0.1)3> net_kernel:connect('test1@127.0.0.1').
> (test2@127.0.0.1)4> flush().
> Shell got {mnesia_system_event,{mnesia_down,'test1@127.0.0.1'}}
> Shell got {mnesia_system_event,
> {inconsistent_database,running_partitioned_network,
> 'test1@127.0.0.1'}}
>
> We found that the mnesia code that detects netsplits is in
> mnesia_monitor. It uses net_kernel:monitor_nodes(true), to monitor
> nodes going up and down. In the problematic scenario, when the
> mnesia_monitor gets the the 'nodeup', it seems to ignore it since a
> node down has not been seen.
> Trace:
> (<0.53.0>) call
> mnesia_monitor:handle_info({nodeup,'test1@127.0.0.1'},{state,<0.52.0>,[],[],true,[],undefined,[]})
> (<0.53.0>) call mnesia_recover:has_mnesia_down('test1@127.0.0.1')
> (<0.53.0>) returned from mnesia_recover:has_mnesia_down/1 -> false
>
> Does anyone have an idea about how we could work around this issue? If
> we would detect the split ourselves, is there anyway we could get
> mnesia to reconnect the nodes?
>
> Regards
> Jonas
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
Post received from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum