Erlang/OTP Forums

Author Message

<  Advanced Erlang/OTP  ~  Efficiency of monitor?

jwatte
Posted: Tue Nov 09, 2010 6:08 am Reply with quote
User Joined: 10 Feb 2010 Posts: 34
Suppose I have a crossbar of five front-end nodes, and five back-end nodes.

Each node has on the order of 100,000 processes, where each process on the front end may have located about a dozen of the back-end processes, keeping a list to them, and each back-end process has a list of the "locator" front-end processes. Each node is a dual-quad-core 64-bit Xeon with more RAM than you could touch in a full second of CPU time, and they are closely co-located on a non-blocking gigabit Ethernet switch.

On front-end node A, I may have processes A1, A2, ..., and on B, B1, B2, ...
On back-end node x, I have processes x1, x2, ... and on y, I have y1, y2, ...
If A1 locates y2, then the list of located processes in A1 will include y2, and the list of locators for y2 will include A1.
Periodically, a message will arrive at back-end processes, and be sent to all locator front-end processes, and/or a message will arrive at front-end processes and be sent to a particular located back-end process. The rate varies between 1 message per 10 seconds, and 10 messages per second, and the system should keep a "soft real-time" target of 99.999% of messages taking less than 100 milliseconds through the entire system.

Now, the sad fact of life is that nodes may die. I wish it weren't so, but cheap 1U hardware just doesn't have the durability it used to. Plus, I'm a klutz, and often trip over power cords.

What I want is for each front-end process that has located a back-end process on a dying back-end node to be told when that back-end node dies, and for each back-end process that has a locator on a front-end node to be told when the front-end node dies.

I can do this with monitor or link pretty easily. However, given the sheer number of processes, is this going to be a performance problem? Processes come and go all the time, and if the implementation of any of those lists is like a linear list, performance death will ensue. I'd like to know about this before actually spinning up the full-size experiment if possible, so any suggestions or learnings you can impart would be much appreciated.

The alternative is for me to keep a process that monitors nodes on each node, and then broadcast to all node-local processes when a downed node is detected elsewhere. However, that's pretty likely to just be a slight variation on what the runtime is already doing for me, so if I could avoid that, that would be swell!

I guess this largely devolves to the question: Do monitor() relations live in a lightweight hash table, or some more expensive data structure?
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum