Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Garbage collection

bjorn at erix.ericsson.se
Posted: Thu Sep 02, 1999 11:10 am Reply with quote
Guest
Klacke <klacke_at_bluetail.com> writes:

>
>
> David Gould writes:
> > I might be interested in working on this. I am curious to know what has
> > prevented this in the past,
>
>
> It's a bit hard, to make the gc non blocking ggc.c needs to rewritten
> into a reentrant collector. This is a bit hard and it also makes
> the collector a bit slower unless some real clever tricks are
> applied.

Yes, I think keeping the performance is the real problem.

>
> Furthermore, when a process has garbed and the garb was
> not finished (since it took too long time) and execution
> was resumed instead, well then we have a fairly complex
> situation. Imagine that we need to send a message to such
> a "half-garbed" process, we need to allocate som memory there
> for the message. Where do we allocate that memory, hardly
> on any of the heaps that are "half-garbed" ??

Messages sent to a process are never put directly onto the heap
of the receiving process. (They used to be in older version of Erlang.)
Instead, each message is put into a separately allocated message
buffer outside the heap.

Therefore, this should not be a problem. But you make must take good
care to keep track of the state (size of data in messages and which
messages that are new and so on).

>
> (Or do we suspend the sender !!!, deadlocks ???)
>
> Thus the situation which lead to this "half-garb", was probably
> that the live data was large. Interupting the gc of this
> large data set leaves us with a whole lot of memory which
> is not released. The original two heaps of the process as
> well as the new heap we're garbing to. Three heaps.

That is a real problem.

/Bjorn Gustavsson




Post generated using Mail2Forum (http://m2f.sourceforge.net)
klacke at bluetail.com
Posted: Thu Sep 02, 1999 11:23 am Reply with quote
Guest
Bjorn Gustavsson writes:
>
> Messages sent to a process are never put directly onto the heap
> of the receiving process. (They used to be in older version of Erlang.)
> Instead, each message is put into a separately allocated message
> buffer outside the heap.
>

I know, this the situation in open source erlang as well.
However the message bugger queue is part of the root set and
can thus not be easily fiddled with if the process
is in an interupted gc state.

> Therefore, this should not be a problem. But you make must take good
> care to keep track of the state (size of data in messages and which
> messages that are new and so on).
>

Either that, or suspend the sender which is a whole lot
easier. I wrote "deadlocks ???" and I've thought a bit about
that now and I don't think it's an issue. ?? Think hard.

> >
> > Thus the situation which lead to this "half-garb", was probably
> > that the live data was large. Interupting the gc of this
> > large data set leaves us with a whole lot of memory which
> > is not released. The original two heaps of the process as
> > well as the new heap we're garbing to. Three heaps.
>
> That is a real problem.
>

Yes, on the other hand one of the more common situations
for a process to get a really huge heap is that somebody
is bombarding it with messages and the process can't keep
up with the processing of all async messages.
If the sender is suspended, it changes that situation a bit.

An other question, what about distributed messages if the sender is
to be suspended. Set the dist port to blocked, thereby
inhibiting *all* dist messages on that port just because one
process is garbing ??


/klacke




Post generated using Mail2Forum (http://m2f.sourceforge.net)
etxuwig at etxb.ericsson.
Posted: Thu Sep 02, 1999 1:41 pm Reply with quote
Guest
On Wed, 1 Sep 1999, David Gould wrote:

dg>I might be interested in working on this. I am curious to know what has
dg>prevented this in the past, the GC itself seems fairly compact and looks
dg>(ok, I only spent 10 minutes peeking at it) fairly clean.

I believe that it was considered a good idea, but it wasn't acted on
because other things were given higher priority.


dg>Also, any thoughts on how to proceed or things I need to be aware of
dg>outside gcc.c are welcome.

Sorry, I can't help you there. I don't understand the GC -- I just use it.

/Uffe

Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etx.ericsson.se>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuv
ug-erlang at davidb.org
Posted: Thu Sep 02, 1999 2:20 pm Reply with quote
Guest
>>>>> On Thu, 2 Sep 1999 11:22:52 +0200 (CEST), Klacke <klacke_at_bluetail.com> said:

> David Gould writes:
>> I might be interested in working on this. I am curious to know what has
>> prevented this in the past,

> It's a bit hard, to make the gc non blocking ggc.c needs to rewritten
> into a reentrant collector. This is a bit hard and it also makes
> the collector a bit slower unless some real clever tricks are
> applied.

> ...

I found a bunch of good papers on garbage collection (realtime as
well) on <ftp://ftp.cs.utexas.edu/pub/garbage/>. One of the papers
discusses a real-time garbage collector that only needs to coordinate
with the "mutators" when the mutate memory. We have a nice advantage
here that there isn't much mutation going on outside of making new
objects.

Their collector is single threaded, and single heaped. Moving to this
model might help with efficiency, since we could avoid the copy when
sending local messages. The collector just needs to be run
periodically (easy with an interpreter) and only needs to be informed
about mutations. It also eliminates issues with allocation in a
process whose GC would be running.

The difficulty here is can we guarantee that a process can run that
needs to run.

I'll look around and figure out which article it actually is that
talks about it.

David Brown



Post generated using Mail2Forum (http://m2f.sourceforge.net)
ug-erlang at davidb.org
Posted: Thu Sep 02, 1999 5:00 pm Reply with quote
Guest
>>>>> On Thu, 2 Sep 1999 11:25:35 -0500 (CDT), James Hague <jhague_at_dadgum.com> said:

> But the point of Erlang is to not think about low-level efficiency unless
> you really need to, eh? :)

Yeah, it's supposed to be the job of the implementors of the
language. When we discuss how the garbage collector works, we make
ourselves to be implementors.

I guess some good questions to ask would be, for typical Erlang
programs (whatever typical):

- Is it frequent that processes are created and destroyed, while
making a bunch of data while alive.
- Are there many longer lived processes that serve up data.

I would also suspect that the whole memory footprint could be reduced
with a single collection heap. This might help with the desired
implementation that runs on a PDA.

Dave Brown



Post generated using Mail2Forum (http://m2f.sourceforge.net)
etxuwig at etxb.ericsson.
Posted: Thu Sep 02, 1999 5:10 pm Reply with quote
Guest
On Thu, 2 Sep 1999, David Brown wrote:

ug-erl>I guess some good questions to ask would be, for typical Erlang
ug-erl>programs (whatever typical):
ug-erl>
ug-erl>- Is it frequent that processes are created and destroyed, while
ug-erl> making a bunch of data while alive.

Yes. One example is httpd, which spawns one process per connection.

ug-erl>- Are there many longer lived processes that serve up data.

Yes. Examples: the code server, the I/O server (user),
application_controller, ...


Both patterns are very important and must be handled efficiently by the
garbage collector.

/Uffe

Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etx.ericsson.se>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuv
dg at informix.com
Posted: Thu Sep 02, 1999 5:59 pm Reply with quote
Guest
On Thu, Sep 02, 1999 at 07:20:41AM -0700, David Brown wrote:
> >>>>> On Thu, 2 Sep 1999 11:22:52 +0200 (CEST), Klacke <klacke_at_bluetail.com> said:
>
> > David Gould writes:
> >> I might be interested in working on this. I am curious to know what has
> >> prevented this in the past,
>
> > It's a bit hard, to make the gc non blocking ggc.c needs to rewritten
> > into a reentrant collector. This is a bit hard and it also makes
> > the collector a bit slower unless some real clever tricks are
> > applied.
>
> > ...
>
> I found a bunch of good papers on garbage collection (realtime as
> well) on <ftp://ftp.cs.utexas.edu/pub/garbage/>. One of the papers

This is Paul Wilsons group at U Texas, a great resource for memory management
information. I have read a number of their papers but not specifically the
realtime GC one. But I will.

> discusses a real-time garbage collector that only needs to coordinate
> with the "mutators" when the mutate memory. We have a nice advantage
> here that there isn't much mutation going on outside of making new
> objects.
>
> Their collector is single threaded, and single heaped. Moving to this
> model might help with efficiency, since we could avoid the copy when
> sending local messages. The collector just needs to be run
> periodically (easy with an interpreter) and only needs to be informed
> about mutations. It also eliminates issues with allocation in a
> process whose GC would be running.

Interesting.

> I'll look around and figure out which article it actually is that
> talks about it.

Please.

-dg
--
David Gould dg_at_informix.com 510.628.3783 or 510.305.9468
Informix Software 300 Lakeside Drive Oakland, CA 94612
You will cooperate with Microsoft, for the good of Microsoft
and for your own survival. -- Navindra Umanee



Post generated using Mail2Forum (http://m2f.sourceforge.net)
dg at informix.com
Posted: Fri Sep 03, 1999 3:30 am Reply with quote
Guest
On Thu, Sep 02, 1999 at 11:25:35AM -0500, James Hague wrote:
> On Thu, 2 Sep 1999, David Brown wrote:
> >
> > Their collector is single threaded, and single heaped. Moving to this
> > model might help with efficiency, since we could avoid the copy when
> > sending local messages. The collector just needs to be run
> > periodically (easy with an interpreter) and only needs to be informed
> > about mutations. It also eliminates issues with allocation in a
> > process whose GC would be running.
>
> A nice benefit of a separate heap per process is that you can just throw
> out all the data for that process when it's killed. Sharing a single heap
> would potentially result in more frequent collections.

This should not be a problem for a global heap garbage collector. Real
programs do not allocate memory in a steady way or even in any kind of random
distribution. They tend to have strong "phase" behaviour, allocating large
numbers of similar objects, working for bit, and then freeing most of the
previous allocations and allocating some new kind of object.

So any realistic memory manager has to cope with this kind of thing. Process
deallocation is just one kind of bursty allocator/collector workload.

-dg

--
David Gould dg_at_informix.com 510.628.3783 or 510.305.9468
Informix Software 300 Lakeside Drive Oakland, CA 94612
You will cooperate with Microsoft, for the good of Microsoft
and for your own survival. -- Navindra Umanee



Post generated using Mail2Forum (http://m2f.sourceforge.net)
patrickdlogan at home.com
Posted: Fri Sep 03, 1999 6:14 am Reply with quote
Guest
David Gould writes:
> >
> > A nice benefit of a separate heap per process is that you can
> > just throw out all the data for that process when it's killed.
> > Sharing a single heap would potentially result in more frequent
> > collections.
>
> This should not be a problem for a global heap garbage
> collector. Real programs do not allocate memory in a steady way or
> even in any kind of random distribution. They tend to have strong
> "phase" behaviour, allocating large numbers of similar objects,
> working for bit, and then freeing most of the previous allocations
> and allocating some new kind of object.
>
> So any realistic memory manager has to cope with this kind of
> thing. Process deallocation is just one kind of bursty
> allocator/collector workload.

Sorry for jumping into this conversation without having followed it
all too closely. But I have a few thoughts.

A problem with a heap per process seems to be how much heap to
allocate to each process, and handling the processes that outgrow
their heap.

Would there be a way to implement a "Big Bag of Pages" (BIBOP) where a
"page" is dedicated to a process but any process' pages would be
scattered around the heap. When a process goes away you'd want to give
those pages back to the system somehow without too much effort.

There was a paper out of Indiana University a few years ago called
something like "Don't Stop the BIBOP" which described using BIBOP
pages to indicate information about the kinds of things allocated on
any given page. Process information may fit into this scheme.

I don't know. Just a thought. I am not a GC implementor.

--
Patrick Logan patrickdlogan_at_home.com


Post generated using Mail2Forum (http://m2f.sourceforge.net)
ug-erlang at davidb.org
Posted: Fri Sep 03, 1999 2:44 pm Reply with quote
Guest
>>>>> On Thu, 2 Sep 1999 23:38:42 -0700 (PDT), Patrick Logan <patrickdlogan_at_home.com> said:

> A problem with a heap per process seems to be how much heap to
> allocate to each process, and handling the processes that outgrow
> their heap.

Another approach would be to do something like Caml Light. Each
process has a small heap. Objects that are smaller than a certain
size are allocated from this heap. When a process does a minor
collection, its live data is moved to the shared heap. This
per-process heap is usually something small, like 32k. (The small
heap is also only for non-mutable objects).

The large heap is the collected in a more coordinated way. This ends
up as a kind of hybrid copying and mark-sweep collector. The first
generation is copying (but only for small objects), and the subsequent
generation is collected normally.

The large heap will need to use a real time collection algorithm,
whereas the small heaps can just be copied through in their entirery.
They can be chosen to be small enough that this collection runs fast
enough.

When a message is passed, it is checked to see which heap it is in,
and only if it is in a small heap is it copied. That way, especially
large messages will avoid the copy.

I'm still looking for the article. Hopefully I will find it when I
return from my trip (Monday is a holiday in the US).

Dave Brown



Post generated using Mail2Forum (http://m2f.sourceforge.net)

Display posts from previous:  

All times are GMT
Page 2 of 2
Goto page Previous  1, 2
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum