Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Garbage collection

vlad.dumitrescu at flashm
Posted: Fri Aug 27, 1999 7:26 am Reply with quote
Guest
Hi!

On the "to do" list at erlang.org lies also an item about improving the garbage
collector.

The question is whether the Boehm-Weiser could collector be used. It is a good
one, tested and free. On the right platforms, it offers incremental collection,
thus reducing the danger of a big process freezing the others while collecting.


There must be a reason for not using it, I don't imagine no one else thought
about that.


/Vlad
______________________________________________________
Get Your FREE FlashMail Address now at http://www.flashmail.com
It's Free, Easy, & Fun !!!


Post generated using Mail2Forum (http://m2f.sourceforge.net)
klacke at bluetail.com
Posted: Fri Aug 27, 1999 8:49 am Reply with quote
Guest
>
> The question is whether the Boehm-Weiser could collector be used. It is a good
> one, tested and free. On the right platforms, it offers incremental collection,
> thus reducing the danger of a big process freezing the others while collecting.
>
>
> There must be a reason for not using it, I don't imagine no one else thought
> about that.
>


Well we have discussed using that sort of conservative collector
as well. We only talked about it a couple of years ago but as far as
I know nobody ever did any experiments with the Boehm collector.

It does however require a pretty extensive rewrite of the runtime
since it's a malloc/free plugin replacement. I also think that it would
be pretty slow for erlang since the absolute majority of allocations
in erlang is the allocation of a cons cell.

Allocating a cons is done as:

uint32 *mem = p->htop;
p->htop += 2;

This is fast, if the Boehm collector should be used, we
need to replace this with:

uint32 *mem = (uint32*) malloc(2);

which is slow, extremely slow.

However I'm not discarding the idea entirely, maybe the
Boehm colect together with some hacks to do fast allocate
cons cells is good. I really don't know and it would take
a couple of months of hard work to find out.


/klacke






Post generated using Mail2Forum (http://m2f.sourceforge.net)
vlad.dumitrescu at flashm
Posted: Fri Aug 27, 1999 9:01 am Reply with quote
Guest
Hi!

I will try to look closer at the existing collector... it is an interesting
area. Difficult, but worth trying, I think. Only I have the time! :-)

>Allocating a cons is done as:
>
>uint32 *mem = p->htop;
>p->htop += 2;
>
>This is fast,

The allocation is fast, but the collection will be slower!

> if the Boehm collector should be used, we
>need to replace this with:
>
>uint32 *mem = (uint32*) malloc(2);
>
>which is slow, extremely slow.

true, but the extra time is distributed, instead of a long collection phase.
Besides, as you say, hacks can be used - line allocating a bunch of cells at
once, instead of one by one.

In the end, since there is no "one size fits all" solution to GC, it depends
a lot on the way typical applications/processes use memory...

/Vlad
______________________________________________________
Get Your FREE FlashMail Address now at http://www.flashmail.com
It's Free, Easy, & Fun !!!


Post generated using Mail2Forum (http://m2f.sourceforge.net)
etxuwig at etxb.ericsson.
Posted: Fri Aug 27, 1999 9:57 am Reply with quote
Guest
On Fri, 27 Aug 1999, Vlad Dumitrescu wrote:

vlad.d>Hi!
vlad.d>
vlad.d>I will try to look closer at the existing collector... it is
vlad.d>an interesting area. Difficult, but worth trying, I think.
vlad.d>Only I have the time! :-)

One thing you could consider is making the per-process GC reentrant. This
would be a significant improvement of the system real-time characteristics.

If you could divide the collection in a couple of (perhaps 2-4) discrete
phases, between which the process could be suspended and rescheduled, this
would be a significant improvement.

Another tiny detail would be to check the reduction counter before starting
a GC -- if it's > 400, yield first, then GC. As it is implemented now, a
process could run its entire timeslice and then start a costly GC on the
500th reduction.

/Uffe

Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etx.ericsson.se>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuv
vlad.dumitrescu at flashm
Posted: Fri Aug 27, 1999 11:55 am Reply with quote
Guest
Thanks Ulf!

I am not yet commited into seriously undertaking this project, but any help
is welcomed!

What I had in mind was using the kind of GC that ISE Eiffel is using. Of course,
their details are not published, but the GC is a more or less a background process
that tries to collect incrementally whenever it has the opportunity. If the
user can adjust the sensibility of the GC, in order to tune it with the needs
of the particular application, then it might be a good solution even for Erlang.


I think that there are many things to be done to adapt a GC algorithm to the
specifics of Erlang. Some might need some help from the compiler (for example,
tagging pure functional calls for optimizing local variable collection).

/Vlad
______________________________________________________
Get Your FREE FlashMail Address now at http://www.flashmail.com
It's Free, Easy, & Fun !!!


Post generated using Mail2Forum (http://m2f.sourceforge.net)
dg at informix.com
Posted: Fri Aug 27, 1999 4:00 pm Reply with quote
Guest
On Fri, Aug 27, 1999 at 02:01:47AM +0000, Vlad Dumitrescu wrote:
> Hi!
>
> I will try to look closer at the existing collector... it is an interesting
> area. Difficult, but worth trying, I think. Only I have the time! Smile
>
> >Allocating a cons is done as:
> >
> >uint32 *mem = p->htop;
> >p->htop += 2;

And also:

if (p->htop >= p->maxhtop)
gc(p);
Or however it gets coded.


> >This is fast,
>
> The allocation is fast, but the collection will be slower!
>
> > if the Boehm collector should be used, we
> >need to replace this with:
> >
> >uint32 *mem = (uint32*) malloc(2);
> >
> >which is slow, extremely slow.

Uh no. The Boehm collector allocation is quite fast. Allocation gets inlined
as approximately:

if (freelist[size]) {
/* fast path allocation */
mem = freelist[size];
freelist[size] = *mem;
} else
/* possibly need to gc */
gen_malloc(size);

Which is not that slow.

-dg

--
David Gould dg_at_informix.com 510.628.3783 or 510.305.9468
Informix Software 300 Lakeside Drive Oakland, CA 94612
You will cooperate with Microsoft, for the good of Microsoft
and for your own survival. -- Navindra Umanee



Post generated using Mail2Forum (http://m2f.sourceforge.net)
dg at informix.com
Posted: Fri Aug 27, 1999 6:56 pm Reply with quote
Guest
On Fri, Aug 27, 1999 at 01:15:39PM -0500, James Hague wrote:
> Isn't the goal of the Boehm collector to work with systems that otherwise
> don't "understand" garbage collection on a fundamental level? What's the
> purpose of such a collector in a language where all datatypes are already
> tagged?

Yes, the Boehm collector is a conservative collector intended to work
without type information. However, it can use type information if it is
available, and the lack of type information is not so much a hindrance as
one might suppose. The advantage of the Boehm collector is maturity of
development and wide use.

There is also an open question as to whether a copying collector as apparently
Erlang uses is better or worse than a mark-sweep collector on modern cache
intensive hardware.

For a good discussion of this check out http://reality.sgi.com/boehm/gc.html.

-dg


--
David Gould dg_at_informix.com 510.628.3783 or 510.305.9468
Informix Software 300 Lakeside Drive Oakland, CA 94612
You will cooperate with Microsoft, for the good of Microsoft
and for your own survival. -- Navindra Umanee



Post generated using Mail2Forum (http://m2f.sourceforge.net)
ug-erlang at davidb.org
Posted: Fri Aug 27, 1999 7:33 pm Reply with quote
Guest
>>>>> On Fri, 27 Aug 1999 13:15:39 -0500 (CDT), James Hague <jhague_at_dadgum.com> said:

> Isn't the goal of the Boehm collector to work with systems that otherwise
> don't "understand" garbage collection on a fundamental level? What's the
> purpose of such a collector in a language where all datatypes are already
> tagged?

Correct. Boehm also has the (possibly serious) drawback that it is
conservative. Since pointers and integers are not tagged, integers
that look like pointers will keep items from being collected. I would
guess that Boehm would probably not perform as well as the existing GC
in the Erlang runtime, since it always has to scan blocks for more
pointers. We know out types and only have to descend objects that
might contain pointers.

Try a web search for "real time garbage". There are a lot of papers
about algorithms and optimizations for real time. ACM also has some
nice articles in their digital library.

Dave



Post generated using Mail2Forum (http://m2f.sourceforge.net)
klacke at bluetail.com
Posted: Mon Aug 30, 1999 8:15 am Reply with quote
Guest
David Brown writes:
> I would
> guess that Boehm would probably not perform as well as the existing GC
> in the Erlang runtime, since it always has to scan blocks for more
> pointers. We know out types and only have to descend objects that
> might contain pointers.
>

So would I, but as usual, speculating gets us nowhere. :-(

Another major difference in the runtime would also be
that all the copying of data that takes place in the
current runtime could (possibly) be replaced by copying
pointers instead !!

Today message passing, ets.erl lookups/insertions are done
by *copying* the data between the process heaps.

(The structure copying code is highly optimized
(erts/system/emulator/runtime/copy.c) and as
long as messages are reasonably small, the copying
overhead vanishes among all the other things that's
happening.)


Robert Virding had a couple of different erlang implementations
a couple of years ago (called VEE) that were using a global
heap for all the erlang processes instead, none of his implementations
reached production status though.

All in all, doing a little Boehm hack might be pretty
interesting.

We also need to keep in mind that the typical sort of
applications that Erlang is intended for are supposed to
run without stopping for a very long time[*]. This implies that
GC is good way to go, but it also implies that *all* data
must be eventually collected. A conservative collector may
forget 10 bytes/hour which leads to a very very hard to find bug.
So performance is important, but complete correctness is even more so.

/klacke


[*] I once read that the third most frequent bug in
the Ericsson AXE switch was 32/16 bit counters that
overflowed. There's a lot of counters in a telephone
switch :-)




Post generated using Mail2Forum (http://m2f.sourceforge.net)
dg at informix.com
Posted: Mon Aug 30, 1999 5:07 pm Reply with quote
Guest
On Mon, Aug 30, 1999 at 10:14:53AM +0200, Klacke wrote:
>
> All in all, doing a little Boehm hack might be pretty
> interesting.
>
> We also need to keep in mind that the typical sort of
> applications that Erlang is intended for are supposed to
> run without stopping for a very long time[*]. This implies that
> GC is good way to go, but it also implies that *all* data
> must be eventually collected. A conservative collector may
> forget 10 bytes/hour which leads to a very very hard to find bug.
> So performance is important, but complete correctness is even more so.

The Boehm collector can support type information if it is available. So there
is no need to use the "conservative" feature so this should be a non-issue
for Erlang.

-dg

--
David Gould dg_at_informix.com 510.628.3783 or 510.305.9468
Informix Software 300 Lakeside Drive Oakland, CA 94612
You will cooperate with Microsoft, for the good of Microsoft
and for your own survival. -- Navindra Umanee



Post generated using Mail2Forum (http://m2f.sourceforge.net)
vlad.dumitrescu at flashm
Posted: Tue Aug 31, 1999 6:00 am Reply with quote
Guest
>Did this discussion start because someone was unhappy with the performance
>of the Erlang garbage collector or is it pure (though interesting)
>conjecture about how to speed up the system?
>

I started it, and it is on the "to do" list at erlang.org, so I suppose someone
has experienced slow collections...

After being buried in the code for the weekend Wink I realized the present collector
is rather tightly integrated with the rest of the runtime... which makes it
difficult to change.

The same applies to the specific VM information: right now the data structures
for JAM and BEAM are hard-coded into the other structures... I think the couplings
should be more loose...

/Vlad
______________________________________________________
Get Your FREE FlashMail Address now at http://www.flashmail.com
It's Free, Easy, & Fun !!!


Post generated using Mail2Forum (http://m2f.sourceforge.net)
etxuwig at etxb.ericsson.
Posted: Wed Sep 01, 1999 7:11 am Reply with quote
Guest
On Mon, 30 Aug 1999, Vlad Dumitrescu wrote:

vlad.d>>Did this discussion start because someone was unhappy with
vlad.d>the performance >of the Erlang garbage collector or is it
vlad.d>pure (though interesting) >conjecture about how to speed up
vlad.d>the system? >
vlad.d>
vlad.d>I started it, and it is on the "to do" list at erlang.org, so
vlad.d>I suppose someone has experienced slow collections...

The main problem, as I see it, with the current collector is that it blocks
the entire runtime system when it starts collecting for one process. Thus,
the GC may be per-process, but all other processes have to wait.

Otherwise, the current collector seems pretty fast. Most GC times are very
short, and we have determined that the AXD 301 spends something like 5% of
its CPU resources on GC when going full blast with connection handling.
(This after we have used a few tricks like custom heap size.)

I have seen collections as long as 1 second on a 300 MHz UltraSPARC, but
those are extremely rare and have appeared under very special circumstances
-- with rapidly growing heap and no data that could be released. As I
recall, it also involved a first attempt at generational GC, followed by a
fullsweep GC with a resize of the heap in the middle.

Even such GC times are reasonable as long as they don't make the whole
system unresponsive. This is why I would like to see an effort to make the
collector reentrant.

/Uffe

Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etx.ericsson.se>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuv
dg at informix.com
Posted: Wed Sep 01, 1999 6:48 pm Reply with quote
Guest
On Wed, Sep 01, 1999 at 09:10:51AM +0200, Ulf Wiger wrote:
> On Mon, 30 Aug 1999, Vlad Dumitrescu wrote:
>
> vlad.d>>Did this discussion start because someone was unhappy with
> vlad.d>the performance >of the Erlang garbage collector or is it
> vlad.d>pure (though interesting) >conjecture about how to speed up
> vlad.d>the system? >
> vlad.d>
> vlad.d>I started it, and it is on the "to do" list at erlang.org, so
> vlad.d>I suppose someone has experienced slow collections...
>
> The main problem, as I see it, with the current collector is that it blocks
> the entire runtime system when it starts collecting for one process. Thus,
> the GC may be per-process, but all other processes have to wait.
>
> Otherwise, the current collector seems pretty fast. Most GC times are very
> short, and we have determined that the AXD 301 spends something like 5% of
> its CPU resources on GC when going full blast with connection handling.
> (This after we have used a few tricks like custom heap size.)
>
> I have seen collections as long as 1 second on a 300 MHz UltraSPARC, but
> those are extremely rare and have appeared under very special circumstances
> -- with rapidly growing heap and no data that could be released. As I
> recall, it also involved a first attempt at generational GC, followed by a
> fullsweep GC with a resize of the heap in the middle.
>
> Even such GC times are reasonable as long as they don't make the whole
> system unresponsive. This is why I would like to see an effort to make the
> collector reentrant.
>
> /Uffe
>
> Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etx.ericsson.se>
> Ericsson Telecom AB tfn: +46 8 719 81 95
> Varuv
klacke at bluetail.com
Posted: Thu Sep 02, 1999 9:23 am Reply with quote
Guest
David Gould writes:
> I might be interested in working on this. I am curious to know what has
> prevented this in the past,


It's a bit hard, to make the gc non blocking ggc.c needs to rewritten
into a reentrant collector. This is a bit hard and it also makes
the collector a bit slower unless some real clever tricks are
applied.

Furthermore, when a process has garbed and the garb was
not finished (since it took too long time) and execution
was resumed instead, well then we have a fairly complex
situation. Imagine that we need to send a message to such
a "half-garbed" process, we need to allocate som memory there
for the message. Where do we allocate that memory, hardly
on any of the heaps that are "half-garbed" ??

(Or do we suspend the sender !!!, deadlocks ???)

Thus the situation which lead to this "half-garb", was probably
that the live data was large. Interupting the gc of this
large data set leaves us with a whole lot of memory which
is not released. The original two heaps of the process as
well as the new heap we're garbing to. Three heaps.
If a message arrives at the process, we need to have yet
another memory area associated with the process where we
can put the message, four heaps !!.

Now then, at a later stage when we're garbing
this 4-heap process, we need to make this gc reentrant as well
otherwise there's no point in the exercise at all.


Well, the above sounds just horrible, so some sound strategy must
be applied here, I don't know which though.

And last but not least, the process must be marked (and queued)
for gc as well so that we can finnish off any lingering half-garbed
processes when the system is idling, this is the easy part because
here we can simply put the process on the run queue and mark it
for GC.

/klacke


Claes Wikstr
etxuwig at etxb.ericsson.
Posted: Thu Sep 02, 1999 9:56 am Reply with quote
Guest
On Thu, 2 Sep 1999, Klacke wrote:

klacke>Furthermore, when a process has garbed and the garb was
klacke>not finished (since it took too long time) and execution
klacke>was resumed instead, well then we have a fairly complex
klacke>situation. Imagine that we need to send a message to such
klacke>a "half-garbed" process, we need to allocate som memory there
klacke>for the message. Where do we allocate that memory, hardly
klacke>on any of the heaps that are "half-garbed" ??
klacke>
klacke>(Or do we suspend the sender !!!, deadlocks ???)

Why would this cause deadlocks? The runtime system already uses flow
control when a process sends a message to a port, right?
If a process tries to send a message to another process which is
half-garbed, it is suspended; if no other processes can run, the
half-garbed process is allowed to continue. Since it doesn't depend on the
process waiting to send a message, there won't be any deadlock.
Or am I missing something?

klacke>Thus the situation which lead to this "half-garb", was probably
klacke>that the live data was large. Interupting the gc of this
klacke>large data set leaves us with a whole lot of memory which
klacke>is not released. The original two heaps of the process as
klacke>well as the new heap we're garbing to. Three heaps.
klacke>If a message arrives at the process, we need to have yet
klacke>another memory area associated with the process where we
klacke>can put the message, four heaps !!.

Yes.
If we suspend senders -- three heaps.
This is a tradeoff between memory useage and realtime behaviour.

The main point of the exercise is that designers will know fairly well
which parts of a program may cause a costly gc, and can structure their
processes accordingly (processes which have to be responsive can offload
heavy tasks to other processes.) There is no point in doing that today,
because the heavy gc will destroy real-time characteristics even if it
happens in a background process! What you can do today is rewrite your
Erlang code so that it performs the same job differently -- perhaps by
forcing small garbage collections along the way. This is "unnecessary" work
(since it's a workaround for shortcomings in the runtime system), and makes
for slower code at the Erlang level.

klacke>Now then, at a later stage when we're garbing
klacke>this 4-heap process, we need to make this gc reentrant as well
klacke>otherwise there's no point in the exercise at all.

I don't think I follow you.
It probably isn't a good idea to accept more data as we're garbing, because
we can't be sure that the gc will ever terminate. Therefore, we should
suspend senders during the gc, and if we do, the gc will behave as today,
except that it may yield at certain stages.

One aspect to consider is that the vast majority of collections which are
cheap should not become significantly more expensive because we want to
make the heavy ones reentrant. I don't know how important it is, or how to
go about it. Perhaps a few clever checks (e.g. heap size) could reveal
whether the gc will be a candidate for reentrant gc...

/Uffe

Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etx.ericsson.se>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuv

Display posts from previous:  

All times are GMT
Page 1 of 2
Goto page 1, 2  Next
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum