Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Cost of Copy

Guest
Posted: Thu May 27, 2010 2:49 pm Reply with quote
Guest
I am still honestly marveling about this. In case anyone finds time to
answer.

Especially on the theory that more deeply nested structures will be
faster to mutate. As fewer internal 'sibling'-pointers will have to be
copied.

http://www.trapexit.org/forum/viewtopic.php?p=55001

Thanks,
Henning


Post received from mailinglist
Guest
Posted: Thu May 27, 2010 6:28 pm Reply with quote
Guest
If you ignore message passing, then the copying/sharing issues in Erlang are
just about the same as in Lisp and Scheme, so you may find some information
that way.

The short version is that some values are represented directly in a memory
"cell," like atoms and (most) integers. Others are pointers to data stored
elsewhere (list elements, or conses, which are two consecutive cells in
memory) or tuples (N consecutive cells plus one for the size). Just by
looking at code you can see where conses and tuples are created. If you use
an existing "pointer" to data, then that data is not copied.

Examples:

B = {A, A, A}
This creates a three element tuple (four memory cells). It doesn't matter if
A is a list of a million elements or a tuple or the number 2. B just
contains three cell-sized values (plus the size of the tuple).

X = [57|List]
This creates a single cons (2 cells). The first element is 57. The second is
a pointer to List. So even though you've got two lists (X and List), all of
the data is shared except for the first cons of X.


Post received from mailinglist
Guest
Posted: Thu May 27, 2010 8:11 pm Reply with quote
Guest
Thanks a lot James!

am I right then to conclude that mutating something in a list will be
the faster the more the data is spread out deeper over several levels?

A = { B, C, D, E, F, G, H }

will be slower to change than, say

A = { { B, C }, { D, E }, {F, G, H } }

because if I change e.g. B, only { B', C } will be created as new
internal list of pointers, as opposed to the whole of { B', C, D, E, F,
G, H }. Which should for larger lists and tuples, and frequent changes,
start to make a difference?

So this would be the fastes option?

A = { { B }, { C }, { D }, { E }, { F }, { G }, { H } }

And if the tuple may be expected to change in size (elements not only
changeing but being added/deleted), this:

A = { { { { B }, { C } }, { { D }, { E } } }, { { { F }, { G } }, { { H
} } } }

Maybe I am wrong that there may be a difference between tuples and
lists, but is it the same for

A = [ B, C, D, E, F, G, H ]

slower to change than

A = [ { B, C }, { D, E }, { F, G, H } ]

? Or are list-elements changed faster, in general, virtually 'in-place'
as only one pointer is replaced?

Hardly I guess, because the list itself must not be changed? So the
whole new list must be copied anyway, just like a tuple.

Thanks for your take,
Henning


James Hague wrote:
> If you ignore message passing, then the copying/sharing issues in Erlang are
> just about the same as in Lisp and Scheme, so you may find some information
> that way.
>
> The short version is that some values are represented directly in a memory
> "cell," like atoms and (most) integers. Others are pointers to data stored
> elsewhere (list elements, or conses, which are two consecutive cells in
> memory) or tuples (N consecutive cells plus one for the size). Just by
> looking at code you can see where conses and tuples are created. If you use
> an existing "pointer" to data, then that data is not copied.
>
> Examples:
>
> B = {A, A, A}
> This creates a three element tuple (four memory cells). It doesn't matter if
> A is a list of a million elements or a tuple or the number 2. B just
> contains three cell-sized values (plus the size of the tuple).
>
> X = [57|List]
> This creates a single cons (2 cells). The first element is 57. The second is
> a pointer to List. So even though you've got two lists (X and List), all of
> the data is shared except for the first cons of X.
>
>



Post received from mailinglist
Guest
Posted: Fri May 28, 2010 7:56 am Reply with quote
Guest
On Thu, May 27, 2010 at 9:09 PM, Henning Diedrich <hd2010@eonblast.com>wrote:

> Thanks a lot James!
>
> am I right then to conclude that mutating something in a list will be the
> faster the more the data is spread out deeper over several levels?
>
>
>
You can't mutate! So your examples don't really make sense.

You can only build new structures. You cannot change them once their built.
For mutation you need to look at ets, dets, mnesia, processes and messages.

Robby


Post received from mailinglist
jmontelius
Posted: Fri May 28, 2010 8:18 am Reply with quote
User Joined: 07 Mar 2009 Posts: 29
On Thu, 27 May 2010 22:09:21 +0200, Henning Diedrich <hd2010@eonblast.com>
wrote:

> A = { { B, C }, { D, E }, {F, G, H } }
> because if I change e.g. B, only { B', C } will be created as new
> internal list of pointers, as opposed to the whole of { B', C, D, E, F,
> G, H }. Which should for larger lists and tuples, and frequent changes,
> start to make a difference?

Not exactly. Lets say we have A as above and want to construct a new tuple

X = {{B', C}, {D, E}, {F, G, H }}

What we then have to do is construct a new tuple with three elements

X = { _ , _ , _}

We then have to construct a new tuple for {B', C}

X = {{_, _}, _, _}

and then we of course have to construct B'

X = {{B',_}, _,_}

so we have constructed two tuples, one of size tree and one of size two
(plus B'), all other structurs C, {D,E} and {F,G,H} are shared.

If we instead have

A = {B, C, D, E, F, G, H }

and want to construct

X = {B', C, D, E, F, G, H}

we construct one new tuple

X = {_, _, _ , _, _, _, _}

and then B'

X = {B', _, _ , _, _, _, _}

so now we have only constructed one tuple of size seven. All other
structures C,D,E,F,G,H are shared.

The difference is in this case marginal (and I don't know which one
executes faster). If we look at larger structures the difference become
apparent. Assume you have a N elements in a set and want to update these
element one by one. One way is to represent this a tuple of size N. Each
update operation would then construct a new tuple of size N. All elements
in the tuple would be shared but the new tuple has one element that
differs.

If we instead represent this as a list of length N we would have to
construct a new list to the point where we have located the element that
we want to change. Assume we want to change E:

A = [B | [C | [D | [E | [F | [G | [H| []]]]]]]]

X = [_ | [_ | [_ | [E' | _]]]]

Depending in how far down E is found we have to construct new list cells
for all element before E. In the example above this is four list cells
[_|_].

In average thins means N/2 new list cells for each update. This is for
large N more efficient than having to construct a new tuple of size N.

A even better way, if N is large, is to construct a tree.

A = {tree,
{tree,
{tree, {leaf, B}, {leaf, C}},
{tree, {leaf, D}, {leaf, E}}}
{tree,
{tree, {leaf, F}, {leaf, G}},
{leaf, H}}}

Now if we want to change E then we have


A = {tree,
{tree,
_,
{tree, _, {leaf, E'}}}
_}

So we construct new tree tuples all the way down to the location of E. All
other structures are shared. In general this means that we will construct
lg(N) new tree tuples.

So a tuple of size N, N/2 new list cells or lg(N) new tree tuples.

The data structures of course have other implications when it comes to
reading and adding elements.


Johan







--
Associate Professor Johan Montelius
Royal Institute of Technology - KTH
School of Information and Communication Technology - ICT

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
View user's profile Send private message
Guest
Posted: Fri May 28, 2010 10:48 am Reply with quote
Guest
On Fri, May 28, 2010 at 6:15 PM, Johan Montelius <johanmon@kth.se> wrote:
> If we instead have
>
> A = {B, C, D, E, F, G, H }
>
> and want to construct
>
> X = {B', C, D, E, F, G, H}
>
> we construct one new tuple
>
> X = {_, _, _ , _, _, _, _}
>
> and then B'
>
> X = {B', _, _ , _, _, _, _}
>
> so now we have only constructed one tuple of size seven. All other
> structures C,D,E,F,G,H are shared.

but...

IIRC, if A is not used in the code after X is bound, the erlang
compiler can optimise this to an in-place operation where the storage
for A is reused for X. So only the B element pointer is changed to
point to B' and no allocation is required. Is this not one of the
reasons for record syntax?

eg.

A = #myRecord{B,C,D,E,F,G,H},
X = A#myRecord{B2},

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
jmontelius
Posted: Fri May 28, 2010 1:11 pm Reply with quote
User Joined: 07 Mar 2009 Posts: 29
On Fri, 28 May 2010 12:46:50 +0200, Richard Andrews
<bflatmaj7th@gmail.com> wrote:

> IIRC, if A is not used in the code after X is bound, the erlang
> compiler can optimise this to an in-place operation where the storage
> for A is reused for X. So only the B element pointer is changed to
> point to B' and no allocation is required. Is this not one of the
> reasons for record syntax?

I would be happily surprised if that is the case. Determining if A is the
last reference to the structure involves either a lot of program analysis
or run-time reference counting. The record syntax solves a problem of
readability, maintainability etc of src code.

To see why the compiler will have a hard time identifying when it is safe
to do destructive updates look at the following:


foo(X) ->
case X of
{A, B, C} ->
B1 = mod(B),
Y = {A, mod(B), C}
zot(Y)
end.


If X is the last reference to the structure {A, B, C} we can replace B
with B1 and just hand it to zot/1. The problem is of course that we don't
know if X is used somewhere else. If this was the call to foo/1


:
foo(X),
bar(X),
:

then we would make a mistake in destroying the structure. To figure out
at compile time when it is safe to reclaim structures is tricky.

Johan





--
Associate Professor Johan Montelius
Royal Institute of Technology - KTH
School of Information and Communication Technology - ICT

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
View user's profile Send private message
rvirding
Posted: Fri May 28, 2010 4:03 pm Reply with quote
User Joined: 30 Aug 2006 Posts: 452 Location: Stockholm, Sweden
As others have pointed out data in Erlang is defined to immutable.
This implies that if you wish to do destructive operations then you
have to be sure they are not seen anywhere. In general this can be
quite difficult as it is often not trivial to see if a data structure,
or a portion of it, is shared. The compiler is very conservative about
this and IIRC only does it when there is a sequence of setelement/3
where the temporaries are not exported and nothing is done between
them. You typically get this when you do a single multiple record
update, for example

A1 = A0#foo{bar=57,baz="Robert",zip=now()}

It is better if you can combine record updates like this instead of
doing them one-by-one.

I think it is easiest to view a complex data structure as a tree with
generally different sized nodes. If you modify one of the nodes then
the only other nodes you will have to rewrite are those nodes between
the root of the tree and the modified node. Unless of course you are
doing something special along the way. Or you have botched your code.
Smile

What sized nodes depends entirely upon what you are doing, the only
general hint I can give is use that which fits best into your
application. I mean it, really. Don't go complicating matters by
picking weirdo data structures. So if you want a dynamic sequence then
a list is usually the best, though not always. If you have 7 things
which go together then put them in one 7 element tuple or in a record.
I would only really start worrying about splitting my tuples if they
get really big. Or if you feel that grouping them is what feels best.

The only time I would seriously start thinking about what size thing
to use is if you are writing more general library type packages.
Generally having smaller nodes but a deeper resulting tree will lead
to less rewriting, but as the tree is deeper it will cost more to
traverse it and though the rewrites will be smaller there will be more
of them. The opposite for wider flatter trees. It also depends on how
you select which subnode to go to, this is, of course, very algorithm
dependent. So in the array module each node contains 10 slots, IIRC,
but here it is easy to select the right one using its index. Dicts
also use indices and are wider. For more general search trees it
becomes more complex and binary trees are popular, but I would say
that it is more the algorithms which decide this.

A list is really a degenerate binary tree.

It all depends. This is clearly optimization and I would read
http://en.wikipedia.org/wiki/Program_optimization "When to optimize"
before getting to deep into this.

Robert

On 27 May 2010 22:09, Henning Diedrich <hd2010@eonblast.com> wrote:
> Thanks a lot James!
>
> am I right then to conclude that mutating something in a list will be the
> faster the more the data is spread out deeper over several levels?
>
> A = { B, C, D, E, F, G, H }
>
> will be slower to change than, say
>
> A = { { B, C }, { D, E },
View user's profile Send private message Visit poster's website MSN Messenger
wuji
Posted: Fri Sep 07, 2012 7:58 am Reply with quote
User Joined: 10 Aug 2012 Posts: 654
time with."Guller has no plans to relocate closer to Bikinis Bikinis [h3]cheap polo shirts[/h3] Bikinis from his home in Austin, but said that he
the town being open to tourists from March to August.He August.He cheap authentic air jordans August.He is planning the first Bikinis, Texas event in March
Muslims to Welcome Ramadan Tonight, ProbablyCrescent Moon Sighting Will Mark Mark cheap polo ralph lauren Mark Beginning of Holy MonthBy ALON HARISHJuly 19, 2012— At
tonight, Muslims the world over will welcome Ramadan, the holiest holiest [h3]cheap Ralph Lauren[/h3] holiest month in the Islamic calendar.The ninth month of Islam's
calendar, Ramadan is believed to be the month in which which cheap designer *beep* which the Quran was revealed. During Ramadan, which begins with
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum