Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Performance of length and spawn

crd at inversenet.com
Posted: Fri Jan 22, 1999 2:15 pm Reply with quote
Guest
A couple of questions:

Is the performance of length/1 O(1) or O(n)? (I.e. is a count of the number
of items in the list stored, or is the length determined by walking the list
every time?)

When spawning a new process, is it necessary to duplicate in memory every
argument passed to it?

The specific case is that I'm writing a server that spawns new processes to
handle most of the requests sent to it, in the hopes of maximizing the
server's availability to accept requests. If a request basically amounts to
"How long is a certain list?", then I can see a few possibilities:

(1) If length/1 is O(n) and lists don't have to be copied for spawn/3: the
list might be quite long, so perhaps it's worthwhile to spawn a process to
get the length and send it back to the caller.

(2) If length/1 is O(1): it probably is fast enough that it isn't worth the
trouble to use spawn/3 even if copying would not be necessary.

(3) If arguments have to be copied for spawn/3: since copying a list
probably takes longer than counting its elements, spawn is probably taking
more of the server's time than just calling length/1 directly.

Another question: Is there any intention to add features to Erlang's module
system, such as the ability to make certain modules private to others? In
practice, I suppose just not documenting certain modules (or documenting
them as not for public use) is enough to keep people out of them, but I'd
like for the language itself to understand that some modules are for
internal use only within a definable group of modules. I seem to recall that
Java's "package" concept supplies such a capability (then again, I haven't
done any Java in a couple of years).

Thanks!

Craig




Post generated using Mail2Forum (http://m2f.sourceforge.net)
ulf.wiger at etxb.ericsso
Posted: Fri Jan 22, 1999 2:49 pm Reply with quote
Guest
Craig Dickson wrote:
>
> A couple of questions:
>
> Is the performance of length/1 O(1) or O(n)? (I.e. is a count of the number
> of items in the list stored, or is the length determined by walking the list
> every time?)

O(n).

A colleague of mine performed some measurements a while back.
This is what he found on a particular version of the runtime system:

CPUtime length(List)
22 1
25 3
33 5
36 7

>
> When spawning a new process, is it necessary to duplicate in memory every
> argument passed to it?

I would think yes, but the operation is fast since it performs some kind
of block copy.



> The specific case is that I'm writing a server that spawns new processes to
> handle most of the requests sent to it, in the hopes of maximizing the
> server's availability to accept requests. If a request basically amounts to
> "How long is a certain list?", then I can see a few possibilities:
>
> (1) If length/1 is O(n) and lists don't have to be copied for spawn/3: the
> list might be quite long, so perhaps it's worthwhile to spawn a process to
> get the length and send it back to the caller.
>
> (2) If length/1 is O(1): it probably is fast enough that it isn't worth the
> trouble to use spawn/3 even if copying would not be necessary.

A spawn operation without copying is fast, but spawn does copy the
arguments. As it turns out, it's cheaper to find out the length of the
list, than to spawn a process to do it.

-module(test).
% to find out how much it costs to spawn a thread

-export([incr/1]).
-export([incr1/2]).

incr(A) ->
Pid = spawn(?MODULE,incr1,[self(),A]),
receive
{Pid,B} -> B
end.

incr1(Pid,A) ->
Pid ! {self(),A+1}.


Eshell V4.7.3.3 (abort with ^G)
1> c(test).
{ok,test}
2> timer:tc(test,incr,[17]).
{60,18}
3> timer:tc(test,incr,[17]).
{42,18}
4> timer:tc(test,incr,[17]).
{44,18}
5> timer:tc(test,incr,[17]).
{42,18}

The time is given in microseconds. Thus, it takes about 45 us on my
machine to spawn a process and wait for its reply (I know, it's not
exactly what you were talking about.)


> (3) If arguments have to be copied for spawn/3: since copying a list
> probably takes longer than counting its elements, spawn is probably taking
> more of the server's time than just calling length/1 directly.


8> L = lists:seq(1,10000).
[...]
9> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2720,<0.47.0>}
10> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2532,<0.49.0>}
11> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2600,<0.51.0>}
12> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2471,<0.53.0>}
13> timer:tc(erlang,spawn,[erlang,length,[L]]).
{2634,<0.55.0>}
14> timer:tc(erlang,length,[L]).
{677,10000}
15> timer:tc(erlang,length,[L]).
{571,10000}
16> timer:tc(erlang,length,[L]).
{639,10000}

As you can see, it's much cheaper to call length(L) (ca 600 us) than to
spawn a process which handles length(L) (2.5 ms), when operating on a
list of 10,000 integers.

Here's a wild thought:

If your input comes from outside Erlang, you may receive it as a binary
(if it's text data). It is very cheap (O(1)) to find out the size of a
binary. It is also cheap to pass it to another process, since the data
is not copied.

19> L = lists:duplicate(10000,79).
[...]
20> timer:tc(erlang,list_to_binary,[L]).
{1714,#Bin}
21> timer:tc(erlang,list_to_binary,[L]).
{1663,#Bin}
22> timer:tc(erlang,list_to_binary,[L]).
{1718,#Bin}
23> timer:tc(erlang,list_to_binary,[L]).
{1671,#Bin}

24> B = element(2,v(23)).
#Bin
25> timer:tc(erlang,size,[B]).
{24,10000}
26> timer:tc(erlang,size,[B]).
{21,10000}
27> timer:tc(erlang,size,[B]).
{21,10000}
28> timer:tc(erlang,size,[B]).
{21,10000}

29> timer:tc(erlang,spawn,[erlang,size,[B]]).
{43,<0.72.0>}
30> timer:tc(erlang,spawn,[erlang,size,[B]]).
{40,<0.74.0>}
31> timer:tc(erlang,spawn,[erlang,size,[B]]).
{38,<0.76.0>}
32> timer:tc(erlang,spawn,[erlang,size,[B]]).
{40,<0.78.0>}

That is, if you don't have to convert it to a binary, finding out the
number of bytes is fast.

>
> Another question: Is there any intention to add features to Erlang's module
> system, such as the ability to make certain modules private to others? In
> practice, I suppose just not documenting certain modules (or documenting
> them as not for public use) is enough to keep people out of them, but I'd
> like for the language itself to understand that some modules are for
> internal use only within a definable group of modules. I seem to recall that
> Java's "package" concept supplies such a capability (then again, I haven't
> done any Java in a couple of years).

There is work being done. "Safe Erlang" comes to mind.
http://www.erlang.se/onlinenews/archive/Erlang-OTP-news/sep98/a2.shtml

/Uffe
--
Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etxb.ericsson.se>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuv
joe at erix.ericsson.se
Posted: Tue Jan 26, 1999 7:53 am Reply with quote
Guest
On Fri, 22 Jan 1999, Craig Dickson wrote:

> A couple of questions:
>
> Is the performance of length/1 O(1) or O(n)? (I.e. is a count of the number
> of items in the list stored, or is the length determined by walking the list
> every time?)
>
O(n)

> When spawning a new process, is it necessary to duplicate in memory every
> argument passed to it?

All arguments are copied.
>
> The specific case is that I'm writing a server that spawns new processes to
> handle most of the requests sent to it, in the hopes of maximizing the
> server's availability to accept requests. If a request basically amounts to
> "How long is a certain list?", then I can see a few possibilities:
>
> (1) If length/1 is O(n) and lists don't have to be copied for spawn/3: the
> list might be quite long, so perhaps it's worthwhile to spawn a process to
> get the length and send it back to the caller.
>
> (2) If length/1 is O(1): it probably is fast enough that it isn't worth the
> trouble to use spawn/3 even if copying would not be necessary.
>
> (3) If arguments have to be copied for spawn/3: since copying a list
> probably takes longer than counting its elements, spawn is probably taking
> more of the server's time than just calling length/1 directly.
>

If the lists are short then I wouldn't bother with any optimisations.
I can't define short - but I wouldn't be looking at any optimisations if
I was thinking in terms of say less than 5,000 elements.

If the lists get big then I'd use a data structure called a
binary. Binaries are untyped chunks of memory. They are not copied
between processes on the same node (they are reference counted). For
messages:

The bif size(Binary) is O(1)

If my data structure were say 100,000 element then I'd certainly
use binaries. Somewhere between 5,000 and 100,000 elements I'd change
representation from lists to binaries.

(Hopefully I'd have written the program in such a way that the
choice of representation would now impact the code much).

I don't know what your problem is but usually a major source of
inefficiency is getting data from the outside world into Erlang, here
you should almost always use binaries - the break even point comes
with very short lists (i.e. even with a few hundred elements you get a
significant gain using binaries instead of lists).

<aside>
There is only one way to write code. Make it as beautiful as
possible, then measure the performance. If not ok optimise. I very
rarely optimise code [It has happened].
</aside>


> Another question: Is there any intention to add
> features to Erlang's module system, such as the ability to make
> certain modules private to others? In practice, I suppose just not
> documenting certain modules (or documenting them as not for public
> use) is enough to keep people out of them, but I'd like for the
> language itself to understand that some modules are for internal use
> only within a definable group of modules.

Not yet planned. Sorry.

I usually make one main module with all the public exports and a
lot of sub modules. This is the convention that is used in the system.

Example: The main module snmp.erl offers the public interface to
the snmp package. All internal modules are given names like snmp_XXX.erl

Hope that helps

/Joe

--
Joe Armstrong Computer Science Laboratory +46 8 719 9452
AT2/ETX/DN/SU Ericsson Telecom SE-126 25 Stockholm Sweden
joe_at_cslab.ericsson.se http://www.ericsson.se/cslab/~joe



Post generated using Mail2Forum (http://m2f.sourceforge.net)

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum