Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  Erlang shows its slow face!

Guest
Posted: Sat Nov 13, 2010 7:43 am Reply with quote
Guest
Hi all!
I have been doing tests to Erlang I found this funny stuff that makes
Pythagorean Triplets

pythag(N) ->
[ {A,B,C} ||
A <- lists:seq(1,N),
B <- lists:seq(1,N),
C <- lists:seq(1,N),
A+B+C =< N,
A*A+B*B =:= C*C].

I tested it agains an implementation I made in C# so, and takes 14
secounds in my pc to do with 300 numbers in Erlang however in c# is just
a secound, even when C# runs under VM too.
So I did all possible ways for me to implement differents manners in
Erlang looking for speed and all is the same, listed as follows:

So my question is, there are any way to do that even more fast, why 3
nestes fors structs in C# are more effients that lists:foldr or
lists:foreach in Erlang.

%% FORMA 1
py1(Max)->
L = lists:seq(1, Max),
lists:foldr(
fun(A, Acc3)->
lists:foldr(
fun(B, Acc2)->
lists:foldr(
fun(C, Acc)->
case ((A*A + B*B =:= C*C) andalso (A+B+C =<
Max)) of
true->
[{A,B,C}|Acc];
false->
Acc
end
end
, Acc2, L)
end
, Acc3, L)
end
, [], L).



%% FORMA 2
py2(Max)->
fora(1, [], Max).

fora(A, Acc, Max)->
Acc1 = forb(A,1, Acc, Max),
case A < Max of
true->
fora(A+1, Acc1, Max);
false->
Acc1
end.

forb(A,B, Acc, Max)->
Acc1 = forc(A,B,1, Acc, Max),
case B < Max of
true->
forb(A,B+1, Acc1, Max);
false->
Acc1
end.

forc(A,B,C, Acc, Max)->
Acc1 = case (A*A + B*B =:= C*C) andalso (A+B+C =< Max) of
true->
[{A,B,C}|Acc];
_->
Acc
end,
case C < Max of
true->
forc(A,B,C+1, Acc1, Max);
false->
Acc1
end.

%% FORMA 3.
py3(Max)->
[{A,B,C} ||
A <-lists:seq(1, Max),
B <-lists:seq(1, Max),
C <-lists:seq(1, Max),
A*A + B*B =:= C*C,
A+B+C =< Max].


=======================================================================
Este mensaje ha sido enviado mediante el servicio de correo electronico que ofrece la Federacion de Radioaficionados de Cuba a sus miembros para respaldar el cumplimiento de los objetivos de la organizacion y su politica informativa. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas.



________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
als
Posted: Sat Nov 13, 2010 8:52 am Reply with quote
User Joined: 29 Mar 2007 Posts: 51
On Sat, 13 Nov 2010 06:37:08 pm Gilberto Carmenate Garc
View user's profile Send private message
Spectra
Posted: Sat Nov 13, 2010 9:05 am Reply with quote
User Joined: 08 Dec 2008 Posts: 56 Location: Australia
On 13/11/10 18:37, Gilberto Carmenate Garc
View user's profile Send private message
Guest
Posted: Sat Nov 13, 2010 2:38 pm Reply with quote
Guest
Hello Gilberto,

Just an observation... When one approaches a language, programming =

environment or development tool, it's important to establish it's =

promises. Then judge it on how well it delivers on those promises. One =

shouldn't be surprised when it fails to deliver on something it has not =
=

promised. One should be suspicious if it promises everything.

For Erlang's case, I think the official FAQ sums it up pretty well...

"...The most common class of 'less suitable' problems is characterised b=
y =

performance being a prime requirement and constant-factors having a larg=
e =

effect on performance..."

http://www.erlang.org/faq/introduction.html#id50247
http://www.erlang.org/faq/introduction.html#id49850

It would be odd if you found Erlang was slow at creating processes, or =

sending messages between processes, or the emulator kept crashing. That =
=

would indicate that it's failing to deliver on what it has promised.

Numerical algorithms? I don't think even C# makes any promises there. Yo=
u =

should probably be looking for a math library like AMD's ACML with =

gcc/gfortran...

http://developer.amd.com/cpu/Libraries/acml/Pages/default.aspx

I'm no mathematician, but I experimented with ACML once when I was toyin=
g =

around with programming audio processor plugins for music production =

software. It made a world of difference (even though my plugins sucked).=


If for some reason want to do high performance math from Erlang, you cou=
ld =

look into writing a port, a linked-in driver or a NIF with a more =

appropriate set of tools then call that from Erlang.

- Edmond -


On Sat, 13 Nov 2010 18:37:08 +1100, Gilberto Carmenate Garc=EDa =

<co7eb@frcuba.co.cu> wrote:

> Hi all!
> I have been doing tests to Erlang I found this funny stuff that makes
> Pythagorean Triplets
>
> pythag(N) ->
> [ {A,B,C} ||
> A <- lists:seq(1,N),
> B <- lists:seq(1,N),
> C <- lists:seq(1,N),
> A+B+C =3D< N,
> A*A+B*B =3D:=3D C*C].
>
> I tested it agains an implementation I made in C# so, and takes 14
> secounds in my pc to do with 300 numbers in Erlang however in c# is ju=
st
> a secound, even when C# runs under VM too.
> So I did all possible ways for me to implement differents manners in
> Erlang looking for speed and all is the same, listed as follows:
>
> So my question is, there are any way to do that even more fast, why 3
> nestes fors structs in C# are more effients that lists:foldr or
> lists:foreach in Erlang.
>
> %% FORMA 1
> py1(Max)->
> L =3D lists:seq(1, Max),
> lists:foldr(
> fun(A, Acc3)->
> lists:foldr(
> fun(B, Acc2)->
> lists:foldr(
> fun(C, Acc)->
> case ((A*A + B*B =3D:=3D C*C) andalso (A+B=
+C =3D<
> Max)) of
> true->
> [{A,B,C}|Acc];
> false->
> Acc
> end
> end
> , Acc2, L)
> end
> , Acc3, L)
> end
> , [], L).
>
>
>
> %% FORMA 2
> py2(Max)->
> fora(1, [], Max).
>
> fora(A, Acc, Max)->
> Acc1 =3D forb(A,1, Acc, Max),
> case A < Max of
> true->
> fora(A+1, Acc1, Max);
> false->
> Acc1
> end.
>
> forb(A,B, Acc, Max)->
> Acc1 =3D forc(A,B,1, Acc, Max),
> case B < Max of
> true->
> forb(A,B+1, Acc1, Max);
> false->
> Acc1
> end.
>
> forc(A,B,C, Acc, Max)->
> Acc1 =3D case (A*A + B*B =3D:=3D C*C) andalso (A+B+C =3D< Max) of
> true->
> [{A,B,C}|Acc];
> _->
> Acc
> end,
> case C < Max of
> true->
> forc(A,B,C+1, Acc1, Max);
> false->
> Acc1
> end.
>
> %% FORMA 3.
> py3(Max)->
> [{A,B,C} ||
> A <-lists:seq(1, Max),
> B <-lists:seq(1, Max),
> C <-lists:seq(1, Max),
> A*A + B*B =3D:=3D C*C,
> A+B+C =3D< Max].
>
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> Este mensaje ha sido enviado mediante el servicio de correo electronic=
o =

> que ofrece la Federacion de Radioaficionados de Cuba a sus miembros pa=
ra =

> respaldar el cumplimiento de los objetivos de la organizacion y su =

> politica informativa. La persona que envia este correo asume el =

> compromiso de usar el servicio a tales fines y cumplir con las =

> regulaciones establecidas.
>
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org
>


-- =

Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
tonyrog
Posted: Sat Nov 13, 2010 5:02 pm Reply with quote
User Joined: 12 Sep 2009 Posts: 43
Hi Gilberto!

I did some rewrite of the FORM 2, that I think is more compatible with the condition that A+B+C =< N,
and the result is of course much better. How did you implement the C# version ?

My results for N = 300, times are measured with timer:tc.

pythag0: 4837528 (4.8s)
pythag1: 4289769 (4.3s)
pythag2: 703632 (0.7s)

And the result with +native

pythag0: 1006865 (1.0s)
pythag1: 441254 (0.4s)
pythag2: 107387 (0.1s)

/Tony

My versions (original + the version that only generates the list once)

-module(pythag).

-compile(export_all).

pythag0(N) ->
[ {A,B,C} ||
A <- lists:seq(1,N),
B <- lists:seq(1,N),
C <- lists:seq(1,N),
A+B+C =< N,
A*A+B*B =:= C*C].

pythag1(N) ->
L = lists:seq(1,N),
[ {A,B,C} ||
A <- L,
B <- L,
C <- L,
A+B+C =< N,
A*A+B*B =:= C*C].


pythag2(N) ->
lists:reverse(pythan2_A(1, N, [])).

pythan2_A(A, N, Acc) when A > N -> Acc;
pythan2_A(A, N, Acc) -> pythan2_A(A+1,N,pythan2_B(A, 1, N, Acc)).

pythan2_B(A, B, N, Acc) when A+B > N -> Acc;
pythan2_B(A, B, N, Acc) -> pythan2_B(A,B+1,N,pythan2_C(A, B, 1, N, Acc)).

pythan2_C(A, B, C, N, Acc) when A+B+C > N -> Acc;
pythan2_C(A, B, C, N, Acc) ->
if A*A+B*B =:= C*C ->
pythan2_C(A, B, C+1, N, [{A,B,C}|Acc]);
true ->
pythan2_C(A, B, C+1, N, Acc)
end.





On 13 nov 2010, at 08.37, Gilberto Carmenate Garc
View user's profile Send private message
Guest
Posted: Sat Nov 13, 2010 7:32 pm Reply with quote
Guest
On Sat, Nov 13, 2010 at 4:59 PM, Tony Rogvall <tony@rogvall.se> wrote:
> Hi Gilberto!
>
> I did some rewrite of the FORM 2, that I think is more compatible with the condition that A+B+C =< N,
> and the result is of course much better. How did you implement the C# version ?
>
> My results for N = 300, times are measured with timer:tc.
>
> pythag0:
Guest
Posted: Sun Nov 14, 2010 12:35 am Reply with quote
Guest
PARALLELISE, PARALLELISE, PARALLELISE!!!

I took your second algo and Tony's improved version (both pretty much =

unchanged) and parallelised them below with little effort. Permutations =
=

for B and C are calculated on a separate concurrent* process for every A=
=

then the result is combined (e.g when N =3D 300, 300 concurrent worker =

processes are spawned.)

I consistently obtained a 40-45% improvement in execution time regardles=
s =

of algo. I also noticed that non-parallel versions were using only ONE o=
f =

my cores whereas parallel versions were using BOTH my cores. Here are th=
e =

results for N =3D 300 (debug_info BEAM complied)...

Sequential Concurrent
Your original 8.44s N/A
Your py2 9.86s 5.51s
Tony's pythag2 1.95s 1.17s**

* This is just to illustrate that with any language, one should focus on=
=

it's promises/strengths. Sometimes this can even de-emphasise it's =

weaknesses. Though I still think it's best to use a tool that makes =

pledges towards the problem area in the first place.

** This is probably roughly as fast the C# implementation but that's =

besides the point!

I agree with Bernard. IMO, you're barking up the wrong tree for doing th=
is =

sort of thing quickly using either Erlang or C#. When you look closer, t=
he =

bottleneck with all the solutions so far isn't the calculation itself (I=
=

was wrong about that earlier) -- it's actually the permutation (the part=
=

done with accumulators/generators). Possibly a language with pointers =

would be better at doing this fast since you'd be able to allocate a =

fixed-size destructive memory structure to walk through and modify (that=
's =

just an educated guess). I'd put my money on the horse named C NIF.

Anyways, here's the parallelised code...

NB: lpmap is a library function best stuck in a separate utility module.=
=

It's my modified + edocumented version of Joe Armstrong's pmap from his =
=

book "Programming Erlang: Software for a Concurrent World" Section 20.2 =
=

"Parallelizing Sequential Code" (go buy it!) The original source is at =

http://www.pragprog.com/titles/jaerlang/source_code/

---- START CODE ----

py2(Max)->
lists:flatten(lpmap(fun(A) ->
forb(A, 1, [], Max)
end, lists:seq(1, Max), ordered)).

forb(A, B, Acc, Max) ->
Acc1 =3D forc(A, B, 1, Acc, Max),
case B < Max of
true -> forb(A, B+1, Acc1, Max);
false -> Acc1
end.

forc(A, B, C, Acc, Max) ->
Acc1 =3D case (A*A + B*B =3D:=3D C*C) andalso (A+B+C =3D< Max) of
true -> [{A,B,C}|Acc];
_ -> Acc
end,
case C < Max of
true-> forc(A, B, C+1, Acc1, Max);
false-> Acc1
end.


pythag2(N)->
lists:flatten(lpmap(fun(A) ->
pythan2_B(A, 1, N, [])
end, lists:seq(1, N), ordered)).

pythan2_A(A, N, Acc) when A > N -> Acc;
pythan2_A(A, N, Acc) -> pythan2_A(A+1,N,pythan2_B(A, 1, N, Acc)).

pythan2_B(A, B, N, Acc) when A+B > N -> Acc;
pythan2_B(A, B, N, Acc) -> pythan2_B(A,B+1,N,pythan2_C(A, B, 1, N, Acc))=
.

pythan2_C(A, B, C, N, Acc) when A+B+C > N -> Acc;
pythan2_C(A, B, C, N, Acc) ->
if A*A+B*B =3D:=3D C*C ->
pythan2_C(A, B, C+1, N, [{A,B,C}|Acc]);
true ->
pythan2_C(A, B, C+1, N, Acc)
end.

%% @spec lpmap(fun(), list(), (atom() =3D ordered|unordered)) -> list=
()
%% @doc Spawns a process for each element in list L, performs specif=
ied
%% function F against each in parallel and then returns results=
=

either
%% same order as L (ordered) or in any order (unordered).
%% NB: See also lpmap/4.

lpmap(F, L, ordered) ->
Ref =3D erlang:make_ref(),
Pids =3D [lpmap_spawn_link(self(), Ref, F, I) || I <- L],
lpmap_gather_ordered(Pids, Ref, [], 0, void);
lpmap(F, L, unordered) ->
Ref =3D erlang:make_ref(),
lists:foreach(fun(I) ->
lpmap_spawn_link(self(), Ref, F, I)
end, L),
lpmap_gather_unordered(length(L), Ref, [], 0, void).

%% @spec lpmap(fun(), integer(), list(), (atom() =3D ordered|unordere=
d)) =

-> list()
%% @doc Same as lpmap/3 except ensures only a maximum of MaxPs paral=
lel
%% processes execute function F at any one time (i.e. first tak=
es =

MaxPs
%% items from list, executes F in parallel against each, then a=
s =

each
%% process returns, spawns another process on next item in L as=
=

long as
%% active processes are less than MaxPs).
%% NB: See also lpmap/4.

lpmap(F, L, MaxPs, ordered) when MaxPs>0 ->
Ref =3D erlang:make_ref(),
{HPids, TPids} =3D if
length(L) > MaxPs -> lists:split(MaxPs, L);
true -> {L, []}
end,
Pids =3D [lpmap_spawn_link(self(), Ref, F, I) || I <- HPids],
lpmap_gather_ordered(Pids, Ref, TPids, MaxPs, F);
lpmap(F, L, MaxPs, unordered) when MaxPs>0 ->
Ref =3D erlang:make_ref(),
{HPids, TPids} =3D if
length(L) > MaxPs -> lists:split(MaxPs, L);
true -> {L, []}
end,
lists:foreach(fun(I) ->
lpmap_spawn_link(self(), Ref, F, I)
end, HPids),
lpmap_gather_unordered(length(HPids), Ref, TPids, MaxPs, F).

%% lpmap internal functions

lpmap_spawn_link(Parent, Ref, F, I) ->
spawn_link(fun() ->
Parent ! {self(), Ref, F(I)}
end).

lpmap_gather_ordered([], _Ref, [], _MaxPs, _F) ->
[];
lpmap_gather_ordered([HPid|TPids], Ref, L, MaxPs, F) ->
receive
{HPid, Ref, Ret} when length(TPids)<MaxPs, L=3D/=3D[] ->
[H | T] =3D L,
[Ret | lpmap_gather_ordered(
lists:append(TPids, [lpmap_spawn_link(self(), Ref, F, H=
)]),
Ref, T, MaxPs, F)];
{HPid, Ref, Ret} ->
[Ret | lpmap_gather_ordered(TPids, Ref, L, MaxPs, F)]
end.

lpmap_gather_unordered(0, _Ref, [], _MaxPs, _F) ->
[];
lpmap_gather_unordered(NPs, Ref, L, MaxPs, F) ->
receive
{_Pid, Ref, Ret} when NPs-1<MaxPs, L=3D/=3D[] ->
[H | T] =3D L,
lpmap_spawn_link(self(), Ref, F, H),
[Ret | lpmap_gather_unordered(NPs, Ref, T, MaxPs, F)];
{_Pid, Ref, Ret} ->
[Ret | lpmap_gather_unordered(NPs-1, Ref, L, MaxPs, F)]
end.


---- END CODE -----


- Edmond -


On Sun, 14 Nov 2010 02:59:18 +1100, Tony Rogvall <tony@rogvall.se> wrote=
:

> Hi Gilberto!
>
> I did some rewrite of the FORM 2, that I think is more compatible with=
=

> the condition that A+B+C =3D< N,
> and the result is of course much better. How did you implement the C# =
=

> version ?
>
> My results for N =3D 300, times are measured with timer:tc.
>
> pythag0: 4837528 (4.8s)
> pythag1: 4289769 (4.3s)
> pythag2: 703632 (0.7s)
>
> And the result with +native
>
> pythag0: 1006865 (1.0s)
> pythag1: 441254 (0.4s)
> pythag2: 107387 (0.1s)
>
> /Tony
>
> My versions (original + the version that only generates the list once)=

>
> -module(pythag).
>
> -compile(export_all).
>
> pythag0(N) ->
> [ {A,B,C} ||
> A <- lists:seq(1,N),
> B <- lists:seq(1,N),
> C <- lists:seq(1,N),
> A+B+C =3D< N,
> A*A+B*B =3D:=3D C*C].
>
> pythag1(N) ->
> L =3D lists:seq(1,N),
> [ {A,B,C} ||
> A <- L,
> B <- L,
> C <- L,
> A+B+C =3D< N,
> A*A+B*B =3D:=3D C*C].
>
>
> pythag2(N) ->
> lists:reverse(pythan2_A(1, N, [])).
>
> pythan2_A(A, N, Acc) when A > N -> Acc;
> pythan2_A(A, N, Acc) -> pythan2_A(A+1,N,pythan2_B(A, 1, N, Acc)).
>
> pythan2_B(A, B, N, Acc) when A+B > N -> Acc;
> pythan2_B(A, B, N, Acc) -> pythan2_B(A,B+1,N,pythan2_C(A, B, 1, N, Acc=
)).
>
> pythan2_C(A, B, C, N, Acc) when A+B+C > N -> Acc;
> pythan2_C(A, B, C, N, Acc) ->
> if A*A+B*B =3D:=3D C*C ->
> pythan2_C(A, B, C+1, N, [{A,B,C}|Acc]);
> true ->
> pythan2_C(A, B, C+1, N, Acc)
> end.
>
>
>
>
>
> On 13 nov 2010, at 08.37, Gilberto Carmenate Garc=EDa wrote:
>
>> Hi all!
>> I have been doing tests to Erlang I found this funny stuff that makes=

>> Pythagorean Triplets
>>
>> pythag(N) ->
>> [ {A,B,C} ||
>> A <- lists:seq(1,N),
>> B <- lists:seq(1,N),
>> C <- lists:seq(1,N),
>> A+B+C =3D< N,
>> A*A+B*B =3D:=3D C*C].
>>
>> I tested it agains an implementation I made in C# so, and takes 14
>> secounds in my pc to do with 300 numbers in Erlang however in c# is j=
ust
>> a secound, even when C# runs under VM too.
>> So I did all possible ways for me to implement differents manners in
>> Erlang looking for speed and all is the same, listed as follows:
>>
>> So my question is, there are any way to do that even more fast, why 3=

>> nestes fors structs in C# are more effients that lists:foldr or
>> lists:foreach in Erlang.
>>
>> %% FORMA 1
>> py1(Max)->
>> L =3D lists:seq(1, Max),
>> lists:foldr(
>> fun(A, Acc3)->
>> lists:foldr(
>> fun(B, Acc2)->
>> lists:foldr(
>> fun(C, Acc)->
>> case ((A*A + B*B =3D:=3D C*C) andalso (A+B=
+C =3D<
>> Max)) of
>> true->
>> [{A,B,C}|Acc];
>> false->
>> Acc
>> end
>> end
>> , Acc2, L)
>> end
>> , Acc3, L)
>> end
>> , [], L).
>>
>>
>>
>> %% FORMA 2
>> py2(Max)->
>> fora(1, [], Max).
>>
>> fora(A, Acc, Max)->
>> Acc1 =3D forb(A,1, Acc, Max),
>> case A < Max of
>> true->
>> fora(A+1, Acc1, Max);
>> false->
>> Acc1
>> end.
>>
>> forb(A,B, Acc, Max)->
>> Acc1 =3D forc(A,B,1, Acc, Max),
>> case B < Max of
>> true->
>> forb(A,B+1, Acc1, Max);
>> false->
>> Acc1
>> end.
>>
>> forc(A,B,C, Acc, Max)->
>> Acc1 =3D case (A*A + B*B =3D:=3D C*C) andalso (A+B+C =3D< Max) of
>> true->
>> [{A,B,C}|Acc];
>> _->
>> Acc
>> end,
>> case C < Max of
>> true->
>> forc(A,B,C+1, Acc1, Max);
>> false->
>> Acc1
>> end.
>>
>> %% FORMA 3.
>> py3(Max)->
>> [{A,B,C} ||
>> A <-lists:seq(1, Max),
>> B <-lists:seq(1, Max),
>> C <-lists:seq(1, Max),
>> A*A + B*B =3D:=3D C*C,
>> A+B+C =3D< Max].
>>
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

>> Este mensaje ha sido enviado mediante el servicio de correo electroni=
co =

>> que ofrece la Federacion de Radioaficionados de Cuba a sus miembros =

>> para respaldar el cumplimiento de los objetivos de la organizacion y =
su =

>> politica informativa. La persona que envia este correo asume el =

>> compromiso de usar el servicio a tales fines y cumplir con las =

>> regulaciones establecidas.
>>
>>
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org
>>
>
> "Have run Make so many times I dunno what's installed anymore"
>


-- =

Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
Willem
Posted: Sun Nov 14, 2010 8:55 am Reply with quote
User Joined: 21 Jul 2006 Posts: 59
Hi,

I think the key is to minimize the number of calculations and comparisons.
In your version you do a lot of multiplications that can easily be avoided.

The version below is about 10 x as fast on my PC:

pythag2(N) ->
L = [{A, A*A} || A <- lists:seq(1,N)],
lists:flatten([forAllBs(A, A2, L, N) || {A, A2} <- L]).

forAllBs(A, A2, L, N) ->
[forAllCs(A, B, A + B, A2 + B2, L, N) || {B, B2} <- L, A + B < N].

forAllCs(A, B, AB, A2B2, L, N) ->
[{A, B, C} || {C, C2} <- L, A2B2 =:= C2, AB + C =< N].

Regards,
Willem

On Sat, Nov 13, 2010 at 8:37 AM, Gilberto Carmenate Garc
View user's profile Send private message
Guest
Posted: Sun Nov 14, 2010 9:19 am Reply with quote
Guest
On Sun, Nov 14, 2010 at 9:52 AM, Willem de Jong <w.a.de.jong@gmail.com> wrote:
> Hi,
>
> I think the key is to minimize the number of calculations and comparisons.
> In your version you do a lot of multiplications that can easily be avoided.
>
> The version below is about 10 x as fast on my PC:
>
> pythag2(N) ->
>
tonyrog
Posted: Sun Nov 14, 2010 10:57 am Reply with quote
User Joined: 12 Sep 2009 Posts: 43
On 14 nov 2010, at 10.17, Hynek Vychodil wrote:

>>
>>
>
> Good idea but it seems that native (HiPE) compiler does this
> optimization for you when you keep staying in one module. (I also
> keeps exporting only main function. It can possibly take effect here
> also.) In BEAM it gives you 50% performance gain. Anyway Erlang is not
> right tool for this job. You should use NIF if performance matter.
>
> pythag4(N) when is_integer(N) -> pythag4(N,1).


I have implemented several small Erlang programs that beat "native" C code.
Most of the time the C programs where badly/hastily written, but that is the point Wink

You must select the algorithm and the implementation wisely, and of course
use the golden rule "Make it work, then make it fast". I would add, if it is needed.
This does not imply that you should write your program unnecessary slow to start with!

Any how. A couple of months ago I implemented the AKS algorithm in Erlang.
The AKS algorithm is a deterministic primality test algorithm (http://en.wikipedia.org/wiki/AKS_primality_test)
I benchmarked this implementation with an implementation in C++.
I was shocked: The Erlang version was about 5 times faster, NOT using obvious parallelism.

In this case I would suspect that garbage collection is the major contributor! The implementation use
a lot of temporary polynomials intermediate results. A copy collector does the trick.

/Tony




"Have run Make so many times I dunno what's installed anymore"



Post received from mailinglist
View user's profile Send private message
Guest
Posted: Sun Nov 14, 2010 11:34 am Reply with quote
Guest
On Sun, Nov 14, 2010 at 11:55 AM, Tony Rogvall <tony@rogvall.se> wrote:
>
> On 14 nov 2010, at 10.17, Hynek Vychodil wrote:
>
>
>
> Good idea but it seems that native (HiPE) compiler does this
> optimization for you when you keep staying in one module. (I also
> keeps exporting only main function. It can possibly take effect here
> also.) In BEAM it gives you 50% performance gain. Anyway Erlang is not
> right tool for this job. You should use NIF if performance matter.
>
> pythag4(N) when is_integer(N) -> pythag4(N,1).
>
> I have implemented several small Erlang programs that beat "native" C code.
> Most of the time the C programs where badly/hastily written, but that is the
> point Wink
> You must select the algorithm and the implementation wisely, and of course
> use the golden rule "Make it work, then make it fast". I would add, if it is
> needed.
> This does not imply that you should write your program unnecessary slow to
> start with!
> Any how. A couple of months ago I implemented the AKS algorithm in Erlang.
> The AKS algorithm is a deterministic primality test algorithm
> (http://en.wikipedia.org/wiki/AKS_primality_test)
> I benchmarked this implementation with an implementation in C++.
> I was shocked: The Erlang version was about 5 times faster, NOT using
> obvious parallelism.
> In this case I would suspect that garbage collection is the major
> contributor! The implementation use
> a lot of temporary polynomials intermediate results. A copy collector does
> the trick.
> /Tony
>
>
>
> "Have run Make so many times I dunno what's installed anymore"
>

When Erlang implementation is faster than C implementation *is* badly
written even it can be for maintainability or haste reason. C++ is
another beast. Pun intended. I have similar experience to you but when
I found that Erlang implementation is faster then I would look how is
it possible and you are right, usually it is memory issue. Anyway
every time you can do same tricks as Erlang does but you will end up
with "half and error prone implementation of Erlang". You can also do
tricks in C which you can't in Erlang. Can you write k/v store which
is able do 200 millions of look up operations per second on one 2.4GHz
i5 core? Anyway HiPE compiler does very good work here. If I count it
correctly pythag3 and pythag4 does about hundred millions checks per
seconds. Very good result I think.


--
--Hynek (Pichi) Vychodil

Analyze your data in minutes. Share your insights instantly. Thrill
your boss.
Guest
Posted: Sun Nov 14, 2010 4:05 pm Reply with quote
Guest
Willem's version in parallel below using a maximum of 2000 concurrent =

processes (find lpmap function is in previous e-mail).

This is approx. 16 x faster on my dual-core. Probably surpasses C# alrea=
dy =

(but again, not really a good point to try to be making.)

pythag2(N) ->
L =3D [{A, A*A} || A <- lists:seq(1,N)], % For all A's
lists:flatten(lpmap(fun({A, A2}) -> % For all B's in parallel
[forAllCs(A, B, A + B, A2 + B2, L, N)
|| {B, B2} <- L, A + B < N]
end, L, 2000, ordered)).

forAllCs(A, B, AB, A2B2, L, N) ->
[{A, B, C} || {C, C2} <- L, A2B2 =3D:=3D C2, AB + C =3D< N].

- Edmond -



On Sun, 14 Nov 2010 19:52:47 +1100, Willem de Jong <w.a.de.jong@gmail.co=
m> =

wrote:

> Hi,
>
> I think the key is to minimize the number of calculations and =

> comparisons.
> In your version you do a lot of multiplications that can easily be =

> avoided.
>
> The version below is about 10 x as fast on my PC:
>
> pythag2(N) ->
> L =3D [{A, A*A} || A <- lists:seq(1,N)],
> lists:flatten([forAllBs(A, A2, L, N) || {A, A2} <- L]).
>
> forAllBs(A, A2, L, N) ->
> [forAllCs(A, B, A + B, A2 + B2, L, N) || {B, B2} <- L, A + B < N].
>
> forAllCs(A, B, AB, A2B2, L, N) ->
> [{A, B, C} || {C, C2} <- L, A2B2 =3D:=3D C2, AB + C =3D< N].
>
> Regards,
> Willem
>
> On Sat, Nov 13, 2010 at 8:37 AM, Gilberto Carmenate Garc=EDa <
> co7eb@frcuba.co.cu> wrote:
>
>> Hi all!
>> I have been doing tests to Erlang I found this funny stuff that makes=

>> Pythagorean Triplets
>>
>> pythag(N) ->
>> [ {A,B,C} ||
>> A <- lists:seq(1,N),
>> B <- lists:seq(1,N),
>> C <- lists:seq(1,N),
>> A+B+C =3D< N,
>> A*A+B*B =3D:=3D C*C].
>>
>> I tested it agains an implementation I made in C# so, and takes 14
>> secounds in my pc to do with 300 numbers in Erlang however in c# is j=
ust
>> a secound, even when C# runs under VM too.
>> So I did all possible ways for me to implement differents manners in
>> Erlang looking for speed and all is the same, listed as follows:
>>
>> So my question is, there are any way to do that even more fast, why 3=

>> nestes fors structs in C# are more effients that lists:foldr or
>> lists:foreach in Erlang.
>>
>> %% FORMA 1
>> py1(Max)->
>> L =3D lists:seq(1, Max),
>> lists:foldr(
>> fun(A, Acc3)->
>> lists:foldr(
>> fun(B, Acc2)->
>> lists:foldr(
>> fun(C, Acc)->
>> case ((A*A + B*B =3D:=3D C*C) andalso (A+B=
+C =3D<
>> Max)) of
>> true->=

>> [{A,B,C}|Acc];
>> false->
>> Acc
>> end
>> end
>> , Acc2, L)
>> end
>> , Acc3, L)
>> end
>> , [], L).
>>
>>
>>
>> %% FORMA 2
>> py2(Max)->
>> fora(1, [], Max).
>>
>> fora(A, Acc, Max)->
>> Acc1 =3D forb(A,1, Acc, Max),
>> case A < Max of
>> true->
>> fora(A+1, Acc1, Max);
>> false->
>> Acc1
>> end.
>>
>> forb(A,B, Acc, Max)->
>> Acc1 =3D forc(A,B,1, Acc, Max),
>> case B < Max of
>> true->
>> forb(A,B+1, Acc1, Max);
>> false->
>> Acc1
>> end.
>>
>> forc(A,B,C, Acc, Max)->
>> Acc1 =3D case (A*A + B*B =3D:=3D C*C) andalso (A+B+C =3D< Max) of
>> true->
>> [{A,B,C}|Acc];
>> _->
>> Acc
>> end,
>> case C < Max of
>> true->
>> forc(A,B,C+1, Acc1, Max);
>> false->
>> Acc1
>> end.
>>
>> %% FORMA 3.
>> py3(Max)->
>> [{A,B,C} ||
>> A <-lists:seq(1, Max),
>> B <-lists:seq(1, Max),
>> C <-lists:seq(1, Max),
>> A*A + B*B =3D:=3D C*C,
>> A+B+C =3D< Max].
>>
>>
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

>> Este mensaje ha sido enviado mediante el servicio de correo electroni=
co =

>> que
>> ofrece la Federacion de Radioaficionados de Cuba a sus miembros para
>> respaldar el cumplimiento de los objetivos de la organizacion y su =

>> politica
>> informativa. La persona que envia este correo asume el compromiso de =
=

>> usar
>> el servicio a tales fines y cumplir con las regulaciones establecidas=
.
>>
>>
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org
>>
>>


-- =

Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
Guest
Posted: Sun Nov 14, 2010 4:12 pm Reply with quote
Guest
On Sun, 14 Nov 2010 21:55:28 +1100, Tony Rogvall <tony@rogvall.se> wrote:

> In this case I would suspect that garbage collection is the major
> contributor! The implementation use
> a lot of temporary polynomials intermediate results.

Hmmmm,

I wonder if you could use a private ETS table to somehow pre-allocate
space for the permutations? ETS is more-or-less mutable memory for Erlang
is it not?

A more experienced Erlang programmer than myself would know if this is
worth pursuing.

- Edmond -

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
Guest
Posted: Sun Nov 14, 2010 4:28 pm Reply with quote
Guest
Hi

Here is another version of the same program using a different approach.

if A^2 + B^2 = C^2, then A and B cannot both be odd, so at least one of the must be even, say, B.

Write B^2 = (C+A)*(C-A)

Now A and C are both even or both odd, so we can write, (B/2)^2 = M1 * M2, where M1 = (C+A)/2, M2 = (C-A)/2
where M1 and M2 are integers. Let p be a prime number and let alpha1 be the highest number such that p^alpha1 divides M1, and similarly for alpha2,
then alpha1 and alpha2 must both be even or both be odd, since the left hand side (B/2)^2 only has even powers of any prime factor.
On the other hand any M1, M2 built in this way will lead to a pythagorean triple, as can be seen by just plugging in.

If A+B+C <= N, then no prime number greater than N/2 can occur.

This lead to the following algorithm.

1. Generate all prime numbers up to N/2. I used Eratosthenes sieve for that.
2. Generate all M1 and M2, by distributing even or odd powers as explained above. Only keep numbers that do not already break A+B+C <= N, and M1 > M2.
3. Put them all in a list, permute A and B, e.g. {3,4,5} and {4,3,5}.
4. unique sort it. Unique because of the A and B permutation, and sort to make the outpout the same as all the other implementations.

The code is pasted below.

The runtime for N=300 is 5 ms on a computer where pythag3 is 80 ms.
But it is for larger values of N that this algorithm will really pull away.
For N=3000, it uses 680 ms compared to 76 s for pythag3.

The earlier programs were all N^3, this one seems, by testing, to be N^2. I haven't thought theoretically about the complexity of this algotithm.

I haven't optimised this program.

Cheers,

Morten.



pythag5(N) when is_integer(N) ->
Primes = sieve(N div 2),
M1M2s = incorporate_primes([{1,1}], N, Primes),
lists:usort(lists:flatten([ [{A,B,C}, {B,A,C}] || {M1, M2} <- M1M2s, M1 > M2, A <- [M1-M2], B <- [2*round(math:sqrt(M1*M2))], C <- [M1+M2], A+B+C =< N])).

sieve(N) when is_integer(N) ->
erase(),
sieve(N,2).

sieve(N, K) when K >= N ->
[X || X <- lists:seq(2, N), erase(X) == undefined];
sieve(N, K) ->
cross_off(K, K, N div K - 1),
sieve(N, find_next_in_sieve(K + 1)).

cross_off(_K, _Current, 0) ->
ok;
cross_off(K, Current, Left) ->
Next = Current + K,
put(Next, out),
cross_off(K, Next, Left - 1).

find_next_in_sieve(K) ->
case get(K) of
undefined ->
K;
_ ->
find_next_in_sieve(K+1)
end.

incorporate_prime(M1M2s, N, P) ->
lists:flatten([incorporate_prime_single({M1,M2}, N, P)|| {M1, M2} <- M1M2s]).

incorporate_prime_single({M1,M2}, N, P) ->
Evens = [{X, Y} || X <- incorporate_prime_even(M1, N, P), Y <- incorporate_prime_even(M2, N, P)],
Odds = [{X, Y} || X <- incorporate_prime_odd(M1, N, P), Y <- incorporate_prime_odd(M2, N, P)],
Evens ++ Odds.

incorporate_prime_even(M, N, P) ->
incorporate_prime(M, N, P, []).

incorporate_prime_odd(M, N, P) ->
incorporate_prime(M * P, N, P, []).

incorporate_prime(M, N, _P, Acc) when M > N/2 ->
Acc;
incorporate_prime(M, N, P, Acc) ->
incorporate_prime(M * P * P, N, P, [M|Acc]).

incorporate_primes(M1M2s, _N, []) ->
M1M2s;
incorporate_primes(M1M2s, N, [P|Rest]) ->
M1M2s_new = incorporate_prime(M1M2s, N, P),
incorporate_primes(M1M2s_new, N, Rest).








________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist
Guest
Posted: Sun Nov 14, 2010 4:52 pm Reply with quote
Guest
Hynek,

A few comments...

> When Erlang implementation is faster than C implementation *is* badly
> written even it can be for maintainability or haste reason. C++ is
> another beast

I disagree. Quick example...

1. Write a "well-written" web-server in Erlang, the "Erlang way"
(lightweight-weight concurrency, message passing).
2. Write a "well-written" web-server in C/C++, the "C/C++ way"
(mulit-threading, shared-memory).

See which is faster. Or, you could save yourself some time and look at
Apache + Yaws. I wouldn't describe either of these as badly written. But
the performance difference is undeniable.

> You can also do tricks in C which you can't in Erlang.

And you can do tricks in Erlang that you can't do in typical C (e.g easy
parallelism with little change to code as evidenced on this thread.)

> Can you write k/v store which
> is able do 200 millions of look up operations per second on one 2.4GHz
> i5 core?

Something is getting lost here. I go back to the promises a language
makes. In delivering on those promises, priorities have to be set and
compromises made. The language user has to weight these.

C/C++ gives us pointers (amongst other things). The language makes great
sacrifices to give us pointers (like garbage collection and debugability)
so we can create insanely efficient data structures. When the benefits of
pointers far outweigh the trade-offs, the choice is clear.

Erlang gives us high concurrency (amongst other things). The Erlang team
made great sacrifices to give us insanely efficient and scalable
concurrency. When the benefits of this far outweigh the trade-offs the
choice is clear.

I see few areas where these benefits/trade-offs overlap. The choice is
usually very clear. In cases where overlapping does occur, there are
various ways of using both.

- Edmond -



On Sun, 14 Nov 2010 22:32:42 +1100, Hynek Vychodil <hynek@gooddata.com>
wrote:

> On Sun, Nov 14, 2010 at 11:55 AM, Tony Rogvall <tony@rogvall.se> wrote:
>>
>> On 14 nov 2010, at 10.17, Hynek Vychodil wrote:
>>
>>
>>
>> Good idea but it seems that native (HiPE) compiler does this
>> optimization for you when you keep staying in one module. (I also
>> keeps exporting only main function. It can possibly take effect here
>> also.) In BEAM it gives you 50% performance gain. Anyway Erlang is not
>> right tool for this job. You should use NIF if performance matter.
>>
>> pythag4(N) when is_integer(N) -> pythag4(N,1).
>>
>> I have implemented several small Erlang programs that beat "native" C
>> code.
>> Most of the time the C programs where badly/hastily written, but that
>> is the
>> point Wink
>> You must select the algorithm and the implementation wisely, and of
>> course
>> use the golden rule "Make it work, then make it fast". I would add, if
>> it is
>> needed.
>> This does not imply that you should write your program unnecessary slow
>> to
>> start with!
>> Any how. A couple of months ago I implemented the AKS algorithm in
>> Erlang.
>> The AKS algorithm is a deterministic primality test algorithm
>> (http://en.wikipedia.org/wiki/AKS_primality_test)
>> I benchmarked this implementation with an implementation in C++.
>> I was shocked: The Erlang version was about 5 times faster, NOT using
>> obvious parallelism.
>> In this case I would suspect that garbage collection is the major
>> contributor! The implementation use
>> a lot of temporary polynomials intermediate results. A copy collector
>> does
>> the trick.
>> /Tony
>>
>>
>>
>> "Have run Make so many times I dunno what's installed anymore"
>>
>
> When Erlang implementation is faster than C implementation *is* badly
> written even it can be for maintainability or haste reason. C++ is
> another beast. Pun intended. I have similar experience to you but when
> I found that Erlang implementation is faster then I would look how is
> it possible and you are right, usually it is memory issue. Anyway
> every time you can do same tricks as Erlang does but you will end up
> with "half and error prone implementation of Erlang". You can also do
> tricks in C which you can't in Erlang. Can you write k/v store which
> is able do 200 millions of look up operations per second on one 2.4GHz
> i5 core? Anyway HiPE compiler does very good work here. If I count it
> correctly pythag3 and pythag4 does about hundred millions checks per
> seconds. Very good result I think.
>
>


--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@erlang.org

Post received from mailinglist

Display posts from previous:  

All times are GMT
Page 1 of 4
Goto page 1, 2, 3, 4  Next
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum