| Author |
Message |
|
| anders_n |
Posted: Sat Oct 20, 2007 2:18 pm |
|
|
|
User
Joined: 28 Feb 2005
Posts: 155
Location: Saltillo, Mexico
|
Hi
I am playing around with Tim Bray's widefinder problem.
Actually I am playing with Steve V's latest version
see http://steve.vinoski.net/blog/2007/10/18/ok-just-one-more-wf/
In it he is using a dict for Boyer-Moore searching, the dict is just
a map from char code to shift amount i.e. the keys are 1 - 255
and the values are ints.
I figured that it would be faster to just have a tuple with the shift values.
So I changed the code to generate a tuple instead of a dict. And the
lookups changed from
dict:fetch(C1, Tbl)
to
element(C1,Tbl)
To my big surprise the program now runs slower.
On a file with 1M rows approx. 192 M.
Steve's original code, using dics, as reported by time
real 0m20.444s
user 0m37.718s
sys 0m0.612s
My modified, using a tuple
real 0m25.849s
user 0m44.203s
sys 0m3.332s
Interesting to note is the high value for sys in my version.
Also the tuple version does not manage to max out the CPU use
it peaks at ~85%, while the dict version gets 100% CPU load.
The program starts 64 erlang processes on my dual core laptop.
My only idea is that this is caused by some pathological CPU cache
behaviour. Does anyone have a better explanation?
/Anders
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Sun Oct 21, 2007 5:43 am |
|
|
|
Guest
|
On 10/20/07, Anders Nygren <anders.nygren@gmail.com (anders.nygren@gmail.com)> wrote:Quote: Hi
I am playing around with Tim Bray's widefinder problem.
Actually I am playing with Steve V's latest version
see http://steve.vinoski.net/blog/2007/10/18/ok-just-one-more -wf/
In it he is using a dict for Boyer-Moore searching, the dict is just
a map from char code to shift amount i.e. the keys are 1 - 255
and the values are ints.
I figured that it would be faster to just have a tuple with the shift values.
So I changed the code to generate a tuple instead of a dict. And the
lookups changed from
dict:fetch(C1, Tbl)
to
element(C1,Tbl)
To my big surprise the program now runs slower.
On a file with 1M rows approx. 192 M.
Steve's original code, using dics, as reported by time
real |
|
|
| Back to top |
|
| Guest |
Posted: Sun Oct 21, 2007 9:22 am |
|
|
|
Guest
|
In my experience using tuples for this purpose, it seems like in many
or most cases tuples are instantiated from scratch every function call
even though they're constant in the bytecode. You might just be
assuming some optimization that Erlang doesn't actually do. This is
the sort of thing that the hybrid model might fix, but as far as I
know the only data structures that are on a global heap is large
binaries, not tuples of any kind or size. I'm mostly just guessing
though, because I have not read any of the Erlang VM's source (and
it's great that I haven't had to!).
You may have better luck restructuring your code in such a way that
this can't happen, either by generating a function that is the
equivalent of element(N, Tuple), e.g. a function with 256 clauses, or
by keeping the tuple in the state of a process (maybe several
processes) so that it doesn't get continuously allocated and
deallocated.
-bob
On 10/20/07, Steve Vinoski <vinoski@ieee.org> wrote:
>
> On 10/20/07, Anders Nygren <anders.nygren@gmail.com> wrote:
> > Hi
> > I am playing around with Tim Bray's widefinder problem.
> > Actually I am playing with Steve V's latest version
> > see http://steve.vinoski.net/blog/2007/10/18/ok-just-one-more -wf/
> >
> > In it he is using a dict for Boyer-Moore searching, the dict is just
> > a map from char code to shift amount i.e. the keys are 1 - 255
> > and the values are ints.
> > I figured that it would be faster to just have a tuple with the shift
> values.
> > So I changed the code to generate a tuple instead of a dict. And the
> > lookups changed from
> > dict:fetch(C1, Tbl)
> > to
> > element(C1,Tbl)
> >
> > To my big surprise the program now runs slower.
> > On a file with 1M rows approx. 192 M.
> >
> > Steve's original code, using dics, as reported by time
> > real 0m20.444s
> > user 0m37.718s
> > sys 0m0.612s
> >
> > My modified, using a tuple
> > real 0m25.849s
> > user 0m44.203s
> > sys 0m3.332s
> >
> > Interesting to note is the high value for sys in my version.
> > Also the tuple version does not manage to max out the CPU use
> > it peaks at ~85%, while the dict version gets 100% CPU load.
> > The program starts 64 erlang processes on my dual core laptop.
> >
> > My only idea is that this is caused by some pathological CPU cache
> > behaviour. Does anyone have a better explanation?
> >
>
> Hi Anders, I don't know the answer, but I agree that it seems strange. I
> wrote some tests that did simple lookups against dicts and tuples in a loop
> and timed them, and tuples were faster, but when I used them in my Wide
> Finder code, they were slower, just as you saw.
>
> --steve
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@erlang.org
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Sun Oct 21, 2007 9:44 am |
|
|
|
Guest
|
"Anders Nygren" <anders.nygren@gmail.com> writes:
>
> My only idea is that this is caused by some pathological CPU cache
> behaviour. Does anyone have a better explanation?
Could be. I have another GUESS: using a tuple instead of dict will
result in a smaller heap size for the spawned process, which will lead
to more frequent garbage collections. You could use the spawn_opt()
BIF with the min_heap_size option.
To get a similar heap size as when using a dict, use
erts_debug:size(wfbm3:init()).
which is 1120.
/Bjorn
--
Bj |
|
|
| Back to top |
|
| Guest |
Posted: Sun Oct 21, 2007 10:36 am |
|
|
|
Guest
|
"Bob Ippolito" <bob@redivi.com> writes:
> In my experience using tuples for this purpose, it seems like in many
> or most cases tuples are instantiated from scratch every function call
> even though they're constant in the bytecode. You might just be
> assuming some optimization that Erlang doesn't actually do. This is
> the sort of thing that the hybrid model might fix, but as far as I
> know the only data structures that are on a global heap is large
> binaries, not tuples of any kind or size. I'm mostly just guessing
> though, because I have not read any of the Erlang VM's source (and
> it's great that I haven't had to!).
Correct for all releases up to and including R11B-5.
> You may have better luck restructuring your code in such a way that
> this can't happen, either by generating a function that is the
> equivalent of element(N, Tuple), e.g. a function with 256 clauses, or
> by keeping the tuple in the state of a process (maybe several
> processes) so that it doesn't get continuously allocated and
> deallocated.
That is good advice in general (up to and including all R11B releases),
but in this case I don't think that a constant tuple was used (i.e. the
tuple is already part of the state of a process).
/Bjorn
--
Bj |
|
|
| Back to top |
|
| Guest |
Posted: Sun Oct 21, 2007 3:14 pm |
|
|
|
Guest
|
On 21 Oct 2007 11:41:12 +0200, Bjorn Gustavsson <bjorn@erix.ericsson.se (bjorn@erix.ericsson.se)> wrote:Quote: "Anders Nygren" <anders.nygren@gmail.com (anders.nygren@gmail.com)> writes:
>
> My only idea is that this is caused by some pathological CPU cache
> behaviour. Does anyone have a better explanation?
Could be. I have another GUESS: using a tuple instead of dict will
result in a smaller heap size for the spawned process, which will lead
to more frequent garbage collections. You could use the spawn_opt()
BIF with the min_heap_size option.
To get a similar heap size as when using a dict, use
|
|
|
| Back to top |
|
| anders_n |
Posted: Sun Oct 21, 2007 4:54 pm |
|
|
|
User
Joined: 28 Feb 2005
Posts: 155
Location: Saltillo, Mexico
|
On 10/21/07, Steve Vinoski <vinoski@ieee.org> wrote:
> On 21 Oct 2007 11:41:12 +0200, Bjorn Gustavsson <bjorn@erix.ericsson.se>
> wrote:
> > "Anders Nygren" <anders.nygren@gmail.com> writes:
> > >
> > > My only idea is that this is caused by some pathological CPU cache
> > > behaviour. Does anyone have a better explanation?
> >
> > Could be. I have another GUESS: using a tuple instead of dict will
> > result in a smaller heap size for the spawned process, which will lead
> > to more frequent garbage collections. You could use the spawn_opt()
> > BIF with the min_heap_size option.
> >
> > To get a similar heap size as when using a dict, use
> >
> > erts_debug:size(wfbm3:init()).
> >
> > which is 1120.
> >
>
> I didn't try this or Bob's suggestion of simulating a tuple, but I did
> eliminate a dict lookup in the code that finds shift values, and also
> changed the macros to use hard-coded string lengths (since the strings are
> fixed) rather than recalculating them with length/1, and got another 25%
> speedup:
>
> <http://steve.vinoski.net/blog/2007/10/21/faster-wf-still/>
>
>
So I then took Steve's latest and
- changed to using a tuple instead of dict and VICTORY
Steve's tbray16 on my laptop
real 0m10.929s
user 0m18.477s
sys 0m0.560s
With tuple instead of dict
real 0m14.126s
user 0m22.373s
sys 0m1.924s
- added [{min_heap_size, 3000}] to the spawn, as suggested by Bjorn
With tuple and min_heap_size, 1000
real 0m7.358s
user 0m11.725s
sys 0m0.480s
With tuple and min_heap_size, 2000
real 0m6.760s
user 0m10.957s
sys 0m0.388s
With tuple and min_heap_size, 3000
real 0m6.305s
user 0m10.149s
sys 0m0.380s
After that increasing the heap size does not make any difference.
So my next step will be to play with garbage collection and see what I can get.
/Anders
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| Guest |
Posted: Sun Oct 21, 2007 8:17 pm |
|
|
|
Guest
|
On 10/21/07, Anders Nygren <anders.nygren@gmail.com (anders.nygren@gmail.com)> wrote:Quote: On 10/21/07, Steve Vinoski <vinoski@ieee.org (vinoski@ieee.org)> wrote:
> On 21 Oct 2007 11:41:12 +0200, Bjorn Gustavsson <bjorn@erix.ericsson.se (bjorn@erix.ericsson.se) >
> wrote:
> > "Anders Nygren" <anders.nygren@gmail.com (anders.nygren@gmail.com)> writes:
> > >
> > > My only idea is that this is caused by some pathological CPU cache
> > > behaviour. Does anyone have a better explanation?
> >
> > Could be. I have another GUESS: using a tuple instead of dict will
> > result in a smaller heap size for the spawned process, which will lead
> > to more frequent garbage collections. You could use the spawn_opt()
> > BIF with the min_heap_size option.
> >
> > To get a similar heap size as when using a dict, use
> >
> > |
|
|
| Back to top |
|
| Thomas Lindgren |
Posted: Mon Oct 22, 2007 7:52 am |
|
|
|
User
Joined: 09 Mar 2005
Posts: 284
|
--- Steve Vinoski <vinoski@ieee.org> wrote:
> Anders sent me his code and I ran it on my 8-core
> Linux box, with the
> following performance results. VICTORY is right!
>
> real 0m1.904s
> user 0m7.917s
> sys 0m1.185s
>
> Like I mentioned to Anders in private email, it's
> nice to have someone more
> experienced with Erlang finally taking a look at
> this; I'm still a relative
> newbie.
> One thing I've liked about this entire exercise is
> that the early attempts
> at solving Tim Bray's Wide Finder in Erlang were
> taking minutes to execute
> and were providing only partial answers. Several of
> us then started
> whittling away at it, and because of the richness of
> the language, we had a
> variety of different avenues to explore. Over time,
> we've vastly increased
> the performance of our solutions. Anders's solution
> now beats Ruby on the
> same machine by about 0.3s, and because of the way
> it uses multiple cores,
> it will likely execute extremely quickly and
> efficiently when Tim gets a
> chance to try it on his T5120.
>
> Yes, fast solutions in other languages were quickly
> found, but those had
> almost nowhere to go beyond their initial forms in
> terms of improvement, not
> because they were already so fast, but because the
> languages ran out of
> alternatives. This is especially true when it comes
> to taking advantage of
> the T5120's many cores. I'm a fan of many languages,
> including Ruby, Python,
> Perl, and C++, all of which have figured prominently
> in the collection of
> various Wide Finder solutions. But for my money,
> Erlang has fulfilled Tim's
> original wishes the best, which is to take the best
> possible advantage of
> all those cores.
Well done, everyone. You've chewed this over pretty
well, I'd say, and it has been interesting to see how
things have improved over time. Here are a couple of
thoughts on further improvements:
1. Native code compilation? It's a bit hit-and-miss,
but this could be the sort of problem that gains from
it.
2. The speedup is 4.15 on 8 cores (if I'm reading
things right: user/real). What is the bottleneck? Too
small input, too much I/O, or is there something that
could be improved or tuned further?
And for language fans, the language itself could be a
bit more helpful. While we haven't emphasized regexp
crunching, it still seems like things could be easier.
Here are some quick thoughts.
0. Not having to write the Boyer-Moore stuff by hand
1. Working with binaries like strings should be easier
2. Reading lines from files
3. Appropriate data chunking for file processing
(people tried all from a few KB to several MB per
chunk -- could the system figure out an appropriate
size on its own?)
4. Perhaps a streaming interface would be even better?
Jay Nelson suggested one a few years ago.
5. When looking at the use of dictionaries, it struck
me that ets:update_counter/3 could have avoided the
use of dictionaries and merging altogether. But, alas,
there is the well-known snag that if the key does not
exist in the table, you need to insert it yourself.
Which means you get a race condition that, as far as I
can see, ets can't handle safely. (Or could I be more
creative?)
6. More intuitive APIs -- it's been instructive, and a
bit alarming, to see how people outside the "core"
community have had to struggle with false starts on
this. More and better documentation and tutorials.
Best,
Thomas
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| dcaoyuan |
Posted: Mon Oct 22, 2007 10:13 am |
|
|
|
User
Joined: 28 Mar 2007
Posts: 34
|
It seems GC is also a key for performance now per my tbray4.erl
code, which uses plain Dict
After I added [{min_heap_size, 3000}] to spawn_opt, the elapsed time
dropped from 7.7 sec to 5.7s immediately.
The code is at:
http://blogtrader.net/page/dcaoyuan/entry/learning_coding_parallelization_was_tim
On 10/22/07, Thomas Lindgren <thomasl_erlang@yahoo.com> wrote:
>
> --- Steve Vinoski <vinoski@ieee.org> wrote:
>
> > Anders sent me his code and I ran it on my 8-core
> > Linux box, with the
> > following performance results. VICTORY is right!
> >
> > real 0m1.904s
> > user 0m7.917s
> > sys 0m1.185s
> >
> > Like I mentioned to Anders in private email, it's
> > nice to have someone more
> > experienced with Erlang finally taking a look at
> > this; I'm still a relative
> > newbie.
> > One thing I've liked about this entire exercise is
> > that the early attempts
> > at solving Tim Bray's Wide Finder in Erlang were
> > taking minutes to execute
> > and were providing only partial answers. Several of
> > us then started
> > whittling away at it, and because of the richness of
> > the language, we had a
> > variety of different avenues to explore. Over time,
> > we've vastly increased
> > the performance of our solutions. Anders's solution
> > now beats Ruby on the
> > same machine by about 0.3s, and because of the way
> > it uses multiple cores,
> > it will likely execute extremely quickly and
> > efficiently when Tim gets a
> > chance to try it on his T5120.
> >
> > Yes, fast solutions in other languages were quickly
> > found, but those had
> > almost nowhere to go beyond their initial forms in
> > terms of improvement, not
> > because they were already so fast, but because the
> > languages ran out of
> > alternatives. This is especially true when it comes
> > to taking advantage of
> > the T5120's many cores. I'm a fan of many languages,
> > including Ruby, Python,
> > Perl, and C++, all of which have figured prominently
> > in the collection of
> > various Wide Finder solutions. But for my money,
> > Erlang has fulfilled Tim's
> > original wishes the best, which is to take the best
> > possible advantage of
> > all those cores.
>
> Well done, everyone. You've chewed this over pretty
> well, I'd say, and it has been interesting to see how
> things have improved over time. Here are a couple of
> thoughts on further improvements:
>
> 1. Native code compilation? It's a bit hit-and-miss,
> but this could be the sort of problem that gains from
> it.
>
> 2. The speedup is 4.15 on 8 cores (if I'm reading
> things right: user/real). What is the bottleneck? Too
> small input, too much I/O, or is there something that
> could be improved or tuned further?
>
> And for language fans, the language itself could be a
> bit more helpful. While we haven't emphasized regexp
> crunching, it still seems like things could be easier.
> Here are some quick thoughts.
>
> 0. Not having to write the Boyer-Moore stuff by hand
>
> 1. Working with binaries like strings should be easier
>
> 2. Reading lines from files
>
> 3. Appropriate data chunking for file processing
> (people tried all from a few KB to several MB per
> chunk -- could the system figure out an appropriate
> size on its own?)
>
> 4. Perhaps a streaming interface would be even better?
> Jay Nelson suggested one a few years ago.
>
> 5. When looking at the use of dictionaries, it struck
> me that ets:update_counter/3 could have avoided the
> use of dictionaries and merging altogether. But, alas,
> there is the well-known snag that if the key does not
> exist in the table, you need to insert it yourself.
> Which means you get a race condition that, as far as I
> can see, ets can't handle safely. (Or could I be more
> creative?)
>
> 6. More intuitive APIs -- it's been instructive, and a
> bit alarming, to see how people outside the "core"
> community have had to struggle with false starts on
> this. More and better documentation and tutorials.
>
> Best,
> Thomas
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@erlang.org
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
--
- Caoyuan
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| Thomas Lindgren |
Posted: Mon Oct 22, 2007 12:13 pm |
|
|
|
User
Joined: 09 Mar 2005
Posts: 284
|
|
| Back to top |
|
| Guest |
Posted: Mon Oct 22, 2007 1:04 pm |
|
|
|
Guest
|
Thomas Lindgren wrote:
>
> That's a 35% speedup just from sizing the heap right.
> So it seems the internal VM heuristics for this could
> be tuned or improved.
Yes, of course. The question is exactly how to satisfy
all needs, especially since they are often contradictory.
The `process-local heap' architecture of the Erlang VM
has made certain choices: stating with a very small heap
(233 words) just so that many thousands (millions?) of
processes can be serviced without running the risk of
crashing the system due to lack of memory.
Of course, if only say 10 processes are ever going to run in
the VM (which I suspect is the case for most micro-benchmarks,
or for the average user out there) this number is ridiculously
low.
In most of today's machines, I strongly suspect that this
233 number can be increased to at least by a factor 10 -- and
perhaps it should -- but the OTP folks should first carefully
examine possible consequences before making this decision. (*)
Ideally, these consequences should be measured on real
applications, not on (toy) micro-benchmarks. Having such a
repository of real applications that can be used as benchmarks
is something that is currently lacking in the Erlang community.
Contributions welcome.
Kostis
(*) The HiPE group can help in this -- this is a good MS Thesis.
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| anders_n |
Posted: Mon Oct 22, 2007 1:08 pm |
|
|
|
User
Joined: 28 Feb 2005
Posts: 155
Location: Saltillo, Mexico
|
On 10/22/07, Thomas Lindgren <thomasl_erlang@yahoo.com> wrote:
>
> --- Steve Vinoski <vinoski@ieee.org> wrote:
>
> > Anders sent me his code and I ran it on my 8-core
> > Linux box, with the
> > following performance results. VICTORY is right!
> >
> > real 0m1.904s
> > user 0m7.917s
> > sys 0m1.185s
> >
> > Like I mentioned to Anders in private email, it's
> > nice to have someone more
> > experienced with Erlang finally taking a look at
> > this; I'm still a relative
> > newbie.
> > One thing I've liked about this entire exercise is
> > that the early attempts
> > at solving Tim Bray's Wide Finder in Erlang were
> > taking minutes to execute
> > and were providing only partial answers. Several of
> > us then started
> > whittling away at it, and because of the richness of
> > the language, we had a
> > variety of different avenues to explore. Over time,
> > we've vastly increased
> > the performance of our solutions. Anders's solution
> > now beats Ruby on the
> > same machine by about 0.3s, and because of the way
> > it uses multiple cores,
> > it will likely execute extremely quickly and
> > efficiently when Tim gets a
> > chance to try it on his T5120.
> >
> > Yes, fast solutions in other languages were quickly
> > found, but those had
> > almost nowhere to go beyond their initial forms in
> > terms of improvement, not
> > because they were already so fast, but because the
> > languages ran out of
> > alternatives. This is especially true when it comes
> > to taking advantage of
> > the T5120's many cores. I'm a fan of many languages,
> > including Ruby, Python,
> > Perl, and C++, all of which have figured prominently
> > in the collection of
> > various Wide Finder solutions. But for my money,
> > Erlang has fulfilled Tim's
> > original wishes the best, which is to take the best
> > possible advantage of
> > all those cores.
>
> Well done, everyone. You've chewed this over pretty
> well, I'd say, and it has been interesting to see how
> things have improved over time. Here are a couple of
> thoughts on further improvements:
>
> 1. Native code compilation? It's a bit hit-and-miss,
> but this could be the sort of problem that gains from
> it.
>
> 2. The speedup is 4.15 on 8 cores (if I'm reading
> things right: user/real). What is the bottleneck? Too
> small input, too much I/O, or is there something that
> could be improved or tuned further?
>
> And for language fans, the language itself could be a
> bit more helpful. While we haven't emphasized regexp
> crunching, it still seems like things could be easier.
> Here are some quick thoughts.
>
> 0. Not having to write the Boyer-Moore stuff by hand
>
> 1. Working with binaries like strings should be easier
>
> 2. Reading lines from files
>
> 3. Appropriate data chunking for file processing
> (people tried all from a few KB to several MB per
> chunk -- could the system figure out an appropriate
> size on its own?)
>
> 4. Perhaps a streaming interface would be even better?
> Jay Nelson suggested one a few years ago.
>
> 5. When looking at the use of dictionaries, it struck
> me that ets:update_counter/3 could have avoided the
> use of dictionaries and merging altogether. But, alas,
> there is the well-known snag that if the key does not
> exist in the table, you need to insert it yourself.
> Which means you get a race condition that, as far as I
> can see, ets can't handle safely. (Or could I be more
> creative?)
>
> 6. More intuitive APIs -- it's been instructive, and a
> bit alarming, to see how people outside the "core"
> community have had to struggle with false starts on
> this. More and better documentation and tutorials.
>
> Best,
> Thomas
>
I did the change to ets:update_counter last night, on my dual
core laptop that made an improvement from my previously
reported
real 0m6.305s
user 0m10.149s
sys 0m0.380s
to
real 0m5.094s
user 0m8.781s
sys 0m0.352s
To avoid the update_counter race condition I have only one process
that all workers report their matches to.
I had some problems with native compilation but finally made it work
and
real 0m2.192s
user 0m3.260s
sys 0m0.336s
/Anders
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| jay |
Posted: Mon Oct 22, 2007 3:05 pm |
|
|
|
User
Joined: 06 Sep 2006
Posts: 139
Location: Los Angeles, CA USA
|
Thomas Lindgren wrote:
> 4. Perhaps a streaming interface would be even better?
> Jay Nelson suggested one a few years ago.
I've been following this thread here and in the blogosphere, but I am
very busy with some code I am working on and am trying valiantly to
wait until I am free before giving an analysis. It will take some
time to discuss all the issues.
For those who haven't read the streaming paper it is on my website at
http://www.duomark.com/erlang/index.html and describes how to build
BIFs that can speed up binary handling. That work was an initial
study of whether binary handling could be accelerated by including a
low-level library of binary functions. It gives performance
comparisons on an old VM for binary matching versus BIFs. This
problem is very amenable to some of the techniques presented in the
paper -- in fact, similar types of problems were what I had in mind
when I started the investigation.
I did further work related to the parsing of binaries over the last
year, but it is very incomplete. It will be 6 months or so (based on
my past track record rather than my optimism) before I will have a
more complete package and analysis of performance. I expect to use
Tim's problem as a motivation for the work.
The main issues I see are as follows:
1) erlang has lots of options, so the developers were able to try
several approaches
2) The performance issues are non-obvious without experience
3) There has been a lot of code slinging and micro-tweaking with good
progress, but not the real analysis of the problem that would
identify the architectural underpinnings for a good solution
(although the participants now have some good experience with
different erlang tools)
4) Doing a proper account of #3 as Thomas suggested would give an
indication of some potential for improving the erlang out-of-the-box
performance on data analysis applications
5) The solutions are large and complicated pieces of code to replace
a 5-liner; erlang should be able to do something useful in 10-20
lines of code
6) While the participants have tried to wring every ounce of
performance out of their solutions, the prospect of a general
solution that could easily be adapted to other problems is strong and
should be faster than a ruby implementation on multi-core, although
probably a little slower than the special purpose solutions being
offered
jay
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
| Thomas Lindgren |
Posted: Mon Oct 22, 2007 3:19 pm |
|
|
|
User
Joined: 09 Mar 2005
Posts: 284
|
--- Anders Nygren <anders.nygren@gmail.com> wrote:
> I did the change to ets:update_counter last night,
> on my dual
> core laptop that made an improvement from my
> previously
> reported
> real 0m6.305s
> user 0m10.149s
> sys 0m0.380s
>
> to
> real 0m5.094s
> user 0m8.781s
> sys 0m0.352s
>
> To avoid the update_counter race condition I have
> only one process
> that all workers report their matches to.
>
> I had some problems with native compilation but
> finally made it work
> and
> real 0m2.192s
> user 0m3.260s
> sys 0m0.336s
Nice -- so native compilation yields a 2.3x speedup
versus the ets-based version (and 2.87x versus the
original). Have you tried setting that process to
priority high? (Can be dangerous in general, but in
the interest of science ...
Parallelization yields 1.78x speedup on the first
version and 1.49x on the native code version. That's
pretty good too on two cores.
(I think the ultimate limit in most any language would
be a time of about 0.33 seconds, the sys time. But at
this point, we are actually within a factor 10 of
that.)
Okay, I just thought of something marginally more
clever when using ets. Maybe you did it this way,
even.
All the matching processes do this when they find a
match 'M':
case catch ets:update_counter(Tab, M, 1) of
{'EXIT', Rsn} -> match_table_owner ! {init_key, M};
_ -> ok
end
and the match_table_owner (which will be the sole
process inserting new matches into the table) does
something like
receive
{init_key, M} ->
case catch ets:update_counter(Tab, M, 1) of
{'EXIT', _} -> ets:insert(Tab, {M, 1});
_ -> ok
end, ...;
...
end
This detects and handles the race condition by
funneling all initializations through the
match_table_owner and making the init there
idempotent. Furthermore, post-init updates will be
done by each matching process without sending
anything. Sounds potentially promising.
Best,
Thomas
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions
Post recived from mailinglist |
|
|
| Back to top |
|
|
|
All times are GMT
Page 1 of 2
Goto page 1, 2 Next
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You cannot download files in this forum
|
|
|