Erlang/OTP Forums

Author Message

<  Erlang questions mailing list  ~  string performance

tmb-erlang at lumo.com
Posted: Mon Sep 20, 1999 2:07 pm Reply with quote
Guest
In many ways, Erlang looks very good for building distributed
web applications, but its string performance is very poor in my
benchmarks: string append is orders of magnitude slower than
in Perl, and characters take many bytes to store (compared to one
byte per character in other languages).

I'm curious whether there are any plans to address this.
One approach would be to transparently switch representations
between lists and strings, like Tcl does. That would be
completely backwards compatible. An alternative would be to
define a separate string type, define new pattern matching
syntax for true strings or prohibit pattern matching on true
strings altogether; this would be backwards compatible, but less
interopreable between old and new code (I still prefer this latter
choice).

So, what are the plans? I know I can use byte arrays, but that
doesn't seem like it's quite the same as having a real string type.

Thanks,
Thomas.


Post generated using Mail2Forum (http://m2f.sourceforge.net)
etxuwig at etxb.ericsson.
Posted: Mon Sep 20, 1999 2:27 pm Reply with quote
Guest
One thing you can do today is to use binaries and deep lists to speed up
processing in web applications.

This has to do with the fact that an Erlang port accepts anything that is a
mix of binaries and byte lists.

Basically, anything that works with erlang:list_to_binary([...]) will work
with a port, and with web applications, everything will eventually go
through a port.

Thus, you can append two strings by writing [String1,String2], and you can
also use erlang:list_to_binary/1 on strings which do not need further
manipulation.

Also, binary_to_list(list_to_binary(DeepList)) can be significantly faster
than lists:flatten(DeepList).

There is talk of a string syntax, but more importantly (I think) is the
upcoming bit syntax, which will allow you to manipulate binaries directly,
including pattern matching of binary data.

The syntax is not set yet, but an early proposal suggested something like:

parse(<"GET ", What/binary, <"
"> | Tail>) ->
{Fields, Contents} = parse_tail(Tail, empty, []),
{What, Fields, Contents}.

parse_tail(<B/binary, <"

"> | Cont>, A) ->
{Cont, Ack};
parse_tail(<B/binary, <"
"> | Tail>, A) ->
parse_tail(Tail, [B|A]).

(Example of parsing the binary data of a HTTP request)

Take the syntax with a grain of salt. It has changed since then, and I
don't know the latest details.

/Uffe


On Mon, 20 Sep 1999 tmb-erlang_at_lumo.com wrote:

tmb-er>In many ways, Erlang looks very good for building distributed
tmb-er>web applications, but its string performance is very poor in my
tmb-er>benchmarks: string append is orders of magnitude slower than
tmb-er>in Perl, and characters take many bytes to store (compared to one
tmb-er>byte per character in other languages).
tmb-er>
tmb-er>I'm curious whether there are any plans to address this.
tmb-er>One approach would be to transparently switch representations
tmb-er>between lists and strings, like Tcl does. That would be
tmb-er>completely backwards compatible. An alternative would be to
tmb-er>define a separate string type, define new pattern matching
tmb-er>syntax for true strings or prohibit pattern matching on true
tmb-er>strings altogether; this would be backwards compatible, but less
tmb-er>interopreable between old and new code (I still prefer this latter
tmb-er>choice).
tmb-er>
tmb-er>So, what are the plans? I know I can use byte arrays, but that
tmb-er>doesn't seem like it's quite the same as having a real string type.
tmb-er>
tmb-er>Thanks,
tmb-er>Thomas.
tmb-er>

Ulf Wiger, Chief Designer AXD 301 <ulf.wiger_at_etx.ericsson.se>
Ericsson Telecom AB tfn: +46 8 719 81 95
Varuv
klacke at bluetail.com
Posted: Mon Sep 20, 1999 2:27 pm Reply with quote
Guest
> In many ways, Erlang looks very good for building distributed
> web applications, but its string performance is very poor in my
> benchmarks: string append is orders of magnitude slower than
> in Perl, and characters take many bytes to store (compared to one
> byte per character in other languages).
>
> I'm curious whether there are any plans to address this.
> One approach would be to transparently switch representations
> between lists and strings, like Tcl does. That would be
> completely backwards compatible. An alternative would be to
> define a separate string type, define new pattern matching
> syntax for true strings or prohibit pattern matching on true
> strings altogether; this would be backwards compatible, but less
> interopreable between old and new code (I still prefer this latter
> choice).
>
> So, what are the plans? I know I can use byte arrays, but that
> doesn't seem like it's quite the same as having a real string type.


The last thing I did while I was still at Ericsson was an
ambitios atempt do address not only the string issue but a number of other
related buffer management issues together with Tony Rogvall.

We worked with it for almost a year and the end result was
a working implementation and a report.

The implementation was never released neither as open source nor
internally in ericsson other than to our colleagues.
The results from the prototype were quite promising though,
both in regards to execution efficency as well as clarity of the
code.


A report describing the proposed language extensions is available at:

http://www.bluetail.com/klacke/binaries.ps

and I'll be
presenting this at the Erlang user conf which is due real soon now.

Now, this work was a prototype and it would need quite a
lot of work to implement it properly and I have no idea what
plans the OTP group have. Furthermore, the implementation
was jam specific and I know the OTP group are going for the BEAM
machine instead.

/klacke


Claes Wikstr
tmb at lumo.com
Posted: Wed Sep 22, 1999 6:39 am Reply with quote
Guest
Hi,

thanks for the response. I'm glad to see that binaries and deep lists
are available for speeding things up. I like the lightweight, dynamic,
and distributed character of Erlang; the string processing seems like
the weakest point right now to me, in particular given the profusion
of text-based Internet protocols and standards (HTTP, XML, POP, SMTP,
etc.).

Maybe it's just because I don't have much experience with it, but the
problem I see with relying on deep lists of binaries is that it may
make code harder to maintain and debug. It certainly makes it harder
to explain to a new Erlang programmer how to do string processing.

> There is talk of a string syntax, but more importantly (I think) is the
> upcoming bit syntax, which will allow you to manipulate binaries directly,
> including pattern matching of binary data.

I think that kind of syntax would be great. I don't think, however,
that using the same data type for strings is such a good idea: if
the distinction between binary and text isn't made early on in the
evolution of a language, it's very difficult to retrofit code later, in
particular when issues like UNICODE support come up. Also, binary and
text should behave (print) differently in an interactive development
environment. It seems to me that the syntax, implementation, and
functionality for "binary" and "text" could be almost the same; but
carrying around one extra bit of type information to distinguish
the two cases would seem very useful to me.

In addition to the syntax, there is also the question of what a
good underlying representation of strings should be in a mostly
functional language. Trees of chunks, lists of blocks, and
pointer pairs (start/end) into a buffer that's extensible at both
ends but otherwise unmodifiable are all possibilities.

Before the language changes, I'm wondering: for getting things done
right now, are there any efficient and powerful string packages around
that can deal with large amounts of variable text and handle things
like string substitutions? Packages that avoid converting back and
forth between various list types and binaries?

Cheers,
Thomas.


Post generated using Mail2Forum (http://m2f.sourceforge.net)

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum