Erlang/OTP Forums

Author Message

<  Erlang  ~  Quirks in Erlang R14B binaries

LRP
Posted: Mon May 09, 2011 3:53 am Reply with quote
Joined: 08 Oct 2008 Posts: 8 Location: Boston, MA.
Or maybe I just don't know what I'm doing.

I've spent days trying to use regular expressions to filter binaries with scant success. Here are two experiments that illustrate some of the frustrations I've encountered.

Can some kind soul either show me how to correct my mistakes or, if it so be, confirm that there are quirks in Erlang R14B binaries.

Latex produces typeset text output. Among other things, it outputs both opening and closing quotation marks, consistent with professional typesetting convention. Open quotes in Latex input files, (*.tex), are represented as shown below:

To `quote' in Latex
To ``quote'' in Latex
To ``quote" in Latex

Our goal is to filter and convert a binary derived from a *.txt file to produce proper *.tex markup.

Wierdness 1

Here's a quote represented as a binary:

A = <<"\"The quick brown fox\"">>.
<<"\"The quick brown fox.\"">>

Which displays as we'd expect:

io:format(A,[]).
"The quick brown fox."ok

We can use a regular expression to convert the opening quote to *.tex markup:

B = binary:replace(A,<<"\"">>,<<"``">>).
<<"``The quick brown fox.\"">>

Which also displays as we'd expect:

io:format(B,[]).
``The quick brown fox."ok

But we need to convert opening quotes only which, presumably, can be identified as a double quote character (") followed by a Perl "word" character (\w).

D = binary:replace(A,<<"\"\w">>,<<"``\w">>, [global]).
<<"\"The quick brown fox.\"">>

Display confirms our fears:

io:format(G,[]).
"The quick brown fox."ok

Maybe re will stand us in better stead.

E = re:replace(A,<<"\"">>,<<"``">>, [global,{return, binary}]).
<<"``The quick brown fox.``">>

So far so good. Let's try for opening quotes only:

F = re:replace(A,<<"\"\w">>,<<"``\w">>, [global,{return, binary}]).
<<"\"The quick brown fox.\"">>

io:format(A,[]).
"The quick brown fox."ok

Grrrr.

Wierdness 2

We would like to delete all newlines at the beginning of our binary.

If we know how many, it's easy:

A = <<"\n\n\n\nThe quick brown fox">>.
<<"\n\n\n\nThe quick brown fox">>

B = binary:replace(A,<<"\n\n\n\n">>,<<>>).
<<"The quick brown fox">>

But, following Perl-style regular expressions documented here:

http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

We should be able to delete an arbitrary number of newlines:

C = binary:replace(A,<<"\n{0,}">>,<<>>).
<<"\n\n\n\nThe quick brown fox">>

NOT.

Maybe re can help us out:

5> E = re:replace(A,<<"\n{0, }">>,<<>>, [{return, binary}]).
<<"\n\n\n\nThe quick brown fox">>

NOT. Grrrr.

So what's going on here? Is there a better way?
View user's profile Send private message
rvirding
Posted: Mon May 09, 2011 3:56 pm Reply with quote
User Joined: 30 Aug 2006 Posts: 452 Location: Stockholm, Sweden
First to make some points:

- In Erlang strings and binary strings '\' is the quote character so to get it in the actual string you need to quote it as well. So <<"\"\w">> is actually <<"\"w>>, to get the '\' in the string you need to write <<"\"\\w">>.

- In the module binary patterns are NOT regular expressions but literal binaries. So the pattern <<"\"\\w">> will match against the actual character sequence "\w. To use regular expressions you need to use the module re.

- In re:replace the replacement string/binary is inserted as is except for the character '&' which is replaced by the whole matching string (quote it to get it in literally) or \1-\9 which are replaced by the matching sub-expressions in the pattern.

- You can use instead '*' instead of '{0,}', it means the same thing and is the standard way of expressing "zero or more". '+' is the standard way for "one or more".

- If you want to make sure that you only work on the beginning of the string then you need to anchor your regular expression with '^'. Check in the 're' docs how the option 'multiline' affects the effect of '^' and '$'. I think the default value is probably what you need.

So putting all this together this is what I found running R14B02:

Code:
A = <<"\"The quick\">>.

40> binary:replace(A, <<"\"">>, <<"``">>).               
<<"``The quick\"">>

43> re:replace(A, <<"\"\w">>, <<"``\w">>, [{return,binary}]).
<<"\"The quick\"">>
44> re:replace(A, <<"\"\\w">>, <<"``\w">>, [{return,binary}]).
<<"``whe quick\"">>
45> re:replace(A, <<"\"(\\w)">>, <<"``\\1">>, [{return,binary}]).
<<"``The quick\"">>

I did not have the same problems you did when trying to remove newlines at the beginning.
Code:
B = <<"\n\n\n\nThe quick">>.
C = <<"\n\n\n\nThe quick\n\n">>.

55> binary:replace(B, <<"\n">>, <<>>, [global]).
<<"The quick">>
56> binary:replace(C, <<"\n">>, <<>>, [global]).
<<"The quick">>
57> re:replace(B, <<"\n*">>, <<>>, [{return,binary},global]).   
<<"The quick">>
58> re:replace(C, <<"\n*">>, <<>>, [{return,binary},global]).
<<"The quick">>
59> re:replace(B, <<"^\n*">>, <<>>, [{return,binary},global]).
<<"The quick">>
60> re:replace(C, <<"^\n*">>, <<>>, [{return,binary},global]).
<<"The quick\n\n">>

I hope this helps a little.

_________________
Robert Virding, Erlang Solutions Ltd.
View user's profile Send private message Visit poster's website MSN Messenger
wuji
Posted: Mon Sep 10, 2012 8:57 am Reply with quote
User Joined: 10 Aug 2012 Posts: 654
said the "investigation is still ongoing" but confirmed facts of the crime crime [h4]replica designer *beep*[/h4] crime detailed in Colon's arrest warrant and obtained by the Miami Herald.According
the Herald, Colon met a friend, Stephany Concepcion, 26, and two two jordan 11 two other men identified only as "Big Killer" and "Crazy Dread" before
to rob Vero, an artist known to keep large amounts of cash cash cheap jordan shoes cash in his home.Conception, Vero's former employee, arrived at his house late
the night of Jan. 6, 2006. From the victim's bathroom, she called called jordan concord called Colon and told him the time was right for the robbery,
to the warrant.Conception said she heard shots fired from the other room. room. jordan 11 concords room. When she ran outside, the other men had driven off and
was alone at the scene when police arrived. She was arrested and and cheap designer *beep* and pleaded guilty to second-degree murder. She was sentenced to 15 years
View user's profile Send private message
dongdongwu
Posted: Wed Sep 19, 2012 8:15 am Reply with quote
User Joined: 19 Sep 2012 Posts: 236
Girls would refuse to even leave their replicas behind. specifically when the product beats the complete meaning of the Christian Louboutin men outlet; in conditions of good quality and detailing.If Cinderella was residing in your twenty primary century as opposed to the aged a single then there would not be any 'Happy actually after'. properly appears like girls nowadays are as well fond of the strong;Christian Louboutin Men Shoes to leave them in your center of nowhere.Christian Louboutin for men Shoes would arrive true handy being a excellent handbag using the glimpse and really feel belonging to the authentic but at a very much lesser price tag adding as very much as types picture in your process.
View user's profile Send private message

Display posts from previous:  

All times are GMT
Page 1 of 1
This forum is locked: you cannot post, reply to, or edit topics.

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum