| Author |
Message |
|
| rvirding |
Posted: Sun Jun 03, 2007 10:03 pm |
|
|
|
User
Joined: 30 Aug 2006
Posts: 452
Location: Stockholm, Sweden
|
This is a new implementation of regular expressions which is sort of compatible with regexp.erl with two major improvements:
1. It now works directly on binaries, all the functions take binaries as input, but not for the regexp.
2. There are 2 new function which extract and return sub-expressions, smatch/2, and first_smatch2. These are the similar to match/2 and first_match/2 but they also sub expressions For example:
2> re:smatch("-axxxb--", "a((x+)|(y+))b").
{match,2,5,"axxxb",{{3,3,"xxx"},{3,3,"xxx"},undefined}}
A sub-expr is 'undefined' if there is no match.
It supports POSIX regexp as did the old one, but we now have POSIX character classes but only for Latin-1. So we can write "[[:digit:]]" or "[[:alnum:]]". The functions are the same as before.
The regexp engine should never explode irrespective of the regexp, which many do, and is about as fast as the old one. It depends on the regexp.
I would like some feed-back on the speed and the interface.
N.B. It is not really possible to have both POSIX and PERL regexps in the same module as apart from the difference in features they have different semantics. If all goes well a PERL module might follow. |
| Description: |
| A new regular expression module. (3) |
|
 Download |
| Filename: |
re.erl |
| Filesize: |
44.19 KB |
| Downloaded: |
1619 Time(s) |
| Description: |
| A new regular expression module. (2) |
|
 Download |
| Filename: |
re.erl |
| Filesize: |
43.97 KB |
| Downloaded: |
1658 Time(s) |
|
|
| Back to top |
|
| Mazen |
Posted: Wed Jun 06, 2007 8:01 am |
|
|
|
User
Joined: 20 Jul 2006
Posts: 164
Location: London
|
Seem to work
Tested it with a few expressions, nothing to fancy. Maybe someday I will have time to write a good test, but thats when I have time
Good job!  |
|
|
|
| Back to top |
|
| nem |
Posted: Mon Jan 14, 2008 5:00 am |
|
|
|
User
Joined: 29 Nov 2007
Posts: 25
|
I've found a small bug where the start position on a match group from first_smatch is negative.
This patch adds the fix_subs call present in smatch/2 but not first_smatch/2.
Code:
--- ../racer/src/re.erl 2007-09-25 10:53:53.000000000 +1200
+++ src/re.erl 2008-01-14 17:49:31.000000000 +1300
@@ -658,13 +658,13 @@
first_smatch_str(Cs, P, Nfa) ->
case next_smatch_str(Cs, P, Nfa) of
- {match,St,Len,_,Subs,_} -> {match,St,Len,Subs};
+ {match,St,Len,_,Subs,_} -> {match,St,Len,fix_subs_str(Subs, St, Cs)};
nomatch -> nomatch
end.
first_smatch_bin(Bin, P, Nfa) ->
case next_smatch_bin(Bin, P, Nfa) of
- {match,St,Len,Subs} -> {match,St,Len,Subs};
+ {match,St,Len,Subs} -> {match,St,Len,fix_subs_bin(Subs, St)};
nomatch -> nomatch
end.
[/code] |
| Description: |
|
 Download |
| Filename: |
re.erl.diff.txt |
| Filesize: |
598 Bytes |
| Downloaded: |
1317 Time(s) |
|
|
| Back to top |
|
| daniello |
Posted: Wed Feb 13, 2008 11:59 am |
|
|
|
User
Joined: 03 Apr 2007
Posts: 15
|
Eshell V5.6 (abort with ^G)
1> re:match("user@host.com","[a-zA-Z_0-9]{1,}[@][a-zA-Z_0-9-]{1,}([.]([a-zA-Z_0-9-]{1,}))$").
{match,1,13}
2> re:match("user@host.com.pl","[a-zA-Z_0-9]{1,}[@][a-zA-Z_0-9-]{1,}([.]([a-zA-Z_0-9-]{1,}))$").
nomatch
3> re:match("user@host.com.pl","[a-zA-Z_0-9]{1,}[@][a-zA-Z_0-9-]{1,}([.]([a-zA-Z_0-9-]{1,})){1,3}$").
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
CPU was 100%
--
Regards,
Daniel |
|
|
|
| Back to top |
|
| nem |
Posted: Mon May 19, 2008 3:40 am |
|
|
|
User
Joined: 29 Nov 2007
Posts: 25
|
Hi all, just found a little bug in first_smatch for binaries. New version of re.erl attached.
I should really put this up on github and add eunit tests. (And bug the OTP people about accepting it into the standard distribution). |
| Description: |
| first_smatch patch for binaries too |
|
 Download |
| Filename: |
re.erl |
| Filesize: |
44.23 KB |
| Downloaded: |
1215 Time(s) |
|
|
| Back to top |
|
| Mazen |
Posted: Thu Jun 12, 2008 9:14 am |
|
|
|
User
Joined: 20 Jul 2006
Posts: 164
Location: London
|
Please note that by R12B3 There is now a module named "re" that ships with the distribution, I.e. beware of name clashes.
http://www.erlang.org/doc/man/re.html
http://www.erlang.org/download/otp_src_R12B-3.readme
Quote:
OTP-7181 An experimental module "re" is added to the emulator which
interfaces a publicly available regular expression library
for Perl-like regular expressions (PCRE). The interface is
purely experimental and *will* be subject to change.
The implementation is for reference and testing in connection
to the relevant EEP.
|
|
|
|
| Back to top |
|
| rvirding |
Posted: Thu Jun 12, 2008 7:55 pm |
|
|
|
User
Joined: 30 Aug 2006
Posts: 452
Location: Stockholm, Sweden
|
Yes, I know! I will have to start a name war, or change the name of my package.  |
|
|
|
| Back to top |
|
|
|
All times are GMT
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
|
|