Matching Words
From Erlang Community
| Revision as of 01:42, 4 September 2006 (edit) Bfulgham (Talk | contribs) ← Previous diff |
Revision as of 18:52, 24 September 2006 (edit) (undo) Ayrnieu (Talk | contribs) (answer a bit differently, also removing ref to PCRE. The Cook Book really isn't the place to ask for language enhancements.) Next diff → |
||
| Line 8: | Line 8: | ||
| <code> | <code> | ||
| + | matches(H,{match,M}) -> matches(H,M,[]). | ||
| + | matches(_,[],Acc) -> Acc; | ||
| + | matches(H,[{I,L}|T],Acc) -> | ||
| + | matches(H,T,[lists:sublist(H,I,L)|Acc]). | ||
| + | |||
| + | words(String, Regexp) -> matches(String,regexp:matches(String, Regexp)). | ||
| + | |||
| Words_1 = "[^ ]+". % as many non-whitespace bytes as possible | Words_1 = "[^ ]+". % as many non-whitespace bytes as possible | ||
| Words_2 = "[A-Za-z'-]+". % as many letters, apostrophes, and hyphens | Words_2 = "[A-Za-z'-]+". % as many letters, apostrophes, and hyphens | ||
| - | 1> | + | 1> words("'alpha-beta gamma theta", Words_1). |
| - | + | ["'alpha-beta","gamma","theta"] | |
| - | + | 3> words("'alpha-beta&or gamma theta", Words_2). | |
| - | " | + | ["'alpha-beta", "or", "gamma", "theta"] |
| - | 3> | + | |
| - | + | ||
| - | + | ||
| - | " | + | |
| </code> | </code> | ||
| Line 26: | Line 29: | ||
| The meaning of "word" in a particular application's context can vary significantly. Languages usually support pluralization of singular nouns, attach posessive modifiers, allow hyphenated word combinations, and so forth. The regular expression used must reflect the expected range of words to be encountered. | The meaning of "word" in a particular application's context can vary significantly. Languages usually support pluralization of singular nouns, attach posessive modifiers, allow hyphenated word combinations, and so forth. The regular expression used must reflect the expected range of words to be encountered. | ||
| - | |||
| - | Unfortunately, there is no existing Perl-compatible regular expression module for use in Erlang. | ||
| [[Category:CookBook]][[Category:Regular_Expressions]] | [[Category:CookBook]][[Category:Regular_Expressions]] | ||
Revision as of 18:52, 24 September 2006
Problem
You want to select words from a string.
Solution
Determine the defining features of a word for your specific application, then write a regular expression that models this idea.
matches(H,{match,M}) -> matches(H,M,[]).
matches(_,[],Acc) -> Acc;
matches(H,[{I,L}|T],Acc) ->
matches(H,T,[lists:sublist(H,I,L)|Acc]).
words(String, Regexp) -> matches(String,regexp:matches(String, Regexp)).
Words_1 = "[^ ]+". % as many non-whitespace bytes as possible
Words_2 = "[A-Za-z'-]+". % as many letters, apostrophes, and hyphens
1> words("'alpha-beta gamma theta", Words_1).
["'alpha-beta","gamma","theta"]
3> words("'alpha-beta&or gamma theta", Words_2).
["'alpha-beta", "or", "gamma", "theta"]
|
Discussion
Erlang does not have a built-in definition for words in strings. On the one hand, this is inconvenient since you have to define your own meaning of "word". On the other hand, this is the correct behavior since the concept of words varies significantly between applications, locales, encodings, and input source.
The meaning of "word" in a particular application's context can vary significantly. Languages usually support pluralization of singular nouns, attach posessive modifiers, allow hyphenated word combinations, and so forth. The regular expression used must reflect the expected range of words to be encountered.

Digg It
Del.icio.us
Reddit
Facebook
Stumble Upon
Technorati

