JavaScript: an overview of the regular expression API
Regular expression syntax
Listed below are constructs that are hard to remember (not listed are things like * for repetition, capturing groups, etc.).- Escaping: the backslash escapes special characters, including the slash in regular expression literals (see below) and the backslash itself.
- If you specify a regular expression in a string you must escape twice: once for the string literal, once for the regular expression. For example, to just match a backslash, the string literal becomes "\\\\".
- The backslash is also used for some special matching operators (see below).
- Non-capturing group: (?:x) works like a capturing group for delineating the subexpression x, but does not return matches and thus does not have a group number.
- Positive look-ahead: x(?=y) means that x matches only if it is followed by y. y itself is not counted as part of the regular expression.
- Negative look-ahead: x(?!y) the negated version of the previous construct: x must not be followed by y.
- Repetitions: {n} matches exactly n times, {n,} matches at least n times, {n,m} matches at least n, at most m times.
- Control characters: \cX matches Ctrl-X (for any control character X), \n matches a linefeed, \r matches a carriage return.
- Back reference: \n refers back to group n and matches its contents again.
> /(a+)b\1/.test("aaba") true > /^(a+)b\1/.test("aaba") false > var tagName = /<([^>]+)>[^<]*<\/\1>/; > tagName.exec("<b>bold</b>")[1] 'b' > tagName.exec("<strong>text</strong>")[1] 'strong' > tagName.exec("<strong>text</stron>") null
Creating a regular expression
There are two ways to create a regular expression.Flags modify matching behavior.
Regular expression literal: var regex = /xyz/; (compiled at load time) Regular expression object: var regex = new RegExp("xzy"); (compiled at runtime)
Examples:
g global The given regular expression is matched multiple times. i ignoreCase Case is ignored when trying to match the given regular expression. m multiline In multiline mode, the begin and end operators ^ and $ work for each line, instead of for the complete input string.
> /abc/.test("ABC") false > /abc/i.test("ABC") trueRegular expressions have the following properties.
- Flags: boolean values indicating what flags are set.
- global: is flag g set?
- ignoreCase: is flag i set?
- multiline: is flag m set?
- If flag g is set:
- lastIndex: the index where to continue matching next time.
RegExp.prototype.test(): determining whether there is a match
The following method returns a boolean indicating whether the match succeeded.regex.test(str)Examples:
> var regex = /^(a+)b\1$/; > regex.test("aabaa") true > regex.test("aaba") falseIf the flag g is set then test() returns true as often as there are matches in the string.
> var regex = /b/g; > var str = 'abba'; > regex.test(str) true > regex.test(str) true > regex.test(str) false
String.prototype.search(): finding the index of a match
The following method returns the index where a match was found and -1 otherwise.str.search(regex)search() completely ignores the flag g. Examples:
> 'abba'.search(/b/) 1 > 'abba'.search(/x/) -1
RegExp.prototype.exec(): capture groups, optionally repeatedly
var matchData = regex.exec(str);matchData is null if there wasn’t a match. Otherwise, it is an array with two additional properties.
- Properties:
- input: The complete input string.
- index: The index where the match was found.
- Array: whose length is the number of capturing groups plus one.
- 0: The match for the complete regular expression (group 0, if you will).
- n ≥ 1: The capture of group n.
> var regex = /a(b+)a/; > regex.exec("_abbba_aba_") [ 'abbba' , 'bbb' , index: 1 , input: '_abbba_aba_' ] > regex.lastIndex 0Invoke repeatedly: Flag global is set.
> var regex = /a(b+)a/g; > regex.exec("_abbba_aba_") [ 'abbba' , 'bbb' , index: 1 , input: '_abbba_aba_' ] > regex.lastIndex 6 > regex.exec() [ 'aba' , 'b' , index: 7 , input: '_abbba_aba_' ] > regex.exec() nullLoop over matches.
var regex = /a(b+)a/g; var str = "_abbba_aba_"; while(true) { var match = regex.exec(str); if (!match) break; console.log(match[1]); }Output:
bbb b
String.prototype.match(): capture groups or all matches
var matchData = str.match(regex);If the flag g of regex is not set, this method works like RegExp.prototype.exec(). If the flag is set then it returns an array with all matching substrings in str (i.e., group 0 of every match) or null if there is no match.
> 'abba'.match(/a/) [ 'a', index: 0, input: 'abba' ] > 'abba'.match(/a/g) [ 'a', 'a' ] > 'abba'.match(/x/g) null
String.prototype.replace(): search and replace
Invocation:str.replace(search, replacement)Parameters:
- search:
- either a string (to be found literally, has no groups)
- or a regular expression.
- replacement:
- either a string describing how to replace what has been found
- or a function that computes a replacement, given matching information.
- $$ inserts a dollar sign $.
- $& inserts the complete match.
- $` inserts the text before the match.
- $' inserts the text after the match.
- $n inserts group n from the match. n must be at least 1, $0 has no special meaning.
> "a1b_c1d".replace("1", "[$`-$&-$']") 'a[a-1-b_c1d]b_c1d' > "a1b_c1d".replace(/1/, "[$`-$&-$']") 'a[a-1-b_c1d]b_c1d' > "a1b_c1d".replace(/1/g, "[$`-$&-$']") 'a[a-1-b_c1d]b_c[a1b_c-1-d]d'Replacement is a function. The replacement function has the following signature.
function(completeMatch, group_1, ..., group_n, offset, inputStr) { ... }completeMatch is the same as $& above, offset indicates where the match was found, and inputStr is what is being matched against. Thus, the special variable arguments inside the function starts with the same data as the result of the exec() method.
Example:
> "I bought 3 apples and 5 oranges".replace( /[0-9]+/g, function(match) { return 2 * match; }) 'I bought 6 apples and 10 oranges'
String.prototype.split(): splitting strings
In a string, find the substrings between separators and return them in an array. Signature:str.split(separator, limit?)Parameters:
- separator can be
- a string: separators are matched verbatim
- a regular expression: for more flexible separator matching. Many JavaScript implementations include the first capturing group in the result array, if there is one.
- limit optionally specifies a maximum length for the returned array. A value less than 0 allows arbitrary lengths.
> "aaa*a*".split("a*") [ 'aa', '', '' ] > "aaa*a*".split(/a*/) [ '', '*', '*' ] > "aaa*a*".split(/(a*)/) [ '', 'aaa', '*', 'a', '*' ]
Sources
- ECMAScript Language Specification, 5th edition.
- Regular Expressions at the Mozilla Developer Network Doc Center
'프로그래밍 > Script' 카테고리의 다른 글
11 Javascript ToolKit For Creating Charts and Graphs (0) | 2013.08.13 |
---|---|
[jQuery] 16 Free jQuery Data And Time Plugins (0) | 2013.08.12 |
[jQuery] Caching jQuery selections in an object (0) | 2013.08.08 |
[javascript] Pickadate.js — Responsive date & time picker (0) | 2013.08.07 |
[jQuery] Sticky-kit: A sticky element jQuery plugin (0) | 2013.08.07 |