Quantifiers allow you to specify the number of occurrences to match against.
Greedy | Reluctant | Possessive | Meaning |
---|---|---|---|
X? | X?? | X?+ | X, once or not at all |
X* | X*? | X*+ | X, zero or more times |
X+ | X+? | X++ | X, one or more times |
X{n} | X{n}? | X{n}+ | X, exactly n times |
X{n,} | X{n,}? | X{n,}+ | X, at least n times |
X{n,m} | X{n,m}? | X{n,m}+ | X, at least n but not more than m times |
其中X可以是一個字符,一個characters class,或者是一個group,關(guān)于group會在下一章講解。
量詞可以分為三類:Greedy,Reluctant和Possessive。需要注意的是X?,X??和X?+都表示X有且只出現(xiàn)一次或者沒有出現(xiàn)。但是他們在實現(xiàn)上存在著微妙的區(qū)別,我們先來看下面這個例子:
雖然這三個正則表達(dá)式表示的是同一個意思,但是對于同一個字符串,卻得到了不同的匹配結(jié)果,這是我在Stack Overflow上找到的一個解釋:
A greedy quantifier first matches as much as possible. So the .* matches the entire string. Then the matcher tries to match the f following, but there are no characters left. So it "backtracks", making the greedy quantifier match one less thing (leaving the "o" at the end of the string unmatched). That still doesn't match the f in the regex, so it "backtracks" one more step, making the greedy quantifier match one less thing again (leaving the "oo" at the end of the string unmatched). That still doesn't match the f in the regex, so it backtracks one more step (leaving the "foo" at the end of the string unmatched). Now, the matcher finally matches the f in the regex, and the o and the next o are matched too. Success!
A reluctant or "non-greedy" quantifier first matches as little as possible. So the .* matches nothing at first, leaving the entire string unmatched. Then the matcher tries to match the f following, but the unmatched portion of the string starts with "x" so that doesn't work. So the matcher backtracks, making the non-greedy quantifier match one more thing (now it matches the "x", leaving "fooxxxxxxfoo" unmatched). Then it tries to match the f, which succeeds, and the o and the next o in the regex match too. Success!
In your example, it then starts the process over with the remaining unmatched portion of the string, following the same process.
A possessive quantifier is just like the greedy quantifier, but it doesn't backtrack. So it starts out with .* matching the entire string, leaving nothing unmatched. Then there is nothing left for it to match with the f in the regex. Since the possessive quantifier doesn't backtrack, the match fails there.
下面是官方文檔中對Greedy,Reluctant和Possessive的解釋,結(jié)合上面的解答,可以更深刻的理解它們之間的區(qū)別:
Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.
(貪婪的,所以每次都先匹配最長的)
The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.
(不情愿的,所以每次都先匹配最短的)
Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.
(占有欲強的,不僅貪婪,想匹配最長的字符串,而且不把匹配到的字符吐出來)