[ANN] Google's RE2 regular expression library
От: remark Россия http://www.1024cores.net/
Дата: 12.03.10 18:19
Оценка: 36 (8) +1

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Backtracking engines are typically full of features and convenient syntactic sugar but can be forced into taking exponential amounts of time on even small inputs. RE2 uses automata theory to guarantee that regular expression searches run in time linear in the size of the input. RE2 implements memory limits, so that searches can be constrained to a fixed amount of memory. RE2 is engineered to use a small fixed C++ stack footprint no matter what inputs or regular expressions it must process; thus RE2 is useful in multithreaded environments where thread stacks cannot grow arbitrarily large.

On large inputs, RE2 is often much faster than backtracking engines; its use of automata theory lets it apply optimizations that the others cannot.

Unlike most automata-based engines, RE2 implements almost all the common Perl and PCRE features and syntactic sugars. It also finds the leftmost-first match, the same match that Perl would, and can return submatch information. The one significant exception is that RE2 drops support for backreferences and generalized zero-width assertions, because they cannot be implemented efficiently. The syntax page gives full details.

For those who want a simpler syntax, RE2 has a POSIX mode that accepts only the POSIX egrep operators and implements leftmost-longest overall matching.


http://code.google.com/p/re2/
http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html

If you absolutely need backreferences and generalized assertions, then RE2 is not for you, but you might be interested in irregexp, Google Chrome's regular expression engine.



1024cores — all about multithreading, multicore, concurrency, parallelism, lock-free algorithms
 
Подождите ...
Wait...
Пока на собственное сообщение не было ответов, его можно удалить.