regex-performance | Performance comparison of regular expression engines | Regex library
kandi X-RAY | regex-performance Summary
kandi X-RAY | regex-performance Summary
Regular expressions are commonly used in pattern search algorithms. This tool is based on the work of John Maddock (See his own regex comparison here) and the sljit project (See their regex comparison here).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of regex-performance
regex-performance Key Features
regex-performance Examples and Code Snippets
Community Discussions
Trending Discussions on regex-performance
QUESTION
This Github repository added std::regex to the list of regular expression engines and got decimated by the others.
Why is that std::regex - as implemented in libstdc++ - so much slower than others? Is that because of the C++ standard requirements or it is just that that particular implementation is not very well optimized?
Also in the shootout std::regex was unable to compile several regular expressions that all the others accepted, even after adding the flag std::regex::extended. They were (?i)Twain
, \b\w+nn\b
, (?i)Tom|Sawyer|Huckleberry|Finn
, \s[a-zA-Z]{0,12}ing\s
, ([A-Za-z]awyer|[A-Za-z]inn)\s
and \p{Sm}
.
UPDATE: Added comparison with boost::regex.
UPDATE2: added ctre
...ANSWER
Answered 2022-Jan-05 at 10:38Is that because of the C++ standard requirements or it is just that that particular implementation is not very well optimized?
The answer is yes. Kinda.
There is no question that libstdc++'s implementation of is not well optimized. But there is more to it than that. It's not that the standard requirements inhibit optimizations so much as the standard requirements inhibit changes.
The regex library is defined through a bunch of templates. This allows people to choose between char
and wchar_t
, which is in theory good. But there's a catch.
Template libraries are used by copy-and-pasting the code directly into the code compiling against those libraries. Because of how templates get included, even types that nobody outside of the template library knows about are effectively part of the library's ABI. If you change them, two libraries compiled against different versions of the standard library cannot work with each other. And because the template parameter for regex is its character type, those implementation details touch basically everything about the implementation.
The minute libstdc++ (and other standard library implementations) started shipping an implementation of C++ regular expressions, they bound themselves to a specific implementation that could not be changed in a way that impacted the ABI of the library. And while they could cause another ABI break to fix it, standard library implementers don't like breaking ABI because people don't upgrade to standard libraries that break their code.
When C++11 forbade basic_string
copy-on-write implementations, libstdc++ had an ABI problem. Their COW string was widely used, and changing it would make code that compiled against the new one break when used with code compiled against the old one. It took years before libstdc++ bit the bullet and actually implemented C++11 strings.
If Regex had been defined without templates, implementations could use traditional mechanisms to hide implementation details. The ABI for the interface to external code could be fixed and unchanging, with only the implementation of the functions behind that ABI changing from version to version.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install regex-performance
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page