Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

glangdale can probably say more, but I think the high level answer is pretty easy: re2c is intensely focused on building state machines, in their entirety, ahead of time. There's also some cool stuff that lets you integrate the state machine with the rest of your code. AFAIK, re2c sticks fairly strictly to state machines and doesn't do sophisticated optimizations like Hyperscan, particularly with respect to literals. To me, they are fundamentally solving different problems. re2c is only viable if you can actually deal with the full size of the DFAs you build.

re2c is also notable for, AIUI, having a very principled solution to the problem of submatch extraction using tagged DFAs. They wrote a paper about it: http://re2c.org/2017_trofimovich_tagged_deterministic_finite...



For what it's worth at this late stage:

Yes, this is an accurate summary. re2c works when it works, and there's clearly a good niche for "pattern matching stuff that determinizes cleanly". However, in the general case, DFAs catch fire (combinatorial explosion in # of states), especially when handling multiple regular expressions.

IIRC SpamAssassin has lots of quite hard patterns and re2c can only handle a subset. I forget what the fallback position is (libpcre?).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: