I'm torn between being happy that Lucene is getting a giant speed boost and my c...

rogerbraun · on April 27, 2011

It seems they did some comprehensive testing to make sure there are no bugs. What else should they do?

lscharen · on April 27, 2011

Understand their code.

rogerbraun · on April 27, 2011

They understood it enough to:

  - implement it, although it is extremely complicated
  - test it
  - fix a bug in the algorithm
  - use it to make part of their software 100x faster

lscharen · on April 28, 2011

I think I came away with almost the opposite impression from reading the article.

From my perspective, they did not implement it. Mark Miller and Robert Muir were able to implement the algorithm for the N=1 case, but were stuck until they found the existing Moman code. They did not implement their own code using Moman as a reference implementation, but just used Moman's code to generate the required tables.

From the article, "Not really understanding the Python code, and also neither the paper, we desperately tried to write our own Python code to tap into the various functions embedded in Moman's code". This sounds to me like they did not have a good understanding of the algorithms they were trying to implement.

They did do a fair bit of testing and did uncover a bug in the Moman code base, but, again, they did not fix this bug themselves, but appealed to Jean-Phillipe who then quickly fixed his code -- in effect, they were relying on a third-party.

And, yes, they did apply the end result to make fuzzy searching a lot faster, which is a good and practical end. It took a lot of effort on the Lucene's team part to get this feature implemented, but that does not mean that anyone has a good understanding of the end result.

In short, I don't get the impression that anyone on the Lucene team could give a 1 hour talk on implementing the Klaus Schulz and Stoyan Mihov paper to a formal language and automata audience.

_r5wf · on April 27, 2011

the way they did is exactly what you feared. They got a complex python code and converted it to Java using a converter tool AFAIK.

andrewcooke · on April 27, 2011

i'm amazed by the text in that post. in general i appreciate people admitting when they don't understand something, but the tone there goes beyond relaxed to, well, a celebration of ignorance (greek letters! oh noes!). is it a joke?

vmind · on April 27, 2011

It's also confusing to note they don't mention whether they even contacted the authors of the paper to see if an implementation (even partial) had been made, or clarification could be provided on implementation details.

bluelu · on April 27, 2011

"Realize, now, what a crazy position we were in."