I once spoke to someone in the know about this, and the main reason is that it's quite expensive for them to do any reasonable job with stemming, etc.
I know that sounds weird (but it's google, they're omnipotent!), but it makes sense: It's worth their while to stem content they crawl and index off the web, cause everybody could in theory access any given page. However, with email, the only person who'll ever benefit is the recipient.
It would at least make sense to deal with plurals. I can't tell you the number of times I've searched for something using an s (or not) at the end, and failed to find what I was looking for, only to remember later that I should try to add or remove an s from my search term.
Exactly....I have always wondered why there is no google premium...I would gladly pay $100/year to have a collection of certain domains easily removed from my google searches, or pick from a list of my favorite sites for a "site:" search, etc etc.
It seems like they've worked on mitigating this on the web with suggestions for alternate spellings and by displaying related searches. While you can see stemming at work in many Google searches, I'm pretty sure they don't build extensive substring indices on the web end either. For example, I've had searches where a substring returns 0 results and the exact phrase returns a handful.
> However, with email, the only person who'll ever benefit is the recipient.
I'm not sure that makes sense - if they add stemming, all users of GMail benefit. Going by your explanation, it wouldn't make sense to add any expensive features to GMail, because the only person who would ever benefit from them is the single user.
I think you misunderstand the nature of stemming; the point is that each and every user's inbox would have to be processed individually, and apparently Google doesn't think the overhead is worth the results.
I understand stemming. I just think the webpage contrast is not a good explanation for why they're not doing it. Building an unstemmed search index per user is also expensive, and helps only the recipient, but they do it because they think the expense is necessary. They stem webpages because they think the expense is necessary. They don't stem mails because they think the expense is not justified, not because the only person who benefits is the recipient.
It is also expensive, but it is less expensive than doing that AND stemming. They just decided that stemming was a line where the benefit (add'l users, more use of Gmail, more AdSense revenue, whatever metric) wasn't worth the (development and ongoing processing) costs.
This is a drawback to putting everything in the cloud: features will be weighed by the CPU cycles and storage required by the providers. Can't wait til we come full circle and get back to client/server computing ;). I'm only half joking. I actually can't wait until things mature enough so we have a hybrid of both models. Then I can decide just how much stuff I want indexed and also not worry what happens when my cable modem flakes out.
I know that sounds weird (but it's google, they're omnipotent!), but it makes sense: It's worth their while to stem content they crawl and index off the web, cause everybody could in theory access any given page. However, with email, the only person who'll ever benefit is the recipient.