tom234's comments

tom234 · on Aug 14, 2008

Michael, I suggest this:

1. Don't spend any huge money on this project until it picks up significantly.

2. treat this as a hobby, which means don't lose sleep over it.

3. focus on your work that makes money if money is a factor

4. take it easy. leave the company as it is.

5. try to make this not be written in LISP. that is a lot of lines of code. convert it.

6. Keep everything going the same time. don't quit your job.

_csoo · on Aug 14, 2008

Wow, this is horrible advice.

1. You have to take risks sometimes. He's done good work and it's worth putting some cash behind it (if necessary).

2. It's not a hobby. It's a business or a research project. In either case, it's a serious thing.

3. This project could be a money-maker in two ways: corporate sponsors for a research project, or corporate clients for a business product.

4. I agree with the take it easy part, but not the other part.

5. That's a joke right?

6. That's also a joke right?

tom234 · on Aug 12, 2008

Great article. Thanks.

tom234 · on Aug 12, 2008

I wanted to post a general discussion going about creating a search engine since the Ycombinator crowd is a good crowd. How do you guys see Google scaling in comparison with how the web is expanding so much? Do you think they will keep indexing the web and adding new content? I personally don't see that to be very scalable. What do you think Google is thinking about the future in terms of incorporating so many webpages into its index. Obviously Google is using a "pull" mechanism where it crawls and indexes. Do you guys see like "push" mechanism like RSS that will work better and how would that affect the search result? Is there any way besides index to achieve a different kind of search engine? Another kind of content retrieval system is like the Digg where the user "pushes" the content to Digg and therefore it is supposedly more relevant and interesting, but the disadvantage of that becoming a search engine is there isn't a lot of content that the user submits compared with like Google. For instance Digg cannot support query like "c# string replace" while Google will do that very easily because it crawled and indexed the MSDN api pages already, while Digg users might never submit that same page to Digg. My main concern is supporting so many content with different query and being very comprehensive search engine like Google without this huge index and crawling restriction? Any clues? I'm not dreaming about this and I actually want to make it a reality somehow.

kleneway · on Aug 12, 2008

In my opinion, yes, Google will keep crawling the web and indexing in a similar (but continually optimized) way for the foreseeable future. Here's why - let's say you work at Google and come up with an entirely new way to do search. Here are your options: 1 - Go to Larry/Sergi with the proposal to completely rip and replace millions/billions of dollars of infrastructure and knowledge in favor of a completely unproven new idea. The cost and risk are so huge they'd never go for it. 2 - Go to an outside investor with the proposal to build it from scratch. If it's truly an amazing, innovative new approach, they might fund it and you'd be able to build it from scratch without the political nightmare of trying to rewrite the core product of a multi-billion dollar public company. This is exactly what both the Cuil and FriendFeed guys did (and yes, FriendFeed is very, very, very much a Google competitor).

One thing I always think about is in 100 years (or 1000 years), will we still be opening a web browser, typing a few words into the same plain Google text box, and hitting the "Search" button to get a page with 20 blue links and a bunch of ads? I doubt it. So there's definitely a better way to do search, we just haven't discovered it. My personal opinion is that some combination of social search (i.e. FriendFeed) plus human-powered search (i.e. Wikia/Mahalo) plus semantic search (PowerSet) will be involved in the next evolution. Of course, if I knew exactly what that looked like, I'd be on a beach right now instead of hanging out here on Hacker News. :)

tom234 · on Aug 12, 2008

Good reply. Thanks for replying. I personally think it ultimatily boils down to content that the search engine has. That is the index in this case for Google. I think the semantics and the semantic web will make a huge difference. How I look at it is that because Google's result for "c# string replace" http://www.google.com/search?hl=en&q=c%23+string+replace is much better than Mahalo's http://www.mahalo.com/C#_string_replace, Google is good. I was thinking about how the search will be in the next 2-3 years. The main problem I see with is the discovery of new content in search engine, and I think Google basically brute forced the whole thing by trying to index everything and hope that the results are there and which works alright for the query above. It basically boils down to 100 thousand crawlers and huge index. If social search engine cannot discover this page http://msdn.microsoft.com/en-us/library/fk49wtc1.aspx for this query "c# string replace" for instance, it is game over. Google is alive because of these kinds of results. My main concern is how to discover new content and without the burden of updating the index and bruce forcing the whole thing by indexing every word on a webpage. I know Google indexes pieces of webpages but still it is ton of words to index. I just see huge problem with creating huge index like Google and maintaining that index which is also a lot of work. Also I don't have the resources (money) to create a huge index, which is another main reason.

Good discussion by the way.

kleneway · on Aug 12, 2008

Yeah, I had some thoughts a while back on one way this could work using social search: http://tchblg.wordpress.com/2008/05/31/why-friendfeed-deserv...

Basically, you'd get results back that are from conversations posted by your friends, side-by-side with traditional results from a search engine (using something like Yahoo's BOSS). Then, if you still didn't get the answer you are looking for, you can broadcast the original query out to your online contact list to see if anyone within your social network knows the answer. You wouldn't get instant results, but it would likely be a very good, trusted result. This probably works better for things like opinions (what's the best Chinese restaurant in Seattle?) vs. a fact-based search like "C# string replace".

I would seriously avoid trying to take on Google or MS or Yahoo by trying to out-index them. Cuil has millions in the bank and some of the world's foremost experts in search at the helm, and things aren't looking all that bright so far. There are lots of problems to solve in the world and lots of approaches on how to solve them - solving search by out-indexing Google should probably be pretty darn low on your "problems to solve today" list.

tom234 · on Aug 12, 2008

I will definitely check out your links. Out indexing Google or Microsoft live search is definitely out of the question for me just because of the amount of data among others. Social search with like Yahoo BOSS might work. I will look into fact-based search using social search and conventional search or some kind of combo.

tom234 · on Aug 11, 2008

This UI design is critical to search engine is just nonsense. All that matters in search engine is the result. You can have the crappiest interface and the best result, you will become billionaire. Simple as that. Design in search engine doesn't matter at all. All that matters is the result.

arockwell · on Aug 11, 2008

Design matters because I need to be able to interpret the search engines results as quickly as possible. Google's lightweight design really shines because it loads instantly and I can usually look through the result page very quickly and figure out which link is the most relevant to my query and click on it.

However, that's not always true, and I sometimes have to click on 3 or 4 links to figure out if I need to refine my search or not. If someone found a better way to organize the search results, so that I can determine which ones are relevant faster that would be a big win.

I do agree with you that the quality of the search results are a lot more important though, but presentation definitely does matter.

tom234 · on July 30, 2008

On "new news:"

I'm not concerned about how the newspapers and producers like MSNBC and Washington Post create the story. The main problem is how do the search engine or whatever can search that news. The problem with Google News is that google is crawling and indexing all the webpage news like it was a webpage and it is not very up to date. If there was a fire right now, you won't see that in google right now. On the other hand, RSS is a different topic. First of all not all news producers have RSS. My main question is, if there is a flooding happening right now, what is the best way to find that information. You can't use google now, because they haven't crawled and indexed it. I'm thinking of news search/aggregator realtime. Another nice thing would be if I search "flooding" it should display many news sources so that I can get some perspective. What is the best way to approach this problem. Crawling and indexing is not the way to go I think if it is needed to be instant.

ckjohnston · on April 27, 2009

This is dead on. But it looks like, to some degree, Twitter Search is solving this problem. Twitter is able to do this because, luckily, Twitter has a base of users that are willing to supply the instantaneous news.

As a real-life example, a couple of weeks ago there was a minor earthquake. It was my first time experiencing one so I searched on Google to see what the deal was. I couldn't find anything. Then I turned to Twitter and saw tons of posts coming in about the earthquake.