Tony's ramblings on Open Source Software, Life and Photography

search

Enterprise Scalable Search

When your database applications start to reach multiple terabytes (that's really big for you novices, really small for fortune 500) it becomes harder to get the database performance out of a traditional database without laying out some serious cash.

We do exceptionally well with the distributed appliance model we built on top of MySQL, but as we continue to grow I'm always keeping an ear to the ground listening for the next big train.

Enter NoSQL

NoSQL is a concept more than a database. In fact it has nothing to do with the database that is actually named "nosql" which really is a relational database. The concept with NoSQL databases is that they are not your traditional database, instead storing things in key / hash pairs. Many of them are built around Apache Lucene, a text search engine.


Roll Your Own Search With SOLR

I was trying to figure out how to write a search engine that could support GamerzCrib forums.

The biggest challenge is that potentially I could be looking at over 10 million posts on the server, and something like 5 million users a month. I've seen vBulletin forums where the search became painfully slow and they had less than a million posts.

After poking around the net, I found Solr, run by the same guys who do Apache web servers. Solr is a Lucene search engine written in Java. It runs as it's own service and accepts updates to the search index, and typically provides XML output as search results, all using http as it's interface.