Tony's ramblings on Open Source Software, Life and Photography

Roll Your Own Search With SOLR

I was trying to figure out how to write a search engine that could support GamerzCrib forums.

The biggest challenge is that potentially I could be looking at over 10 million posts on the server, and something like 5 million users a month. I've seen vBulletin forums where the search became painfully slow and they had less than a million posts.

After poking around the net, I found Solr, run by the same guys who do Apache web servers. Solr is a Lucene search engine written in Java. It runs as it's own service and accepts updates to the search index, and typically provides XML output as search results, all using http as it's interface.

The documentation is pretty cryptic for someone not familiar with Java, but that's about the only bad thing I can say about it. The version 1.3 'nightly builds' even has a PHP specific interface you can use.

If running Solr on an Internet connected machine, make sure you lock it down with a firewall, because there's no authentication needed to update or query the index. You don't want some script kiddie screwing with your search backend.

So far I've got my PHP scripts talking to it using CURL, so the next step is to write the code to actually add forum posts to the index, delete those entries if the post is deleted, and provide a search interface.


Toby's picture

How did it smoke?

Did you ever get this to work? I'm currently looking at the same thing.

Cheers

tony's picture

I never actually finished the

I never actually finished the project. It was working, but I didn't finish the integration with the site. Real work took precedence and I had to move on...

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <br> <p>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for preventing automated spam submissions. It is case sensitive.
Image CAPTCHA
Enter the characters shown in the image.