Tony's ramblings on Open Source Software, Life and Photography

Why I need Linux and not Windows

On a daily basis, I do things on our office network that I just can't do in Windows. Don't get me wrong, there's a couple of things I do in Windows that I just can't do yet in Linux, but there's more actual things that get done to our data in Linux that it would just be painful to do in Windows.

We aren't your typical business. We are so driven by a custom workflow that off the shelf products just don't fit the bill close enough without paying some outside consulting company to custom write middleware to link different tasks together. Our processes while not unusual individually are extremely different from your average company due to the volume of paper we scan in a given month.

Enter Linux.All but one server in our facility is running Linux on an AMD Opteron 64 bit processor. The one missing link runs a proprietary imaging package that we use for indexing and publishing all of our documents. Initially our entire workflow went through that one Windows application, but we soon found that when you're scanning 2 million pages a month, MsSQL would quickly crush under it's own weight. Not that MsSQL by itself can't handle that kind of load - it really seems to be more related to the software we used.

Initially those 2 million pages would be scanned as batches and uploaded into the database. Then the indexing software would copy that data to a workstation, split it up into documents, allow a user to index them and then send the individual documents back to the database. By the time we were done I had 4 million pages on the server, 2 million of which needed deleted. Not to mention all 2 million pages would travel across the network not once, not twice, but a minimum of FOUR times when it was all done, using some sort of inefficient ActiveX component to send the data.

Our first foray into Linux in the process came about by this need to reduce the load on that proprietary software. Now we scan the documents and use NFS to transfer them automatically to a holding and processing server that runs Linux. When it comes time to index those files, they are again transferred using NFS to a Windows workstation where they are split up and indexed. After indexing, they are sent for the first time to the Windows server for QC and publishing.

Previously the time to send a scan batch to the server would take 5 minutes. Now it's about 30 seconds. The time to pull a batch down to be indexed used to be 20 minutes, now it's about 8. I was also able to put in some automatic backup processes while that data is waiting on the Linux server that allows me to reference an old batch up to 30 days later if necessary.

Our goal is to replace the Windows server for at least 3/4 of our production line entirely by EOY 2007. I think we'll hit it closer to July.


Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <br> <p>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for preventing automated spam submissions. It is case sensitive.
Image CAPTCHA
Enter the characters shown in the image.