Vertical Search Engine in .NET

Posted Saturday, April 9, 2011 in Old JamesCMS Posts

I'm currently working on a project to set up a vertical search engine. In other words its a search engine that can crawl a specfic portion of the web. I'm working with a team and we haven't determined what criteria to narrow our search field to but we were able to find a great open source search engine built in C#. Arachnode.NET has support for keyword filtering and even bayessian classification. The search engine index is done with lucene.net which provides modern indexing functions and there's even support to use MS SQL full-text indexing. It's nice to see an open source .NET project on par with those in Apache's Software Foundation.

Summary

The vertical search engine project involved setting up a vertical search engine to crawl the web for pages related to steganography. The crawler and query page are from the C# open source project Arachnode.NET.

Query the results at the search page (removed)

The search page is a “Google-like” search engine that can search the crawl results.

Download the lucene.net full-text index

The full-text index can be downloaded and viewed using Luke.

See the development screeenshots

Screenshots have been taken during the configuration, web crawling and setup of the project.

See the code tab for the Google custom search (removed)

Google custom search has been set up here to compare results to our own vertical search engine.

Download the list of domains crawled

There were a total of 793 domains crawled.

Download the list of webpages crawled

There were a total of 15,570 webpages crawled. All of which were from the 793 domains listed above. These were stored because they were found to be related to steganography.