Laurent Kempé - Indexing and searching business entities using Lucene.Net Framework, part 2

Conception using generics and reflection of a search engine to index and search content in your business entities without being intrusive.

Part 1 is available following this link Indexing and searching business entities using Lucene.Net Framework, part 1

Lucene.Net presentation

Lucene.Net is an open source project coming from the Java world currently incubating at the Apache Software Foundation (ASF). It is a source code port on the .NET platform using C#, done class-by-class, API-per-API, of the indexing and searching engine algorithms of Java Lucene.

Apache Lucene is an efficient indexing and searching engine for text data. However it is not offering integrated support for document like Office Word or PDF, you need to use extensions able to extract the text content of a document in order to be able index it. This is also mandatory for markup documents like HTML.

Lucene.Net follows scrupulously the APIs defined in the classes of the original Lucene Java version. The API names as well as the class names are preserved with the intention to follow naming guidelines of the C# language. For example, the method Hits.length() of the Java implementation is written Hits.Length() in its C# version.

Like the port of the APIs and the classes in C#, the algorithm of the Java version of Lucene is also ported in the C# version. This means that an index created using the Java version of Lucene is 100% compatible with it C# version, in reading, writing and updating. Therefore two processes, one written in Java and the other in C#, could achieve concurrent searches using the same index.

You might consult the documentation of the last stable version, version 2.0, on the following page. To download the last stable version browse to this page. To get more information about Lucene I recommend using the pages dedicated to the Java version of Lucene which are much more consistent.

Lucene.Net Architecture

The lower layer is the data access layer (Storage). Then, the upper layer is about accessing the index files (data access). This layer is used by the indexing system and the searching system. On top of those we find a layer for searching and a search request parser layer used by the searching part of Lucene.Net. Identically we found a parser layer and a document layer used for the indexation part of Lucene.Net.

To get more information about Lucene I recommend reading the presentation on Lucene website.

Now that we got a better view on what is Lucene.Net about we will see in the next part how we will use it to index the properties of our business entities.

This post is cross-posted on innoveo blog and in French on my .NET community portal Tech Head Brothers.