Blog & News

Articles, insights and thinking from software development vendor

Service for a text search "Solr"

Solr is a platform for a "full text search", developed on the base of the Apache Lucene library. Simply, it is an independent service for a text search among different documents. Why should we use Solr, if the original library has very impressive features and is suitable for the variety of purposes? Solr provides not only a convenient mechanism for data receiving, but also has many additional components. Its launch as a separate service allows you to connect it to any application. There is no need in serious development, if you are going to use Solr; you can implement it on the latest stages of the project development. Finally, the interaction is performed through HTTP API, making the connection easy and convenient.

As it was said, the service implementation enables flexible settings for specific tasks. Data processing mechanism consists of a variety of modules which are connected to the core. The number of modules is constantly growing and that leads to the appearing of new specific and universal components. Also, if it is necessary, you can create your own component. Moreover, Solr has a built-in mechanism of query caching, thus achieving the high speed of data receiving (even with the non-optimized query logic). There are a lot of out-of-the-box modules, which can completely solve almost any imaginable task.

Quick search is achieved through a complex indexing mechanism. Indexing mechanism is automated, but it can be set according to the specific needs. We should also mention the ease of the use of the service. The query itself is a number of different parameters that allows to select, filter and sort the necessary data.  You can specify the desired format of sent / received data (the main ones are XML and JSON). This flexibility makes it possible to use any convenient mechanism for the interaction with the service. There are already several implementations (and server, and client) for more simple data selection. But to get the work started there is no need to implement any data request or data receive process. Solr has a web interface that is used for processing, testing and data analysis. In the filter constructor, you can generate the required URL, perform the query and see the result. For a more detailed view you can enable debug-mode, in which a request for metadata appears and indicates the reasons of selecting each of the documents. This way you can debug the request.

Master Of Query Building, Solr, Apache Lucene library, full text search, a text search among different documents 

Fig. 1 Master of query building

Below there is an example of corresponding request.

{
  "responseHeader": {
    "status": 0,
    "QTime": 3,
    "params": {
      "q": "*:*",
      "indent": "true",
      "wt": "json",
      "_": "1417524247869"
    }
  },
  "response": {
    "numFound": 17,
    "start": 0,
    "docs": [
      {
        "id": "SP2514N",
        "name": "Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133",
        "manu": "Samsung Electronics Co. Ltd.",
        "cat": [
          "electronics",
          "hard drive"
        ],
        "features": [
          "7200RPM, 8MB cache, IDE Ultra ATA-133",
          "NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor"
        ],
        "price": 92,
        "price_c": "92,USD",
        "popularity": 6,
        "inStock": true,
        "_version_": 1471069944284184600
      },
      {
        "id": "6H500F0",
        "name": "Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300",
        "manu": "Maxtor Corp.",
        "cat": [
          "electronics",
          "hard drive"
        ],
        "features": [
          "SATA 3.0Gb/s, NCQ",
          "8.5ms seek",
          "16MB cache"
        ],
        "price": 350,
        "price_c": "350,USD",
        "popularity": 6,
        "inStock": true,
        "_version_": 1471069944287330300
      },
      {
        "id": "F8V7067-APL-KIT",
        "name": "Belkin Mobile Power Cord for iPod w/ Dock",
        "manu": "Belkin",
        "cat": [
          "electronics",
          "connector"
        ],
        "features": [
          "car power adapter, white"
        ],
        "weight": 4,
        "price": 19.95,
        "price_c": "19.95,USD",
        "popularity": 1,
        "inStock": false,
        "_version_": 1471069944289427500
      }
    ]
  }
}

However, there are several important actions that have to be considered when using Solr. A common format of the processed entity should be defined for the easy search setting. It is important to decide in what form the data will be received for indexing, because filtering capabilities depend on it. Also, because the original data is indexed, it is not stored in its original form. If you need it - you need to specify it during configuration. Depending on the type of the field, its destination and the amount of information you can select the optimal set of modules.

It should be noted that the platform contains a number of useful features, one of which is automatic re-indexing of documents. This means that all documents must be uniquely identified, and in case of a repeated document the indexes are updated. This is useful when changing the set of processing modules. However, it is necessary to determine the composition of the modules, because a large number of indexing documents may take a considerable period of time.

If you need to implement system into a .Net application, the best solution is to use the library SolrNet. It is a client for cooperation with Solr. It helps to generate the query string automatically. The project has an open source, so you can add the necessary functionality as it is needed.

The conclusion

So, we found that Solr is ideally suited for widely demanded tasks, as well as for solving more specific tasks related to the search. Below is a list of the main features that are easy to implement:

  • Filters
  • Synonyms, abbreviations
  • Spell Checking or auto-correction. If words are entered incorrectly, system corrects spelling "Did you mean ..."
  • Automatic addition
  • Replacement tokens
  • Search within results (using caching intermediate results of queries)
  • Stop Words
  • Support of the phrase search and word search with the exception
  • Results sorting
  • The ability to automatically update the indexes

If your goals are overlap by the given list - you should think about using Solr, especially in the case of the late stage of project development.

Order a phone call

Convenient time to call:

Cancel

Get in touch

Attach
Your file up to 30 mb
Cancel