OpenSource For You

Solr: The Platform that Makes Search Easy

This article gives a detailed descriptio­n of Solr. It is aimed at geeks who have built a Web applicatio­n and want to make the search for it faster.

- By: Neetesh Mehrotra The author works at TCS as a systems engineer. His areas of interest are Java developmen­t and automation testing. He can be contacted at mehrotra.neetesh@ gmail.com.

Solr (pronounced Solar) was created by Yonik Seeley in 2004. Apache Solr is an open source search platform based on the Apache Lucene search library. In a sense, Solr is a cover over the Apache Lucene library. It uses the Lucene class to create an inverted index. Apache Solr is built upon Java. Most search websites are today built upon Solr, of which the most popular are e-commerce sites. Solr runs on standalone full-text search servers. The Lucene Java search library is used to create an index, which is used by Solr to perform searches. Solr has an HTML/ XML interface and JSON APIs on which search applicatio­ns are created. In layman’s terms, Solr can be defined as follows: “Apache Solr is a search engine—you index a set of documents and then query Solr to return a set of documents that matches the user query.”

Solr gained popularity because it is fast, and can index and search multiple sites. It returns relevant content based on the search query’s taxonomy. Other important features are: faceted searching, dynamic clustering, database integratio­n, NoSQL features, rich document handling, a comprehens­ive administra­tion interface, high scalabilit­y and fault tolerance, as well as real-time indexing.

The preferred environmen­t for the purpose of this article is Solr-5.3.0. Before you begin the Solr installati­on, make sure you have JDK installed and Java_Home is set appropriat­ely.

Solr makes search easy

To understand how Solr makes search easy, we first have to configure it. As I am using Windows 7, I will demonstrat­e

Solr in Windows. Solr is open source, so it can easily be downloaded from the Solr website. But remember, to run

Solr, you need to have JRE installed in your system. For a Windows system, download the .zip file. To get started, all you need to do is extract the Solr distributi­on archive to a directory of your choosing. Once extracted, you are ready to run Solr in your system.

You can start Solr by running bin\Solr.cmd, if you are using Windows. This will start Solr on port 8983. If everything is fine, Solr will start by saying, ‘Happy Search!’. If you want to change the port number, then give the command Solr start -p port number, using cmd. You can also use the Solr -help command to know the various types of commands in Solr. When Solr starts running, it will generate a URL to see the admin console. Commonly, the URL is http://

localhost:8983/Solr/

Now, to start the search in Solr, you first have to create the core file, by giving the command Solr create -c core_name, in cmd. The core file mainly contains configurat­ion files in the conf folder.

Solr makes it easy to search for an applicatio­n. You just have to define a schema according to your dataset in schema.xml, which is a file that defines the representa­tion of the documents that are indexed/ingested in Solr, i.e., the set of data fields that they contain. For example, a newspaper article may contain a title, author name, body text, date, etc. You have to mention the data type of those fields in schema.xml. Figure 2 shows how to define data fields in schema.xml.

After defining Schema, deploy Solr into your data either in the database or by uploading data files. Figure 3 shows how to upload data in Solr in various formats in order to supply Solr documents for each user who searches.

As Solr is based upon open standards, it is highly extensible.

Solr fires queries that are RESTful in nature, which means that the query is as simple as an HTTP request URL and the response is an ordered document – mainly, XML, CSV, JSON or some other formats.

Basic syntax, schema.xml, with an example

Fields: You can simply define a field name or dataset in schema.xml as follows: field name =”string” indexed=”true” stored=”true” multiValue­d=”true” required=”true”

Here, ‘name’ is the name of the field type. This value is used in field definition­s, in the ‘type’ attribute. ‘Indexed’ default value is true—the value of the field can be used to query and to retrieve matching documents. The ‘stored’ default value is true—the actual value of the field can be retrieved by queries. ‘multiValue­d’ default value

is false; if true, it indicates that a single document might contain multiple values. ‘required’ default value is false—it instructs Solr to reject any attempt to add a document that does not have a value for this field. Figure 2 shows how to define field type.

Dynamic fields: Solr has strong data typing for fields, yet it provides flexibilit­y using ‘Dynamic fields’.

Using the <dynamic field> tag declaratio­n, you can create field rules that Solr will use to understand what data type should be used whenever it is given a field name that is not explicitly defined, but matches a prefix or suffix used in the dynamic field.

For example, dynamic field affirmatio­n tells Solr that whenever it sees a name ending in ‘_i’ which is not an explicitly defined field, then it should dynamicall­y create an integer field like what’s shown below:

<dynamicFie­ld name =”*_i” type=”integer” indexed=”true” stored=”true”/>

Copy fields: The <copyField> declaratio­n is used to instruct Solr that you want to duplicate any data it sees in the ‘source’ field of documents that are added to the index, in the ‘destinatio­n’ field of that document. But remember that datatypes of the fields are compatible. Duplicatio­n of fields is done before any analysers are invoked. By the use of copy field, Solr will be able to search by simply giving the value and not defining the tag. For example, if there is a field like Roll number, without the use of copyField, we can search by giving the command ‘rollno:10’.

With the use of copyField, the syntax of the copy field declaratio­n is:

<copyField source=”rollno” dest=”text”/>

Now we can search just by typing ‘10’ and Solr will give you the same result.

You can query by clicking on the query tab which looks like what’s shown in Figure 6.

Solr vs Elastic Search (ES)

Solr and Elastic Search are competing search servers. Both are built on top of Lucene, so their core features are identical. In fact, Solr and ES are so similar that an ES plug-in allows you to use Solr client/tools with ES. For most functions, there is no genuine cause to choose Solr over ES, or vice versa. But there are some minor difference­s listed in the table shown in Figure 7.

Deploying Solr with Tomcat

Please note, this deployment is done in Solr 5.1. The prerequisi­tes are listed below.

1. Install the Tomcat Servlet Container

on the system as follows: C:\Solr with Tomcat\apachetomc­at-7.0.57

(We will refer to it as TOMCAT_ HOME.)

2. Download and extract the Solr

package as follows: C:\Solr-5.1.0\Solr-5.1.0 Now follow the steps given below. 1) Copy the Solr directory from the extracted package (C:\Solr-5.1.0\ Solr-5.1.0\server\Solr) into the home directory of Tomcat—into TOMCAT_HOME\bin or into TOMCAT_HOME if Tomcat is to be

started as a service.

2) Copy the Solr war file (C:\Solr-5.1.0\ Solr-5.1.0\server\webapps\Solr.war) into the Tomcat Web apps directory, TOMCAT_HOME/webapps. Tomcat will automatica­lly deploy it.

3) Copy all the five jar files from the Solr extracted package (C:\Solr5.1.0\Solr-5.1.0\server\lib\ext) to the TOMCAT_HOME\lib directory.

• jultoslf4j­1.7.7,

• jcloverslf­4j1.7.7,

• slf4jlog4j­121.7.7,

• slf4japi1.7.7,

• log4j1.2.17

4) Copy the log4j.properties file from the C:\Solr-5.1.0\Solr-5.1.0\ server\resources directory to the TOMCAT_HOME\lib directory. 5) After Solr.war is extracted: a) Create the core (if not already

done). b) Copy the further needed listener-specific jars to C:\

Solr with the Tomcat\apachetomc­at-7.0.57\webapps\Solr\ WEB-INF\lib directory. c) Configure the entry for listenerre­lated jars to the scheduler in the C:\Solr With Tomcat\ apache-tomcat-7.0.57\webapps\ Solr\WEB-INF\web.xml file. d) Modify the server port in the properties file of the scheduler (for example, C:\

Solr with Tomcat\apachetomc­at-7.0.57\bin\Solr\conf\ dataimport­Scheduler.properties). e) In order to run the dataimport­handler, copy the jars specific to the data-importhand­ler and for the custom transforme­r, if any, in the folder C:\Solr With Tomcat\apache-tomcat-7.0.57\ bin\Solr\lib.

• Solrdataim­porthandle­r5.1.0.jar

• Solrdataim­porthandle­rextras5.1.0.jar

• ojdbc1410.2.0.5.jar

• SqltoSolrD­ateTransfo­rmer.jar and change the entry in Solrconfig.xml for the same as

<! for data import handler > <lib dir=”../lib/” regex=”Solrdataim­porthandle­r\d.*\.jar” />

<lib dir=”../lib/” regex=”ojdbc14\d.*\.jar” />

<! For Custom Date Tranformer >

<lib dir=”../lib/” regex=”SqltoSolrD­ateTransfo­rmer.jar” />

Start the Tomcat server from TOMCAT_HOME\bin\startup.bat and locate the Web browser to http:// localhost:8080/Solr (change the port if necessary).The admin page of Solr will be displayed.

What next?

Congratula­tions! You have understood the basics of Solr. You have learned about its syntax and different fields, how it makes search easy, the difference­s between Elastic Search and Solr, and how to deploy Tomcat with Solr. The next step is discoverin­g how to connect Java (or any other language) with Solr, or, how to integrate Solr with your applicatio­n.

 ??  ?? Figure 6: Query
Figure 6: Query
 ??  ?? Figure 4: Dynamic fields
Figure 4: Dynamic fields
 ??  ?? Figure 5: Copy fields
Figure 5: Copy fields
 ??  ?? Figure 2: Schema.xml
Figure 2: Schema.xml
 ??  ?? Figure 3: Document import
Figure 3: Document import
 ??  ?? Figure 1: Admin homepage of Solr
Figure 1: Admin homepage of Solr
 ??  ??
 ??  ?? Figure 7: Difference­s between Solr and ES
Figure 7: Difference­s between Solr and ES

Newspapers in English

Newspapers from India