Solr: The Plat­form that Makes Search Easy

This ar­ti­cle gives a de­tailed de­scrip­tion of Solr. It is aimed at geeks who have built a Web ap­pli­ca­tion and want to make the search for it faster.

OpenSource For You - - Contents - By: Neetesh Mehro­tra The au­thor works at TCS as a sys­tems en­gi­neer. His ar­eas of in­ter­est are Java devel­op­ment and au­to­ma­tion test­ing. He can be con­tacted at mehro­[email protected] gmail.com.

Solr (pro­nounced So­lar) was cre­ated by Yonik See­ley in 2004. Apache Solr is an open source search plat­form based on the Apache Lucene search li­brary. In a sense, Solr is a cover over the Apache Lucene li­brary. It uses the Lucene class to cre­ate an in­verted in­dex. Apache Solr is built upon Java. Most search web­sites are to­day built upon Solr, of which the most pop­u­lar are e-com­merce sites. Solr runs on stand­alone full-text search servers. The Lucene Java search li­brary is used to cre­ate an in­dex, which is used by Solr to per­form searches. Solr has an HTML/ XML in­ter­face and JSON APIs on which search ap­pli­ca­tions are cre­ated. In lay­man’s terms, Solr can be de­fined as fol­lows: “Apache Solr is a search engine—you in­dex a set of doc­u­ments and then query Solr to re­turn a set of doc­u­ments that matches the user query.”

Solr gained pop­u­lar­ity be­cause it is fast, and can in­dex and search mul­ti­ple sites. It re­turns rel­e­vant con­tent based on the search query’s tax­on­omy. Other im­por­tant fea­tures are: faceted searching, dy­namic clus­ter­ing, data­base in­te­gra­tion, NoSQL fea­tures, rich doc­u­ment han­dling, a com­pre­hen­sive ad­min­is­tra­tion in­ter­face, high scal­a­bil­ity and fault tol­er­ance, as well as real-time in­dex­ing.

The pre­ferred en­vi­ron­ment for the pur­pose of this ar­ti­cle is Solr-5.3.0. Be­fore you be­gin the Solr in­stal­la­tion, make sure you have JDK in­stalled and Java_Home is set ap­pro­pri­ately.

Solr makes search easy

To un­der­stand how Solr makes search easy, we first have to con­fig­ure it. As I am us­ing Windows 7, I will demon­strate

Solr in Windows. Solr is open source, so it can eas­ily be down­loaded from the Solr web­site. But re­mem­ber, to run

Solr, you need to have JRE in­stalled in your system. For a Windows system, down­load the .zip file. To get started, all you need to do is ex­tract the Solr dis­tri­bu­tion archive to a direc­tory of your choos­ing. Once ex­tracted, you are ready to run Solr in your system.

You can start Solr by run­ning bin\Solr.cmd, if you are us­ing Windows. This will start Solr on port 8983. If ev­ery­thing is fine, Solr will start by say­ing, ‘Happy Search!’. If you want to change the port num­ber, then give the com­mand Solr start -p port num­ber, us­ing cmd. You can also use the Solr -help com­mand to know the var­i­ous types of com­mands in Solr. When Solr starts run­ning, it will gen­er­ate a URL to see the ad­min con­sole. Com­monly, the URL is http://

lo­cal­host:8983/Solr/

Now, to start the search in Solr, you first have to cre­ate the core file, by giv­ing the com­mand Solr cre­ate -c core_­name, in cmd. The core file mainly con­tains con­fig­u­ra­tion files in the conf folder.

Solr makes it easy to search for an ap­pli­ca­tion. You just have to de­fine a schema ac­cord­ing to your dataset in schema.xml, which is a file that de­fines the rep­re­sen­ta­tion of the doc­u­ments that are in­dexed/ingested in Solr, i.e., the set of data fields that they con­tain. For ex­am­ple, a news­pa­per ar­ti­cle may con­tain a ti­tle, au­thor name, body text, date, etc. You have to men­tion the data type of those fields in schema.xml. Fig­ure 2 shows how to de­fine data fields in schema.xml.

Af­ter defin­ing Schema, de­ploy Solr into your data ei­ther in the data­base or by up­load­ing data files. Fig­ure 3 shows how to upload data in Solr in var­i­ous for­mats in or­der to sup­ply Solr doc­u­ments for each user who searches.

As Solr is based upon open stan­dards, it is highly ex­ten­si­ble.

Solr fires queries that are REST­ful in na­ture, which means that the query is as sim­ple as an HTTP re­quest URL and the re­sponse is an or­dered doc­u­ment – mainly, XML, CSV, JSON or some other for­mats.

Ba­sic syn­tax, schema.xml, with an ex­am­ple

Fields: You can sim­ply de­fine a field name or dataset in schema.xml as fol­lows: field name =”string” in­dexed=”true” stored=”true” mul­ti­Valued=”true” re­quired=”true”

Here, ‘name’ is the name of the field type. This value is used in field def­i­ni­tions, in the ‘type’ at­tribute. ‘In­dexed’ de­fault value is true—the value of the field can be used to query and to re­trieve match­ing doc­u­ments. The ‘stored’ de­fault value is true—the ac­tual value of the field can be re­trieved by queries. ‘mul­ti­Valued’ de­fault value

is false; if true, it in­di­cates that a sin­gle doc­u­ment might con­tain mul­ti­ple val­ues. ‘re­quired’ de­fault value is false—it in­structs Solr to re­ject any at­tempt to add a doc­u­ment that does not have a value for this field. Fig­ure 2 shows how to de­fine field type.

Dy­namic fields: Solr has strong data typ­ing for fields, yet it pro­vides flex­i­bil­ity us­ing ‘Dy­namic fields’.

Us­ing the <dy­namic field> tag dec­la­ra­tion, you can cre­ate field rules that Solr will use to un­der­stand what data type should be used when­ever it is given a field name that is not ex­plic­itly de­fined, but matches a pre­fix or suf­fix used in the dy­namic field.

For ex­am­ple, dy­namic field af­fir­ma­tion tells Solr that when­ever it sees a name end­ing in ‘_i’ which is not an ex­plic­itly de­fined field, then it should dy­nam­i­cally cre­ate an in­te­ger field like what’s shown be­low:

<dy­nam­icField name =”*_i” type=”in­te­ger” in­dexed=”true” stored=”true”/>

Copy fields: The <copy­Field> dec­la­ra­tion is used to in­struct Solr that you want to du­pli­cate any data it sees in the ‘source’ field of doc­u­ments that are added to the in­dex, in the ‘des­ti­na­tion’ field of that doc­u­ment. But re­mem­ber that datatypes of the fields are com­pat­i­ble. Du­pli­ca­tion of fields is done be­fore any anal­y­sers are in­voked. By the use of copy field, Solr will be able to search by sim­ply giv­ing the value and not defin­ing the tag. For ex­am­ple, if there is a field like Roll num­ber, with­out the use of copy­Field, we can search by giv­ing the com­mand ‘rollno:10’.

With the use of copy­Field, the syn­tax of the copy field dec­la­ra­tion is:

<copy­Field source=”rollno” dest=”text”/>

Now we can search just by typ­ing ‘10’ and Solr will give you the same re­sult.

You can query by click­ing on the query tab which looks like what’s shown in Fig­ure 6.

Solr vs Elas­tic Search (ES)

Solr and Elas­tic Search are com­pet­ing search servers. Both are built on top of Lucene, so their core fea­tures are identical. In fact, Solr and ES are so sim­i­lar that an ES plug-in al­lows you to use Solr client/tools with ES. For most func­tions, there is no gen­uine cause to choose Solr over ES, or vice versa. But there are some mi­nor dif­fer­ences listed in the ta­ble shown in Fig­ure 7.

De­ploy­ing Solr with Tom­cat

Please note, this de­ploy­ment is done in Solr 5.1. The pre­req­ui­sites are listed be­low.

1. In­stall the Tom­cat Servlet Con­tainer

on the system as fol­lows: C:\Solr with Tom­cat\apa­chetom­cat-7.0.57

(We will re­fer to it as TOMCAT_ HOME.)

2. Down­load and ex­tract the Solr

pack­age as fol­lows: C:\Solr-5.1.0\Solr-5.1.0 Now fol­low the steps given be­low. 1) Copy the Solr direc­tory from the ex­tracted pack­age (C:\Solr-5.1.0\ Solr-5.1.0\server\Solr) into the home direc­tory of Tom­cat—into TOMCAT_HOME\bin or into TOMCAT_HOME if Tom­cat is to be

started as a ser­vice.

2) Copy the Solr war file (C:\Solr-5.1.0\ Solr-5.1.0\server\we­bapps\Solr.war) into the Tom­cat Web apps direc­tory, TOMCAT_HOME/we­bapps. Tom­cat will au­to­mat­i­cally de­ploy it.

3) Copy all the five jar files from the Solr ex­tracted pack­age (C:\Solr5.1.0\Solr-5.1.0\server\lib\ext) to the TOMCAT_HOME\lib direc­tory.

• jul­to­slf4j­1.7.7,

• jcl­over­slf4j­1.7.7,

• slf4j­log4j12­1.7.7,

• slf4j­api­1.7.7,

• log4j­1.2.17

4) Copy the log4j.prop­er­ties file from the C:\Solr-5.1.0\Solr-5.1.0\ server\re­sources direc­tory to the TOMCAT_HOME\lib direc­tory. 5) Af­ter Solr.war is ex­tracted: a) Cre­ate the core (if not al­ready

done). b) Copy the fur­ther needed lis­tener-spe­cific jars to C:\

Solr with the Tom­cat\apa­chetom­cat-7.0.57\we­bapps\Solr\ WEB-INF\lib direc­tory. c) Con­fig­ure the en­try for lis­ten­erre­lated jars to the sched­uler in the C:\Solr With Tom­cat\ apache-tom­cat-7.0.57\we­bapps\ Solr\WEB-INF\web.xml file. d) Mod­ify the server port in the prop­er­ties file of the sched­uler (for ex­am­ple, C:\

Solr with Tom­cat\apa­chetom­cat-7.0.57\bin\Solr\conf\ dataim­portSched­uler.prop­er­ties). e) In or­der to run the dataim­porthandler, copy the jars spe­cific to the data-im­porthandler and for the cus­tom trans­former, if any, in the folder C:\Solr With Tom­cat\apache-tom­cat-7.0.57\ bin\Solr\lib.

• Solr­dataim­porthandler­5.1.0.jar

• Solr­dataim­porthandler­ex­tras5.1.0.jar

• ojdbc14­10.2.0.5.jar

• Sql­toSol­rDateTrans­former.jar and change the en­try in Sol­r­con­fig.xml for the same as

<!­­ for data im­port han­dler ­­> <lib dir=”../lib/” regex=”Sol­r­dataim­porthandler­\d.*\.jar” />

<lib dir=”../lib/” regex=”ojdbc14\d.*\.jar” />

<!­­ For Cus­tom Date Tran­former ­­>

<lib dir=”../lib/” regex=”Sql­toSol­rDateTrans­former.jar” />

Start the Tom­cat server from TOMCAT_HOME\bin\startup.bat and lo­cate the Web browser to http:// lo­cal­host:8080/Solr (change the port if nec­es­sary).The ad­min page of Solr will be dis­played.

What next?

Con­grat­u­la­tions! You have un­der­stood the ba­sics of Solr. You have learned about its syn­tax and dif­fer­ent fields, how it makes search easy, the dif­fer­ences be­tween Elas­tic Search and Solr, and how to de­ploy Tom­cat with Solr. The next step is dis­cov­er­ing how to con­nect Java (or any other lan­guage) with Solr, or, how to in­te­grate Solr with your ap­pli­ca­tion.

Fig­ure 6: Query

Fig­ure 4: Dy­namic fields

Fig­ure 5: Copy fields

Fig­ure 2: Schema.xml

Fig­ure 3: Doc­u­ment im­port

Fig­ure 1: Ad­min home­page of Solr

Fig­ure 7: Dif­fer­ences be­tween Solr and ES

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.