Us­ing Mon­goDB to Im­prove the IT Per­for­mance of an En­ter­prise

This ar­ti­cle tar­gets de­vel­op­ers and ar­chi­tects who are look­ing for open source adop­tion in their IT ecosys­tems. The au­thors de­scribe an ac­tual en­ter­prise sit­u­a­tion, in which they adopted Mon­goDB in their work flow to speed up pro­cesses.

OpenSource For You - - Contents - By: Raj Thi­lak and Gau­tham D.N. Raj Thi­lak is a lead de­vel­oper at Dell Tech­nolo­gies. He can be reached at Ra­j_Thi­ Gau­tham D.N. is an IT man­ager at Dell Tech­nolo­gies. He can be con­tacted at Gau­

In the last decade or so, the amount of data gen­er­ated has grown ex­po­nen­tially. The ways to store, man­age and vi­su­alise data have shifted from the old le­gacy meth­ods to new ways. There has been an ex­plo­sion in the num­ber and va­ri­ety of open source data­bases. Many are de­signed to pro­vide high scal­a­bil­ity, fault tol­er­ance and have core ACID data­base fea­tures. Each open source data­base has some spe­cial fea­tures and, hence, it is very im­por­tant for a de­vel­oper or any en­ter­prise to choose with care and an­a­lyse each spe­cific prob­lem state­ment or use case in­de­pen­dently. In this ar­ti­cle, let us look at one of the open source data­bases that we eval­u­ated and adopted in our en­ter­prise ecosys­tem to suit our use cases.

Mon­goDB, as de­fined in its doc­u­men­ta­tion, is an open source, cross-plat­form, doc­u­ment-ori­ented data­base that pro­vides high per­for­mance, high avail­abil­ity and easy scal­a­bil­ity.

Mon­goDB works with the con­cept of col­lec­tions, which you can as­so­ciate with the ta­ble in an RDBMS like MySQL and Or­a­cle. Each col­lec­tion is made up of doc­u­ments (like XML, HTML or JSON), which are the core en­tity in Mon­goDB and can be com­pared to a log­i­cal row in Or­a­cle data­bases.

Mon­goDB has a flex­i­ble schema as com­pared to the nor­mal Or­a­cle DB. In the lat­ter we need to have a def­i­nite ta­ble with well-de­fined col­umns and all the data needs to fit the ta­ble row type. How­ever, Mon­goDB lets you store data in the form of doc­u­ments, in JSON for­mat and in a non­re­la­tional way. Each doc­u­ment can have its own for­mat and struc­ture, and be in­de­pen­dent of oth­ers. The trade-off is the in­abil­ity to per­form joins on the data. One of the ma­jor shifts that we as de­vel­op­ers or ar­chi­tects had to go through while adopt­ing Mongo DB was the mind­set shift — of get­ting used to stor­ing nor­malised data, get­ting rid of re­dun­dancy in a world where we need to store all the pos­si­ble data in the form of doc­u­ments, and han­dling the prob­lems of con­cur­rency.

The hor­i­zon­tal scal­a­bil­ity fac­tor is ful­filled by the ‘shard­ing’ con­cept, where the data is split across dif­fer­ent ma­chines and par­ti­tions called shards, which help fur­ther scal­ing. The fault tol­er­ance ca­pa­bil­i­ties are en­abled by repli­cat­ing data on dif­fer­ent ma­chines or data cen­tres, thus mak­ing the data avail­able in case of server fail­ures. Also,

an au­to­matic leader elec­tion process pro­vides high avail­abil­ity across the clus­ter of servers.

Tra­di­tion­ally, data­bases have been sup­port­ing sin­gle data mod­els like key value pairs, graphs, re­la­tional, hi­er­ar­chi­cal, text searches, etc; how­ever, the data­bases com­ing out to­day can sup­port more than one model. Mon­goDB is one such data­base that has multi-model ca­pa­bil­i­ties. Even though Mon­goDB pro­vides geospa­tial and text search ca­pa­bil­i­ties, it’s not as good or as up to the mark as Solr or Elas­tic Search, which make bet­ter search en­gines.

An anal­y­sis of our use case

We cur­rently work in the or­der man­age­ment space, where the or­der data is com­mu­ni­cated to al­most 120+ ap­pli­ca­tions us­ing 270+ in­te­gra­tions through our mid­dle­ware.

One of the main com­po­nents we have im­ple­mented in-house is our cus­tom log­ger, which is a ser­vice to log the trans­ac­tion events, en­abling mes­sage track­ing and er­ror track­ing for our sys­tem. Most of the mes­sag­ing is asyn­chro­nous. The in­te­gra­tions are het­ero­ge­neous, whereby we con­nect to Or­a­cle, Mi­crosoft SQL re­la­tional data­bases, IBM’s MQ, Ser­vice Bus, Web ser­vices and some file based in­te­gra­tions.

Our mid­dle­ware pro­cesses gen­er­ate a large num­ber of events in the path through which the or­der trav­els within the IT sys­tem, and these events usu­ally con­tain or­der meta­data, a few or­der at­tributes re­quired for search­ing; a sta­tus in­di­cat­ing the suc­cess, er­rors, warn­ings, etc; and in some cases, we store the whole pay­load for de­bug­ging, etc.

Our cus­tom log­ger frame­work is tra­di­tion­ally used to store these events in plain text log files in each of the server’s lo­cal file sys­tems, and we have a back­ground Python job to read these log files and shred them into re­la­tional data­base ta­bles. The log­ging is fast; how­ever, track­ing a mes­sage across mul­ti­ple servers and try­ing to get a real-time view of the or­der is still not pos­si­ble. Then there are prob­lems around the sched­uler and the back­ground jobs which need to be mon­i­tored, etc. In a clus­ter with both Prod and DR run­ning in ac­tive mode with 16 phys­i­cal servers, we have to run 16 sched­uler jobs and then mon­i­tor them to en­sure that they run all the time. We can in­crease the speed of data-fetch­ing us­ing mul­ti­ple threads or sched­ule in smaller in­ter­vals; how­ever, man­ag­ing them across mul­ti­ple do­mains when we scale our clus­ters is a main­te­nance headache.

To get a real-time view, we had rewrit­ten our log­ging frame­work with a light­weight Web ser­vice, which could write di­rectly to RDBMS data­base ta­bles. This brought down the per­for­mance of the sys­tem. Ini­tially, when we were writ­ing to a file on lo­cal file sys­tems, the pro­cess­ing speed was around 90-100k mes­sages per minute. Now, with the new de­sign of writ­ing to a data­base ta­ble, the per­for­mance was only 4-5k mes­sages per minute. This was a big trade-off in per­for­mance lev­els, which we couldn’t af­ford.

We rewrote the frame­work with Or­a­cle AQs in be­tween, wherein the Web ser­vice writes data into Or­a­cle AQs; there was a sched­uler job on the data­base, which de­queued mes­sages from AQ and in­serted data to the ta­bles. This im­proved the per­for­mance to 10k mes­sages per minute. We then hit a dead end with

Or­a­cle data­base and the sys­tems. Now, to get a re­al­time view of the or­der with­out los­ing much of the per­for­mance, we started look­ing out at the open source ecosys­tem and we hit upon Mon­goDB.

It fit­ted our use case ap­pro­pri­ately. Our need was a data­base that could take in high per­for­mance writes where mul­ti­ple pro­cesses were log­ging events in par­al­lel. Our query rate of this log­ging data was sub­stan­tially lower. We quickly mod­elled the doc­u­ment based on our pre­vi­ous ex­pe­ri­ence, and were able to swiftly roll out the cus­tom log­ger with a Mon­goDB back­end. The per­for­mance im­proved dra­mat­i­cally to around 70k mes­sages per minute.

This en­abled us to have a near real-time view of the or­der across mul­ti­ple pro­cesses and sys­tems on a need ba­sis, with­out com­pro­mis­ing on the per­for­mance. It elim­i­nated the need for mul­ti­ple sched­uler pro­cesses across a clus­ter of servers and also hav­ing to man­age each of them. Also, ir­re­spec­tive of how many pro­cesses or how many servers our host ap­pli­ca­tion scaled to, our log­ger frame­work hosted on sep­a­rate in­fra­struc­ture is able to cater to all the needs in a ser­vice-ori­ented fash­ion.

Cur­rently, we are learn­ing through ex­pe­ri­ence. Some of the chal­lenges we faced while adopt­ing Mon­goDB in­volved man­ag­ing the data growth and the need to have a purge mech­a­nism for the data. This is some­thing that is not ex­plic­itly avail­able, and needs to be planned and man­aged when we cre­ate the shards. The shard man­age­ment needs to be im­proved to pro­vide op­ti­mal us­age of stor­age. Also, the repli­cas and the lo­ca­tion of the repli­cas de­fine how good our dis­as­ter re­cov­ery will be. We have been able to main­tain the in­fra with­out much has­sle and are look­ing at the op­por­tu­nity to roll out this log­ging frame­work into other ar­eas, like the prod­uct mas­ter or cus­tomer mas­ter in­te­gra­tion space in our IT. This should be pos­si­ble with­out much re­work or changes be­cause of Mon­goDB’s flex­i­ble JSON doc­u­ment model.

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.