MLDB: The Open Source Ma­chine Learn­ing Data­base for the Cloud and for Docker

This ar­ti­cle talks about MLDB in re­la­tion to the cloud and Docker. It de­tails the prom­i­nent tools for ML, de­scribes the fea­tures of MLDB, its in­stal­la­tion on dif­fer­ent plat­forms, and the var­i­ous al­go­rithms sup­ported by it.

OpenSource For You - - Contents - By: Dr Gau­rav Ku­mar The au­thor is the MD of Magma Re­search and Con­sul­tancy Pvt Ltd, Am­bala. He de­liv­ers ex­pert lec­tures and con­ducts work­shops on the lat­est tech­nolo­gies and tools. He can be con­tacted at ku­mar­gau­ His per­sonal web­site is a

Ma­chine learn­ing and pre­dic­tive an­a­lyt­ics are key ar­eas of re­search in mul­ti­ple do­mains in­clud­ing bioin­for­mat­ics, com­pu­ta­tional anatomy, nat­u­ral lan­guage pro­cess­ing, speech recog­ni­tion, etc. Ma­chine learn­ing is used to train the soft­ware or hard­ware ap­pli­ca­tions based on spe­cific mod­els for rule min­ing, pre­dic­tion and knowl­edge dis­cov­ery. Data sci­en­tists and an­a­lysts im­ple­ment dif­fer­ent su­per­vised or un­su­per­vised ap­proaches to get ac­cu­racy and per­for­mance from raw datasets.

Ma­chine learn­ing (ML) and ar­ti­fi­cial in­tel­li­gence (AI) are very closely re­lated, but these terms have dif­fer­ent per­spec­tives. Ar­ti­fi­cial in­tel­li­gence (AI) is the con­cept by which ma­chines per­form tasks sim­i­lar to hu­man be­ings, and we call such ma­chines ‘smart’.

Ma­chine learn­ing is the re­cent ad­vanced ap­pli­ca­tion area of AI in which ma­chines learn by them­selves based on dy­namic in­puts. Ma­chine learn­ing based im­ple­men­ta­tions are more ac­cu­rate and have high op­ti­mi­sa­tion.

The prom­i­nent tools for ma­chine learn­ing and deep learn­ing are MLDB, Keras, Ed­ward, Lime, Apache Singa and Shogun. The de­tails for each can be found on their re­spec­tive web­sites.

Ma­chine Learn­ing Data­base (MLDB) is a pow­er­ful and high per­for­mance data­base sys­tem specif­i­cally de­vel­oped for ma­chine learn­ing, knowl­edge dis­cov­ery and pre­dic­tive an­a­lyt­ics. MLDB is FOSS and is com­pat­i­ble with as­sorted plat­forms. It uses the REST­ful API for the stor­age of data, ex­plor­ing the data us­ing Struc­tured Query Lan­guage (SQL) and, fi­nally, it trains ma­chine learn­ing mod­els.

The fol­low­ing are the key fea­tures of MLDB that suit a range of ap­pli­ca­tions.

Speed: The train­ing, modelling and dis­cov­ery process in MLDB is highly per­for­mance-aware. It has huge pro­cess­ing power com­pared to H2O, Scikit-Learn or Spark MLib, which are prom­i­nent ma­chine learn­ing li­braries.

Scal­a­bil­ity: MLDB sup­ports ver­ti­cal scaling with higher ef­fi­ciency, so all mem­ory mod­ules as well as cores can be used si­mul­ta­ne­ously with­out any is­sues of de­lay or per­for­mance.

Free and open source: The com­mu­nity edi­tion of MLDB

is avail­able and dis­trib­uted on a pow­er­ful repos­i­tory and host­ing of GitHub (­bai/mldb).

SQL sup­port: This makes MLDB very user-friendly along with the sup­port for Big Data pro­cess­ing. MLDB can process, train and make pre­dic­tions us­ing data­base ta­bles that have mil­lions of col­umns, with con­cur­rent pro­cess­ing and no com­pro­mise on in­tegrity.

Ma­chine learn­ing: MLDB is de­vel­oped for high per­for­mance ma­chine learn­ing ap­pli­ca­tions and mod­els. It has sup­port for deep learn­ing with the graphs of Ten­sorFlow that make it su­pe­rior in knowl­edge dis­cov­ery.

Ease of im­ple­men­ta­tion: There are in­stal­la­tion pack­ages for mul­ti­ple plat­forms and pro­gram­ming en­vi­ron­ments in­clud­ing Jupyter, Docker, JSON, Cloud, Hadoop, and many oth­ers.

Com­pat­i­bil­ity and in­te­gra­tion: MLDB pro­vides a higher de­gree of com­pat­i­bil­ity with dif­fer­ent ap­pli­ca­tion pro­gram­ming in­ter­faces (APIs) and mod­ules in­clud­ing JSON, REST and Python based wrap­pers.

De­ploy­ment: MLDB can be de­ployed eas­ily on an HTTP end­point that pro­vides easy in­ter­face and fast de­ploy­ment.

Mon­goDB and NoSQL sup­port: The bridge or in­ter­face of Mon­goDB and MLDB can be cre­ated to sup­port MLDB SQL queries. These SQL queries can be ex­e­cuted on Mon­goDB col­lec­tions, which give MLDB more pow­ers to in­ter­act with NoSQL data­bases for un­struc­tured and het­ero­ge­neous datasets.

Fig­ure 3 de­picts the per­for­mance of MLDB when com­pared to other li­braries. An ex­e­cu­tion of the 100 Tree Ran­dom For­est ap­proach is done on 1 mil­lion rows with one node us­ing MLDB and other li­braries. From the graph­i­cal re­sults, it is ev­i­dent that MLDB is com­par­a­tively bet­ter, takes less time and its ac­cu­racy com­pares well with other ma­chine learn­ing li­braries. The per­for­mance of MLDB is com­pa­ra­ble with that of xg­boost, H2O, Scikit-Learn and Spark MLib.

Sup­port for al­go­rithms in MLDB

The pro­ce­dures of MLDB are used for the train­ing of ma­chine learn­ing mod­els and these are im­ple­mented us­ing func­tions. Given be­low is the list of func­tions and pro­ce­dures that can be used in MLDB with high per­for­mance.

Su­per­vised ma­chine learn­ing

Clas­si­fi­ca­tion in­clud­ing multi-la­bel clas­si­fi­ca­tion: clas­si­fier.train

• Lo­gis­tic re­gres­sions

• Gen­er­alised lin­ear mod­els

• Neu­ral net­works

• De­ci­sion trees

• Ran­dom forests

• Naive Bayes mod­els us­ing boost­ing and bag­ging Sup­port vec­tor ma­chines (SVM): svm.train

Clas­si­fiers cal­i­bra­tion: prob­a­bi­lizer.train

Deep learn­ing

Ten­sorFlow mod­els: ten­sorflow.graph


K-Means mod­els: kmeans.train

Di­men­sion­al­ity re­duc­tion, man­i­fold learn­ing and visu­al­i­sa­tion

Trun­cated Sin­gu­lar Value De­com­po­si­tions (SVD): svd.train t-dis­trib­uted Stochas­tic Neigh­bour Em­bed­ding

(t-SNE): tsne.train

Fea­ture en­gi­neer­ing

Sen­tiWordNet mod­els: im­port.sen­tiwordnet

Word2Vec: im­port.word2vec Term-Fre­quency/In­verse-Doc­u­ment-Fre­quency (TF-IDF) mod­els: tfidf.train

Count-based fea­tures: stat­sTable.train

Fea­ture Hash­ing/Vec­tor­ize fea­tures: fea­ture_hasher

In­stalling MLDB on dif­fer­ent plat­forms

MLDB pro­vides a Web based in­ter­face for the eas­i­est im­ple­men­ta­tion and hands-on ex­pe­ri­ence. A free ses­sion of MLDB can be ex­pe­ri­enced for 90 min­utes us­ing a Web based panel af­ter sign­ing up (reg­is­tra­tion) on There are many demos and a lot of doc­u­men­ta­tion avail­able so that a cloud based MLDB can be worked out with­out in­stal­la­tion on the lo­cal sys­tem. Even the self-cre­ated datasets can be up­loaded on this hosted ses­sion.

Edi­tions for lo­cal in­stal­la­tion

There are two edi­tions of MLDB that are free, and are dis­trib­uted as com­mu­nity and en­ter­prise edi­tions. To run the MLDB en­ter­prise edi­tion, you need to en­ter the li­cence key to ac­ti­vate the soft­ware. A li­cence key can be cre­ated for first-time users on sign­ing up at­cense_­man­age­ment and filling the re­quired de­tails in the reg­is­tra­tion form.

Docker im­age

The clas­si­cal MLDB dis­tri­bu­tion is the Docker im­age. While other distri­bu­tions are avail­able for vir­tual ma­chines, the Docker im­age is ex­e­cuted as a con­tainer. This method is used for Linux flavours or pri­vate cloud de­ploy­ments.

Af­ter the in­stal­la­tion of Docker, the MLDB con­tainer is launched with a pre-spec­i­fied mapped direc­tory.

$ mkdir </my/sys­tem/path/myMLDB­data>

With the ex­e­cu­tion of the fol­low­ing com­mands, a port can be set us­ing the mldb­port pa­ram­e­ter.

docker run --rm=true \ -v </my/sys­tem/path/myMLDB­data>:/mld­b_­data \ -e MLDB_IDS=”`id`” \

-p (IP-Ad­dress):<mldb­port>:80 (Port) \­est

For se­cu­rity and over­all in­tegrity, a tun­nel is es­tab­lished for re­mote servers as in the fol­low­ing in­struc­tion us­ing the SSH tun­nel:

$ ssh -f -o Ex­itOnFor­wardFail­ure=yes <user>@<re­mote­host> -L <lo­cal­port>: (IP-Ad­dress):<mldb­port> -N

Once the mes­sage ‘MLDB Ready’ is viewed, the brows­ing and ac­ti­va­tion of MLDB can be done on a Web browser us­ing the URL http://lo­cal­host:<lo­cal­port>.

Vir­tual ap­pli­ance

The in­stal­la­tion of MLDB for vir­tu­al­i­sa­tion is very easy. The vir­tual ap­pli­ca­tion (OVA file) is avail­able so that it can be im­ported us­ing Vir­tu­alBox or any other vir­tu­al­i­sa­tion soft­ware.

Af­ter down­load­ing Vir­tu­alBox from https://www. vir­tu­­loads, the OVA file of MLDB Ap­pli­ance at http://pub­ can be im­ported.

Sim­ply dou­ble-click the OVA file or se­lect ‘Im­port Ap­pli­ance’ in the File Menu of Vir­tu­alBox and fi­nally point out to the down­loaded MLDB OVA dis­tri­bu­tion.

The de­fault user name to log in on OVA is ‘ubuntu’ and the pass­word for suc­cess­ful au­then­ti­ca­tion is ‘mldb’.

Af­ter these steps, the MLDB in­stance can be ex­e­cuted us­ing the URL http://lo­cal­host:8080/ on any Web browser.

Cloud servers

MLDB can be in­stalled and de­ployed on the cloud en­vi­ron­ment of Ama­zon Web Ser­vices (AWS). The Ama­zon Ma­chine Im­age (AMI) is avail­able for de­ploy­ment on AWS, and can be eas­ily at­tached in the dash­board of Ama­zon Cloud.

The steps needed for at­tach­ing and con­fig­ur­ing AMI in AWS are as fol­lows:

1. Cre­ate an Ama­zon Web Ser­vices (AWS) ac­count on http:// aws.ama­

2. ‘Cre­ate In­stance Wiz­ard’ should be se­lected from ‘N’, the Vir­ginia Zone on the AWS dash­board.

3. Se­lect Ama­zon Ma­chine Im­age (AMI).

4. From Com­mu­nity AMIs, search ‘Dat­a­cratic MLDB’.

5. Launch the lat­est AMI for MLDB.

6. Any in­stance can be se­lected from AWS depend­ing upon the us­age and load of the ap­pli­ca­tion.

7. Con­fig­ure AWS in­stance de­tails.

8. Add up the stor­age or mem­ory pa­ram­e­ters in Ama­zon.

9. Name the re­mote ma­chine.

10. In­te­grate SSH Port 22 with the ma­chine.

11. Launch the in­stance.

12. Es­tab­lish a se­cu­rity tun­nel.

13. Ac­ti­vate MLDB.

Fig­ure 1: The of­fi­cial por­tal of Ma­chine Learn­ing Data­base (MLDB)

Fig­ure 2: Fea­tures of MLDB

Fig­ure 3: Per­for­mance of MLDB com­pared with other ma­chine learn­ing li­braries

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.