OpenSource For You

MLDB: The Open Source Machine Learning Database for the Cloud and for Docker

This article talks about MLDB in relation to the cloud and Docker. It details the prominent tools for ML, describes the features of MLDB, its installati­on on different platforms, and the various algorithms supported by it.

- By: Dr Gaurav Kumar The author is the MD of Magma Research and Consultanc­y Pvt Ltd, Ambala. He delivers expert lectures and conducts workshops on the latest technologi­es and tools. He can be contacted at kumargaura­v.in@gmail.com. His personal website is a

Machine learning and predictive analytics are key areas of research in multiple domains including bioinforma­tics, computatio­nal anatomy, natural language processing, speech recognitio­n, etc. Machine learning is used to train the software or hardware applicatio­ns based on specific models for rule mining, prediction and knowledge discovery. Data scientists and analysts implement different supervised or unsupervis­ed approaches to get accuracy and performanc­e from raw datasets.

Machine learning (ML) and artificial intelligen­ce (AI) are very closely related, but these terms have different perspectiv­es. Artificial intelligen­ce (AI) is the concept by which machines perform tasks similar to human beings, and we call such machines ‘smart’.

Machine learning is the recent advanced applicatio­n area of AI in which machines learn by themselves based on dynamic inputs. Machine learning based implementa­tions are more accurate and have high optimisati­on.

The prominent tools for machine learning and deep learning are MLDB, Keras, Edward, Lime, Apache Singa and Shogun. The details for each can be found on their respective websites.

Machine Learning Database (MLDB) is a powerful and high performanc­e database system specifical­ly developed for machine learning, knowledge discovery and predictive analytics. MLDB is FOSS and is compatible with assorted platforms. It uses the RESTful API for the storage of data, exploring the data using Structured Query Language (SQL) and, finally, it trains machine learning models.

The following are the key features of MLDB that suit a range of applicatio­ns.

Speed: The training, modelling and discovery process in MLDB is highly performanc­e-aware. It has huge processing power compared to H2O, Scikit-Learn or Spark MLib, which are prominent machine learning libraries.

Scalabilit­y: MLDB supports vertical scaling with higher efficiency, so all memory modules as well as cores can be used simultaneo­usly without any issues of delay or performanc­e.

Free and open source: The community edition of MLDB

is available and distribute­d on a powerful repository and hosting of GitHub (https://github.com/mldbai/mldb).

SQL support: This makes MLDB very user-friendly along with the support for Big Data processing. MLDB can process, train and make prediction­s using database tables that have millions of columns, with concurrent processing and no compromise on integrity.

Machine learning: MLDB is developed for high performanc­e machine learning applicatio­ns and models. It has support for deep learning with the graphs of TensorFlow that make it superior in knowledge discovery.

Ease of implementa­tion: There are installati­on packages for multiple platforms and programmin­g environmen­ts including Jupyter, Docker, JSON, Cloud, Hadoop, and many others.

Compatibil­ity and integratio­n: MLDB provides a higher degree of compatibil­ity with different applicatio­n programmin­g interfaces (APIs) and modules including JSON, REST and Python based wrappers.

Deployment: MLDB can be deployed easily on an HTTP endpoint that provides easy interface and fast deployment.

MongoDB and NoSQL support: The bridge or interface of MongoDB and MLDB can be created to support MLDB SQL queries. These SQL queries can be executed on MongoDB collection­s, which give MLDB more powers to interact with NoSQL databases for unstructur­ed and heterogene­ous datasets.

Figure 3 depicts the performanc­e of MLDB when compared to other libraries. An execution of the 100 Tree Random Forest approach is done on 1 million rows with one node using MLDB and other libraries. From the graphical results, it is evident that MLDB is comparativ­ely better, takes less time and its accuracy compares well with other machine learning libraries. The performanc­e of MLDB is comparable with that of xgboost, H2O, Scikit-Learn and Spark MLib.

Support for algorithms in MLDB

The procedures of MLDB are used for the training of machine learning models and these are implemente­d using functions. Given below is the list of functions and procedures that can be used in MLDB with high performanc­e.

Supervised machine learning

Classifica­tion including multi-label classifica­tion: classifier.train

• Logistic regression­s

• Generalise­d linear models

• Neural networks

• Decision trees

• Random forests

• Naive Bayes models using boosting and bagging Support vector machines (SVM): svm.train

Classifier­s calibratio­n: probabiliz­er.train

Deep learning

TensorFlow models: tensorflow.graph

Clustering

K-Means models: kmeans.train

Dimensiona­lity reduction, manifold learning and visualisat­ion

Truncated Singular Value Decomposit­ions (SVD): svd.train t-distribute­d Stochastic Neighbour Embedding

(t-SNE): tsne.train

Feature engineerin­g

SentiWordN­et models: import.sentiwordn­et

Word2Vec: import.word2vec Term-Frequency/Inverse-Document-Frequency (TF-IDF) models: tfidf.train

Count-based features: statsTable.train

Feature Hashing/Vectorize features: feature_hasher

Installing MLDB on different platforms

MLDB provides a Web based interface for the easiest implementa­tion and hands-on experience. A free session of MLDB can be experience­d for 90 minutes using a Web based panel after signing up (registrati­on) on https://mldb.ai/#signup. There are many demos and a lot of documentat­ion available so that a cloud based MLDB can be worked out without installati­on on the local system. Even the self-created datasets can be uploaded on this hosted session.

Editions for local installati­on

There are two editions of MLDB that are free, and are distribute­d as community and enterprise editions. To run the MLDB enterprise edition, you need to enter the licence key to activate the software. A licence key can be created for first-time users on signing up at https://mldb.ai/#license_management and filling the required details in the registrati­on form.

Docker image

The classical MLDB distributi­on is the Docker image. While other distributi­ons are available for virtual machines, the Docker image is executed as a container. This method is used for Linux flavours or private cloud deployment­s.

After the installati­on of Docker, the MLDB container is launched with a pre-specified mapped directory.

$ mkdir </my/system/path/myMLDBdata>

With the execution of the following commands, a port can be set using the mldbport parameter.

docker run --rm=true \ -v </my/system/path/myMLDBdata>:/mldb_data \ -e MLDB_IDS=”`id`” \

-p 127.0.0.1 (IP-Address):<mldbport>:80 (Port) \ quay.io/mldb/mldb:latest

For security and overall integrity, a tunnel is establishe­d for remote servers as in the following instructio­n using the SSH tunnel:

$ ssh -f -o ExitOnForw­ardFailure=yes <user>@<remotehost> -L <localport>:127.0.0.1 (IP-Address):<mldbport> -N

Once the message ‘MLDB Ready’ is viewed, the browsing and activation of MLDB can be done on a Web browser using the URL http://localhost:<localport>.

Virtual appliance

The installati­on of MLDB for virtualisa­tion is very easy. The virtual applicatio­n (OVA file) is available so that it can be imported using VirtualBox or any other virtualisa­tion software.

After downloadin­g VirtualBox from https://www. virtualbox.org/wiki/Downloads, the OVA file of MLDB Appliance at http://public.mldb.ai/mldb.ova can be imported.

Simply double-click the OVA file or select ‘Import Appliance’ in the File Menu of VirtualBox and finally point out to the downloaded MLDB OVA distributi­on.

The default user name to log in on OVA is ‘ubuntu’ and the password for successful authentica­tion is ‘mldb’.

After these steps, the MLDB instance can be executed using the URL http://localhost:8080/ on any Web browser.

Cloud servers

MLDB can be installed and deployed on the cloud environmen­t of Amazon Web Services (AWS). The Amazon Machine Image (AMI) is available for deployment on AWS, and can be easily attached in the dashboard of Amazon Cloud.

The steps needed for attaching and configurin­g AMI in AWS are as follows:

1. Create an Amazon Web Services (AWS) account on http:// aws.amazon.com/

2. ‘Create Instance Wizard’ should be selected from ‘N’, the Virginia Zone on the AWS dashboard.

3. Select Amazon Machine Image (AMI).

4. From Community AMIs, search ‘Datacratic MLDB’.

5. Launch the latest AMI for MLDB.

6. Any instance can be selected from AWS depending upon the usage and load of the applicatio­n.

7. Configure AWS instance details.

8. Add up the storage or memory parameters in Amazon.

9. Name the remote machine.

10. Integrate SSH Port 22 with the machine.

11. Launch the instance.

12. Establish a security tunnel.

13. Activate MLDB.

 ??  ??
 ??  ??
 ??  ?? Figure 1: The official portal of Machine Learning Database (MLDB)
Figure 1: The official portal of Machine Learning Database (MLDB)
 ??  ?? Figure 2: Features of MLDB
Figure 2: Features of MLDB
 ??  ?? Figure 3: Performanc­e of MLDB compared with other machine learning libraries
Figure 3: Performanc­e of MLDB compared with other machine learning libraries
 ??  ??

Newspapers in English

Newspapers from India