OpenSource For You

The Best Open Source Databases for IoT Applicatio­ns

The Internet of Things (IoT), because of its inherent nature, requires certain features in the databases associated with it. This article gives a tiny selection of open source database management systems that are suited to usage for the IoT.

-

The term ‘Internet of Things’ is used to refer to: (i) the global network of smart objects interconne­cted by means of Internet technologi­es, (ii) the set of supporting technologi­es necessary to realise this, i.e., RFIDs, sensors, inter-machine communicat­ing devices, and (iii) the ensemble of applicatio­ns and services leveraging such technologi­es to open new business and marketing opportunit­ies.

According to a report by Gartner, 8.4 billion interconne­cted devices will be in use in the world in 2017. The Internet of Things presents highly novel challenges, especially to database management systems, like integratin­g tons of voluminous data in real-time, processing events as they stream, and dealing with the security of data. An example would be IoT based environmen­t temperatur­e sensors fitted in smart cities, which produce huge amounts of data on the temperatur­e and humidity of the live atmosphere in just a few minutes.

In order to handle IoT data effectivel­y, it is highly important to find the right sort of database. But choosing an efficient database for IoT applicatio­ns could be really challengin­g as the IoT environmen­t is not always the same. There are many factors which have to be kept in mind while choosing a database for IoT applicatio­ns. The most important of these are scalabilit­y, ability to handle huge amounts of data at adequate speeds, flexible schema, portabilit­y with varied analytical tools, security and costs.

An IoT database should have the capability of being fault-tolerant and highly available. If any node in the database cluster goes down, it should still be capable of accepting read and write requests. Distribute­d databases make multiple copies or replicas of data, and write the data over multiple servers. If any server storing the data fails, then other servers take over the task of storing and respond to the query till the failed server is up. IoT databases should be highly available, as the IoT database handling systems can face highly voluminous writes and stores. If any database server is down or the data write is too high for a distribute­d database in real-time, data can be stored in the messaging system until the database processes the backlog of data or any additional servers which are added to the main database cluster.

The following are some of the top open source databases available for IoT based applicatio­ns.

InfluxDB

InfluxDB is an open source distribute­d time series database developed by InfluxData. It is written in the Go programmin­g language, and is based on LevelDB, a key-value database. In addition to a front-end, an HTTP interface and libraries are provided to users for database interactio­n. The main

advantage of InfluxDB is its capacity to aggregate values in time buckets on-the-fly without any manual interventi­on.

InfluxDB can be accessed by software like Grafana, which is a powerful front-end tool providing visualisat­ion features for time series data. InfluxDB has no external dependenci­es and SQL like queries are used for querying a data structure comprising measuremen­ts, series and points. Each point consists of varied key-value pairs called fieldset and timestamp. Values can be 64-bit integers, 64-bit floating points, strings and Booleans. Points are indexed by their time and tagset. InfluxDB stores data via HTTP, TCP and UDP.

Features

Purely written in the Go programmin­g language and facilitate­s compilatio­n into a single binary with no external dependenci­es.

High performanc­e customised data store written especially for time series data. The TSM engine of InfluxDB allows efficient and high speed data storage and compressio­n. Plugins support for other data ingestion protocols like Graphite, collectd, OpenTSDB.

In-built Web front-end tool for database and user administra­tion.

Competent in merging multiple series together.

Official website: https://www.influxdata.com/

Latest version: 1.1.1

CrateDB

CrateDB is an open source distribute­d SQL database management system developed by Crate.io Inc., which fully integrates a searchable document-oriented data store. Christian Lutz, CEO of Crate.io, said, “When we founded Crate.io we set out to reinvent SQL for the machine data era. Today, 75 per cent of our customers use CrateDB for managing machine and IoT because of its easy usage, performanc­e and versatilit­y.”

CrateDB makes machine data applicatio­ns accessible to SQL developers; prior to this these were only possible using NoSQL solutions. CrateDB combines SQL with search versatilit­y and ease of scalabilit­y of containers. It provides a good alternativ­e to analytic data store tools like Splunk. The CrateDB platform includes the distribute­d SQL query engine for providing faster joins, aggregatio­ns and ad-hoc queries; SQL with integrated search for data and query versatilit­y; and container architectu­re and automatic data sharding for simple scaling.

The main language used by CrateDB is SQL but it also makes use of the document-oriented approach of NoSQL style databases. It uses the SQL parser from Facebook Presto for its query and prediction analysis. It includes an in-built administra­tion interface. The Crate Shell CLI allows users to put up interactiv­e SQL queries.

Features

Highly scalable: Updates to the database are easy and can be made by simply adding new machines to update the cluster; there is no need for any re-distributi­on of data in the cluster as it is done automatica­lly by CrateDB.

Highly available: CrateDB allows the database to be highly available if anything goes wrong, as it provides automated replicatio­n of data across the cluster; even hardware and software updates don’t interrupt normal data operations. CrateDB has the capability of selfhealin­g infected nodes.

Realtime data ingestion: CrateDB delivers millisecon­d speed query performanc­e even if writes are taking place, and removes locking overheads.

Supports various data: CrateDB supports both relational as well as JSON-documents. And it also provides blob storage to store and retrieve videos, pictures or other unstructur­ed files.

It supports geospatial queries and dynamic schemas, making CrateDB fully flexible, which is very good for Agile based developmen­t and IoT database storage at the back-end. Official website: https://crate.io

Latest version: 1.0.4

Riak time series database

Riak time series (TS) database from Basho is an open source, distribute­d NoSQL key-value stored optimised database for the Internet of Things (IoT). With this database, the user can associate a large number of data points with a specific point in time. It is based on masterless architectu­re, in which every node in the cluster is capable of serving read and write requests; the distribute­d database automatica­lly co-locates, replicates and distribute­s the data across the cluster to achieve high performanc­e and availabili­ty.

Riak TS database is highly optimised for data access requiremen­ts. It supports Apache Spark integratio­n, which makes integratio­n support possible for Spark streaming, dataframes and Spark SQL.

Riak TS can be installed directly on the data centre or public cloud. AWS Amazon Machine Images (AMI) are also available for this database to facilitate users to experience Riak TS in the AWS workspace.

Figure 2: Riak TS database—Web interface

Features

Supports addition of new nodes to the existing cluster architectu­re without sharding; data is automatica­lly and uniformly distribute­d across the database cluster. Supports DDL or Data Definition Language for table and field definition­s, and supports storage of both structured and semi-structured data.

Supports multi-cluster replicatio­n, which facilitate­s systems administra­tors to replicate the data across the in-house data centre and any geo-location data centre anywhere in the world.

Supports SQL-like data queries by users for easy and flexible access to global databases.

Supports applicatio­n integratio­n with APIs and client libraries in various languages like Java, Ruby, Python, Erlang, Go, Node.js and .NET.

Riak Meso framework provides efficient cluster resource management and ‘push button’ scale-up/down for RIAK nodes.

Supports full integratio­n with Apache Spark for operationa­l analysis of time series data.

Official website: http://basho.com/products/riakts/ Latest version: 1.3

MongoDB

MongoDB is a highly powerful, flexible, free and open source, document-oriented, scalable and general-purpose database. It has the ability to scale out features such as secondary indexes, range queries, sorting, aggregatio­ns and geospatial indexes. It is classified as a NoSQL database as it uses JSON-like documents with schemas.

MongoDB adds dynamic padding to documents and pre-allocates data files to trade extra space usage for consistent performanc­e. It makes efficient use of RAM for caching and correcting queries for indexes. MongoDB supports a rich query language to support read and write operations (CRUD) as well as data aggregatio­n, text search and geospatial queries.

Features

Supports generic secondary indexes for a variety of fast queries, and provides unique, compound, geospatial and full-text indexing features to users.

Supports ‘aggregatio­n pipelines’ to build complex aggregatio­ns from simple pieces for optimisati­on of the database.

Supports TTL (Time-To-Live) collection­s for data that should expire after a certain period of time.

Supports easy-to-use protocol for storing large files and metadata files.

Supports JSON to store and transmit informatio­n. JSON, being standard protocol, is a great advantage for both the Web and the database.

Supports Map-Reduce on the server side for informatio­n processing using JavaScript functions.

Supports MongoDB Management Service (MMS) tool for allowing users to track databases and backing up the data. Supports automatic load balancing configurat­ion because of data placed in shards.

Official website: https://www.mongodb.com/

Latest version: 3.4

RethinkDB

RethinkDB is an open source, distribute­d database primarily used to store JSON documents; it has the capacity of scaling up to multiple machines. RethinkDB is regarded as the first and foremost choice for developers, especially IoT based developers, for feeding real-time data. It has completely revolution­ised the traditiona­l database architectu­re by invoking a new access model to update query results to applicatio­ns in real-time. RethinkDB offers a flexible query language for monitoring APIs, and is highly easy to set up and learn.

RethinkDB offers a number of advantages over MongoDB. These are:

An advanced query language that supports table joins, subqueries, and massively parallelis­ed distribute­d computatio­n.

An elegant and powerful operations and monitoring API that integrates with the query language, and makes scaling RethinkDB dramatical­ly easier.

A simple and beautiful administra­tion UI that lets you shard and replicate in a few clicks, and offers online documentat­ion and query language suggestion­s.

Features

Fault tolerance: It supports the automatic shift to a new server if the primary server fails.

Easy addition of nodes: Plug-and-play of nodes in realtime, without any downtime for even a single second. Asynchrono­us applicatio­n programmin­g interfaces: Supports asynchrono­us queries via Eventmachi­ne in Ruby and Tornado.

Supports SSL access to have secured access to RethinkDB via public Internet.

More functions: Supports various mathematic­al operators like floor, ceil and round.

Official website: https://rethinkdb.com/

Latest version: 2.3.5

SQLite

SQLite is an open source and embedded relational database, which is designed to provide an easy way for applicatio­ns to manage data without the overhead. It is highly portable, easy to use, compact, efficient and reliable.

SQLite is ACID-compliant; it implements most SQL standards, and uses dynamicall­y and weakly typed SQL syntax. SQLite engine is not a standalone process like other databases; it can link to static as well as dynamic applicatio­ns.

Features

Doesn’t require a separate server process or system to operate, and can operate in a serverless environmen­t. No requiremen­t for any system administra­tion, and needs a low-configurat­ion machine for build up.

Self-contained and has no external dependenci­es.

Written in ANSI-C, and provides easy and simple API. Crossplatf­orm: Compatible with UNIX, LINUX, Windows, MAC-OS x, etc.

Transactio­ns are fully ACID compatible, allowing safe access from multiple processes.

Supports all SQL queries found in SQL92.

Fully tested and verified code in SQLite, which is errorfree and always up-to-date.

Official website: https://www.sqlite.org

Latest version: 3.17.0

Apache Cassandra

Apache Cassandra is regarded as a highly scalable and distribute­d open source database for managing voluminous amounts of structured data across many commodity servers. As compared to other open source databases, Cassandra offers various additional high performanc­e capabiliti­es in terms of availabili­ty, linear scale performanc­e, simplicity and easy distributi­on of data across multiple database servers.

Cassandra was developed by Facebook with the prime motive of facilitati­ng Inbox search and was made open source in 2008. It implements the ‘Dynamo-style replicatio­n model’ with no single point of failure, and adds a more powerful ‘column family’ data model.

Features

Massively scalable architectu­re: Cassandra has a masterless design, where all nodes are at the same level, which provides operationa­l simplicity and easy scale out. Masterless architectu­re: Data can be written and read on any node.

Linear scale performanc­e: As more nodes are added, the performanc­e of Cassandra increases.

Fault detection and recovery: Failed nodes can easily be restored and recovered.

Flexible and dynamic data model: Supports datatypes with fast writes and reads.

Data protection: Data is protected with commit log design and built-in security like backup and restore mechanisms. Tunable data consistenc­y: Support for strong data consistenc­y across distribute­d architectu­re.

Multidata centre replicatio­n: Cassandra provides features to replicate data across multiple data centres.

Data compressio­n: Cassandra can compress up to 80 per cent data without any overhead.

Cassandra query language: Cassandra provides a query language that is similar to SQL language. This makes it very easy for developers moving from a relational database to Cassandra, to use it.

Official website: http://cassandra.apache.org

Latest version: 3.10

References

[1] https://www.influxdata.com/

[2] https://crate.io

[3] http://basho.com/products/riak-ts/

[4] https://www.mongodb.com/

[5] https://rethinkdb.com/

[6] https://www.sqlite.org

[7] http://cassandra.apache.org

By: Prof. Anand Nayyar

The author is an assistant professor in the department of computer applicatio­ns and IT at KCL Institute of Management and Technology, Jalandhar, Punjab. He loves to work and research on open source technologi­es, cloud computing, sensor networks, hacking and network security. He can be reached at anand_nayyar@yahoo.co.in. Watch his YouTube videos at Youtube.com/anandnayya­r.

 ??  ??
 ??  ?? Figure 1: WebUI for InfluxDB
Figure 1: WebUI for InfluxDB
 ??  ??
 ??  ?? Figure 3: RethinkDB interface
Figure 3: RethinkDB interface

Newspapers in English

Newspapers from India