Open source in Big Data
Big Data has become a reality now, and it is important for the business continuity. The volume, variety, velocity and veracity in which data is accumulated by organisations has led to phenomenal growth not just in understanding Big Data but also better and faster analytics. Due to its popularity, a lot of open source projects have evolved around Big Data and some of the most popular ones listed below serve different purposes. Druid is an open source distributed data store originally developed to analyse online events for ad markets and to work on data streams. Apache Spark: This is a newer data processing engine used to run analysis faster on large datasets. It has support for applications written in Java, Scala, Python and R. Apache Flume: This is a service that gathers information from distributed sources that is later stored in HDFS. Apache Hive: This tool allows people to use an SQL-like language to analyse petabytes of data. Taiga: Originally developed by LinkedIn, this tool is