OpenSource For You

Open source in Big Data

-

Big Data has become a reality now, and it is important for the business continuity. The volume, variety, velocity and veracity in which data is accumulate­d by organisati­ons has led to phenomenal growth not just in understand­ing Big Data but also better and faster analytics. Due to its popularity, a lot of open source projects have evolved around Big Data and some of the most popular ones listed below serve different purposes. Druid is an open source distribute­d data store originally developed to analyse online events for ad markets and to work on data streams. Apache Spark: This is a newer data processing engine used to run analysis faster on large datasets. It has support for applicatio­ns written in Java, Scala, Python and R. Apache Flume: This is a service that gathers informatio­n from distribute­d sources that is later stored in HDFS. Apache Hive: This tool allows people to use an SQL-like language to analyse petabytes of data. Taiga: Originally developed by LinkedIn, this tool is

Newspapers in English

Newspapers from India