MakeMyTrip Travels Forward in Time Using the Power of Open Source
Gurugram based MakeMyTrip presently dominates the travel market with a whopping 47 per cent share. It has emerged as not just a regular consumer of open source technologies but has also, of late, transformed into a contributor.
MakeMyTrip has deployed a large number of open source innovations. Starting from Apache Hadoop, Storm and Spark to advanced solutions such as Jenkins, OpenTSDB, Grafana and the ELK (Elasticsearch, Logstash and Kibana) stack, the company uses many community-based deployments.
CTO Sanjay Mohan believes that there is a growing need to focus more on open source because a vast part of the Web is now being open sourced. “Open source solutions have been tried and tested for scale and security by the top-notch companies globally and proven to work well without any vendor lock-in period,” he says.
DataShark project for the community
While adopting open source to scale existing offerings and solve emerging problems has been quite common for MakeMyTrip, Mohan and his team sketched out a plan last October to contribute back to the community too with their own solution. So they designed dataShark as an advanced framework for security and network event analytics. “We felt that the security and open source community would greatly benefit from this framework and hence released it as an open source project,” says Vikram Mehta, senior manager of information security, MakeMyTrip.
The dataShark framework is targeted at security researchers, Big Data analysts and operations teams looking to ingest data from sources such as the file system, Syslog and Kafka in a secure and easy way. Also, the framework provides experts with the ability to write custom map and machine language algorithms that can operate on ingested data.
Built on Apache Spark, dataShark uses the Python language to write custom use cases and power the framework model. It comes with two operation modes, namely, standalone executable and production. The standalone executable mode of the framework provides a one-shot analysis of static data, whereas the production mode provides a full-fledged production deployment with components such as event acquisition, event queuing, the core data engine and persistence layer that can all ingest data from the file system or HDFS.