Open Source for you

AIOps: The Key Enabler for DevOps

-

The recent pandemic has promoted remote work in a big way, giving a huge boost to DevOps and AIOps. This rapid, wide scale change is creating real concerns in AIOps, DevOps, and IT service management, as organisati­ons seek the best monitoring and incident response solutions for their now distribute­d enterprise­s. This article discusses, among other things, the top AIOps tools on GitHub.

Today’s digital era of hybrid infrastruc­ture with heterogene­ous commercial off-the-shelf and bespoke (service-based and monolith) apps (that have multiple middleware/ integratio­n platforms with different technology stacks) has mandated IT Ops teams to provide uninterrup­ted IT service to its end users. This has made IT teams proactive enough to capture and resolve issues before they occur. These teams must deal with constant change while ensuring zero downtime. Seamless operation is the key to grow, compete and thrive.

Earlier, IT teams grew along with technology, from standalone to vertical scaling through distribute­d computing. The emergence of virtual computing led to a world of microservi­ces and ephemeral logic with containeri­sed scaling. Nowadays, organisati­ons simply generate too much data for humans to monitor and understand manually or by using legacy tools.

Digital transforma­tion is much more than digitisati­on of business processes. It brings in all the benefits of digital technologi­es to make business processes automated, agile and fast with reduced steps and activities, ensuring availabili­ty of the required informatio­n at a click for the business decision maker. This brings in additional responsibi­lity to ensure that IT operations run proactivel­y for identifyin­g and resolving issues.

The key building blocks — scalabilit­y, availabili­ty, fault tolerance, security, cost-effective, operationa­l excellence — lead us towards IT operations with a DevOps and AIOps strategy.

AIOps is the applicatio­n of artificial intelligen­ce in IT operations

Artificial intelligen­ce for IT operations (AIOps) brings together artificial intelligen­ce (AI) with analytics and machine learning (ML) for automation in the identifica­tion and resolution of IT operations.

AIOps combines Big Data and machine learning to automate or replace all primary IT operations processes, including availabili­ty and performanc­e monitoring, event correlatio­n and analysis, anomaly detection and causality determinat­ion, as well as IT service management and automation.

As per Gartner, AIOps refers to ‘technology platforms that use machine learning (ML) and data science to solve IT operation problems’. Gartner predicts that the use of AIOps and digital experience monitoring tools to monitor applicatio­ns and infrastruc­ture will rise from 5 per cent in 2018 to 30 per cent in 2023.

AIOps helps IT operations and DevOps teams to work smarter and faster with the proactive algorithmi­c analysis of IT data to identify digital service issues and resolve them quickly, before business operations and customers are impacted. With AIOps, Ops teams can understand the immense complexity and quantity of data generated by modern IT environmen­ts, and prevent outages, maintain uptime and attain continuous service assurance.

With IT at the heart of digital transforma­tion efforts, AIOps lets organisati­ons operate at the speed they need to.

Union of DevOps with AIOps

DevOps is the set of practices that combines developmen­t (Dev) and IT operations (Ops) with the union of people, processes and technology to continuall­y provide value to customers. DevOps helps the team to be highly performing, building better products faster for customer satisfacti­on.

DevOps gives the ownership, support and success of services to the developers that write the code. Small DevOps engineers’ teams with primary site reliabilit­y engineerin­g (SRE) give insights to correct and strengthen the reliabilit­y and scalabilit­y of services. The challenge is to get the right people across geographic­ally dispersed teams to understand and resolve the issues of monitoring vast amounts of data.

AIOps addresses the operationa­l challenges and covers every aspect of an organisati­on’s service strategy. It helps in releasing people from operations to focus on mission-critical tasks, empowering them to build improved services for better customer experience­s.

AIOps in open source

Most open source AIOps projects use Python, as it is the first programmin­g language for machine learning. Based on an organisati­on’s thrust on operationa­l efficiency, various AIOps and open source tools can be combined and used on AIOps platforms.

Top 5 open source AIOps tools on GitHub (based on stars)

1. SeldonIO/Seldon-core (stars: 2.2k)

This is an open source platform to deploy an organisati­on’s machine learning models on Kubernetes at a massive scale; it has over 2 million installs.

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models, Seldon core converts your ML models (TensorFlow, PyTorch, H2O, etc) or language wrappers (Python, Java, etc) into production REST/ GRPC microservi­ces. Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabiliti­es out-of-the-box, including advanced metrics, request logging, explainers, outlier detectors, A/B tests, canaries, and more.

2. Logpai/Loglizer (stars: 781)

Loglizer is a machine learning based log analysis toolkit for automated anomaly detection. Logs are imperative in the developmen­t and maintenanc­e process, as they allow developers and support engineers to monitor systems and track abnormal behaviours/ errors. Loglizer provides a toolkit that implements a number of machine learning based log analysis techniques that have multiple supervised and

unsupervis­ed models with:

■ Log collection

■ Log parsing

■ Feature extraction

■ Anomaly detection

3. Whylabs/Whylogs (stars: 326)

This tool profiles and monitors the ML data pipeline end-to-end, and is available in Python and Java.

Whylogs is an open source statistica­l logging library that allows data science and ML teams to effortless­ly profile ML/ AI pipelines and applicatio­ns, producing log files that can be used for monitoring, alerts, analytics, and error analysis. Whylogs is an excellent solution for profiling production ML/AI pipelines that operate on TB-scale data and with enterprise SLAs.

Key features:

■ Data insight

■ Scalabilit­y

■ Lightweigh­t

■ Unified data instrument­ation

■ Observabil­ity

4. Jixinpu/Aiopstools (stars: 224)

This is a fundamenta­l package for AIOps with Python providing capabiliti­es. Features include:

■ Anomaly detection

■ Alarm convergenc­e

■ Time series forecastin­g method

■ Associatio­n analysis for alarms

5. AICoE/Log-anomaly-detector (stars: 168)

This is used for log anomaly detection - machine learning to detect abnormal events logs. Log anomaly detection (LAD) can connect to streaming sources and predict abnormal log lines. It uses unsupervis­ed machine learning models to achieve this result. Lad-Core: ML Code is used for inferring if a log line is an anomaly. It uses W2V (word 2 vec) and SOM (self-organising map) with unsupervis­ed machine learning.

Grafana and Prometheus are used to visualise the health of the machine learning system, and can help track and prevent false positives in ML jobs.

Open source AIOps learning platforms

1. Tencent/Metis (stars: 1.1k)

Metis is a learnware platform in the field of AIOps. The current version of this open source learnware solves the anomaly detection problem of time series data from the perspectiv­e of machine learning.

2. Linjinjin1­23/AwesomeAIO­ps (stars: 930)

This platform gives a summary of AIOps learning materials at one place.

3. Chenryn/Aiops-handbook (stars: 506)

This is a collection of slides, repositori­es and papers about AIOps.

4. Logpai/Awesome-loganalysi­s (stars: 287)

This platform offers a curated list of awesome publicatio­ns and researcher­s on log analysis, anomaly detection, fault localisati­on and AIOps.

Open source contributi­ons to AIOps

Prometheus: This is an open source monitoring solution. It’s a graduate of a Cloud Native Computing Foundation (CNCF) project which focuses on monitoring for site reliabilit­y engineerin­g (SRE). It simplifies pulling numerical metrics from a metrics endpoint.

Grafana: This is an open source metric analytics and visualisat­ion suite popular among Prometheus users to visualise the metrics.

Elastic Stack: This is a suite of open source products from Elastic designed to help users search, analyse, and visualise data from any type of source, in any format, in real-time. When you run Elastic Stack with

Elastic Search, it provides monitoring and logging solutions.

AI is the key to helping DevOps teams scale the technology created today and in the future. AIOps helps to make the management of IT operations simple and accelerate the speed of solving IT Ops problems by automating their resolution.

It frees manpower to focus on innovating for a better customer experience, leading to maximum profitabil­ity for the business.

 ??  ??
 ??  ?? Figure 2: AIOps platform
Figure 2: AIOps platform

Newspapers in English

Newspapers from India