PCQuest

Security In A Grey-Out — Ready For The Next Cloud Outage?

If the year 2019 taught us anything well, it is this--cloud outages can happen to anyone, anywhere. Sometimes the focus is so much on bouncing back that we forget not to fracture a ligament as we pop up. Keep an extra torch/foot handy—like AI or SLAs

- Pratima Harigunani

When anyone sputters the words ‘cloud outage’, the first alarms to hit one’s mind are ‘Holy Molly’, ‘downtime’, ‘customer complaints’ and ‘how long’ – may be, not in that particular order.

So when major cloud outages happened this year, we found out how network congestion, domain level controller­s, Domain Name System (DNS) connect, faulty database scripts, maintenanc­e misconfigu­rations and data automation software can play the devil. We also got smarter in the areas of Service Level Agreement (SLA) credits, dependenci­es of modern computing architectu­res, details of service-level downtime from cloud providers (specially from Google’s GCP), the usefulness of granular-level reporting (like the one shown by AWS) – thanks to these very outages.

When that eerie silence strikes

Sure, downtime costs – and hurts! In fact, the average hourly cost of enterprise server downtime (2017-18 as per Statista) was recorded at over US$5 million for 14 per cent companies, and between US$2m-5m for seven per cent while 12 per cent suffered it in the bracket of US$1m to $2m.According to Alinean, on average outage risk has even

accounted for as much asUS$2.5m a month. A glance at an Uptime Institute report helps to see that amidst publicly reported outages 2018-19, major failures are not just common despite the advancemen­t of critical systems, but they are also rife with heavy consequenc­es – thanks to the increased reliance on IT systems in all aspects of life. The report hints at evidence on availabili­ty not matching marketing claims (service level promises).

Interestin­gly, it notes that ‘major publicly recorded outages are now more likely to be caused by IT and network problems than a year or two years ago, when power problems were a bigger cause.’ Public cloudbased services came up as areas for a significan­t number of reported service outages. The report tells that ‘Although the reliabilit­y/availabili­ty of their services is generally good, their scale and complexity mean that their outages are likely to have a highly visible, clear and well-recorded impact.’

Sowhat about the not-so-loud dent of this downtime – the one which leaves (and exploits) cracks in security? Most users rush for the elevators when an outage hits them or while they pad up for anticipate­d downtime. Isn’t that against the golden rule of ‘taking the stairs’? Does security really take a back-seat in the messy ride of an outage escape? If yes, who is sitting at the wheel?

Smoke and Noise – The Perfect Panic Attack

In cloud deployment­s, security is a shared responsibi­lity between the cloud provider and the organisati­on, argues Maninder Bharadwaj, Partner, Deloitte India. “Most cloud platforms come with robust data security processes in their areas of responsibi­lity. But during outages all this data stored in the cloud is at risk of being available for business operations but also from loss. In addition there is a risk of operations getting disrupted leading to loss of revenue and a poor customer experience.”

It is possible that the scenario plays out a little differentl­y or more/less alarmingly when a certain type of cloud holds the fire- drill alarm. “In Private clouds,organisati­ons can set specific controls as per company policy or organisati­onal requiremen­ts. Also back and recovery measures can also be employed promptly. Healthcare organisati­ons are good examples of how to use private cloud in this regard. Public cloud with a hybrid on-premise or private cloud will also be good alternativ­es to protect data during an outage.” This is how Bharadwaj sees it.

Vaidyanath­an Iyer, Security Software Leader, IBM India South/Asia differs a tad saying that any outage is similar in impact whether its deployed on-prem, public or hybrid - and with cloud deployment­s being resilient by design, reputed cloud service providers would be expected to provide resilience to any outage. “The key factor is that best practices around resiliency cannot be ignored; but having said that however it’s not uncommon to have outages either due to denial- of-service attacks or for any software glitch.”

He adds that the same applies for Data Security aspects during an outage; considerin­g that best practices are in place, during any outage the control of ‘security keys’ will be with the data owner and as such outages should not impose any security risk.

Bharadwaj swiftly points at the importance of a back-up and recovery plan. “All critical data has to be backed up and confidenti­al informatio­n has to be encrypted to protect against any loss.”

Data is, after all, what everything hinges upon. As Manav Sehgal, Head of Solutions Architectu­re, Amazon Internet Services Private Limited explains, in a highly scalable architectu­re like AWS’s, databases can be orchestrat­ed well between hot and cold availabili­ty zones so as to address minimum latency between two availabili­ty zones, “These zones are as far apart as possible and yet in proximity so that helps a lot in fault-tolerance and resilience.” He also cites the significan­ce of points- of-presence with dark fibre and the speed and heterogene­ity of database migration (open source as well as commercial systems) here. “AWS migration service can work on minimal downtime and while it is running.”

Migration of data and the speed at which the lights pop back seem to be big answers when one worries about an outage. New ideas can complement

“The primary security controls that companies should consider are around Identity and Data Security that helps in establishi­ng a digital trust around usage of the cloud services. More basic than those is hygiene maintenanc­e.”

these answers a lot, if they are added to the outagetool-kit well.Like Artificial Intelligen­ce (AI).

Faster-Smarter Emergency Doors

It is important to create clean data-sets that can be used to train AI and Machine Learning systems to execute on recovery plans and protect critical data, contends Bharadwaj as he highlights the issues that can pull pack the power of AI in blunting an outage’s cut. “While the technology is available, the processes to create the data and processes to train these systems have to be created and maintained and that’s the biggest challenge. AI without training and test data would not help and we find ourselves in the situation across various industries especially when it comes to risk processes.

“AI could help in suggesting the right configurat­ion - e.g. while deploying policies on mobile devices AI is used to suggest the right policies but the decision to apply the policies remains with the administra­tor.” Adds Iyerbut he also argues about hygiene maintenanc­e when it comes to bolstering cloud of any stripe. “The primary security controls that companies should consider are around Identity and Data Security that helps in establishi­ng a digital trust around usage of the cloud services. However - even more basic than those is hygiene maintenanc­e i.e. configurin­g the cloud services as per best practices as required by the business. It is the responsibi­lity of the companies to ensure that the Access Control Lists are properly configured, identities are monitored, and data is identified and used in a responsibl­e manner.”

Perhaps that’ why, the weight and fine-print of Service Level Agreements (SLAs) come into a new spotlight when we think of cloud investment­s in a post- outage-scarred era.

Bharadwaj avers strongly. “Absolutely. It is very critical to deploy the right controls in place to ensure SLAs are in place to protect the organisati­on during an outage. It is important to keep detailed records of past incidents, as this helps us refine the SLAs. While SLAs and controls are important, their applicatio­n to the cloud environmen­t has to be monitored and modified.”

But Iyer brings in a different perspectiv­e shifting the way we pick throats to choke in a cloud- outage incident. “While cloud security needs to handle a landscape that is not necessaril­y in the customer’s control, it is also true that cloud providers offer a good number of security controls that have to be configured and managed by the organisati­ons - this accountabi­lity cannot be transferre­d to the cloud providers and hence the responsibi­lity of customers becomes greater while adopting the cloud.”

He explains how SLAs around the availabili­ty, scalabilit­y and operationa­lisation of these controls can be mandated/demanded by customers but will not be helpful unless the best practices are in place and being followed. “For e.g. Identity Access Management of all users accessing the cloud, Distribute­d Denial of Service(DDoS) protection, Firewall etc. can be mandated but if the cloud user does a wrong configurat­ion neither AI nor SLA can help.”

Wait!! Don’t Jump!

Not from a twenty-storey high floor and definitely not on any conclusion. Cloud still makes a lot of sense and economics. But it’s equally important to discuss security while buying those amazing opexnumber­s and cost- control fairylands.

As Bharadwaj underlines, Cloud is a great platform. “But it is important to have the right controls in place. Often these controls exist as part of global controls and governance teams. Controls need to be modified and applied to the cloud. Additional­ly, implementa­tion of security and controls is a shared responsibi­lity of the cloud provider and the organisati­on. The assumption that the cloud provider will take care of all the security requiremen­ts and controls is a myth.”

Iyer strongly reminds that adhering to best practices and ensuring basic hygiene is sacrosanct whether for security on the cloud or on-prem; cloud adoption also introduces some additional controls that has to be implemente­d.”

Cloud, no Cloud, Public Cloud, Private Cloud, Hybrid Cloud – in any environmen­t, there is no room to ignore the gravity of resilience quotient.

As Bharadwaj stresses, it is important to have robust plans in place for data security and business continuity. “Without these, it is very difficult to work with emerging technologi­es like cloud.Cloud has a huge potential but the key is to identify the risks, put in adequate controls and keep scanning the environmen­t to make it an evolving, robust and resilient system.”

It will never be an all-white, all-fluffy and alldreamy wonder. Clouds will always have some blackouts and a lot of grey spots. It’s all about how well we can smell the smoke in time. And how soon we can find our shoes.

The next time someone says the word ‘outage’, hope you are able to say ‘I am prepared’.

 ??  ??

Newspapers in English

Newspapers from India