Security In A Grey-Out — Ready For The Next Cloud Outage?
If the year 2019 taught us anything well, it is this--cloud outages can happen to anyone, anywhere. Sometimes the focus is so much on bouncing back that we forget not to fracture a ligament as we pop up. Keep an extra torch/foot handy—like AI or SLAs
When anyone sputters the words ‘cloud outage’, the first alarms to hit one’s mind are ‘Holy Molly’, ‘downtime’, ‘customer complaints’ and ‘how long’ – may be, not in that particular order.
So when major cloud outages happened this year, we found out how network congestion, domain level controllers, Domain Name System (DNS) connect, faulty database scripts, maintenance misconfigurations and data automation software can play the devil. We also got smarter in the areas of Service Level Agreement (SLA) credits, dependencies of modern computing architectures, details of service-level downtime from cloud providers (specially from Google’s GCP), the usefulness of granular-level reporting (like the one shown by AWS) – thanks to these very outages.
When that eerie silence strikes
Sure, downtime costs – and hurts! In fact, the average hourly cost of enterprise server downtime (2017-18 as per Statista) was recorded at over US$5 million for 14 per cent companies, and between US$2m-5m for seven per cent while 12 per cent suffered it in the bracket of US$1m to $2m.According to Alinean, on average outage risk has even
accounted for as much asUS$2.5m a month. A glance at an Uptime Institute report helps to see that amidst publicly reported outages 2018-19, major failures are not just common despite the advancement of critical systems, but they are also rife with heavy consequences – thanks to the increased reliance on IT systems in all aspects of life. The report hints at evidence on availability not matching marketing claims (service level promises).
Interestingly, it notes that ‘major publicly recorded outages are now more likely to be caused by IT and network problems than a year or two years ago, when power problems were a bigger cause.’ Public cloudbased services came up as areas for a significant number of reported service outages. The report tells that ‘Although the reliability/availability of their services is generally good, their scale and complexity mean that their outages are likely to have a highly visible, clear and well-recorded impact.’
Sowhat about the not-so-loud dent of this downtime – the one which leaves (and exploits) cracks in security? Most users rush for the elevators when an outage hits them or while they pad up for anticipated downtime. Isn’t that against the golden rule of ‘taking the stairs’? Does security really take a back-seat in the messy ride of an outage escape? If yes, who is sitting at the wheel?
Smoke and Noise – The Perfect Panic Attack
In cloud deployments, security is a shared responsibility between the cloud provider and the organisation, argues Maninder Bharadwaj, Partner, Deloitte India. “Most cloud platforms come with robust data security processes in their areas of responsibility. But during outages all this data stored in the cloud is at risk of being available for business operations but also from loss. In addition there is a risk of operations getting disrupted leading to loss of revenue and a poor customer experience.”
It is possible that the scenario plays out a little differently or more/less alarmingly when a certain type of cloud holds the fire- drill alarm. “In Private clouds,organisations can set specific controls as per company policy or organisational requirements. Also back and recovery measures can also be employed promptly. Healthcare organisations are good examples of how to use private cloud in this regard. Public cloud with a hybrid on-premise or private cloud will also be good alternatives to protect data during an outage.” This is how Bharadwaj sees it.
Vaidyanathan Iyer, Security Software Leader, IBM India South/Asia differs a tad saying that any outage is similar in impact whether its deployed on-prem, public or hybrid - and with cloud deployments being resilient by design, reputed cloud service providers would be expected to provide resilience to any outage. “The key factor is that best practices around resiliency cannot be ignored; but having said that however it’s not uncommon to have outages either due to denial- of-service attacks or for any software glitch.”
He adds that the same applies for Data Security aspects during an outage; considering that best practices are in place, during any outage the control of ‘security keys’ will be with the data owner and as such outages should not impose any security risk.
Bharadwaj swiftly points at the importance of a back-up and recovery plan. “All critical data has to be backed up and confidential information has to be encrypted to protect against any loss.”
Data is, after all, what everything hinges upon. As Manav Sehgal, Head of Solutions Architecture, Amazon Internet Services Private Limited explains, in a highly scalable architecture like AWS’s, databases can be orchestrated well between hot and cold availability zones so as to address minimum latency between two availability zones, “These zones are as far apart as possible and yet in proximity so that helps a lot in fault-tolerance and resilience.” He also cites the significance of points- of-presence with dark fibre and the speed and heterogeneity of database migration (open source as well as commercial systems) here. “AWS migration service can work on minimal downtime and while it is running.”
Migration of data and the speed at which the lights pop back seem to be big answers when one worries about an outage. New ideas can complement
“The primary security controls that companies should consider are around Identity and Data Security that helps in establishing a digital trust around usage of the cloud services. More basic than those is hygiene maintenance.”
these answers a lot, if they are added to the outagetool-kit well.Like Artificial Intelligence (AI).
Faster-Smarter Emergency Doors
It is important to create clean data-sets that can be used to train AI and Machine Learning systems to execute on recovery plans and protect critical data, contends Bharadwaj as he highlights the issues that can pull pack the power of AI in blunting an outage’s cut. “While the technology is available, the processes to create the data and processes to train these systems have to be created and maintained and that’s the biggest challenge. AI without training and test data would not help and we find ourselves in the situation across various industries especially when it comes to risk processes.
“AI could help in suggesting the right configuration - e.g. while deploying policies on mobile devices AI is used to suggest the right policies but the decision to apply the policies remains with the administrator.” Adds Iyerbut he also argues about hygiene maintenance when it comes to bolstering cloud of any stripe. “The primary security controls that companies should consider are around Identity and Data Security that helps in establishing a digital trust around usage of the cloud services. However - even more basic than those is hygiene maintenance i.e. configuring the cloud services as per best practices as required by the business. It is the responsibility of the companies to ensure that the Access Control Lists are properly configured, identities are monitored, and data is identified and used in a responsible manner.”
Perhaps that’ why, the weight and fine-print of Service Level Agreements (SLAs) come into a new spotlight when we think of cloud investments in a post- outage-scarred era.
Bharadwaj avers strongly. “Absolutely. It is very critical to deploy the right controls in place to ensure SLAs are in place to protect the organisation during an outage. It is important to keep detailed records of past incidents, as this helps us refine the SLAs. While SLAs and controls are important, their application to the cloud environment has to be monitored and modified.”
But Iyer brings in a different perspective shifting the way we pick throats to choke in a cloud- outage incident. “While cloud security needs to handle a landscape that is not necessarily in the customer’s control, it is also true that cloud providers offer a good number of security controls that have to be configured and managed by the organisations - this accountability cannot be transferred to the cloud providers and hence the responsibility of customers becomes greater while adopting the cloud.”
He explains how SLAs around the availability, scalability and operationalisation of these controls can be mandated/demanded by customers but will not be helpful unless the best practices are in place and being followed. “For e.g. Identity Access Management of all users accessing the cloud, Distributed Denial of Service(DDoS) protection, Firewall etc. can be mandated but if the cloud user does a wrong configuration neither AI nor SLA can help.”
Wait!! Don’t Jump!
Not from a twenty-storey high floor and definitely not on any conclusion. Cloud still makes a lot of sense and economics. But it’s equally important to discuss security while buying those amazing opexnumbers and cost- control fairylands.
As Bharadwaj underlines, Cloud is a great platform. “But it is important to have the right controls in place. Often these controls exist as part of global controls and governance teams. Controls need to be modified and applied to the cloud. Additionally, implementation of security and controls is a shared responsibility of the cloud provider and the organisation. The assumption that the cloud provider will take care of all the security requirements and controls is a myth.”
Iyer strongly reminds that adhering to best practices and ensuring basic hygiene is sacrosanct whether for security on the cloud or on-prem; cloud adoption also introduces some additional controls that has to be implemented.”
Cloud, no Cloud, Public Cloud, Private Cloud, Hybrid Cloud – in any environment, there is no room to ignore the gravity of resilience quotient.
As Bharadwaj stresses, it is important to have robust plans in place for data security and business continuity. “Without these, it is very difficult to work with emerging technologies like cloud.Cloud has a huge potential but the key is to identify the risks, put in adequate controls and keep scanning the environment to make it an evolving, robust and resilient system.”
It will never be an all-white, all-fluffy and alldreamy wonder. Clouds will always have some blackouts and a lot of grey spots. It’s all about how well we can smell the smoke in time. And how soon we can find our shoes.
The next time someone says the word ‘outage’, hope you are able to say ‘I am prepared’.