Cloud Risk was one of the top-level Enterprise Risks for the Company, and we were not alone in identifying that as the top-level risk. Here is a story that will bring this closer to home, a true story.
Alex: I cannot believe that they would do that?
Sasha: Do what??
Alex: Create a link from Dev to Prod.
Sasha: Oh! Why would they do that?
Alex: Something about the commercial license middleware issue, and we didn’t buy an additional license for the dev environment, so they had to test in Production.
Everyone smiled; a different expression wasn’t appropriate — everyone knew what was coming. I led that response by stepping in for my lead, who was out on vacation in Hawaii; he deserved the time off. We worked through the problem with all hands on deck, locked in a tiny war room, glass covered with sheets of paper, a whiteboard with timelines drawn in different colors, threat intelligence on attacker tactics, and known facts at that point. You would understand if you have seen one — no one left the room. People will walk by, curious about what is happening, but they are not supposed to look. A disclosure list was on the whiteboard. You dont want to be on one; if you can avoid it, it feels special, but the novelty can evaporate quickly if it ever goes wrong, and it does. I was unfortunate; I was always there on every disclosure list. Incident Response teaches you to remain calm, you have to keep your head leveled. After the initial short exchange, there was no blame; the focus shifted to what was at hand. An intruder had used a Zero-day to exploit an unprotected development environment. They had a shell beaconing home, and they were now sitting on the Production systems with critical data; the intruders didn’t know that yet. The following 16 hours were challenging, a chess game with a timer. We eventually evicted the intruders and secured the system without any data exposure. It is quite a story but it is a very different conversation. But, it was also one of the most rewarding 16 hours of my life.
Incident Postmortem revealed the development team created a temporary link between the development and Production Cloud environments. Unfortunately, only a handful of people knew about it, and it wasn’t documented. This is not an isolated incident or practice; this happens across most organizations for one reason or another. The focus is on value generation, and the faster we go, the quicker the business generates value. We didn't know and it hurt us, almost did in this case.
So what is the problem? Lack of holistic Lifecycle Visibility, the dynamic nature of the Cloud, the speed of development, and the proclivity towards creating fast-moving, highly autonomous teams more often than not lead to this situation. Organizations of any size over time will lose track of what is in the Cloud and resource relationships and interdependencies. No one knows what is in the Cloud holds quite true. Cloud Risk grows, we simply cannot protect what we do not know. End result remediation is quite literally broken and the Cloud Risk goes unmanaged.
Most cloud environments have multitudes of these configurations created just in time to accomplish a critical task and often forgotten. Unfortunately, every such configuration adds to the Cloud Risk, a hidden undocumented, unapproved access. Silos between functional teams, tools, and capabilities further add to the confusion. As a result, misconfigurations are missed, and vulnerabilities are not prioritized across the lifecycle of a capability. Gartner's report notes that Cloud misconfiguration along with application vulnerabilities are the two biggest sources of incidents. Everyone feels the pain, is overworked, and cannot seem to handle all the data generated by a plethora of tools. The tendency is to add more, generating more data for the teams to consume, and every tool adds additional workload on the already oversubscribed teams. With no end in sight, we return to add yet another tool that might solve the problem and find ourselves with a YAP, Yet Another tool Problem.
As we add more tools remediation is prioritized along with functional silos, and teams are not working on the highest priority risk. This adds more work, makes teams inefficient, and often adds to the overall Cloud Risk. See Fig 1.
We propose an alternate starting with full visibility of the Cloud — The lifecycle view, starting with getting the basics rights, Get the basics right, get the Visibility right. Data extracted for all environments — dev, stage, prod. All functional areas, design, development and production, code, applications, and Cloud Infrastructure all are visible, every aspect that is needed to secure the Cloud is tied through relationships and connection, network access, policy-based serverless and APIs, running application, and code and infrastructure all in a lifecycle view and all other data is an overlay on the basic visibility. Remediation is prioritized horizontally with the highest risk getting the priority but also across the full Cloud portfolio — teams are always working on the highest Risk Areas and we can see a vision we might not want to add more tools — we can get this right.
Start building a true Risk picture with that we can handle Cloud Risk. The Lifecycle view is so powerful as it starts breaking the functional silos teams can work more efficiently and more quickly. We believe that this has the potential to truly bring in Risk as not just a word we throw around but one that we use to make the appropriate business decisions because now we have the basics right and we know what is in our environment and not just Production.
Ahsan Mir is CEO of Rapticore, a cybersecurity startup. Ahsan has extensive experience in security operations, incident management, and leadership. He enjoys reading, trail running, climbing, playing guitar, and feeding birds. He can be reached on LinkedIn.