Azure Flaw Highlights Need to Secure Kubernetes Cloud Clusters

August 22, 2024September 19, 2024 Jeff Burt cybersecurity, Google Mandiant, kubernetes, Microsoft Azure Cloud

A recently fixed vulnerability in Microsoft’s Azure Kubernetes service put a spotlight on the difficulty – and importance – of hardening Kubernetes clusters in the cloud against cyberattacks, according to researchers with Google’s Mandiant security unit.

Mandiant in a report this week outlined a security flaw in Azure Kubernetes Services (AKS) that would have allowed hackers who exploited it to gain access to credentials for a range of services used in Kubernetes clusters, enabling them to access and steal sensitive information and escalate privileges to run other malicious operations.

“An attacker with command execution in a pod running within an affected Azure Kubernetes Services cluster could download the configuration used to provision the cluster node, extract the transport layer security (TLS) bootstrap tokens, and perform a TLS bootstrap attack to read all secrets within the cluster,” Mandiant researchers Nick McClendon, Daniel McNamara, and Jacob Paullus wrote. “This attack did not require the pod to be running with hostNetwork set to true and does not require the pod to be running as root.”

They added that “attackers that exploited this issue could gain access to sensitive information, resulting in data theft, financial loss, reputation harm, and other impacts.”

An Undocumented Component

The vulnerability affected AKS clusters that use Azure CNI and Azure Network Policy in the network configuration settings. The vulnerability stemmed from an undocumented component in Azure called Azure WireServer, which is used internally by the platform for a number of reasons, the researchers wrote. It’s a component that cybersecurity firm CyberCX wrote about in May 2023.

The Mandiant researchers wrote that they followed an attack technique laid out by CyberCX and were able to recover TLS bookstrap tokens for the cluster from WireServer, adding that “given access to the WireServer and HostGAPlugin endpoint, an attacker could retrieve and decrypt the settings provided to several extensions, including the “Custom Script Extension,” a service used to provide a virtual machine its initial configuration.”

Mandiant disclosed the security flaw to Microsoft, which has since fixed it.

Sitaram Iyer, vice president of emerging technology and global architects at Venafi, said Mandiant’s discovery of the flaw further proves that protecting TLS certificates is becoming more important.

“These kinds of vulnerabilities benefit hackers who have increasingly been targeting developer pipelines and underlying cloud infrastructures,” Iyer said. “Especially alarming is that attackers are now using malware to scour Kubernetes clusters for machine identities, like TLS certificates.”

The Risks in Complex Cloud Environments

He said organizations need discovery, observability, and automation capabilities for Kubernetes clusters to be protected “in complex multicloud and multi-cluster cloud-native environments.”

Such complex cloud environments can create unexpected security risks, according to Guy Rosenthal, vice president of product at DoControl, adding that this is more than a simple configuration error.

“It’s a sophisticated attack that exploits undocumented Azure components to gain elevated privileges within a Kubernetes cluster,” Rosenthal said. “What makes this particularly concerning is that an attacker doesn’t need root access or special network privileges to exploit it. They just need to compromise a single pod in the cluster. From there, they can potentially access sensitive information across the entire cluster, including credentials for various services.”

He added that “it’s like giving someone the keys to the kingdom just because they managed to sneak into the courtyard,” he said.

Hardening Kubernetes

Pointing to the AKS flaw and the need to harden Kubernetes, the Mandiant researchers wrote that “enforcing authentication for internal services, applying granular network policies, and restricting unsafe workloads with pod security are now table stakes for preventing post-exploitation activity that can compromise an entire cluster. These security configurations that limit [the] attack surface help prevent against known and unknown attacks alike.”

They wrote that Kubernetes clusters often are deployed without addressing the possibility that an attacker may have code execution privileges win a pod. This can happen several ways, such as through existing vulnerabilities in workloads, continuous integration build jobs, or a compromised developer account. In such instances, network police are key to preventing malicious active once the flaw has been exploited.

“Without network policies in place, you should assume a compromised pod can access any network resource any other pod on the cluster can access,” the researchers wrote. “This could include the local Redis cache for another pod, managed databases running in your cloud provider, or even your on-premises network. When these services require authentication and are configured correctly, this is a relatively low-risk configuration — a vulnerability in one of these services would be necessary for an attacker to exploit.”

Bootstrapping Trust

Kubernetes security experts have known for years that bootstrapping trust in Kubernetes nodes is difficult, they wrote. Kubelets running on Kubernetes noes need a TLS certificate signed by the control plane certificate authority to safely run, but “in a large distributed system where nodes (or virtual machines [VMs]) are constantly created and destroyed, how should that certificate be provisioned onto the VM?”

That can be done by using the metadata server across cloud providers to deliver a static token to provisioned VMs to prove the VM should be part of the cluster and thus issue a kubelet certificate. However, such metadata services are accessible to the network and could lead to token theft if an attacker has network access. With these tokens, the bad actor can create a kubelet certificate for their own system and use the credentials to attack the control plane, steal secrets, and disrupt workloads scheduled on their malicious node.

“While protecting these tokens by denying applications access to the metadata server can help, the managed Kubernetes industry has evolved beyond simple token provisioning as a means for identifying VMs for critical security decisions,” the researchers wrote.

They pointed to Google Kubernetes Engine (GKE) as an example, noting that it uses a trust bootstrap process that is backed by a cryptographically verifiable virtual trusted platform module (vTPM) that is part of shielded nodes, which have been enabled by default for newly created GKE clusters since 2021.

“GKE shielded nodes remove the risk of bootstrap token theft instead of concealing it,” the researchers wrote. “Instead of relying on possession of a static token to authenticate and authorize a request for a new kubelet certificate, the VM requests an attestation from the VM’s vTPM, which is then verified by the control plane before issuing the kubelet certificate.”