Overcoming Kubernetes Infrastructure Challenges

October 19, 2020October 19, 2020 Keith Basil edge computing, heterogenous architecture, kubernetes, Kubernetes clusters

The widespread adoption of the Kubernetes standard for container orchestration has redefined how organizations manage computing deployments at the edge. In turn, more innovative use cases have emerged for lightweight Kubernetes distributions like K3s, designed for small footprint workloads.

These Kubernetes deployments can range in size from 500 to 600 single-node clusters to 15,000 to 20,000 clusters, with an average size of 1,700 clusters. As an example, an oil and gas support services company, and Rancher Labs customer, intends to support a mobile fleet of more than 600 vans that act as mini data centers. Traveling to remote oil wells, these vans use Kubernetes to conduct sensor analysis, measuring everything from drill efficiency to flagging potential gas hot spots.

Another organization has plans to use Kubernetes across its 100 factories. Single-node clusters each manage purpose-built, programmable applications that run robotics and machine components in production lines.

Single-Node Kubernetes Clusters, Multiple Issues

But the proliferation of single-node clusters creates three primary infrastructure management challenges: security, heterogeneous architectures and limited connectivity. To manage clusters on a massive scale, organizations need to overcome these challenges.

Securing the Edge

Security is a fundamental concern as enterprises push compute and storage outside the data center into hostile environments where there are no physical security barriers.

For example, thousands of people might visit a retail environment every day. There is nothing stopping someone from reaching over the counter, unplugging a small footprint device and walking away with it.

Security starts at day zero when the machine is built using a bill of materials, shipped to a location and installed. It must remain secure in transit and not be tampered with before being switched on. Once activated, the machine must verify that no rogue software has been installed. Only then can it securely register with its management system.

The day-two problem centers on the continuous secure management of these devices. Imagine the logistics if an organization has 1,700 of these clusters deployed across a continent, many in remote locations with limited or intermittent connectivity. It is simply not practical to send people to manage and verify the security of each device.

There are several open source projects that help in this regard.

The Intel Secure Device Onboard (SDO) solution provides a similar plug-and-play experience to that of connecting a USB device to a laptop. With SDO, an edge device can be securely and automatically onboarded in a zero-touch manner into the cloud environment in less than a minute.

Another open source project is Keylime, MIT’s remote attestation based on the Trusted Platform Module (TPM). This creates a Hardware Root of Trust from the TPM chip built into a device all the way through to the data center. Using the TPM cryptographic hash, Keylime conducts trusted verification of the device to ensure it has not been tampered with on a hardware or software level.

Managing Heterogeneous Architecture

An organization with a large number of edge devices might use a mix of ARM and Intel-based architectures, making multi-operating system support essential. In these cases, enterprises need to support continuous integration pipelines for each device type. Some use cases also require graphics processing units (GPUs) to support the workloads.

For instance, a fast-food chain might use GPUs for visual analysis to count the number of cars at a drive-thru. These GPUs can also be used to process languages such as the conversation between the customer at the drive-thru and the person taking the order inside the restaurant. Using GPU-backed conversational AI, interactions can then be analyzed on a national level to determine whether employees understand customers and identify areas where training can enhance the experience.

To optimize capital expense (CapEx), organizations must first define the use case for the Kubernetes clusters. Working backward from workload requirements, the appropriate hardware bill of materials is specified to support that use case. In the example of the fast-food restaurant, GPU support is critical. At scale, these Kubernetes clusters (many of which are single-node clusters) can then be engineered in a purpose-driven model so that overprovisioned hardware doesn’t go to waste.

Kubernetes must therefore be architecture-agnostic. This is critical to meet both the business demands for the customer as well as the application use case, whatever that might be.

Connecting the Edge

Kubernetes deployments also face challenges in limited or intermittent connectivity environments. This means container sizes cannot be so large as to clog the pipeline and limit how quickly updates can be delivered to the edge.

The proliferation of Kubernetes typically sees most clusters purpose-built with small hardware profiles without much memory or storage space. For example, a machine might only have 2GB of RAM, one reserved for infrastructure and the remaining memory footprint to run workloads. Development teams must therefore build applications that target small file sizes. This is also important when considering how long it might take for an update to reach a container in limited connectivity conditions. Containers must, therefore, be kept as small as possible.

This is where the GitOps operating model comes into play. It provides a set of best practices to join deployment, management and monitoring for containerized clusters and applications. Within that, Kubernetes can leverage a pull model that sees the edge “phoning home” for updates when there is connectivity in place. At scale, having tens of thousands of clusters do this is significantly more effective than trying to push down updates when there is no reliable connectivity.

A Standardized Approach for Kubernetes

These edge use cases scratch the surface of the potential for CNCF (Cloud Native Computing Foundation)-certified Kubernetes distributions to enable the internet of things.

The standardized Kubernetes API (application programming interface) is perhaps the most powerful aspect that comes from Kubernetes. This means that every Kubernetes cluster can link to any cloud environment, regardless of the service provider. With one API guiding everything, the cloud ecosystem now has a standardized way of managing all its infrastructure.

The notion of a hybrid cloud is behind us. Using a single API that can point and connect to wherever the cloud is for an edge device creates an environment where Kubernetes can be everywhere and run everything.

This article is part of a series of articles from sponsors of KubeCon + CloudNativeCon 2020 North America