How to Enable the Next Generation of Microservices

July 3, 2024 Randy May AI, ESP, microservice architecture, microservices, ML

It is widely acknowledged that microservices enable high scalability, productivity and agility. Organizations can break down monolithic applications into smaller, purposeful and reusable services with microservices, and DevOps teams can build, test and deploy applications faster.

However, using microservices to their full advantage today involves new ways of thinking about the microservices model — enter event-driven microservices. Businesses are turning to event-driven architectures to become more responsive and capitalize on short-lived opportunities. As the tolerance for latency shrinks, the event-driven shift is causing the industry to rethink microservice architectures.

New use cases that call for low-latency, event-driven architectures include:
• Operationalizing AI and ML models (aka ML inference)
• IoT data analytics and processing, from edge to cloud
• Fraud/anomaly detection
• Transaction monitoring and processing (e.g., online payments and e-commerce)

A central challenge for architects is preserving the agility and productivity provided by the traditional microservice model while enabling these new use cases to embrace real-time data.

Much of event-driven microservice architectures revolve around a simple fact: Responding to requests or processing messages requires additional data from multiple sources. For example, a recommendation service would need the customer’s preferences and history. An order service might need customer information and inventory information. The time required to access this contextual data typically determines application performance.

To address the contextual data issue, the current generation of microservices typically incorporates an in-memory data store or a fast data store into the stack. The fast data store sits between the original data sources and the serving layer, which is often a pool of stateless spring boot instances. The fast data store boosts performance by being in memory where the access speed is not bound by disk performance and because the data in a fast data store is pre-processed to some extent.

The work required to combine data from separate sources and format it is done in the background, rather than when the data is requested by the application. These two mechanisms are typically sufficient to produce access times in the single-digit millisecond range. A local cache on the serving layer can further reduce latency to the microsecond range at the cost of some data staleness.

Enabling Event-Driven Microservices

As companies strive for quicker and smarter responses, the need for speed and throughput is increasing, primarily fueled by data-hungry ML models.

In-memory, fast data stores accelerate access to contextual data, however, merging contextual and real-time data usually requires network hops because the business logic and the contextual data are hosted in two different products on two different machines. The emerging event-driven microservice architectures, combine compute capabilities and an in-memory data grid in the same runtime to deliver ultra-low latency. The industry is still evolving, but the trend toward unification is unmistakable, with some vendors offering what is known as a unified real-time data platform.
Another challenge with contextual data is that it is typically too large to keep it in memory on a single server. This fact can make “near caching” impractical. To address the challenge, fast data stores stripe their data across multiple servers. Going a step further, unified data platforms stripe both the processing and the data across a cluster of servers. Because it is a single platform, the data and service layers are aligned so that, for example, each credit card transaction is processed on the server where the contextual data related to the account holder lives. This co-location of data and processing brings access time well into sub-millisecond territory while still allowing the business logic to be redeployed independently of the in-memory data.

Additionally, the industry is beginning to shift attention from event brokers — which are now reasonably standard technology — toward the components required to process events efficiently. Platforms designed from the ground up to process high volumes of event streams are emerging. Known collectively as “event stream processing” (ESP) engines they are not modeled on the concept of a remote function call, but rather on a graph of stream processing stages connected into a low-latency, event-processing pipeline. The pipeline is an independently deployable component that performs a coherent group of business functions by consuming and emitting events. In other words, it is a counterpart of the traditional microservice.

ESPs offer many advantages for modern event-driven applications. ESPs are designed to handle stateful tasks such as aggregation or de-duplication, which are difficult to handle with a stateless application server. Also, they are inherently asynchronous, allowing them to scale and process millions or billions of events per second.

Conclusion

Real-time AI/ML applications are both data and compute-intensive. Extending the microservice paradigm to work for these use cases requires both rapid access to contextual data and a way to perform complex event processing at scale. The need for speed virtually mandates the use of in-memory data. Forward-looking vendors are moving beyond this, combining a fast data store with rich compute capabilities and an event processor in a single runtime. These newer architectures “flatten the stack,” simplifying development and management and enabling businesses to build, deploy and manage high-performance, data-intensive applications quickly and at a lower cost.