Lessons in Legacy to Microservices Modernization

May 3, 2021May 4, 2021 Brice Dardel agile, KubeCon + CloudNativeCon, microservices, teams

Modernizing legacy applications to a microservices architecture is very different from starting from scratch on a greenfield new application project. Large companies typically have working software that spans decades, and the challenges to modernize are numerous, ranging from collaboration and alignment to more technical topics requiring new paradigms and upskilling the workforce.

Part One – Organizational Lessons Learned

This collection of lessons learned includes pain points and suggestions on how to approach these modernization challenges. Some of the lessons learned may not be applicable to you, but, hopefully, some are relevant and can be leveraged.

The 15 lessons we identified are divided into the following two parts:

Organizational lessons, focused on people and processes, and technical lessons detailing technology-specific opportunities.

First, a note on the technical stack and some context. The legacy applications were Java J2EE code running on WebLogic. The new technology stack is a set of SpringBoot microservices running on OpenShift. There are many approaches to decomposing a monolith to microservices. A lot of articles discuss how to lower risk by carving out functionality of the monolithic application one module at a time, releasing small pieces iteratively. The downside to this approach is the need to refactor (reduce) the original monolith on an ongoing basis, a complicated – and typically messy – process. This article is different – and, we believe – more interesting. Our lessons are drawn from a real-world example of a “big bang” migration dealing with more than a billion dollars worth of transactions a day; turning off the legacy system all at once and replacing it with a dozen microservices overnight.

Here are the organizational lessons learned and included in this article:

Develop a team structure (the one that worked for legacy applications won’t work for agile microservices).
Prepare the field with a small group of trailblazers, then scale to more teams.
Define an ownership model and RACI for microservices.
Reorganize high-performing teams with care (they are the treasure of a company).
Keep a close eye on processes that give the illusion of control (and actually slow down the delivery of value).
Plan for production-only issues.
Reserve capacity for innovation.

Lesson One: Develop a New Team Structure

You must create a new team structure, as the one that worked for legacy applications won’t work for agile microservices. It’s not unusual for legacy software to have been developed with legacy methodologies such as:

Long release cycles to production, even when using sprints (hardening sprint, UAT, etc).
Separate teams running testing and quality assurance.
Security as an afterthought in the development process.

In this model, multiple teams have separate, distinct responsibilities. Features move from one team to the next, each of which has their own definition of ‘done.’ The lack of shared responsibility and alignment with goals increases cycle times and creates inefficiencies between teams. Most importantly, this structure does not foster innovation or enable modern practices where we treat everything as code. Nor does it specify automated testing as part of the definition of ‘done’ for a feature (where techniques such as behavior-driven development can be leveraged).

When considering breaking up a large monolith and embracing modern product team structures, it’s important to start by setting up these new teams differently and more efficiently. Spending the time to codify the RACI model between the teams will clarify responsibility, while also empowering those accountable. This is particularly relevant when the organization is still matrixed, with different silos providing resources to the product team (BAs, testers, software engineers). Everyone needs to clearly understand the definition of ‘done’ for every step in the process, including unit testing, functional testing, code reviews, security scans and quality gates.

Lesson Two: Prepare the Field, Then Scale

Start with a small group of trailblazers, then scale to more teams. Being agile doesn’t mean a lack of planning and moving forward with the first idea without thinking it through. While it’s tempting to replace the aged monolith with any modern architecture, don’t get in over your head; experimentation goes a long way toward reducing risk and ensuring productive decisions.

A dedicated pilot team – or a few pilot teams – can start exploring and building complementary template services, with explicit permission to backtrack as needed. Starting small does not mean starting trivial. Include the cross-cutting concerns needed by all services, such as security, logging and observability, with support for tracing requests as they move through various microservices. Ownership can start with the team(s) creating the templates. One important aspect of a successful pilot program is a mechanism to share best practices and templates among teams (portal, wiki, touchpoints). This effort will bring standardization and consistency in the cross-cutting concepts, which simplifies the tooling around the microservices – for example, log collection is immensely simpler if all services log in the same format.

Once these skeleton services are developed, it’s time to scale up to the number of teams needed.

Lesson Three: Define an Ownership Model and RACI for Microservices

Product team members have a shared responsibility to produce business value, no matter where they are from, in a matrixed organization (BA, Dev, QA, support, etc). A product should belong to one team, but one team can also have multiple products. When developers from different teams update each other’s services, it blurs the lines of accountability and ownership. It’s challenging enough for a product team to share the responsibility for quality in what used to be a siloed approach; if contributions start to originate from different teams, it becomes very chaotic. You start having cross-team pull requests, different expectations on how to test, and, at worst, the added complexity of managing new issues found. Is it because of what that other team has done? Who is responsible for the quality, now?

Those decisions are typically made because a team has extra capacity, and program management is trying to shorten delivery time. In our opinion, the availability of that extra team should be used differently; paying down their own technical debt versus trying to add features in other team’s microservices that are actively being modified by another team.

Lesson Four: Reorganize High-Performing Teams with Care

Without elaborating on the team development stages of forming, storming, norming and performing, new team formations need time to gel. It takes months for people to work well together, and some teams don’t reach a high-performing state by themselves. That’s when management needs to coach the team, reestablishing alignment toward shared goals, reducing work in progress, or changing personnel. Once a team has reached that high-performing state and social bonds exist between the individuals, the type of work they do (for example, which service they own) is less important than the trust and cohesion between the team members. A great team will find opportunities and push boundaries regardless of the product they own.

One concern is the practice of dismantling existing teams in an agile train to reallocate members into new teams by having team members select their new teams based only on the products that the new team would own. The rationale behind this approach can be to either create more (smaller) teams, share knowledge, or both. This often creates less-than optimal results, as team members select their new team based on products without information about the product owners or team technical leads. First, as mentioned previously, at the end of the day, it probably matters more with whom you work than what you do. Second, teams need a few months to gel, and dismantling and rebuilding teams starts the bonding process all over again. Third, if products later move to a different team, the initial rationale to pick a product goes away, unless team members have the opportunity to move around, too (if not, the tenet loses credibility).

As for alternate approaches to creating more teams, it’s often better to increase a team’s headcount and then split the larger team into two. For building subject matter expertise in various products in an organization, knowledge can be shared by scheduling team member rotations on a voluntary basis. If that voluntary movement is not enough, incentives should be created to encourage mobility, and term limits could be considered.

Lesson Five: Keep a Close Eye on Processes

When an audit or process has no correlation with the quality of the release, there is an opportunity for improvement. When ignored, it reduces the teams’ ability to deliver, and people may feel a sense of fatalism, perpetuating the concept that process-following is more valuable than the result itself.

Processes that contribute nominal value to the final product often originate from a well-intentioned policy that has, over time, become a checkbox exercise. Examples include attaching documentation to issues with little review of its quality and usefulness, or writing troubleshooting articles that end up being low value-add, like a copy-paste from other services. The self-inflicted pain of check-the-box activities takes courage to address, yet always pays back dividends exponentially when improved. The proper agency and a relentless focus are needed to trim processes and prioritize value add.

When a regulatory process is a bottleneck, revisit the original requirements to analyze whether the processes can be changed to achieve compliance more efficiently. For instance, a regulation originally intended to guarantee third-party penetration testing can now leverage automation and AI-based tools. Change the process and improve the results.

Lesson Six: Plan for Production-Only Issues

Unfortunately, not all problems occur within the confines of test environments. Problems occur in production, too. When they do, what’s most important is how they are managed to minimize the impact on end users and the business.

Let’s talk about a situation where an issue – not found in testing – shows up in production and impacts end users. The first thing teams attempt is to reproduce the issue in lower environments. But what if that reproduction can’t be done in a timely manner (and timeliness depends on the impact)? Improving the test environments and/or processes is needed, postmortem, but what about right now? Is there a plan in place that allows for controlled changes that pass all the existing tests, and that would tentatively improve the situation, to be deployed to production? Or is the organization entering into analysis paralysis and falling back to the process as a cover for taking calculated risks? Fear of making things worse is good, but stopping progress because of fear is not.

Employees often follow the process to mitigate the risk to themselves – processes that present issues that have likely occurred previously. Nobody gets in trouble for following processes, even if the outcome is detrimental. Successful continuous delivery and continuous improvement must include the ability to roll back extremely quickly and efficiently. When the business model permits, canary releases can also be leveraged to test changes. There are times when a change that tentatively addresses a production issue is not reproducible in lower environments – and still passes all the reviews and tests in those lower environments – is an appropriate level of risk to take in the short term.

Lesson Seven: Reserve Capacity for Innovation

Value-driven delivery is a core pillar of agile practices. When organizations put extreme pressure on feature delivery, they typically have fewer opportunities to improve the mode of operation, reducing innovation and future efficiencies. Even the most up-to-date team will fall behind on best practices and technologies if not given the capacity to experiment and adjust. This is even more critical when the team is not yet at the innovation stage and still catching up from a legacy application.

One way to set aside capacity for improvements is by scheduling regular innovation sprints dedicated to non-delivery outcomes. One caveat is that this dedicated time must be the first casualty to ensure that the delivery commitments are met. And that places the team in a dilemma: by internally scaling down the innovation efforts, its members can meet goals that the team committed to and be perceived as functioning well. In the long term, however, it decreases the team’s efficiency. When this happens infrequently, this is not an issue; if it happens regularly it must be evaluated.

Another recommended approach is to not treat innovation and paying down technical debt as activities scheduled in their dedicated sprints. Instead, slightly reduce the team’s capacity to make innovation an ongoing activity. That way, it will not be perceived as a buffer at the end of a few sprints.

We hope these organizational lessons learned are useful and help you in migrating your legacy applications to microservices. Don’t hesitate to add your comments below, ask questions, or share your best practices! Then, head on to part two to read about the technical lessons learned from such modernization efforts.