Are You Giving DNS the Respect it Deserves?

October 17, 2023October 16, 2023 Lee Atchison cloud-native applications, cloud-native architecture, DNS, networking

by Lee Atchison

The backbone of all modern, cloud-native applications is DNS. It’s the silent orchestrator of the internet, making sure that when someone clicks a link, they get where they expected to go. Most of the time, DNS just works. In fact, it’s one of the most reliable protocols on the internet today, and its creation dates back to the very, very early days of the internet. Its impact is widespread, and it is a critical component of every business today.

And while DNS is, overall, a highly reliable service, minor human-generated glitches are very common. These minor glitches can wreak untold havoc on your business.

Just look at Meta, the company behind Facebook. In late 2021, Facebook, Instagram and WhatsApp shut down for over six hours. The problem? A minor configuration change that ultimately took out their DNS system. The result? Facebook’s entire system, internal and external, was brought down for six hours.

Consider this: Your company’s online operations, all of its applications, and all of your internal and external tools all rely on DNS. If DNS is running seamlessly, business proceeds as usual and revenue streams flow uninterrupted.

But, if there’s a snag, even a very minor issue, the ramifications can be swift and severe. Suddenly, customers can’t access your services, and financial transactions are halted. Your business is closed to business.

And that isn’t even the worst part. DNS is critical to how your employees work inside your company as well. Tools, systems and online infrastructures all rely on DNS to function. So, not only is your application down and your business closed to customers, but your employees are stymied and can’t effectively work towards fixing the issue.

The most common DNS problem is the result of a minor configuration error created by an employee while they are making other changes. This minor configuration error can cause huge ripple effects.

For startups and small businesses, managing DNS is typically done by a knowledgeable team member who has many other responsibilities. When a DNS adjustment is needed, the team member stops what they are working on, makes the change quickly, and goes back to their ongoing project. It’s simple, fast, and isn’t given much thought. This is the exact set of conditions that make it perfect for minor mistakes to work their way in.

For larger enterprises, DNS is often locked behind a plethora of restrictions and guardrails, making any changes difficult. When a change is needed, the requestor has to jump through many hoops to make that change, and most of their time is focused on navigating the hoops and restrictions and less on the actual DNS change itself—this is another set of conditions perfect for making mistakes.

DNS is reliable, business-critical, yet highly unforgiving of minor bugs. And the processes we use to adjust DNS are breeding grounds for these minor bugs. The result? On a smaller scale, problems, such as the Facebook problem, occur to businesses every day. Every day, some business suffers a loss caused by DNS.

It’s a sobering reality.

So, how do you keep your business safe from DNS mistakes? Here are four key recommendations that all companies can implement to make DNS mistakes less likely and less impactful to your business.

Recommendation One: Automate your DNS configuration deployment process. Just like the rest of your application, a proper CI/CD deployment pipeline used to deploy DNS changes is an absolutely critical first step in getting hold of your DNS vulnerabilities.

Recommendation Two: Use a revision control system. Just like the source code for your application, DNS configuration should be managed using a revision control system. DNS can be managed exclusively using simple text files, and those text files can be checked into a revision control system such as Git, just like your source code. The text files in your revision control system can then be deployed using your automated DNS deployment pipeline. The result? Changes can be tracked, problems can be easily identified, and serious issues can be rolled back to working configurations quickly and easily.

Recommendation Three: Implement a quality review process. No single person should be allowed to make any changes to your DNS system without being independently reviewed and examined. This review process can catch minor glitches and other mistakes before they are deployed. This review process is a lot simpler to implement if you have already implemented the first two recommendations above.

Recommendation Four: Use independent DNS providers. Many existing service providers offer DNS services, often for free, along with the rest of their services. Given how closely DNS is tied into everything in your business, instead of using these “easy to access” DNS providers, use a separate, third-party provider. In fact, use more than one. Put your production, customer-facing DNS needs into one provider, and put your internally-focused development and operations tools into a different provider. One of the reasons why the Facebook outage was so long was because the DNS outage not only brought down the customer-facing websites but also brought down all the internal tools the company uses to solve problems. Keeping these independent will reduce the likelihood of a problem in part of your application ecosystem, bringing down other parts of your ecosystem.

DNS is an easy service to forget about. The vast majority of the time, it “just works.” But when it fails, it can be devastating to your business. And DNS failures are almost universally driven by human error. Rather than treating it as a minor part of your application ecosystem, treat it like a major core part of your overall infrastructure. Without proper oversight, DNS can kill your business.