8 Things You Don’t Know About Architecting Your Business Infrastructure for Scale

What does it mean to “architect for scale” and why do you need to do so? Architecting for scale is about building and updating critical applications so they deliver what your increasingly demanding digital customers expect. Remember, your application’s performance, more and more, will be compared with the likes of Amazon and Instagram and Facebook. Architecting for scale is a way of thinking, designing, planning for, and executing so your applications meet the needs and demands of a growing customer base, no matter the size or their expectations.

Architecting for scale is about applications and ensuring they and your business keep up to date with modern customer expectations.

Architecting for scale is critical for all modern applications and all modern digitally engaged organizations. It’s so important that I felt the need to outline eight things you probably don’t know about architecting for scale.

1. Availability and scalability go hand in hand

These two concepts- availability and scalability -go hand in hand. Therefore, you must address them both to adequately address either one.

When customer load increases and the application cannot respond to the increased traffic, the expected result is slowdowns, brownouts, and blackouts. This upsets your customers and ultimately turns them away from you. When your customers can’t use your application the way they need to use it-if they find it too slow, unresponsive, or unavailable-they will go to your competitors instead. You don’t need to worry about scaling your application if you don’t have any customers.

Scalability and availability are two parts of the same problem.

2. Scaling is not just about increasing traffic to your application

But scaling is more than that. As the number of customers grows, your business expands, and the needs of your application increase. To keep customers coming to your business, you expand and add more capabilities.

Not only is your application handling more customers with more data, but your application needs to have more features and capabilities, which means more developers and other people working on improving your application.

Increasing the number of people working on your application can lead to other types of scaling issues. Testing and rolling out new features becomes harder as the number of people working on the application increases. Just as your application demands increase, your ability to expand your application-and test it to make sure it keeps working as expected-decreases. This is another type of scaling problem.

This is why moving away from monoliths and moving toward service-based architectures is so popular. It improves your ability to scale your application development, which improves your speed of innovation-without sacrificing product quality.

3. Architecting for scale is as much about team culture and organization as it is about technology

This means adopting DevOps culture and organizational strategies. This means adopting DevOps processes and systems. This means using STOSA (single team oriented service architecture) principles for building and operating your services. This means continuous delivery of new features, rather than deploying large product releases.

It means changing your mindset to adapt to the modern needs of your customers, your company, and your application.

4. SLAs are not just for customers

But when it comes to building a high-scalability, high-availability application, you need to specify internal SLAs-performance promises between services. In a true service-oriented architecture, in order for a service to perform as needed for its customers, it must be able to depend on internal services to perform at their promised level of performance.

In a STOSA team, a service-each of which is owned by an individual team-needs to make performance guarantees to all the other services that depend on them. Every service must make promises to each dependent service in order for the application as a whole to meet external commitments. Monitoring these internal SLAs can be used to determine the source of a problem during an outage.

If, for example, an application API call is running slower than normal, by checking the SLA commitments and performance of all the called services, you can narrow down what service might be the cause of the slow-running API call.

Internal SLAs are a critical component in operating a large, growing modern application.

Want more from Lee? Sign up for updates!

5. Improving availability requires managing risk

Donald Rumsfeld, the former U.S. Secretary of State, famously said that the problems to be concerned about are the “unknown unknowns”-those problems we don’t even know that we don’t know about.

Risk management is about turning the unknowns into knowns. In the case of modern applications, risk management is about identifying and managing areas of concern, then addressing the risks that have the highest impact to our business.

A risk matrix is a common tool to help manage application risk. Risk matrices give visibility and prioritization to technical debt and pending problems. They are a great communications tool between development teams and management.

Effective use of risk matrices will help reduce availability issues in your application.

6. The best way to make your application work is to break it

This process, called Game Day testing, involves force-breaking some portion of a running and operating application in some manner, and seeing how it behaves and how the application-and the teams supporting it-respond to resolve the forced error.

The idea is that the best way for a team to learn how to resolve certain application failures is to see how they fail in real life. But rather than waiting for them to fail at some random point, you instead force a failure at a more convenient time to evaluate how your systems and teams resolve the problems. Random failures tend to occur at inconvenient times, such as the middle of the night or during a critical operation. Forcing them at a more convenient time lets you choose a low usage period, or a daytime hour when everyone is in the office and can work together easily on the problem.

Game Day testing can be planned events, such as testing a data center failure by disconnecting an important data center for a few hours. Or they can be randomly generated failures, using tools such as Netflix’s Chaos Monkey. This allows you to see whether your application can self-recover from the problem, or see how quickly a support team can resolve the problem.

Constantly testing various failure scenarios is a great way to keep your application operating and improves your application and support team’s ability to quickly resolve problems in the future. Nothing improves your application availability better than breaking it regularly.

7. The cloud is essential to scaling

The cloud’s ability to quickly add new resources, such as servers, storage, and network capacity, to an application is invaluable in building a cost-effective application. These resources allow the application to scale up and down based on application usage. They can even add significant resources quickly (burst) to support sudden and unexpected usage spikes. In the modern world, where a celebrity’s or influencer’s mention of a business on social media can cause that company’s traffic to increase by many orders of magnitude almost instantly, the ability to handle usage spikes is critical to application scalability.

But it’s more than that-the ability to create an entirely new data center by replicating an existing data center quickly allows you to create disaster recovery plans that do not require investing in standby hardware. Instead, if a disaster occurs that causes an application’s data center to go offline, for example, a new data center can quickly be brought up to run the operation with relatively little effort.

Such abilities are critical to modern digital applications, and can only be practically implemented using public cloud networks.

8. You should ignore the serverless hype

But while serverless computing has great value in many situations, the serverless hype prevalent in the industry would indicate that everything should be serverless.

However, nothing can be farther from the truth. In many situations, using serverless computation may actually cause your application problems, or at least cause it to run much more expensively than is needed.

The pros and cons discussion of when and how to use serverless computing is beyond the scope of this article, but the point is that serverless is not appropriate for all situations.

Make sure to only use serverless capabilities where they will benefit your application-not where all the hype says you should use it.

Architecting for Scale-the book

Interested in learning even more? See my full list of books and reports, and check out my online courses as well as my column in

Originally published at https://leeatchison.com on August 23, 2021.

--

--

Lee Atchison is a recognized industry thought leader in cloud computing and application modernization. leeatchison.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lee Atchison

Lee Atchison is a recognized industry thought leader in cloud computing and application modernization. leeatchison.com