Optimize your cloud resources as solutions mature and uncover 'hidden' savings.

Thomas Lindegaard

Software Engineer

Jesper Nysteen

Software Engineer

Of course, running your software solutions costs money.
If you run them on-premise, you run the risk of having purchased too much hardware to be able to keep up with a possible business and development need or to cope with peak loads - but in the cloud you don't need to provision too many resources, so projects and organizations can avoid that situation altogether. That way, the IT manager can sit back and rely on getting optimal value for the budget? Or not?

________________________________________

There are two classic situations where you might provision too many or too large resources in the cloud. The positive scenario: The organization starts many new projects and is very innovative. The negative scenario: Sufficient knowledge about IT operations in the cloud is lacking. The provisioning takes place on different terms than with traditional on-premise operation, where you only get new or larger resources when the operations department installs them.

Regardless, it is possible to manage the costs you have in the cloud. Monitoring can be set up, demands are made that the operations department is aware of the provisioning of new resources and you can clean up regularly. It requires new skills in the operations department to take on that responsibility, in a new way.

You can also focus on costs as a requirement for DevOps projects in the organization. This blog is founded in a DevOps mindset, but the principles can also be used in an organization where development and operations are separated.

In the following, scaling in/out/up/down is mentioned. It can be compared to buckets – if you scale up and down, you change the size of the buckets. If you scale out and in, then you add or remove buckets.

Timing

When looking at costs, these are some of the triggers:

You can choose to do it from the start - but in the beginning you lack knowledge about where to focus your efforts, as you do not yet have empirical evidence for how the resources are strained over time.

You can also do it when Azure's built-in guide recommends it - but in that situation it might be Azure lacking knowledge of how the resources are loaded over time.

You can also choose to do so when you receive notifications/alarms from Azure that a budget has been exceeded - but this requires that you have been able to set up a realistic budget in advance. It can be difficult, as there is also a lack of empirical evidence for how the resources are strained over time.

We do this continuously, step by step with the maturity of our projects and the customers' increasing focus on costs associated with their 'new' cloud solutions.

A saving of 58%

On one of our projects in the financial sector where we operate a large part of the customer's infrastructure, we have identified a few obvious areas for savings.

A large part of the customer's systems is based on Azure App Services, which provides almost unlimited access to IT resources, so that we can smoothly deliver on new business requests as soon as they appear.

As the systems have matured and clear usage patterns can be identified and predicted, we continuously analyze the capacity used against the provisioned capacity. This allows us to identify areas where the capacity can be adjusted.

The development process for the customer runs according to CI/CD principles, so there is a need for development and test environments where all changes are pressure tested before they go into operation. In the start-up phase of the project, it was difficult to foresee the need for the availability of these environments, but as the resource requirements for both manual and automated tests became more predictable, opportunities naturally arose to minimize the cost of these resources.

The predictability allowed us to set up automatic adjustment of the number of instances in the App Service plan for the test environment depending on the scheduled work hours.

A graph of the resulting savings can be seen below (the Y-axis scale has been converted to an index), where it can be easily read that the automatic adjustment worked from around May 3rd. Compared to the configuration before 3 May, a saving of 28.6% has been achieved.

All our projects naturally leverage Azure's built-in methods to automatically scale resources, including the various scaling options for App Services. However, Azure does not offer any automatic scaling of the provisioned App Service Plan SKU, locking in a larger portion of the operational costs. SKU is Azure's term for a specific configuration of an Azure product – e.g. is the App Service Plan SKU “P2v3” the name for a certain “size” of server with 4 cores, 16 GB RAM and 250 GB storage.

For that reason, we have developed mechanisms that scale down App Service Plan SKUs for the development environment, outside of scheduled working hours – of course with a design that allows them to quickly start up again should a situation require overtime development.

A graph of the resulting savings can be seen below (the Y-axis scale has been converted to an index). Around April 16, an automatic scaling of the SKU was implemented for the App Service Plans in the development environment, so that a smaller SKU is used in the evenings and weekends. Compared to the configuration before April 16, a saving of 58% has been achieved.

Resources check - no stone left unturned

With another customer, we have created the back-end for an extensive online shop. When the project had reached at certain level of maturity, we agreed with the client's IT manager to calculate the costs of their cloud resources to see if the costs could be reduced. Since the strain on the resources was well known, the conditions for achieving a saving were also good.

The largest items across development, test and production environments have therefore been scrutinized. The most costly resources had the greatest focus. There are many resources that were not prioritized based on the consideration that the development hours will not be able to be financed through savings for the resources in question.

Specifically, work has been done with savings on Azure SQL, Api Management, App Service Plan and Data Factory.

For Azure SQL, reserved instances are used relatively early in the project (you can pay a fixed and lower price for a number of instances and commit in return for a period of, for example, one year). You can reserve for 1 or 3 years at a time, so it is a good example of a resource where you cannot optimize costs until you have some knowledge of the load.

Api Management is scaled down in the development and testing environment during the hours when development and testing are not being carried out.

The App Service plan must be able to be scaled out very quickly when the end users shop online in order to keep up with the often large number of requests. Most customers shop Monday to Friday between 6 a.m. and 4 p.m. In addition, some requests are paged (when a call is paged, for example, only 100 items are returned at a time and then called again if the customer wants more data.), which can give more, but smaller requests, and thus relieve the load on the individual instance. In addition, a lightweight version of the responses was created for the customer, which can be used in cases where the online shop only needs a subset of the data from the original responses. Finally, more rules for scaling out and in were added, so that scaling is not done as quickly outside the hours when there are usually the most customers online.

Here you see a rapid scale out of the app service plan at approximately 7 in the morning.

Data Factory could be part of the answer.

Azure Data Factory can use a self-hosted integration runtime. It already exists with the customer in question, for other reasons, and it is cheaper than an Azure integration runtime. In addition, all Copy activities (sub-task in Data Factory that copies data) in Data Factory have been configured with 2 Data Integration Units, so the default, which is more expensive, is not used. Similarly, all Data Flow activities have been configured to use 8 vCores, unless it is a critical Data Flow that requires more. On top of that, timeouts are also set on Pipelines and Data Flows, so that they do not keep trying to finish if, for example, there is data or services that are not available. In addition, pipelines are now also scheduled so that data is only updated during the online shop's busiest hours. Automatic monitoring has also been established to ensure that the various activities in Azure Data Factory continue to have the cheapest configuration.

After the pipeline work with the customer, we could ascertain that the costs for the above resources could be reduced by between 10% and 50%, which is a significant saving on the IT budget.

How could you delve into your own solutions?

Does your project or your organization meet the right requirements to optimize your costs on your cloud solution? Here are a few of the questions you can start by asking yourself:

• Do you have the right skills to optimize costs?

• Are you coming from an operation and development setup and need to move to DevOps?

• How much money will you be able to save on your cloud solutions without it increasing your time-to-market or lowering your opportunity to experiment with new solutions that support the business?

Whether there is anything to be improved could often be clarified in a relatively short time after combing through your set-up. If you need help with the process, we are of course ready to help and only a phone call away.

Please reach one of our cloud experts at: T/ +45 72 20 30 60 or use the contact form below to get in touch.

Do you need help getting started with the analysis and clarification?

Reach out and let us help you control your Azure Cost.

Get in touch