Integrating Performance Testing into your DevOps Pipeline

The Typical CI/CD Pipeline

What comes to mind when you think of a mature CI/CD pipeline? Many people think of an automated process consisting of build, test, and deploy stages. Yet, there is always room for improvement, and there are useful quality steps many organizations tend to overlook.

Let's focus on the testing stage of this process. What kinds of tests should be run during this stage? Perhaps a mixture of unit testing, API testing, and UI testing? It’s also likely there’d be some regression test suites set up as a part of the pipeline. Those are all good, but what about performance testing as a part of the testing stage? Often, performance testing is overlooked and not included in CI/CD pipelines.

Performance testing is usually run separately from automated processes and maybe for good reason. Performance testing can be time-consuming and resource-intensive, making it too bulky to integrate into typical CI/CD pipelines. However, that doesn’t mean all forms of performance testing are unsuitable for CI/CD. By breaking up your performance tests into smaller, more manageable pieces, you can effectively integrate these performance tests into your CI/CD pipeline without greatly impacting the end-to-end run time of the pipeline.

But first, let’s cover some performance testing 101. Feel free to skip this part if you’re well-versed.

What is Performance Testing?

Performance testing is conducted to get insight into how an application performs when placed under a given workload. Performance testing can be broken up into multiple subcategories, the names used interchangeably sometimes. In this post, we’ll cover and define the two varieties of performance testing we believe are most important to the CI/CD pipeline.

Load Testing

Load testing (or Ramp-Up testing) is used to evaluate the application's performance as the number of concurrent users increases. This testing can determine how many concurrent users the application can handle, as well as the average response times users will experience under such a load. This testing can also determine if the load-bearing capabilities of the system meet any previously established performance benchmarks.

Your typical load test will consist of multiple parameters: thread count, ramp-up period, and loop count. The thread count is used to set the total number of concurrent test threads, which will simulate the number of concurrent users. The ramp up period is the time interval it will take until the number of concurrent threads reaches the specified thread count starting from a count of one.

For example, with a ten-second ramp-up and a thread count of ten, one new thread of testing will be added after every second, until the count of ten is reached. The loop count is the number of times the entire concurrent test run will be executed. In this case, given another test with a thread count of ten and a loop count of five, the ten threads of testing will run, then the ten threads will be kicked off another four times until the specified loop count is reached.

Stress Testing

A stress test is used to evaluate how an application behaves under extreme loads. A good example of this is a Black Friday sale where the application may experience an extreme spike in traffic. The stress test aims to simulate these scenarios. In particular, stress testing can determine which system components fail first and also determines how well the system recovers after being overloaded. The report data from the stress test will help determine if the application was recovered gracefully after being overloaded. Stress tests are useful because they can reveal issues such as memory leaks, data corruption, security vulnerabilities, and more.

Why is it Important?

Performance testing is important because it simulates the conditions the end-user may experience when using the application in the real world. A real end-user will use your application along with numerous other end-users. As quality professionals, we want to ensure the app works as expected for each concurrent user.

On a more technical level, performance testing ensures the application's scalability, stability, and reliability standards are what we expect. Without performance testing, you may encounter a situation where users will be turned off by the application’s poor performance during peak activity. It’s bad enough if your app slows to crawl, but even worse if it experiences failures when requests start to timeout. But this needn’t happen. These issues can be caught and mitigated early in the development process when performance testing is factored into your CI/CD pipeline.

Creating Performance Tests for your Pipeline

We now have a good idea of what performance testing is, and why it’s beneficial to include as a quality gate within your automated CI/CD pipeline, but how should you create performance tests specifically for the pipeline? Generally, we want to account for three things:

The type of performance test created.
What benchmarks should be met.
What parts of the overall system should be targeted.

Pass/fail criteria can be defined based on existing performance baselines. We can capture these through web analytics, or based on new performance targets defined by the business. Specific test criteria can be defined based on desired response time, throughput, hit rate, error rate, CPU utilization, memory utilization, and similar metrics. These criteria can be defined on an app-by-app basis and your organization can determine what’s best for its own needs, but we do want to control which parts of the system should be targeted so we can efficiently utilize performance testing within a CI/CD pipeline.

Performance Unit Testing

Just as unit tests are used to test the smallest units of application code, we want performance tests to target the smallest unit of system functionality. In an ideal world, your application would be developed using a microservices architecture. A microservices architecture makes use of small, decoupled, independently functioning services. By providing individual endpoints, applications utilizing microservices architecture are easier to set up, maintain, and organize performance unit tests for. For organizations not yet using microservices, there are alternative solutions that can be explored.

Integrating Performance Testing into your Pipeline

Now, we can add a performance testing node to the testing portion of the pipeline definition giving us a CI/CD pipeline with performance testing. Of course, it’s not that simple. You may wonder how we know which performance unit test to run? The solution is surprisingly simple… tagging!

Through clever use of tagging, we can fine-tune which steps in the pipeline are run, and which tests are run during the testing portions. Tagging can flag whether or not the incoming change should impact performance. This way the performance testing step can be bypassed in cases where no changes to performance are expected. In an advanced DevOps practice, the tags can also be used to determine which performance tests should be run. For example, when a change affecting login is made, that particular change can be tagged with something like "performance_impact" and "login." The pipeline logic will determine the performance testing step that should be run, and that only the login tests need running.

Implementing these enhancements to the pipeline logic will save precious time and resources for your organization. Making these changes to your DevOps process may be difficult and time-consuming, but they pay off by ensuring your pipeline runs as efficiently as possible, while also maintaining the highest quality standards including strong performance.

Need help with performance testing? Contact us today.