**How to forecast the performance of a database**

*Introduction*

*Data requirement and way of plotting the gathered data*

*Phases of forecasting*

*Proven Forecast Models*

*Scalability and forecasting*

*Conclusion*

*1. Introduction:***In**the area of database performance the most frequently asked question is:

“If the system workload doubles, can the system handle the load?” or may be “If we upgrade our CPUs, add 20 more IO devices,and increase the workload by 20%, what will happen to response time, CPU utilization, and average IO utilization?”

To answer those questions we must follow some method. Here I am trying to formulate such a method where the key metrics for forecast are:

- utilization,
- queue length,
- and response time.

*2. Data requirement and way of plotting the gathered data:*
To get the values, you essentially need 3 things:

• a few simple formulas

• some basic operating system statistics

• some basic database statistics

definitions and symbols:

**S**: Time to service one workload unit. This is known as the

*service time*or

*service*

*demand*. It is how long it takes the CPU to service a single transaction. For

example, 1.5 seconds per transaction or 1.5 sec/trx. The best way to get the value

for Oracle systems is to simply derive it.

**U**: Utilization or CPU busyness. It’s commonly displayed as a percentage and that’s

how it works in this formula. For example, in the formula it should be something like

75% or 0.75, but not 75. A simple way to gather CPU utilization is simply running

**sar –u 60 1**. This will give you the average CPU utilization over a 60 second period.

**λ**: Workload arrival rate. This is how many transactions enter the system per unit

of time. For example, 150 transactions each second or 150 trx/sec. When working

with Oracle, there are many possible statistics that can be used for the “transaction”

arrival rate. Common statistics gathered from

**v$sysstat**are*logical reads*,*block**changes*,

*physical writes*,

*user calls*,

*logons*,

*executes*,

*user commits*, and

*user*

*rollbacks*. You can also mix and match as your experience increases. For this paper,

we will simply use

*user calls*.**Q**: Queue length. This is the number of transactions waiting to be serviced. This

excludes the number of transactions currently being serviced. We will derive this

value.

**M**: Number of CPUs. You can get this from the instance parameter

*cpu_count*.

The CPU formulas for calculating averages are as follows:

U = ( S λ ) / M (1)

R = S / (1 - U^M) (2)

Q = ( MU / (1 - U^M) ) – M (3)

**Based on the above formulae we must do the following things:**

*3. Phases of forecasting:*- Workload Characterization
- Model Development
- Validation

and then go to the forecast phase.

Here are the brief explanation of the phases:

*Workload Characterization*
Most important part of workload characterization is data gathering. Most likely you will have to

create your own data gathering tools. (could be used “sar”, “vmstat”, “glance” etc) And the data that is to be gathered will be highly dependant upon which forecast model you use

A detailed discussion of characterizing your workload is out of scope for this paper. But the thing to be

remembered is to keep it as simple as possible.

*Model Development*
Every forecast uses some kind of model. Even a standard benchmark attempts to

*model*a real-life
system. We use models everyday of our life. We draw pictures of an Oracle instance, we work on

an ER diagram, we play with model airplanes, cars, and trucks (some of us still do), and we

pretend we are building bridges and buildings with wood blocks. These are all models and they

are fundamental to understanding reality.

*Validation*
Once a forecast model has been developed we must

*statistically*understand just how well the
model actually forecasts. The validation process takes an optimized model and runs unseen (i.e.,

data that the model was not been trained/optimized with) historical data through the model and

compares the forecast values against what we know to be true. It’s the classic predicted value

minus the actual value…hence we have an opportunity to statistically measure error and

randomness. The statistical information will be used when making our final forecast related

statements.

Very basic and simple statistics become very useful and extremely powerful. After we validate

our forecast model, we can responsibly make forecasting statements.

*4. Proven Forecast Models***Simple Math**. As the name states, it’s

*simple*. The funny thing is, we use simple mathematical

forecasting all the time without ever thinking about it. For example, if we know an Oracle client

process consumes 10MB of non-shared resident memory and plans are to add another 50 users

(resulting in 50 additional client processes),3 then the system will require 500MB of additional

memory. Because there is no queuing involved with

*Simple Math*, it’s only appropriate for
forecasts like memory, basic IO predictions, and basic networking. I would resist using simple

math for CPU, advanced IO, and advanced network predictions.

**Ratio Modeling**. Ratio modeling is a fast way to make low precision forecasts. Myself and two

colleagues developed the

*Ratio Modeling Technique*back in the mid-1990’s in response to finding
ourselves in situations where we simply had to deliver a forecast, yet we did not have the time to

use an alternative model.

*Ratio Modeling*works very well when you are quickly budgeting
hardware, assessing and exposing technical risk, validating alternative technical architecture

designs, and especially when

*sizing packaged applications*. The technique enables you to define
the relationship between process categories (e.g., batch processes) and a specific system resource

(e.g., CPU).

*Ratio Modeling*produces no statistical data, so it is truly a*back of the envelop*
forecast technique. It has been so useful, there is a separate paper devoted entirely to

understanding and using

*Ratio Modeling*.**Linear Regression (LR)**. Linear regression is fantastic because it’s simple to use (assuming you

have a good tool4) and provides statistical measurements to add strength to your forecast

statements. I typically use

*LR*when I want to quickly predict CPU utilization for a given business
activity like

*orders shipped per hour*or*web hits per second*. Regression analysis requires data
and data requires a system to sample from. Therefore, LR requires a production environment.

The relationship between just about anything and CPU utilization is

*not*linear. Forecast
techniques and reality demonstrate that at around 55% utilization, queue time is already

consuming around 5% of total response time. And at around 75% utilization, queue time is

already consuming around 15% of total response time. So while I enthusiastically use linear

regression, I will never make a forecast with a utilization over 75%. Simply because even though

the forecast model says it’s OK, I know it’s not because it expects the data to be linear when it’s

clearly not. So be careful when using LR, but use it liberally and appropriately.

**Simple Queuing**. Queuing theory is a wonderful and powerful forecast technique. It can even be

implemented into an MS-Excel workbook.5 Understanding queuing is a fundamental performance

management concept. The components are simple: a transaction, a queue (or

*line*as we say in the
USA) and a server. When a transaction enters a queue response time starts and queue time starts.

When the transaction finally begins getting served by the server (a CPU perhaps), queue time

stops and service time begins. When the transaction is finished being served, service time stops

and response time stops. This can be symbolically6 shown mathematically:

Rt = St + Qt

**Simulation**. While common amongst the world’s research community, simulation is virtually

ignored when forecasting Oracle system performance. Cost and education are the primary

reasons, but with some instruction, a little experience, along with some basic statistics, simulation

can be a fantastic solution that yields surprisingly good precision.

Many commercial simulation packages are graphical and can produce some eye-popping ¡°ah ha¡±

moments when presenting the animated results to management. However, these commercial

packages can be relatively expensive and have a substantial learning curve. But there are

alternatives as I will discuss below.

Basically, a simulation forecast model simulates transactions working there way through a system

(in our case, an Oracle based system), while watching and recording what is happening. For

usability and precision7 reasons, the model is highly abstracted with the entire computing system

represented with only a few queues. To enable the simulation to mimic your production system, it

must have the ability to be

*tuned*. The tuning process can take a long time if done by hand.
Suppose the simulation model had just four tunable parameters. The possible combinations are

virtually unlimited yet only a few combinations will yield an optimized/tuned forecast model that

will provide the precision we desire. Even when intelligently adjusting the parameters, to

*begin*
approaching acceptable precision, it could take literally hundreds of individual simulation runs.

Obviously, an automated optimization solution is needed and I think you can see that the

computational power required could be tremendous. Fortunately, there are also solutions to this

challenge that makes low cost simulation forecast models possible. Keep in mind that for basic

demonstrations and when used as an educational tool, running a few simulations can work

beautifully.

With the right tools, simulation models can be built rather quickly, optimized rather quickly

(assuming the computing power is available), yield precise results, and allow for both component

level forecasts as well as the system as a whole. This makes simulation an extremely powerful

forecasting tool.

*5. Scalability and forecasting***The Relationship Between Physical CPUs and Effective CPUs**

Scalability is more than a function representing the relationship between workload and throughput.

When more broadly applied, scalability helps us understand the relationship between

physical CPUs and their

*effective application processing power*(sometimes referred to as*effective**CPU power*,

*effective CPUs*, or simply

*power*).

**How Scalability Is Used in Forecasting**

Scalability gives our forecast models an additional dose of reality. Queuing theory-based models

assume that when given a CPU, the entire CPU’s power is available for user-related work.

This is not true, in part because a portion of that CPU’s resources must be used to manage that

CPU. Thus, we have the concept of effective CPU power in contrast to physical CPU power, as

noted in the previous section.

To infuse that additional dose of reality into our forecast models, the physical CPUs are

scaled down to effective CPUs before being placed into our forecast models. For example, if the

scalability model shows that our database server’s 10 physical CPUs actually supply 7 effective

CPUs of power, the key is to place 7 CPUs, not 10 CPUs, into the forecast model. If a forecast predicts

12 effective CPUs are required to meet service levels, those 12 effective CPUs are placed

back into the scalability model and transformed into perhaps 16 physical CPUs.

**SCALABILITY VS. QUEUING**

Scalability is not the same as queuing. Given enough arrivals, queuing will occur even with perfect scalability.

For example, suppose the operating system divides a process into 100 pieces of work, yet there are only

24 CPUs. As a result, there will be 76 pieces of work queued. In this situation, scalability (that is, the ability to parallelize) is not the issue. The issue is the combination of the arrival rate, the service time, and the number of servers is causing the pieces of work to queue. However, if there were 128 CPUs, then 28 CPUs would be idle because the operating system was unable to divide the processes into 128 pieces of work. In this situation,scalability is limiting throughput. So remember that while scalability and queuing are related, they are distinct.

*6. Conclusion:*
Apart from forecasting, its very interesting to observe how hardware vendors, operating system vendors, database vendors, application vendors, and even businesses continue to find ways to increase parallelism, thereby reducing the throughput-limiting scalability effects. Because in our hearts, even if we are not technically focused, we know that increasing parallelism can increase throughput. And increasing throughput can ultimately mean increased

profits for the business IT was created to serve in the first place!

## No comments:

## Post a Comment