Insights

Regression Models - Data Analysis Friend or Foe - Part 2

Regression Models - Data Analysis Friend or Foe - Part 2

"To develop a complete mind: Study the science of art; ~ Study the art of science. Learn how to see. Realize that everything connects to everything else.” ― Leonardo Da Vinci

Part 1 of this two-part series described how data may not contain complete information. This part will explore how regression can be used to provide answers for what is missing.

A regression model can be designed to determine the amount of apportioned CPU time to determine the actual total time used by the batch jobs. The model will be based on the simple concept of: total CPU time = CPU time recorded for the job, where X is a factor to adjust (multiply) the raw job CPU time by to obtain the true time consumed.

The aim of the regression model will be to calculate and provide the value of the factor you must multiply the CPU time recorded by the job to obtain the correct time it used.

This is slightly more complex. Different types of jobs will use more unapportioned CPU time than others due to the types of operations being performed. Typically, the varying types of jobs will be set up to execute in a class structure or particular initiators within the operating environment. This permits setting performance and usage characteristics and also breaking the separating the values for multiple regressions similar to the one just mentioned above or into a slightly more complex form of model.

The form for three types of jobs would be more like Total CPU time = type A CPU time + type B CPU time + type C CPU time. Again, the regression model will provide, in this case, the factors you must multiply each type by to obtain the correct times they used.

Taking this just a little further, what if there was batch job processing mixed with another type workload, a transaction processing system, resulting in an interactive customer inquiry system. The new workload is one which the operating system knows even less about precisely which transaction caused what usage. In this case, the regression could be required to be performed in a step-wise or phased approach. In this case the design is multiple levels of regression models.

The first level would be that of type of workloads like Total CPU time = Batch CPU time + Transaction Processing CPU time.

After obtaining the factors from the regression model for the Batch and Transaction Processing CPU times, a further model is needed. The next level would be a partial model of the form corrected Batch CPU time (factor times Batch CPU time) = type A CPU time + type B CPU time + type C CPU time. This will yield the factors for the various types. A further model would be set up for each workload type category.

So why go through this?

Let’s look at an example of how this can be used. Say batch recorded time is equal to 40% of the computer time and timesharing is 10%. If you expect batch to grow at 25% and timesharing to grow at 50%, total usage could be estimated at 65% - (40% x 1.25) + (10% x 1.50). If we look at the values with unapportioned time allocated back, the picture could change significantly. If batch has unapportioned time at a rate of 10% and timesharing has unapportioned time at a rate of 300%, the results change immensely. The starting number for batch would be 44% (40% x 1.1) and for timesharing 40% (10% x 400). The real CPU being consumed is 84%. The new total value after growth would be 115% - that is
(44% x 1.25) + (40% x 1.5)). This means that instead of having a total CPU usage of 65% the real usage would turn out to be 115%; in other words more than is available.

The process may seem intimidating, but many software products like SAS, Excel, SPSS and others have routines that perform these regressions with minimal input required.

The input that will be needed might be the separations for workloads, CPU time or other metric. The result of the factors provided will provide the correct values for each level and type within the respective levels.

The discussion here has focused on CPU time. Properly used, tools like regression and other quantitative techniques can prove to be very powerful and useful in exposing details that are hidden within virtually any kind of raw data. Start by keeping each model simple and think it through. Break the model down stepwise where appropriate and sooner than you might think, regression will be a powerful ally.

I asked an economist for her phone number... and she gave me an estimate.

Article written by Dr. Joseph Trevaskis
Want more? For Job Seekers | For Employers | For Contributors