The Stepping Stones of Regression Modeling - Part 4

(Read Part 1, Part 2, Part 3, Part 4, Part 5 and Part 6 of this Stepping Stones series.)

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." ― John Tukey

As mentioned in part 3 of this series, there is most likely a mixture of ‘usage types’ on any system.  For example, there are users who mostly access business functions like editing, moving data, updating their calendar and performing simple calculations. This activity will not cause an intensive impact and the usage over the logon period will be somewhat low.

On the other hand, there are users who might be executing programs and transactions that are heavily computational and will consume higher levels of CPU. Of course there will be those who do varying levels of each of these resulting in a mixture of usage profiles.

Any mixture of user profiles would have no impact CPU if they remained constant. But in reality, new users get added, work activity changes, systems and programs get added, removed and modified which all cause an impact on system fluctuation.

Here is an example of the magnitude of change to CPU due to varying uses of a computer:

Most mainframe or large scale computer manufacturers have detailed information about how fast their computers operate different instructions. Essentially, a simple instruction such as an addition is completed faster than an instruction that is more complex, such as performing data manipulation. One particular vendor’s own instruction profiles for speed of execution of individual commands observed a variance of 30 fold. In other words, if the most basic instruction took 1 unit of time to execute, the most complex instruction would take 30 units of time to execute. Basically, if only the most basic executing instructions were used, the CPU would execute 30 times faster than if only the most complex instructions were used. Being able to identify the profiles will show how much CPU power is required and which CPU and configuration will be most effective for the planned workloads.

A stepping stone is a stone that helps you get across a river. A Step-wise Multiple Regression model can be such a stepping stone in this configuration. Let us say there are basically five levels of user profiles. They will be called L (Low), ML (Medium Low), M (Medium), MH (Medium High) and H (High).

The second level step equation will be:

Total CPU (Timesharing) = a + bL + cML + dM + eMH + fH + u


40% (example above) = a + bL + cML + dM + eMH + fH + u


a = the intercept

b = slope of Low

c = slope of Medium low

d = slope of Medium

e = slope of Medium High

f = slope of High

u = regression residual

With the use of Multiple Regression, the mixture percentage by type of user can be determined.  Regression can also be used to determine the CPU usage for each user type.

Let us look at some variations of this model…

In the original example quoted above, the 40% real CPU time for time-sharing is grown at the 50% rate as indicated. Let’s say that the 50% growth projection was based on a 50% increase in the number of users from 1000 to 1500. If the growth in profiles was uniform at the 50% rate, the new real CPU time of 60% would be the result as seen in the table below:

Usage Profile % CPU at level # Users % Real CPU New # Users New % Real CPU
Low 5 50 0.40 75 0.60
Medium Low 25 250 3.60 375 5.40
Medium 40 400 16.00 600 24.00
Medium High 25 250 15.00 375 22.50
High 5 50 5.00 75 7.50
Total 100 1000 40.00 1500 60.00

What if the growth in all user types was not uniform, but only in the Low usage category?

The 40% real CPU time for timesharing would now grow to a real new CPU time of 44% as demonstrated in the table following:

Usage Profile % CPU at level # Users % Real CPU Varied # Users 1 Varied % Real CPU 1
Low 5 50 0.40 550 4.40
Medium Low 25 250 3.60 250 3.60
Medium 40 400 16.00 400 16.00
Medium High 25 250 15.00 250 15.00
High 5 50 5.00 50 5.00
Total 100 1000 40.00 1500 44.00

Now let’s assume that all the growth was in the High usage category.  The 40% real CPU time for timesharing would now grow to a real new CPU time of 90% as demonstrated in this table:

Usage Profile % CPU at level # Users % Real CPU Varied # Users 2 Varied % Real CPU 2
Low 5 50 0.40 50 0.40
Medium Low 25 250 3.60 250 3.60
Medium 40 400 16.00 400 16.00
Medium High 25 250 15.00 250 15.00
High 5 50 5.00 550 55.00
Total 100 1000 40.00 1500 90.00

Similar variances could exist and be identified by additional Step-wise Regressions for the Batch and any other processing. The Regression for the Batch workload itself would be the third step and additional as is needed. A subsequent level calculation might be application use within user profile. The difference can be crucial in the forecast of CPU requirements. Diving below the initial step and beyond can definitely yield pearls of crucial and highly valuable information.

In the next part of this series, some insights into other uses of these techniques will be discussed.

Q: Did you hear about the statistician who invented a device to measure the weight of trees?

A: It's referred to as the log scale.

Article written by Dr. Joseph Trevaskis
Want more? For Job Seekers | For Employers | For Influencers