Jun 18, 2015

(Read Part 1, Part 2, Part 3, Part 4, Part 5 and Part 6 of this Stepping Stones series.)

This article is continued from Part 1 where we discussed Regression Modeling and the fact that Regression is a statistical technique for predicting a value or behaviour based on *previous* values or behaviours.

Herein the discussion will be focused on Multiple Regression, in particular, Step-wise Multiple Regression. For clarity, a Regression is based on one or more causal variables generating a change in another dependant variable. Basically, a simple regression may be looked at as the dependant variable being some function of the causal variable. In other words, the value of one variable is dependent upon another variable.

**The variable which has its value determined is known as the dependent variable (predicted variable) and the variable that determines the value of the dependent variable is known as the causal variable (deterministic variable).**

In order to form a foundation, it is best to give a brief description of what regression is. To start, a definition is “A statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). … Multiple Regression: Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + u

Where:

Y= the variable that we are trying to predict

X= the variable that we are using to predict Y

a= the intercept

b= the slope

u= the regression residual. …” – Investopedia.

For those not familiar with these terms:

- The
**intercept**is the point at which the line crosses (intersects with) the X axis (where X=0). - The
**slope**is a term for the angle at which the line rises or falls. As if one point was X=5 and Y=10, with a slope of 5, a change to X to be 6 would result in Y=15. This is an increase of 5 in Y for every increase of 1 in X. - The
**regression**residual is the amount of the equation that is not accounted for by the values determined. This is caused by random variation in the variables where there is not an exact cause and effect to the equation results. These random or noise variances can be caused by measurement error or other factors external to those included in the equation.

In this equation, the objective is to minimize the ‘u’ or regression residual. Some of the components may not always be active and as a result, if included in an extended model, will cause more possibility of error.

In the first part of this article we started looking at a CPU capacity example. A point that is very significant, is that it is critical that the data being solved for, is for the same time period; that is, the total CPU time, the batch CPU time and the timesharing CPU time.

If they are for differing times, the results will be non-representative and the correlation will be low. A benefit of Step-wise Multiple Regression is that as the results calculated in one Regression Model are used as input to the next level, it provides the potential for the use of data from differing times when more data might be available for the specific calculation to be carried out.

The answer is two-fold. First, the model equation becomes highly complex. The second is that the more detailed and fragmented the model becomes, the more difficult it is to obtain a sample size big enough to achieve adequate validation of the results.

As the number of variables (X) to predict the value for the equation (Y) increase, the more difficult the calculation and the more difficult it is to obtain an accurate equation. This is also complicated by the fact that it becomes far more difficult to obtain an adequate number of observations.

In subsequent parts of this series, examples of application and use will be presented.

*Numbers are like people; torture them enough and they'll tell you anything.*

Article written by Dr. Joseph Trevaskis

Want more? For Job Seekers | For Employers | For Contributors

Share Article: