In earlier parts of this article series, basic definitions and descriptions were presented for Regression Modeling. This part will continue with the description and turn toward Step-wise Regression.
As part of the results, in addition to the multiplication coefficients, there are factors generated such as the Residual Squared. R-Squared is a measure of how accurate the multiplication factors are taken to be. At the simplest level the R-Squared, or the coefficient of determination, is the confidence level. The closer to 1 it is, the better. 1 is taken to be 100%. Conversely, the closer the R-Squared is to 0 the lower the confidence.
For example, an R-Squared value of .95 might be taken as a 95% confidence level of the model results. The model coefficients that are calculated by the model account for 95% of the variance in the model. A coefficient is the number a value is multiplied by – say, 3X would have a 3 as its coefficient and the value would be 3 times X. In other words, if you multiply each of the variables known by its respective coefficient, the sum of the collective values would account for 95% of the projected value.
Another example is 0.50. This could be taken as basically random, the toss of a coin, heads or tails. As in my article titled “Regression Models - Data Analysis Friend or Foe?” the question posed and answered is this:
Let’s look at an example of how this can be used. Say batch recorded time is equal to 40% of the computer time and timesharing is 10%. If you expect batch to grow at 25% and timesharing to grow at 50%, total usage could be estimated at 65% - (40% x 1.25) + (10% x 1.50). If we look at the values with unapportioned time allocated back, the picture could change significantly.
If batch has unapportioned time at a rate of 10% and timesharing has an unapportioned time at a rate of 300%, the results change immensely. The starting number for batch would be 44% (40% x 1.1) and for timesharing 40% (10% x 400). The real CPU being consumed is 84%. The new total value after growth would be 115%, that is ((44% x 1.25) + (40% x 1.5)). This means that instead of having a total CPU usage of 65% the real usage would turn out to be 115%, in other words, more than is available.
So far we have been looking at a basic Multiple Regression. Moving on, we will now look at Step-wise Multiple Regression. This starts with an initial Multiple Regression model and builds from there.
A step breaks the total computation down into a number of parts or steps. The first step provides values for some elements or parts of the equation. These values are then used as input for a sub-part or a subsequent step to refine the values at lower, more detailed levels. A step takes an equation based on high level values at first and obtaining a high level set of values. In turn, these values are input into a secondary step and so forth and so on as needed to obtain the lowest level required.
In the timesharing example, the timesharing activity itself has a number of parts or segments. There are usually differing applications and differing types of users, some low usage and others high usage.
The example used timesharing as 10% recorded by the timesharing system and a calculated 300% rate of uncaptured time, or 40% real CPU usage. It would be very important to know what the varying types of usage are and if that distribution might change. Again using the figures used in this example, 40% usage with a growth of 50% would result in 60% utilisation at the end of the projection period.
In future parts of this article, an additional description of Step-wise Regression will be discussed plus some examples of this technique and uses of Step-wise Multiple Regression Models will be provided as seeds for fruitful thought.
Statistics play an important role in genetics. For instance, statistics prove that numbers of offspring is an inherited trait. If your parent didn't have any kids, odds are you won't either.