(Read Part 1, Part 2, Part 3, Part 4, Part 5 and Part 6 of this Stepping Stones series.)
To many, the subject of Regression might be somewhat intimidating or almost mystical. Let’s discuss the basics of how regression works.
Regression is a statistical technique for predicting a value or behaviour based on previous values or behaviours.
This helps inform all kinds of decisions from whether more computing power is needed (or not) to what stocks to buy (or not!).
In practical terms, it is used to develop a line through data observation points for a ‘Best Fit’.
What is a ‘Best Fit’? It is a line that minimizes the distance of the observations from the line calculated. The technique utilised uses the least squares of the variations of the actual observations from the line calculated. What this does is calculate the square of each distance of the actual to the line and attempts to generate the line that has the least value when the squares of all the points are summed.
The image that this looks like in practice is something like this:
For the moment, let’s say you have an amount of money in your pocket. If that amount is made up of 5 pence or cent coins, the amount you have is based on how many of those coins you have. If you have 30 of these coins, you will have 1.50 pounds or dollars (30 X 0.05 = 1.50). If this were extended to have a mixture of coins such as .01, .05, .10 and .50, there would be so many of each. The solution would be in the form of Multiple Regression.
In earlier articles, “Regression Models – Data Analysis Friend Or Foe?” Part 1 and Part 2, the concept was discussed that data in the raw state does not necessarily provide a complete or obvious explanation of what it contains. The discussion of the use of regression models can be used as a key to unlock far more meaningful information.
We can take an example from computer resources capacity planning. The aim is to apportion the amount of uncaptured processing time of the CPU (Central Processing Unit or the controlling unit of a computer) to various component usage elements, such as batch and timesharing.
The benefit of knowing the rate of captured time for each is beneficial in numerous ways. These include more effective projection of growth in CPU usage for capacity reasons and being able to more effectively bill for the CPU time actually used. In the example provided in that article, the projections for CPU were used with three sub-workloads per batch. The sub-workloads were obtained using a second model.
The solution results calculated in one Regression Model are used as input for the next level Regression and so forth. This is called Stepwise Regression. This regression is performed in multiple steps to obtain more detailed level of information.
After that more detailed discussions and examples of how the regressions might best be used, along with a number of examples of how they can be the key to a wealth of information that, if utilised, can save immense effort, time, money and misdirection.
Checking some questionnaires that had just been filled in, a census clerk was amazed to note that one of them contained figures 121 and 125 in the spaces for “Age of Mother, If Living” and “Age of Father, if Living.” “Surely your parents can’t be as old as this?” asked the incredulous clerk. “Well no,” was the answer, “but they would be IF LIVING!” ― Gary Ramseyer
Article written by Dr. Joseph Trevaskis
Want more? For Job Seekers | For Employers | For Contributors