This article is about Predictive Modeling. It explores the appropriateness of modeling in general and predictive modeling in particular, as well as examining some pitfalls. Modeling is the process of formulating and abstracting a representation of a real problem, based on simplifying assumptions. Thus, no model is an exact representation of reality. Said a different way, a model cannot fully represent a complex problem, but can provide some insight into the problem and assist decision makers with applying solutions. Inherent in this statement is that a predictive model is just one piece of a problem’s solution and in some instances is just a support tool, aiding the business decision maker, rather than driving them. Predictions are only as good as our ability to frame a problem, simplify it enough to apply analytic solution methods, acquire the appropriate data and estimate the duration of its usefulness.
It is interesting to me that there is a modeling tool called Crystal Ball, a spreadsheet application for predictive modeling. I have used it on occasion. However, even with the very best tools and techniques, predicting can be a perilous endeavor and much worse in some cases. Of course, I wish I had a dollar for every time my models were wrong. On the other hand, at least it is not like the blind leading the blind—I do have one good eye.
During Operation Desert Storm (back when I was a young Army officer), we used several predictive models for things like battle outcome, casualty rate and so on. Some of these yielded really bad predictions. The well-respected Concepts Evaluation Model (CEM) was a piston model which we were using to determine how our forces would push Iraqi units out of Kuwait. It was not very accurate. Another model, the much respected Extended Air Defense Simulation (EADSIM), was used to try to calculate the effectiveness of our daily airstrikes against targets. EADSIM was not calibrated for the kind of forces we would fight during Desert Storm and the results had to be tweaked a little bit every day until they could get the model to behave like reality.
I recently met with two different customers who wanted models. They both wanted propensity to purchase models. After discussing their requirements, it turned out they had data for the phenomena of interest but had not used the data to determine if their product was appealing to their customers. They had data on 7MM that they could use for this purpose, but they were focused on 1.7MM for which they did not have this data. They did not need a predictive model. Instead, they needed to experiment to understand their present situation (descriptive) and collect additional data if the information value was low.
The term “model” seems to be on the tip of everyone’s tongue, from market analysts through business unit managers to CEOs. “I have a problem and need a model” [to solve it], is a common phrase I hear. The term seems to represent anything that is data analytics related. Yet, models are only one analytic approach to problem solving. Not only are there situations where you do not need a model, there are also instances when you should not use a model.
Here are some things you should not try to model: death event, birth event, divorce event. Imagine you have predictive models for the death event and birth events. Suppose that a customer identified by a model is contacted by agent or representative.
“Sir, I understand you are going to die next May. Are all of your insurance needs being met?”
“Sir I understand that your wife is expecting a new baby in the next six months…”
These are events we should react to, but should not try to get ahead of. If you are selling life insurance, for example, focus on touches at birth date, rather than life events such as divorce. You will probably catch most events by contacting customers during their birth-month.
“Sir, we wanted to wish you a happy birthday and make sure that we are meeting all of your life insurance needs.”
You may have a model that informs you of a customer’s propensity to purchase a new mortgage. You might take the results and contact ten people with the top ‘propensity score’. You might find out that a third of them are renting and have no intention of getting a new home; hence, a new mortgage. Does that mean your model is wrong? Yes and no! I like to use this quote from the late George Box.
“All models are wrong, but some are useful.”
The Corps Battle Simulation (CBS) was being used for Desert Storm to determine the level of attrition that would occur to US and Iraqi forces. It overestimated US casualties by several orders of magnitude, because the assumptions behind these models (the simplifying assumptions) could not be applied to the kind of combat experienced in Desert Storm. The assumptions behind these models were based on a soviet-bloc force.
George spoke the truth – “for every predictive model I build has flaws!” The models are abstractions of reality, based upon simplifying assumptions. We could not accurately predict a divorce event, because there are too many factors that we do not have data on (or at least should not have), like whether a husband is having an affair, whether one spouse is abusive to the other and so on. A customer may have scored high in a new mortgage model because they are moving and have the profile of customers who may be in the market for a mortgage. These factors might include high income, older age, career field and so on. Predictions are not 100 percent accurate. In Figure 1, the model will give you false negatives and false positives. Hopefully, we have followed a modeling process that reduces these errors, but just by the nature of statistics errors will occur (Type I and Type II errors) even if the rest of our modeling process is perfect.
Where a model could be useful is where it is predicting responses that are greater than what we might know just by random selection. We might be picking up an additional 2% over random selection and that 2% could be good (statistically significant) where we have a very large customer (or prospect) base and it could be poor if we are talking about 2% of 100 customers (or prospects).
As modelers (or model project managers), we have to manage customer expectations up front and as consumers of models we have to know what to expect. A propensity model, for example, is going to take a large audience and pour it through a funnel with a narrow opening for customers with a high propensity for doing action X (e.g., purchasing). For more about this, see Gold Prospecting and Propensity Modeling.
With the myriad of modeling tools and techniques, people generally think they need a model to solve their business problems. There seems to be an epidemic for machine learning algorithms and buzz-words like supervised and unsupervised learning which are bouncing off corporate board rooms and marketing departments. We, the analysts, have to ensure that models are the appropriate analytic approach and that the ability to predict the future, crystal ball or not, has limitations.