Jul 02, 2015

Predictive modeling does not lie solely in the domain of Big Data Analytics or Data Science. I am sure that there are a few “data scientists” who think they invented predictive modeling. However, predictive modeling has existed for a while and at least since World War II. In simple terms, a predictive model is a model with some predictive power. I will elaborate on this later.

I have been building predictive models since 1990. Doing the math, 2015 – 1990 = 25 years, I have been engaged in the predictive modeling business longer that data science has been around. My first book on the subject, “Fundamentals of Combat Modeling (2007), predates the “Data Science” of 2009 (see below).

It is really a trick question. The term was first used in 1997 by C. F. Jeff Wu. In his inaugural lecture for the H. C. Carver Chair in Statistics at the University of Michigan, Professor Wu (currently at the Georgia Institute of Technology) calls for statistics to be renamed data science and for Statisticians to be renamed Data Scientists. That idea did not land on solid ground, but the topic reemerged in 2001 when William S. Cleveland published “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” But it was really not until 2009 that data science gained any significant following and that is also the year that Troy Sadkowsky created the data scientists group on LinkedIn as a companion to his website, datascientists.com (which later became datascientists.net). [1]

It is not a field of statistics! Yes, we do predictive modeling in statistics, but it is really a multidisciplinary field and is based more in mathematics than in other fields. Now, if you consult the most authoritative source of factual information available to the world, *Wikipedia*, you will find an incorrect view of predictive modeling (of course, I do not believe what I said about Wikipedia). It was formed by people with too much time on their hands and too little exposure to other disciplines, such as physics and mathematics.

Predictive modeling may have begun as early as World War II in the planning of Operation Overlord, the Normandy Invasion, but was certainly used in determining air defenses and bombing raid sizes (it may have appeared as early as 1840 [2]). Now, this is not an article about the history of operations research, so suffice it to say that the modern field of operational research arose during World War II. In the World War II era, operational research was defined as “a scientific method of providing executive departments with a quantitative basis for decisions regarding the operations under their control.”[3]

The answer is easy: a model with some predictive power. I say that with caution and use the word “some” because more often than not, decision makers think that these model are absolute. Of course, they become very disappointed when the predictions do not occur as predicted. Rather than expand on my simplistic definition, I think some examples may help.

The taxonomy of predictive models represented here is neither exhaustive nor exclusive. In other words, there are other ways to classify predictive models, but here are some:

This kind of model is a statistical model based on time series data. It uses “smoothing” techniques to account for things like seasonality in predicting or forecasting what may happen in the near future. These models are based on time-series data.

Time series models are technically regression models, but machine learning algorithms, like auto neural networks, have been employed recently in Time Series Analysis. Here I am referring to logistic regression models used in propensity modeling and other regression models like linear regression models, robust regression models, etc. These models are based on data.

hese models are based on physical phenomena. They include 6-DoF (Degrees of Freedom) flight models, space flight models, missile models and combat attrition models (based on physical properties of munitions and equipment).

These include auto neural networks (ANN), support vector machines, classification trees, random forests, etc. These are based on data, but unlike statistical models they “learn” from the data.

These are forecasting models based on data, but the amount of data, the short interval of prediction windows and the physical phenomena involved make them much different that statistical forecasting models

These are usually restricted to continuous time models based on differential equations or estimated using difference equations. They are often used to model very precise processes like the dynamics solid fuel rockets or to approximate physical phenomena in the absence of actual data, like attrition coefficients approximation or direct fire effects in combat models.

The first two examples, Time Series and Regression models, are statistical models. However, I list it separately because many do not realize that statistical models are mathematical models based on mathematical statistics. Things like means and standard deviations are statistical moments, derived from mathematical moment generating functions. Every statistic in Statistics is based on a mathematical function.

- Press, G. “A Very Short History Of Data Science”, Forbes, May 28, 2013 @ 7:09 AM, Retrieved 05-29-2015.
- W. Bridgman, The Logic of Modern Physics, The MacMillan Company, New York, 1927.
- Operational Research in the British Army 1939–1945, October 1947, Report C67/3/4/48, UK National Archives file WO291/1301. Quoted on the dust-jacket of: Morse, Philip M, and Kimball, George E, Methods of Operations Research, 1st Edition Revised, pub MIT Press & J Wiley, 5th printing, 1954

Article written by Jeffrey Strickland

Want more? For Job Seekers | For Employers | For Contributors

Share Article: