Big data has always been there and we have a long history together. Data is all around us and always has been but now we are refining how we capture and read it. In addition, technology allows us to store a relatively standardized data structure which facilitates and accelerates the treatment of the data being captured every day.
The challenge is knowing how to "mix" the data that we have access to so we can draw conclusions. It may be similar to how a chef prepares a dish.
A chef has access to a myriad of ingredients but does not use them all at the same time. They only choose ingredients that are useful and that influence other food either because they enhance flavor or intensity of another food or influence certain foods over others. After this period of selection, they proceed to the preparation, cook and create it into a final product. It's the quantity of ingredients rather than the mere coupling between them that impact the result.
When big data occurs equally, we have access to almost everything, but we should think big and make small, calculated decisions.
The first step is done with intuition. You should select the sets of data that you consider to be the influencers of your study's objective.
Second, analyze how the selected data sets influence each other and the correlation between them. Although not to indicate causality, it tells us how it behaves in relation to one another and the level of impact. Two metrics that have no related behavior will unlikely have a dependency.
Third, we need to evaluate the dependence of the data. A simple comparison is a store owner saying, "When there is bad weather, I have fewer visits to my store." The variables of weather and visits to a store have a certain linear relationship. But, are visits and weather dependent and to what extent? This approach will result in another variable, the covariance. Covariance shows the dependence between two random metrics and how they change together.
A step by step process has now been created to build a model that can deliver a final result and establish whether a hypothesis, or intuition, is true or not. You've chosen your data and have found that they are related but we can take it to another level. It is assumed that we want to know what influences the achievement of our goals and by how much. That is, with no change in the variable ‘x’, how will variables affect the targeted outcome? Is it worth the effort for the result?
To get this information, we will have to dip into the regression from which we get an equation with the most value. On one side of the equation we have the goal to achieve. We then move across the related set of variables to the objective measurement and influence, then decide if it is worth a certain variable influence to achieve the goal.
We have access to Big Data empowering us to make the next right decision and to accomplish the targeted outcome all while evaluating its impact and importance. Don't forget your coat and bon appétit.