Data Visualization - The Rosetta Stone of Data Science

Data Visualization - The Rosetta Stone of Data Science

With the growing interest in Big Data, a closely related field - Data visualization - is also witnessing increased attention and interest. Visualization is often perceived as the required interface that translates complex analytical models to insights that end-users understand. Feature-rich interactive dashboards are experiencing increased adoption and traction in the corporate sector because these tools make the outcomes of complex analyses easily accessible and palatable to business users, who rely on the outcomes of statistical analysis, rather than on the analysis itself.

To provide an example, a business user is interested in such broader topics as customer segmentation, demand prediction or micro targeting. Sophisticated algorithms and analytical models, operating on the available data, produce insights that potentially influence business outcomes. Visualization abstracts the complexities and delivers insights the business user appreciates and understands. Correspondingly, visualization is evolving into the “Rosetta Stone” of Data Science.

Historically, data visualization has always been a part of Data science and analytics projects. Diverse visualization tools and graphical models have often been employed during the various stages of the data analysis activity chain.

During the initial, exploratory phase of an analytics project, a data scientist uses visualization models and graphs to gain a quick overview of the data and identify broader patterns or underlying characteristics. The visualizations employed by the data scientist during this stage allow the analyst to easily connect multiple data sources, correlate among various datasets, quickly identify the outliers or average values and identify key influencers (variables) for further analysis. As the person who consumes these visualizations is also usually the data scientist performing the actual analysis, much rigor in data presentation or formatting is usually not necessary.

At the later stages, however, the data scientist may leverage other visualization techniques to cogently present the data in a form that business users appreciate; these users rely on the outcomes of the analyses and visuals for decision-making, but may not necessarily want to concern themselves with the analysis per se. At this stage, data visualizations are used as means to best communicate the insights and results gleaned from analysis in a format that maps these insights to actionable business outcomes.

It could, therefore, be inferred that visualization plays a central part in data science. Historically, visualization techniques were restricted to simple 2-dimensional graphs that presented data from a limited number of sources and analyzed through simple statistical models.

However, today's demands on extracting intelligence from disparate datasets and near real-time data processing, the time window available for analysis is getting smaller. This requires that companies also empower the business users with the tools to derive insights meaningful to them from the data available at their disposal.

Today, it is necessity to present the data in such ways as geo-visualization or heat maps than historically required. Often, the visualization tools are expected to offer interactive functionalities that allow the end-user to manipulate or filter the data or to plug additional data sources as required.

This means that analytical software vendors have to abstract complex statistical tools and workflows, while retaining the rigor of algorithms and mathematical models. Correspondingly, they have to support a variety of data formats as well as offer more tooling flexibility, functional capabilities and visualization repertoire than was historically sufficient. Therefore, many analytics software vendors match these expectations by embedding and abstracting complex data analysis and visualization workflows in their products.

Visualizations, especially in Big Data, are becoming increasingly complex. The growing amount of unstructured data constitutes a major challenge for the presentation. For large datasets, it is easier to lose track of the meaning of the visualization.

The question that a data scientist should ask is not just, "what I want to say," but "Do I have to say anything at all"? 

The wide range of visualization tools and forms of representation offers the data scientist many opportunities to derive a number of insights from a data set. This, however, can easily be snatched out of context and misunderstood because people believe the pictures very quickly. A data scientist, therefore, should be conscious of his/her responsibility and not abuse it.

As a consumer of insights, one should always explore the data sources, quality of available data, analysis performed and the relationships that led to the insights presented.

But one thing is clear: visualizations are becoming more and more important in the daily lives of many companies. Innovations in data visualization will continue to grow and it makes sense to closely watch this exciting space.

Article written by Mithun Sridharan
Want more? For Job Seekers | For Employers | For Contributors