The insatiable demand for data continues unabated. We want to gain deeper insights into market trends, customers, competitors and our business performance, but many companies are not making the progress they anticipated. And the promise of big data analytics remains largely out of their reach.
Why? Because most companies still don’t take a strategic approach to data integration. It’s laborious and time-consuming. It’s costly. And most cannot see the direct impact it has on driving business objectives while supporting risk management initiatives for governance, regulatory and compliance (GRC) requirements.
If anything, data integration has become more complex as the sources of data have exploded. Not only are companies collecting and retaining more data – multinationals have data in many countries that they struggle to integrate, manage and analyze. Moreover, companies are sharing more information with trading and supply chain partners than ever before.
Much of this data is beyond the structured transactional variety in conventional systems and databases. In fact, unstructured data – from spreadsheets and documents to Web pages and social shares – is growing exponentially faster. More companies are recognizing that this data represents a trove of knowledge that has largely gone untapped because they have been hidden in user and departmental data silos across the enterprise.
In the software-driven economy, people expect unfettered access to data 24/7. And they are increasingly accessing this data with a mobile device. As the pace of business accelerates, companies are under increasing pressure to ensure that the right users have access to the right data in the right format – at the point of decision.
The “3 Vs” of big data are often referred to volume, variety and velocity. However, I believe the true 3 Vs are validity, veracity and value. That’s because all of this data is of little use if it’s not integrated. Inadequate data governance has resulted in data sprawl, with incomplete or inaccurate data sets driving flawed assumptions and multiple versions of models that undermine data-driven decision-making. After all, bad data at the speed of light is still bad data.
As much of this data becomes localized, it is more difficult to manage. Equipping users with a desktop data visualization tool and calling it self-service BI/analytics often disappoints both IT and business managers. Users get bogged down trying to integrate data from different sources to prepare for analysis rather than gaining the hoped-for insights. Studies show that despite the panoply of newer technologies, enterprises typically spend up to 80% of their time in business intelligence projects preparing the data for analysis.
We also see these data silos exploited by cybercriminals. Sensitive information is exposed and/or stolen, leaving companies to face GRC violations, fines and reputational damage.
Part of what’s made data integration so cumbersome and costly is its data warehousing and extract, transform and load (ETL) processes. Moving data is always a challenge, and the old hand-coded cube methodologies that let IT determine the data sets users should be working with are outmoded.
Newer integration technologies that support data migration, app consolidation, data quality and profiling, and master and metadata management go beyond the traditional ETL functionality. These tools automate much of the cleansing, matching, error handling and performance monitoring – processes that IT teams often struggle with manually. They allow teams to implement a standardized approach to integrating diverse data sets, including those from SaaS applications and IaaS or PaaS clouds.
Data integration is not a one-size-fits-all approach. It’s important for IT teams to make sure they’re using the right tool for the job. For example, bulk processes may be effective for a modeler working with large data sets that lack update times. In contrast, data virtualization may be appropriate for high-availability latency-sensitive transactional systems such as high-frequency trading environments.
Modern data integration tools can handle batch projects or interoperate with real-time analytics applications. And newer tools allow this integration to occur in a data lake, eliminating the need to move the data. Some refer to this process as extract, load and transform (ELT).
For manageability, it’s important to keep the number of integration tools to a minimum. This will largely be a factor of user profiles, project criteria and the types of data they are working with. At the same time, it’s critical that these tools interoperate seamlessly to achieve the desired data efficiency.
Modern data integration enables IT to be more responsive to business users and strategic initiatives. These tools help IT ensure that the data users access are complete, current, consistent and accurate.
Additionally, modern data integration allows IT teams to manage data more effectively at reduced costs. They become more productive by spending less time on writing specialized scripts and more time on getting people the information they need – where and when they need it. It also makes it easier for data teams to collaborate with compliance and security teams to ensure policy adherence and resilience to in the event of a cyberattack.