Data is today’s gold, representing huge potential value for businesses. By analyzing lots of data, you can discover patterns that can help you introduce efficiency, make smarter decisions, innovate products, and reduce operational liabilities like cost and risk. By using hidden data, you can transform your business or disrupt an industry by using available data to solve a painful problem.
Organizations use their data to increase revenue and better serve their customers, according to Experian’s 2017 Global Data Management Benchmark Report. If you’re wondering how to use or improve the data you have to benefit your business, you’re not alone.
Yael Garten, LinkedIn’s director of data science, said, “People often ask me, ‘When is the right time to start thinking about data?’ It’s never too early.” It might inform the data you’re collecting, the decisions you’re making, and how you’re setting up your organization, she said.
Benefits of Data
When you gather data, structure it for meaningful use, and make it easily available, here are a few of the benefits you can realize:
- Get answers quickly so you can make decisions and solve problems faster and smarter.
- Break down silos among internal departments.
- Discover hidden opportunities for efficiency and cost savings.
- Identify trends that could impact your market or competitive advantage.
- Better understand your customers to sharpen your sales and marketing programs.
- Innovate new products.
- Accelerate research and product development.
- Recruit great-fit talent faster and more successfully.
But there’s a catch. To achieve these outcomes, data must be uncovered, clean, and structured. Even in today’s data-rich environment, getting clean data is a challenge because it requires sharing information accurately across devices and formats that are not readily integrated. Most systems also require the use of manual data entry, which can introduce human error when the process isn’t managed closely.
The following are three kinds of data that do more harm than good.
1. Dirty Data
In IoT, dirty data is particularly costly. James Branigan, founder of Bright Wolf, developed a software platform that centralizes data collection and management for millions of devices, such as temperature sensors on refrigerated trucks, so enterprise and military IoT systems can dispatch just-in-time resources to optimize operations and resources.
“Dirty data is well-formed data reported by your devices that is invalid in some way, so it doesn’t immediately get flagged as garbage,” said Branigan. Many heavy-machinery companies use IoT for predictive maintenance – dispatching a service technician automatically when a critical part must be replaced on a fleet vehicle based on total mileage, for example.
“With these automated systems,” he said, “dirty data can cause you real harm – where you’re starting to incur real economic cost because these automated actions are being kicked off by data that’s not valid.”
2. Dark Data
The stakes are just as high for companies that are looking to mine the gold within their own data, most of which is as simple as information about supply chains, sales, customers, and services. Just 18% of U.S. organizations have an advanced, or optimized, level of data quality, says the Experian study.
That lack of optimization in data collection and management is why most businesses have dark data – or data they collect, process, and store as part of day-to-day business activities but are not using for any other reason.
This disconnect gave rise to the chief data scientist role and IT services firm Cognizant’s naming “data detective” one of the 21 occupations of the future. The person in this role, says Cognizant, would “generate meaningful business answers and recommendations from the investigation of data” generated from multiple end points.
3. Unstructured Data
Sometimes, data is available but isn’t prepared for use. It must be enriched in some way to make it compatible with the system that must consume the data.
For example, the hundreds of hours of video captured by autonomous vehicles must be broken down, often frame by frame, to prepare the data that machine learning algorithms will use to “teach” the autonomous system to perform functions such as recognizing objects like trucks or street signs. That process transforms unstructured data into structured data that can be used to build machine learning models.
The Bottom Line
Businesses across a wide range of industries from human resources and public safety to global health and economic development are looking at ways data can improve their services, communication, and outcomes. Now is a great time to think about the data you have and how you might use it to improve your customer experience, generate revenue, work smarter and more efficiently, solve a painful problem for your business, or disrupt an entire industry.
The next article in this series will explore the extent to which we trust data, dirty data and some of its causes, and common data quality issues.
This article originally appeared in IoT for All on December 18, 2017. It is the first in a series of three articles about dirty data.
Data Science Dirty Data AI & Machine Learning Data Cleansing