It may surprise you to know that the foundation was being laid for computing and artificial intelligence (AI) as early as the 1940s. By the early 1990s, we already had scientific papers showing that neural networks are universal function approximators.
That’s a fancy way of saying that for a long time, neural networks have had the potential to learn any relationship between variables - such as the relationship between pixels and what’s in a photograph (image recognition), the words in a phrase and its meaning (natural language processing) or the relationship between past, present, and future (time-series forecasting).
But for all that potential and theoretical knowledge, why is now the right time for AI to emerge as an essential force for innovation? There are three profound changes, long in the making, that have brought us to this point: advances in computing power, improvements in algorithms, and an abundance of data.
Indeed, the AI spring has arrived. And those who are most prepared will realize its benefits. Rob Thomas, IBM’s general manager of data and Watson AI, said in Forbes: “These days, organizations with no AI strategy are like businesses in 2000 that had no internet strategy, or those in 2010 that had no mobile strategy.”
Data Makes or Breaks Machine Learning
To have an AI strategy, you need a data strategy. That’s important because useful data is very hard to come by, and the difference between quality data and bad or insufficient data can be the difference between an astonishing result and a complete waste of time and resources.
This is the reason we see partnering with CloudFactory as such an important endeavor for us. Yes; there is a lot of data available out there but for all the talk about big data, most of it has serious problems. CloudFactory has a decade of experience with quality data labeling for machine learning and for core data processes. CloudFactory’s team understands scaling training data with quality requires a strategic combination of people, technology, and processes.
Quality Data Increases Signal and Reduces Noise
People often ask me, “What is the accuracy you can get working on this data project,” and I always say, “You cannot know before you look at the dataset.” Your accuracy depends on your model, computational power, and data. And often, data is the biggest challenge when you are beginning an AI project. You can keep trying different models and using more computational power from the coziness of your lab, but getting better data can be much harder.
The reason you can’t predict accuracy is because every random variable is composed of signal and noise. If the signal is weak and the noise strong, the variable is unpredictable by nature. An example is variables in finance: no matter how much data or how good your model is, there is only so much you can learn about what is driving it.
If the signal is strong and the noise weak, you eventually can reach high accuracy because the relationship between the variables you’re looking for can be picked up. An example is image recognition, where from most pictures you can eventually discover what’s in it, even though recognizing something just by looking at pixel values is incredibly hard.
And this is why quality data matters. It increases the signal and reduces the noise, allowing for better performance as well as cheaper and faster training.
A Great Idea With Data Problems Is Lost
A few months ago we met an analytics startup that was developing a product for a Fortune 500 retail company. They wanted to split products by types, using only the product description to determine the placement in the stores that would maximize profits.
The issue was, they only had a few dozen products labeled, and they wanted to create an algorithm to label tens of thousands. The startup asked us for help because the models they were using were not giving the expected results. We had to explain that the problem was not the science, the problem was the data.
Unfortunately, their project was over budget and past the delivery date, so it was scrapped. Another sad example of what happens when you don’t make up-front investments to understand your data requirements and how to achieve them. In this case, the lack of labeled data meant we couldn’t teach our AI algorithms how to classify the items. For a Fortune 500 retailer, such a missed opportunity could result in tens, or even hundreds of millions of dollars of opportunity cost.
The Potential and Benefits of Machine Learning
Today, many organizations see the incredible potential of machine learning. In a competitive global economic environment, not exploiting this potential may mean disaster, but embarking on the AI journey without doing all the homework can be just as bad. If the project is not structured correctly, there will be trouble.
The type of task the AI algorithm is meant to do will require different amounts of data and computing time, but the results can be quite worth it. An Accenture analysis indicates that between 2018 and 2022, banks that invest in AI and human-machine collaboration at the same rate as top-performing businesses could boost their revenue by an average of 34% and their employment levels by 14%.
We have seen machine learning result in 80% improved accuracy for demand planning, 27% increases in sales, 36% reductions in payroll, 18% reduction is logistics costs, and the list goes on and on.
The opportunities are definitely there but knowing how to reap them is the key to success. With improving machine learning models, increased computing resources, and higher quality data, the opportunities this technology can unlock are enormous. Not taking advantage of them could be ruinous.
Scientia’s mission is to use the frontier of academic knowledge to solve business problems. Learn more about Scientia and our solutions.
Data Labeling Workforce Strategy Data Partners AI & Machine Learning