Big Problems with Big Data and AI

by August 13, 2019
Both big data and AI have done a lot of good. And yet, there are some fundamental problems you should keep in mind in your reporting. (Credit: Pixabay user >Wynn Pointaux )

Big data and artificial intelligence are tremendously popular in business. Everybody is using “machine learning” or “deep learning” or “analytics” or “neural networks” or any of the buzzwords that pop up.

This is how companies will transform themselves for the future. At least, that’s what the tech industry and corporations in many other segments of the economy have said for years. In fact, although the term big data was supposedly coined in 2005, I did some checking and found mentions in 2001. At the turn of the decade, people were already beginning to discuss the possibilities.

Both big data and AI have done a lot of good, if you listen to proponents and those who have implemented it. And yet, there are some fundamental problems you should keep in mind in your reporting.

Irreproducible AI

The scientific community has come up against problems that also spell bad news for business. Researchers who use AI have trouble reproducing results, according to Science. Few studies share the code that made them possible, for a variety of reasons. Even if you have the code, it can run differently depending on the data you use to train the software.

The result one organization gets, including companies, got may not translate into another findings. Or, if that company changes the AI tools it uses for whatever reasons, suddenly what seemed to be a trend that would help revolutionize a business practice may no longer work. That introduces a lot of uncertainty. The results that companies trust may be curious factors of using a particular software package, or of the data used to train the system.

An example is the trouble that Amazon faced in using an AI recruiting tool. Because the software was trained with previous Amazon hiring decisions, it turned out to have a bias against resumes from women.

P-hacking

If you’ve studied statistics and its use in research, you may have come across the problem  of p-hacking. The p-value is a measure of how statistically significant some results are, or how likely it is they could have occurred by accident.

P-hacking refers to the practice of digging into a body of data to turn up patterns. You comb through the data and look for what seems to be statistically significant. The problem is that the larger the body of data, the more likely that some patterns that are really a work of chance will appear. It you test enough data and enough possible hypotheses—in other words, potential patterns—you’re bound to find something.

Because companies may logically focus on their own data and keep looking for the patterns that might help them improve operations, they could well stumble across something that isn’t indicative of anything real. But, finding the pattern, the lightbulb goes off and executives say, here’s a magic tool for improvement.

That isn’t to say patterns aren’t real. They may be. The way you ultimately tell is now looking in other data for similar patterns. That requires the knowledge and discipline to insist on further examination.AI and big data can certainly produce good results. They can also send companies down a blind alley. To report on how companies use the tools and techniques, you’ll have to keep all this in mind and press hard to see whether companies are keeping their eyes open.