Big problems with big data and AI

Erik Sherman

August 13, 2019

Share this article:

Big data and artificial intelligence are tremendously popular in business. Everybody is using “machine learning” or “deep learning” or “analytics” or “neural networks” or any of the buzzwords that pop up.

This is how companies will transform themselves for the future. At least, that’s what the tech industry and corporations in many other segments of the economy have said for years. In fact, although the term big data was supposedly coined in 2005, I did some checking and found mentions in 2001. At the turn of the decade, people were already beginning to discuss the possibilities.

Both big data and AI have done a lot of good, if you listen to proponents and those who have implemented it. And yet, there are some fundamental problems you should keep in mind in your reporting.

Irreproducible AI

The scientific community has come up against problems that also spell bad news for business. Researchers who use AI have trouble reproducing results, according to Science. Few studies share the code that made them possible, for a variety of reasons. Even if you have the code, it can run differently depending on the data you use to train the software.

The result one organization gets, including companies, got may not translate into another findings. Or, if that company changes the AI tools it uses for whatever reasons, suddenly what seemed to be a trend that would help revolutionize a business practice may no longer work. That introduces a lot of uncertainty. The results that companies trust may be curious factors of using a particular software package, or of the data used to train the system.

An example is the trouble that Amazon faced in using an AI recruiting tool. Because the software was trained with previous Amazon hiring decisions, it turned out to have a bias against resumes from women.

P-hacking

If you’ve studied statistics and its use in research, you may have come across the problem of p-hacking. The p-value is a measure of how statistically significant some results are, or how likely it is they could have occurred by accident.

P-hacking refers to the practice of digging into a body of data to turn up patterns. You comb through the data and look for what seems to be statistically significant. The problem is that the larger the body of data, the more likely that some patterns that are really a work of chance will appear. It you test enough data and enough possible hypotheses—in other words, potential patterns—you’re bound to find something.

Because companies may logically focus on their own data and keep looking for the patterns that might help them improve operations, they could well stumble across something that isn’t indicative of anything real. But, finding the pattern, the lightbulb goes off and executives say, here’s a magic tool for improvement.

That isn’t to say patterns aren’t real. They may be. The way you ultimately tell is now looking in other data for similar patterns. That requires the knowledge and discipline to insist on further examination. AI and big data can certainly produce good results. They can also send companies down a blind alley. To report on how companies use the tools and techniques, you’ll have to keep all this in mind and press hard to see whether companies are keeping their eyes open.

Artificial Intelligence

Author

Erik Sherman

Erik is an independent journalist and author who primarily covers business, economics, finance, technology, politics, and legal/regulatory, while elegantly expressing the complex and often incorporating data analysis.

More Like This...

A typewriter with the words "ai ethics" typed on paper

Latest in Technology

Two Minute Tips

Sign up now.
Get one Tuesday.

Every Tuesday we send out a quick-read email with tips for business journalism.

Subscribers also get access to the Tip archive.

Big problems with big data and AI

Erik Sherman

Irreproducible AI

P-hacking

Author

More Like This...

Journalists urged to embrace AI, carefully: SABEW panel highlights opportunities, risks and ethics in newsrooms

How AI has — and will continue to — change journalism

4 technology terms journalists need to know

Latest in Technology

Journalists urged to embrace AI, carefully: SABEW panel highlights opportunities, risks and ethics in newsrooms

Can Arizona become the next global semiconductor hub?

A new way to organize business investigations

Two Minute Tips

Sign up now.
Get one Tuesday.

Latest Articles

Betting on the future: The boom in prediction markets

How business journalists can level-up in the new year by “super serving” their audience

No credit, no problem: Auto lender Tricolor’s broken promises to vulnerable customers

Covering changing retail strategies amid economic uncertainty

Figuring out freelancing, from the first pitch to the fine print of contracts

Big problems with big data and AI

Erik Sherman

Irreproducible AI

P-hacking

Author

More Like This...

Latest in Technology

Two Minute Tips

Sign up now. Get one Tuesday.

Latest Articles

Search The Reynolds Center

Who We Are

Journalist Resources

Coverage Areas

The 2025 Barlett and Steele Awards are now open for submissions!

Submit your work in one of three categories. There are cash prizes for winners and never any entry fees!

Sign up now.
Get one Tuesday.