Data is something business journalists need to use, whether it is statistics about unemployment or information about company revenues. Sometimes data can be difficult to find and even trickier to organize. Here are some useful tools that business journalists can use to save time while working with data.
Searching for data
Have you ever looked for certain data on a search engine but could not find it? Well, you are not alone. CMSWire wrote that people have been having trouble finding datasets because they are not formatted in a way that search engines could easily recognize. Google may have a solution.
Google Dataset Search is a search engine from Google that was launched in 2018. It works similarly to Google Scholar and lets you find datasets in publishers websites, digital libraries or personal web pages.
Business journalists could use this tool to find different datasets more easily than just using the regular search engines. For example, if you put “Apple revenue data” in Google Dataset Search you will get links to “Apple’s revenue worldwide 2004-2019” or “Apple’s net income.” All the links are from the companies or government agencies that collect and analyze the data.
According to Google, the search engine basically filters all the other sources apart from those who provide data. For that purpose, Google has developed guidelines for dataset providers to describe their data in such a way that a search engine understands the website content better. It is important to indicate who created the data, when it was published, how it was collected and so on.
Business journalists can also use a website called Sqoop which offers a database of companies, SEC filings, federal court records and so on. Sqoop allows its users to set up alerts and be notified when new documents are uploaded. According to their website, journalists from media organizations such as Bloomberg, CNBC and IRE use Sqoop.
OpenCorporates is another website that offers data on companies that comes from primary public sources. OpenCorporate is not free, however. The monthly or yearly price depends on the amount of data you want to access.
What if you found the data that you were looking for, but it is a PDF with several tables full of numbers. There are a couple of tools that business journalists can use.
Tabula is an open-source software “created by journalists, for journalists.” It can extract the data into an Excel spreadsheet or a CSV and it works on Windows as well as on Mac systems. Tabula’s website says that it is used “to power investigative reporting at news organizations of all sizes, including ProPublica, The Times of London, Foreign Policy, La Nacion, The New York Times and St. Paul (MN) Pioneer Press.”
Datawrapper explains in an easy way how to use Tabula. First you open the software, import the data and mark where the dataset or a table is in the document. Then click “preview & export extracted data.” Datawrapper suggests previewing the data, because sometimes part of the characters are missing and it affects the entire dataset. Another thing to note is that Tabula won’t work with scanned images, only with PDFs that were created from electronic text. But it is a very useful tool for those journalists, who are working with sensitive data and do not want to upload it on online PDF converters.
In case you do not want to extract data from PDF but want to just find a date, regular search won’t be useful unless you know what date you are looking for. Document Cloud is an online platform that can analyze the document, find all the dates and put them on a timeline. It can also provide a list of entities and people mentioned.
This is a small portion of those tools that business journalists can use on every stage of reporting. There are many more websites or applications that can make a journalist’s life easier by saving time and energy.