“Data journalism is a trendy thing… But it is not a new thing”
– Alex Howard, fellow at the Tow Center for Digital Journalism at Columbia Journalism School
In one talk by Alex Howard, he summed up the history behind data journalism and the best practices that data journalists should follow in order to be ethical and successful.
Alex first offered a handy definition of what data journalism is: “Gathering, cleaning, organising, analysing, visualising and publishing data to support the creation of acts of journalism.”
The shorter version of this, according to Alex, is: “Treating data as a source, just like you would do a person.”
According to Alex, data journalism first came into existence through the work of John Snow – a physician born in 1813. John Snow mapped and compared cholera outbreaks with the location of wells:
Thus, “newspapers have been using data for centuries… It’s not a new idea to have data as part of the media or part of the sourcing.”
However, Alex pointed out that there are many reasons why we might think of “data journalism” as something that is entirely new. This is because of new tools and context, such as online spreadsheets and wikis, data visualisation tools, code sharing and open source frameworks. But in this field, what are the best practices that data journalists should follow?
The best practices for data journalists
1) Report it out
“If you see things in the data, pick up the phone”. An example that Alex Howard uses for this practice is WikiLeaks data: “You have a lot of great things that you can chart and map, but you also need to validate the data: what is the data telling you?”
This is a great reminder to data journalists that stories still need to be humanised.
2) Show something new
Journalists must ask why their story matters. An example by Alex of a news organisation creating a meaningful story is when NPR created a map of fire forecasts and you can check these against where you live, therefore making it relevant to their audience.
3) Tell a story
“People respond to narratives, not just to a spreadsheet. A lot of the audience just want to know, what is the story here? Why does it matter to me?”
As part of the report, Alex Howard interviewed Anthony DeBarros from USA Today who emphasised that storytelling still matters: “We use these tools to find and tell stories. We use them like we use a telephone. The story is still a thing”.
Alex Howard also pointed out that at some point in the future we will drop the “data” from “data journalism” – “it will just become journalism” – exactly like how we do not talk about “telephone journalism” or “email journalism”.
Another example that Alex uses is from The New York Times:
This was the most popular content on their website last year and was created by Josh Katz and Wilson Andrews. It is a quiz that uses data to locate where you were born in the US. This type of content is “sticky” and engaging, says Alex Howard.
4) Show your work, show your data:
“Wherever possible, people want to see the data so that they can look at it and check your work. Pro Publica has been a leader in this and it is a practice that other media organisations can and should follow,” according to Alex.
Alex also talked about data journalism start-ups not doing this. For instance, Alex looked at the RSS feed of approximately 290 stories on FiveThirtyEight and around 100 features on the site: only ten of those included the data.
“It enables the really critical audience to go and evaluate whether your methodology was accurate or not.”
5) Share your code
Although Alex admits that public media cannot always publicly share their code, he uses the example of NPR Tech who share a lot of their code on GitHub.
6) Consider ethics
As journalists we must consider ethical considerations such as: Is the data clean? Is the data representative? What biases might be hidden in the data? Was the data legally obtained? Does the data contain personally identifiable information?
Looking at how the data was collected is also really important:
- Who gathered the data? How?
- Was it clear how the data was going to be used?
- Can people opt-out of collection or usage?
- “Notice and consent” is not enough
7) Think about data analysis and numeracy
One of the consistent themes that Alex found is that it must get better. People need to be able to understand the basics. Examples include:
- N = ?
- Average vs Median
- Statistical significance?
- Correlation ! = Causation
- Regression to the mean
Alex picks out some examples where journalists are clearly not thinking about statistics:
This brilliant example of a visualisation gone wrong, published by Reuters, which received a lot of negative coverage – and understandably so. “It’s misleading – it’s easy to lie with data,” Alex said about it. Ampp3d fixed it for Reuters in this article and commented: “A pretty universal law of information design is that positive numbers go up, and negative numbers go down. You should only break that rule in the name of artistic expression if you’re very careful not to mislead the reader. Frankly we’d expect a lot better from Reuters.”
Alex also points out that there is a whole website dedicated to misleading visualisations – WtfViz.net – I’ll definitely be taking a look.
8) Present data with context, in context
Journalists must think about presenting the data in context. One example is from The New York Times who rushed to publish data about Medicare online. The data allowed you to look for a given doctor and see how much they were receiving from Medicare. However, without the context, it was very hard to know why they were receiving that certain amount.
ProPublica followed this up with a more in-depth investigation – thus using more context.
Read the full report
If you have any interest in data journalism, I would full recommend reading through Alex’s full paper on: “The Art and Science of Data-Driven Journalism”, which is available here.