James Tozer @J_CD_T Producing data journalism for @TheEconomist. Daft opinions my own. Jun. 07, 2019 4 min read

THREAD: After months of data scraping and number crunching, this week we’ve published an analysis of whether @Google's news algorithm displays political bias. (1/25)  https://www.economist.com/graphic-detail/2019/06/08/google-rewards-reputable-reporting-not-left-wing-politics 

.@realDonaldTrump has often claimed that the search engine discriminates against right-leaning publications, because so many of the results for searches about "Trump" come from @nytimes and @CNN. (2/25)

To test whether he was right, we created a computer program that could collect the first page of results on Google's news tab (within its main search engine) for any keyword on any day. (3/25)  https://www.google.com/search?q=trump&tbm=nws 

To make sure that our results weren't being tailored to our personal internet profiles, our computer program used a browser with no history. We operated it from a server based in a swing district in Kansas. (4/25)

For every day in 2018, we scraped the first page of results on Google's news tab, for a selection of 31 keywords across a range of political, economic and newsy topics. That gave us a sample of 175,000 article links. (5/25)

We then picked a broad sample of 37 popular publications - stretching from @DailyKos and @thedailybeast on the far left to @BreitbartNews and InfoWars on the far right - and counted what percentage of these 175,000 links they got. (6/25)

To work out whether the results skewed left or right, we needed a measure of where each publication sits on the spectrum. We combined ratings from two amateur fact-checking websites,  http://AdFontesMedia.com  (@vlotero) and  http://MediaBiasFactCheck.com . (7/25)

It was clear that publications which rated as left-leaning got a higher share of Google results than right-leaning ones. (In the chart below, we scaled the share of the 175,000 links gained by these 37 publications to 100%, and stacked them from left to right.) (8/25)

The obvious explanation might seem to be that Google's algorithm is biased against right-leaning publications. Google denies this. And there might be some non-ideological reasons why @washingtonpost got three times as many links as @FoxNews. (9/25)

In fact, Google has a long list of criteria that it gives to its 10,000+ human "search quality evaluators", who rate websites according to their "expertise", "authoritativeness" and "trustworthiness", among other things. (10/25)  https://static.googleusercontent.com/media/www.google.com/en//insidesearch/howsearchworks/assets/searchqualityevaluatorguidelines.pdf 

Among the measures that it asks its raters to consider are Pulitzer prizes, "ratings from independent organisations", and public opinion. (11/25)

So we decided to see if we could predict what share of Google's results a publication ought to get, using the apolitical criteria that the company mentions going into its algorithm. (12/25)

The first variables that we put into our model measured "trustworthiness". We combined ratings from those two fact-checking websites to give each publication an accuracy score. And we asked @YouGov to poll 1,500 Americans about how much they trust each publication. (13/25)

Then we added the number of @PulitzerPrizes each publication has won, whether it was print/broadcast/online and if it had a paywall. (Online-only publications did worse; having a paywall was associated with slightly fewer links, after controlling for everything else.) (14/25)

Next we included how much American web traffic (via @SimilarWeb) a publication gains from sources other than Google, as a proxy for its national audience, and also its tally of Facebook followers. Websites with little traffic but lots of FB fans got few search results. (15/25)

Finally, we accounted for how often our selected 37 publications wrote about each of these 31 keywords, using data from @Meltwater. In 2018, for example, @CNN published 3.3 times as many articles mentioning "Mueller" as @FoxNews did. (16/25)

Was our model any good? It did a reasonable job of predicting how many Google links each publication would get. (Specifically, it could account for nearly four-fifths of the differences between how often publications appear in Google's news tab.) (17/25)

But crucially, when we compared our model's predictions to the actual share of the 175,000 Google links that each publication received, we found no evidence that right-leaning sites did worse than expected. (18/25)

In other words, once we accounted for how trustworthy, reputable and popular a news organisation is, knowing where it sat on the political spectrum made no difference to our predictions. (Adding the variable to our model did not improve its accuracy.) (19/25)

The likely reason, therefore, that right-leaning websites get less exposure on Google's news tab is not their ideology, but the fact that they score less well on measures of accuracy and authority than their left-leaning equivalents do. (20/25)

(For those of you wondering: the composite ideological measure from the fact-checking websites put @TheEconomist on the centre-right, near @FT and @Forbes. We slightly underperformed relative to our very small expected level of Google visibility.) (21/25)

Of course, the results varied substantially depending on which keyword we looked at. If you play around with the fantastic interactive built by @martgnz, you will see that left-leaning publications were indeed strongly overrepresented on searches for "Trump". (22/25)

But when it came to articles which included the word "liberal", right-leaning publications got far more links than you would expect. (23/25)

All of which suggests that one of the most important variables in Google's algorithm is how interesting an article is - or at least, how likely it is attract clicks. Left-leaning publications write incendiary articles about Trump; right-leaning ones do so about liberals. (24/25)

This is not definitive proof that Google's algorithm is politically unbiased. If Pulitzer judges or fact-checkers are skewed, then our variables (and Google's human raters) could be too. Perhaps a different sample of keywords or publications would give different results. (25/25)

You can follow @J_CD_T.


Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.