
Publisher Buzzfeed has rebuilt its analytics processes around Google's BigQuery data warehouse and Looker dashboards for consumption by reporters and editors.
A big challenge for Buzzfeed was unifying its content analytics now that it publishes across around 20 platforms, from Buzzfeed.com to Facebook, Snapchat and Twitter.
Buzzfeed has nine billion monthly 'content views' across its platforms including social, so that's a lot of data to aggregate.
Speaking during Google Cloud Next '18 in San Francisco this week Nick Hardy, senior business analyst at Buzzfeed, said: "As a content company, content analysis is at the centre of what we do, so we wanted to make that available to everyone at the company and not just the analytics and data science teams."
In terms of this growing variety of publishing platforms, Hardy admits that some play nicer than others when it comes to opening up data via APIs.
"Social media data varies, we like to hit the APIs when we can but some platforms, without naming names, are not as nice, so we have done some creative things to get at that data," he said.
The challenge for Buzzfeed was to consolidate this data and make it consumable for a wide range of users with different data skills.
The first step was building a cloud data warehouse foundation, eventually turning to Google Cloud's BigQuery. Buzzfeed was using Amazon Redshift before, and wanted to move to BigQuery because that system "required lots of upkeep and dev resources," according to Hardy.
He also notes that the team had already decided on Looker as its downstream business intelligence tool, primarily "because it give us lots of functionality and is well integrated with BigQuery".
Before Looker, Buzzfeed relied on homegrown tools for downstream analytics, but Hardy said these struggled to scale as content became more distributed across channels.
"It became a burden to make dashboards using engineering capacity and we needed a better system to iterate more quickly," he said.
Looker gave Buzzfeed everything it wanted out of the box, with minimal dev overhead, including central access controls to give editors access to just the information they need.
Buzzfeed also liked that Looker came in two flavours: Explore for hardcore users and Dashboard for simple top-line analytics.
An example is when quizzes are created for the site.
Hardy explained: "Take questions about Disney as an example, if that does well how can we run that along its course? Social media editors were heavy users of this data, so we built dashboards for them to see what performs well in real time."
These content creators are particularly keen to see 'quiz completions', so this metric is built into their Looker dashboards.
Hardy said the main lessons the data team learned when it comes to building for non-technical users were twofold: "First is: people want answers quickly. Another thing we found is less is more.
"Looker allows us to open up the whole warehouse to people, but they didn't want that, so we simplified and stripped it back to what users want with the option to scale up from there, rather than opening up the floodgates."
Now Buzzfeed has 100s of terabytes of data across 300 tables in BigQuery, supporting 100,000 queries per day, of which 5,000 feed into Looker, where 1500 employees access metrics every month.
Lastly, Hardy said Buzzfeed is an early access partner for BigQuery ML, testing the new machine learning tool to improve the performance of its recommendation systems, which power the Buzzfeed sites and Facebook pages, as well as driving insight to social media editors to help them optimise content distribution.
(Reporting by Scott Carey, Computerworld UK)