Forbes is a global media company, focusing on business, investing, technology, entrepreneurship, leadership, and lifestyle. We rely on roughly 2,500 authors to publish original content on our site — at a pace of more than 400 articles each day. We recently set a goal to become 100% in the cloud. As part of that, we took the opportunity to modernize our system for site statistics processing, which provides information such as number of page views, article rankings, and traffic sources. When we migrated that outdated, on-premises system to the cloud, we reduced our technical debt, boosted site performance, and added new capabilities for our contributors. Today, we use Firestore, a fully managed, serverless document database, BigQuery, an enterprise data warehouse, as well as Google Analytics, an analytics service that tracks web traffic, to facilitate our statistics processing.
Custom-built, legacy system was too clunky for the cloud
Previously, our statistics processing system consisted of a series of incoming logs from collection servers, which were then routed into buckets on processing servers and sent to a MySQL database for short-term storage. From there, the information was routed to another database for long-term document storage. As time went on, more and more data was pushed to the long-term document storage database. At that point, simply lifting and shifting this system to the cloud would have been a financial strain because of the sheer volume of data it stored. In addition, our previous system lacked an access-control list, so contributors either saw everyone’s statistics, or couldn’t see anything at all. We had very little control over how the data was consumed in these systems.
From our partners:
Reduced technical debt with Firestore and Google Analytics
As we shifted away from on-prem to cloud solutions, we decided it was time to fix our statistics processing system, and move from a bespoke self-collection architecture to something simpler. We landed on Firestore — a NoSQL document database — because it seamlessly integrates with Google Analytics, reduces our maintenance, and improves the user experience for authors who want to check on the performance of their content. Further, as a database, Firestore doesn’t require any configuration or management, it’s entirely cloud-native, it’s cheap to store data in, and it executes low-latency queries.
Using the enterprise data warehouse BigQuery to house our historical data, Firestore to process statistics, and Google Analytics to deliver site metrics, we significantly reduced our technical debt and replaced our 40-45 application servers to just three.
Accelerated site metrics from 1X per day to every 15 minutes
Once we implemented our new statistics processing system, we were able to update our contributors’ site metrics much faster. Before, for certain metrics like daily page views, we would run the calculation once a day. But with Firestore, we were able to increase that to every 15 minutes.
Also, we went from only calculating historical data, to showing contributors the performance of their posts over the last hour and even minute. In terms of database writes, we went from 100,000 per day to millions of writes every 15 minutes.
By providing this granular level of data to our contributors, we are helping them better optimize their content and deliver the best possible pieces to their readers.
Enhanced user experience with SEO suggestions
As a media outlet that relies on page views to drive revenue, we wanted to provide new ways for our contributors to increase their page views. Using real-time data and artificial intelligence, Firestore delivers SEO-optimization suggestions to contributors. We show these recommendations to writers in our content management system, helping them draft higher performing headlines and increasing the page views on each of their pieces. This capability helps increase our return on content and allows writers to constantly optimize for better performance.
Improved collaboration with BI team
We’ve also seen benefits of adopting a cloud-native solution with internal stakeholders. Our business intelligence (BI) team, for instance, is tasked with presenting data in a cleaned-up format, which we call our data mart. Previously the BI team relied on an analytics tool that was completely decoupled from the statistics processing system. Now, with Google Analytics, both teams are using the same data, helping us better understand the queries our BI team runs.
Looking forward with Firestore
Forbes uses Google Cloud in a variety of different ways, including hosting our proprietary first-party data platform, ForbesOne. This full-featured platform includes data collection, data processing, data analysis, and the use of ML and AI to create segmentation and lookalike audiences, content targeting on- and off-the-platform, as well as reporting. Next, we’re looking to bring the power of ForbesOne into our publishing platform via Firestore, opening up even more insights and recommendations to our journalists.
Want to learn more about Firestore? Find more information to help you on your journey:Learn how to easily develop rich applications using a fully managed, scalable, and serverless document database on the Firestore product page.Watch this video to learn how Forbes used Firestore, BigQuery and Cloud Functions to rapidly develop new intelligence technologies that help their journalists make informed decisions.New customers get $300 in free credits to spend on Firestore. Go to the console to get started.
By: Benjamin Harrigan (Software Architect, Forbes) and Alexander Shnayderman (Engineering Manager, Forbes)
Originally published at: Google Cloud Blog
For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!
Our humans need coffee too! Your support is highly appreciated, thank you!