When people get sick, they turn to the Web for information. Back in 2008, a team at Google dug into this behavior and found that certain search terms were good indicators of flu levels. We later launched Google Flu Trends to estimate flu activity in near real-time using aggregated Google search data, in regions around the world.
At the end of every flu season, we evaluate the performance of our model. Are our estimates accurate? What worked well, or not so well? Do we need to make any updates? After the 2009 H1N1 season, for example, we updated the model to make sure we were providing accurate estimates. Since 2009, the model had performed well at the national and regional levels in the US and no update was needed.
Flu Trends can help estimate the start, peak, and duration of each flu season--all important information for public health agencies. During the 2012-2013 season in the US, the model performed well in estimating the start and duration of the season. However, the model overestimated the severity of the flu. In January 2013, after spotting the difference between our estimates and the percentage of healthcare visits for influenza-like illnesses (ILI) reported by the Centers for Disease Control (CDC), we started to investigate the high estimates. We found that heightened media coverage on the severity of the flu season resulted in an extended period in which users were searching for terms we’ve identified as correlated with flu levels. In early 2013, we saw more flu-related searches in the US than ever before.
We evaluated several options to improve the model. Ultimately, we determined that an update using the peak from the 2012-2013 season provided a close approximation of flu activity for recent seasons. We will be applying this update to the US flu level estimates for the 2013-2014 flu season, starting from August 1st. A casual observer will see that the new model forecasts a lower flu level than last year’s model did at a similar time in the season. We believe the new model more closely approximates CDC data.
This is an iterative process. We will keep exploring how we can build resilience to accommodate the effect of news media. In the meantime, stay healthy!
Posted by Christian Stefansen, Software Engineer Permalink | Links to this post |