Thursday, November 13, 2008

Google Trends

I heard of a cool Google tool on NPR today, called Google Trends. Here's its description from Wikipedia:

Google Trends is a tool from Google Labs that shows the most popularly searched terms from the beginning of 2004 to now.

... Google Trends charts how often a particular search term is entered relative the total search volume across various regions of the world...

Google Trends also allows the user to compare the volume of searches between two or more terms. An additional feature of Google Trends is in its ability to show news related to the search term overlaid on the chart showing how new events affect search popularity.

Interestingly, there are some search keywords that are quite seasonal, like summer camps, which strongly coincides with the end of the United States school year ...

... some search keywords that come up around a certain date each year. For example, searches for the Internal Revenue Service peak on April 15, the deadline for filing taxes in the United States ...

"Twilight zone" peaks every 6 months, corresponding with the 4th of July and New Year's marathons of the show played on the Sci-Fi Channel.

It's all terribly fascinating because you can really get a feel for how certain people and events affect other events. Or how certain items are bigger concerns in certain parts of the country. For example, more people search on the Internet using the terms "foreclosure" and "bankruptcy" in Phoenix than in any other major city ... which makes sense because the housing crisis has hit here especially hard.

Or obvious stuff like the fact that search trends for the term "toys" mirrors trends for "Christmas" or that searches for "gas prices" generally happen at the same time as "hybrid". Or funny stuff like the fact that searches for "George Bush" and "stupid" seem to correspond pretty well, as do "Republican" and "scandal" .

The topic of the NPR news story was a specific application of Google Trends called Google Flu Trends. It's in the news for it's ability to predict flu outbreaks significantly quicker and more accurately than the CDC.

From the NY Times article on the same subject:

... There is a new common symptom of the flu, in addition to the usual aches, coughs, fevers and sore throats. Turns out a lot of ailing Americans enter phrases like “flu symptoms” into Google and other search engines before they call their doctors.

That simple act, multiplied across millions of keyboards in homes around the country, has given rise to a new early warning system for fast-spreading flu outbreaks, called Google Flu Trends.

Tests of the new Web tool from Google.org, the company’s philanthropic unit, suggest that it may be able to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention.

In early February, for example, the C.D.C. reported that the flu cases had recently spiked in the mid-Atlantic states. But Google says its search data show a spike in queries about flu symptoms two weeks before that report was released. Its new service at google.org/flutrends analyzes those searches as they come in, creating graphs and maps of the country that, ideally, will show where the flu is spreading.

The C.D.C. reports are slower because they rely on data collected and compiled from thousands of health care providers, labs and other sources. Some public health experts say the Google data could help accelerate the response of doctors, hospitals and public health officials to a nasty flu season, reducing the spread of the disease and, potentially, saving lives.

... Researchers have long said that the material published on the Web amounts to a form of “collective intelligence” that can be used to spot trends and make predictions.

But the data collected by search engines is particularly powerful, because the keywords and phrases that people type into them represent their most immediate intentions. People may search for “Kauai hotel” when they are planning a vacation and for “foreclosure” when they have trouble with their mortgage. Those queries express the world’s collective desires and needs, its wants and likes.

...Google Flu Trends avoids privacy pitfalls by relying only on aggregated data that cannot be traced to individual searchers. To develop the service, Google’s engineers devised a basket of keywords and phrases related to the flu, including thermometer, flu symptoms, muscle aches, chest congestion and many others.

Google then dug into its database, extracted five years of data on those queries and mapped it onto the C.D.C.’s reports of influenzalike illness. Google found a strong correlation between its data and the reports from the agency, which advised it on the development of the new service ...

As with all technology, there are both useful and scary aspects. Obviously, many people would have concerns with privacy. Data used in an aggregate sense, and anonymously would not seem to violate that. And Google has been pretty good about privacy issues so far. But there is definitely the opportunity there for data to be used improperly. So, it's important to be vigilant. We don't want to be hermits and live off the grid. That's not the answer. But we also don't want everyone in the world to know every time we pick our nose. There's got to be a safe and practical balance.

"The personal life of every individual is based on secrecy, and perhaps it is partly for that reason that civilized man is so nervously anxious that personal privacy should be respected" -- Anton Chekhov quotes (Russian playwright and master of the modern short story, 1860-1904)



1 comment:

Laura said...

I deal with this kind of problem on a daily basis with student records data. There is a fine line sometimes between aggregate data and individual data and there are very strict usage laws in higher ed for student data related to privacy laws. It is scary how much information Google has about me.

There's a movement in the state of Illinois (and several other states) to create student record-level databases from K-12 all the way through to college and the workforce. The federal government tried to mandate this for all US schools and was told to fuck off. This is supposed to be used to do research on educational and employment trends - but think of the abuses that are possible. For now, private institutions in IL are partcipating in a voluntary manner (many refuse) but state schools are mandated by the state to submit records. It's kinda scary when you think about it.