X

Researchers mine tweets in search of health trends

Computer scientists feed 2 billion public tweets into computers, filter out the 1.5 million that mention health issues, and uncover patterns about the flu, allergies, cancer, insomnia, depression, and more.

Elizabeth Armstrong Moore
Elizabeth Armstrong Moore is based in Portland, Oregon, and has written for Wired, The Christian Science Monitor, and public radio. Her semi-obscure hobbies include climbing, billiards, board games that take up a lot of space, and piano.
Elizabeth Armstrong Moore
2 min read

The explosion of social media has given researchers a lot of data to mine and trends to identify, but two computer scientists at Johns Hopkins University say they've developed sophisticated filtering software that is attracting particular attention from public health officials.

Johns Hopkins computer scientists Mark Dredze, left, and Michael J. Paul say that Twitter posts can provide useful public health information. Will Kirk

Twitter, which launched five years ago, has already been used by computer scientists to try to track the flu.

But when Johns Hopkins University computer scientists Mark Dredze and Michael Paul devised a method to filter and categorize health-related tweets, they weren't sure what they might find. So they decided to sort the tweets (they filtered 1.5 million health-related tweets from a sample of 2 billion) into electronic, ailment-specific "piles."

"There have been some narrow studies using Twitter posts, for example, to track the flu," Dredze said in a news release. "But to our knowledge, no one has ever used tweets to look at as many health issues as we did."

From cancer and allergies to insomnia and depression, the duo was able to glean information about where people were sick with what and how they were coping.

There are, of course, several inherent limitations, not least of which is that tweets are little acts of self-reporting. People can tweet that they are home with the flu for the benefit of the boss--right from the comfort of their picnic blanket--and the program would have no way of telling fact from fiction.

Furthermore, any trends identified via tweets are just that: Twitter trends. The researchers admit to this, cautioning that the data represent those who tweet, and therefore capture far less about, for instance, the elderly.

There's also the issue of discretion. The vast majority of health-related tweets were fairly innocuous, i.e.: "Had to pop a Benadryl...allergies are the worst." There are simply fewer tweets on health trends about which people feel shy or ashamed, such as STDs, abuse, eating disorders, and more.

Still, within the limitations of tracking Twitter trends, some useful information can be gleaned. For instance, in some 200,000 of the 1.5 million health-related tweets, the researchers were able to draw on user-provided public information to determine location and track trends by time and place--i.e. when allergy and flu seasons peaked (at least among tweeters) in various parts of the country.

Dredze and Paul will be presenting their findings on July 18 at the International Conference on Weblogs and Social Media in Barcelona. Expect any news and trends from the event to be tweeted.