X

Twitter's strange Discovery engine

<b>commentary</b> The top story in my Discover tab recently was about bark beetles. I have to believe that Twitter's Discovery engine has mistaken me for another user.

Dan Farber
4 min read
Twitter

I am an admitted Twitter addict. I spend at least a few, intermittent hours a day scanning the river of tweets in my feed and lists to see what is happening in the world, especially the tech world that I cover for CNET. The challenge is keeping up with the flow of tweets.

I would guess that I miss 90 percent of my main Twitter feed of over 1,000 sources across several topics. If I am busy in a meeting or writing a story, I come back to 1,000 to 2,500 unread tweets, and I am not going to spend much effort going backward in time, swimming against the current. If something specific comes to mind, I can search Twitter, or Google, for the latest info on a topic.

In my more curated Twitter list of 190 tech sources, where I spend more time, I miss at least 50 percent of the tweet flow during prime time. If a story is important and trending, however, it will likely surface downstream when I am paying attention.

In light of the high percentage of missed tweets, I was hoping that Twitter's Discover feature and its personalization algorithms would assist my news-gathering activities, grabbing tweets in the flow that I missed and would be of interest to me based on what Twitter knows about my use of the service.

At the Wired Business Conference earlier this month, Twitter CEO Dick Costolo explained that Discovery is based on who you follow. "The accounts you follow paint a picture of what we call your interest graph...That interest graph shapes a maybe even more compelling picture of who you are and what you're interested in than gender, age, location, etc.," he said.

I certainly send the Twitter engine enough signals -- tweets, retweets, follows, followers, and location data -- to help determine what is important to me. So far, though, I have to believe that Twitter's Discovery engine has mistaken me for another user.

For example, a recent top tweet in my Discover tab was about bark beetles, linked to a story from hcn.org. I can't figure out the correlation with my Twitter profile, unless the most blaring signal of the moment is "dbfarber = beetles + fires + Western forest trees." I have been posting pictures on Pinterest of Western forest trees, but I am not sure how that signal leads to bark beetles on Twitter Discovery, rather than tweets more relevant to my interests.

If the Discover engine is based on relevant tweets and stories shared by people I'm connected to on Twitter, and people connected to those I follow, I can confirm that I am not following hcn.org, though, who knows, a few of the people I follow might be. In any case, this is not an example of delivering signal from noise.

Britney and Demi at the Fox upfronts was also a surprise in my Discovery tab and interest graph. I don't recall following Demi, Britney, X Factor, Fox or any of the people tweeted about at the momentous event. A few of the tweets were a bit more on target, such as "Obama's Bain ad," but not compelling enough for a click. Further checks of my Discovery tab yielded similarly unappetizing results, such as "Ministers end contract with A4e," "Seeking mates for Mei Xiang and Tian Tian," and "Perian project comes to an end."

Twitter's engineering blog provides a more technical description of how Discovery works, but not an explanation for the odd range of selections:

To generate the stories that are based on your social graph and that we believe are most interesting to you, we first use Cassovary, our graph processing library, to identify your connections and rank them according to how strong and important those connections are to you.

Once we have that network, we use Twitter's flexible search engine to find URLs that have been shared by that circle of people. Those links are converted into stories that we'll display, alongside other stories, in the Discover tab. Before displaying them, a final ranking pass re-ranks stories according to how many people have tweeted about them and how important those people are in relation to you. All of this happens in near-real time, which means breaking and relevant stories appear in the new Discover tab almost as soon as people start talking about them.

As Twitter is proving, personalizing results is still an elusive art. I could give Twitter more data to work with by continuing to use Discovery, but for now I'll give it a rest.