Twitter puts real humans into its search algorithm...and profits

In a dense engineering post, Twitter explains how it uses "crowdsourced" human evaluators to make sense of ephemeral hashtags and other search terms. And who benefits? Why, Twitter's advertisers, of course.

Twitter has made an old idea new again, unveiling a new system that lets actual human beings tell its data center how to make sense of trending hashtags and other topical searches.

But don't get too excited about this apparent triumph of man over machine. First, the actual work done by these people seems likely to be menial and poorly compensated, even if it does accomplish something that Twitter's mighty information systems appear unable to manage on their own.

Second, and more important, you shouldn't expect to see Twitter's service improve in any ways you might actually notice --- unless, that is, you happen to be a Twitter advertiser. Because the primary aim of the system appears to be improving Twitter's ability to serve up relevant ads against briefly popular hashtags whose meaning would be completely opaque to computers, though readily grasped by real people.

On the other hand, this could fill in an important part of Twitter's business model. While it's difficult to tell from the outside, Twitter apparently believes that there's big money to be made from serving up the right ads against sudden waves of public interest in various memes. Since you could argue that Twitter really isn't much more than a steady progression of such waves gently lapping against the beach of human consciousness, it's entirely possible the company is right.

Twitter revealed what it called its "real-time human computation" system in a dense and confusing blog post written by Twitter data scientist Edward Chen and Alpa Jain, a senior software engineer in the company's "Revenue @ Twitter" group. Chen and Jain start out reasonably enough, laying out the difficulty of intepreting the meaning of searches that suddenly spike in popularity, only to fade away just as quickly. Citing some notable examples from the recent presidential debates, they write:

1. The queries people perform have probably never before been seen, so it's impossible to know without very specific context what they mean. How would you know that #bindersfullofwomen refers to politics, and not office accessories, or that people searching for "horses and bayonets" are interested in the Presidential debates?

2. Since these spikes in search queries are so short-lived, there's only a small window of opportunity to learn what they mean.

Of course, this presents no problem for the actual human users of Twitter, who can generally follow the Zeitgeist quickly enough to figure out what's going on -- even if they have to Google the hashtag or search term to grasp its meaning. (I've had to do that myself on any number of occasions.)

But it does create an issue for automated interpretation systems, which rely heavily on context and historical usage to ascertain exactly what Twitter users are talking about. And neither is very helpful in deciphering a meme that pops up on Twitter and then fades away almost instantly. Of course, the only reason automated interpretation systems are involved at all here is because they're what Twitter relies on to serve up "relevant" ads -- promoted tweets, promoted feeds and what have you -- against these brief but often quite powerful search surges.

In other words, Twitter didn't have a functionality problem here -- it had a revenue problem. And that's what Chen and Jain have stepped in to solve with their merry band of crowdsourced volunteers.

Of course, the data scientists can't come right out and say that. Instead, they treat us to a discourse of how Twitter's data systems work -- one replete with topologies, bolts, spouts, tuple streams and Kafka queues. A representative sentence:

The Storm topology attaches a spout to this Kafka queue, and the spout emits a tuple containing the query and other metadata (e.g., the time the query was issued and its location) to a bolt for processing.

The gist of the technical description is that Twitter has a snazzy new way of determining when a new search term or hashtag is sufficiently popular to warrant interpretation. And at that point, it's dispatched to human workers at Amazon's Mechanical Turk service. Mechanical Turk, which I'd never heard of, is essentially an automated contracting service that farms out data jobs requiring human interpretation to an army of workers across the globe. Amazon dubs it, cleverly enough, as "Artificial Artificial Intelligence."

Twitter's jobs, however, don't go to just any Mechanical Turk workers. Instead, Twitter has culled a select number of folks on MT to "evaluate" trending search terms -- essentially by discussing them on forums and chatrooms and then telling Twitter which categories, images and videos the terms relate to. Because Twitter trusts this elite team, it can rely on a single evaluation to begin categorizing the new search term.

That then primes the company's advertising engine to serve up the right kind of ads. For instance, it might display @barackobama or @mittromney ads against the hashtag #bindersfullofwomen instead of, say, promotions for Office Depot.

Twitter declined to answer my questions about these human workers, including how many of them it relies on and how they're compensated. So it seems fair to note that Mechanical Turk is not, in general, exactly what you'd consider a generous employer. Of the ten most lucrative "human intelligence tasks" it listed at the time of writing, pay rates ranged from a high of $135.65 for transcribing and tagging five hours and 36 minutes of video to a low of $11 for writing a 300-word review article.

Remember, those are the best-paying jobs of the 1,842 listings currently displayed on the MT site. I couldn't get to the lowest-paying jobs without logging in as a worker, but the lowest ones I could see paid up to $1.69 for transcribing a four-minute audio clip.

Now, maybe Twitter pays its select human evaluators more than that, and it may not list its jobs on the MT site itself -- at least not anywhere the unwashed masses can see them. And the company does seem eager to portray its human evaluators as one big happy family.

Chen and Jain, for instance, note they "crowdsourced a singing telegram" to Twitter's Mechanical Turk workers to celebrate the launch of their project, and cite it as an example of "the kind of top quality our workers provide." And it's certainly a well put-together tribute to Jain herself, though there's no word as to whether the MT workers were compensated for their time and effort here:

For the record, I asked Twitter if this real-time human computation system served any purpose beyond tailoring advertising more precisely. A spokeswoman's response by e-mail:

At this time, this particular project does not provide additional advantages beyond those discussed in the blog post. However, as mentioned above, we do use the overall human computation system for other purposes.

The "overall human computation system" she referred to, by the way, is this:

We have used human computation systems for some time, and for a variety of purposes -- for example: to measure performance of our search algorithms. Over time, our overall human computation systems have evolved quite a bit.

We only recently launched this particular system, which involves humans giving us additional information about search queries in real time.

What's unclear at this point is what else Twitter might do with this system. ReadWrite's Jon Mitchell offers the intriguing speculation that Twitter has just figured out how to effectively render news sites obsolete, because it can take a sudden burst of interest in a trending news subject and translate it directly into relevant, on-the-fly news pages much faster than human editors could.

That's an unnerving but all-too-plausible projection, at least from the perspective of this news editor. That said, breaking news based on announcements and other public events has become increasingly commoditized over the past few years, so full automation (or even semi-automation) may be the logical next step.

But those of us in the actual news business aren't the only ones who should be concerned. Google may also have its hands full trying to update Google News or newer real-time services -- which could presumably perform similar tricks using data out of Google+ -- to compete with Twitter's real-time fire hose.

About the author

David Hamilton is the assistant managing editor of CNET News. He has been writing and editing business and tech coverage for about two decades -- the majority of that at the Wall Street Journal in both Tokyo and San Francisco. He is a two-time winner of the Overseas Press Club award and has written for numerous magazines and blogs, including Slate, Science, VentureBeat, CBS Interactive's BNET, California Lawyer and the New Republic.


Join the discussion

Conversation powered by Livefyre

Don't Miss
Hot Products
Trending on CNET


Up for a challenge?

Put yourself to the real tech test by building your own virtual-reality headset with a few household items.