X

Twitter puts real humans into its search algorithm...and profits

In a dense engineering post, Twitter explains how it uses "crowdsourced" human evaluators to make sense of ephemeral hashtags and other search terms. And who benefits? Why, Twitter's advertisers, of course.

David Hamilton Assistant Managing Editor, CNET News
David Hamilton is the assistant managing editor of CNET News. He has been writing and editing business and tech coverage for about two decades -- the majority of that at the Wall Street Journal in both Tokyo and San Francisco. He is a two-time winner of the Overseas Press Club award and has written for numerous magazines and blogs, including Slate, Science, VentureBeat, CBS Interactive's BNET, California Lawyer and the New Republic.
David Hamilton
6 min read

Twitter has made an old idea new again, unveiling a new system that lets actual human beings tell its data center how to make sense of trending hashtags and other topical searches.

But don't get too excited about this apparent triumph of man over machine. First, the actual work done by these people seems likely to be menial and poorly compensated, even if it does accomplish something that Twitter's mighty information systems appear unable to manage on their own.

Second, and more important, you shouldn't expect to see Twitter's service improve in any ways you might actually notice --- unless, that is, you happen to be a Twitter advertiser. Because the primary aim of the system appears to be improving Twitter's ability to serve up relevant ads against briefly popular hashtags whose meaning would be completely opaque to computers, though readily grasped by real people.

On the other hand, this could fill in an important part of Twitter's business model. While it's difficult to tell from the outside, Twitter apparently believes that there's big money to be made from serving up the right ads against sudden waves of public interest in various memes. Since you could argue that Twitter really isn't much more than a steady progression of such waves gently lapping against the beach of human consciousness, it's entirely possible the company is right.

Twitter revealed what it called its "real-time human computation" system in a dense and confusing blog post written by Twitter data scientist Edward Chen and Alpa Jain, a senior software engineer in the company's "Revenue @ Twitter" group. Chen and Jain start out reasonably enough, laying out the difficulty of intepreting the meaning of searches that suddenly spike in popularity, only to fade away just as quickly. Citing some notable examples from the recent presidential debates, they write:

1. The queries people perform have probably never before been seen, so it's impossible to know without very specific context what they mean. How would you know that #bindersfullofwomen refers to politics, and not office accessories, or that people searching for "horses and bayonets" are interested in the Presidential debates?

2. Since these spikes in search queries are so short-lived, there's only a small window of opportunity to learn what they mean.

Of course, this presents no problem for the actual human users of Twitter, who can generally follow the Zeitgeist quickly enough to figure out what's going on -- even if they have to Google the hashtag or search term to grasp its meaning. (I've had to do that myself on any number of occasions.)

But it does create an issue for automated interpretation systems, which rely heavily on context and historical usage to ascertain exactly what Twitter users are talking about. And neither is very helpful in deciphering a meme that pops up on Twitter and then fades away almost instantly. Of course, the only reason automated interpretation systems are involved at all here is because they're what Twitter relies on to serve up "relevant" ads -- promoted tweets, promoted feeds and what have you -- against these brief but often quite powerful search surges.

In other words, Twitter didn't have a functionality problem here -- it had a revenue problem. And that's what Chen and Jain have stepped in to solve with their merry band of crowdsourced volunteers.

Of course, the data scientists can't come right out and say that. Instead, they treat us to a discourse of how Twitter's data systems work -- one replete with topologies, bolts, spouts, tuple streams and Kafka queues. A representative sentence:

The Storm topology attaches a spout to this Kafka queue, and the spout emits a tuple containing the query and other metadata (e.g., the time the query was issued and its location) to a bolt for processing.

The gist of the technical description is that Twitter has a snazzy new way of determining when a new search term or hashtag is sufficiently popular to warrant interpretation. And at that point, it's dispatched to human workers at Amazon's Mechanical Turk service. Mechanical Turk, which I'd never heard of, is essentially an automated contracting service that farms out data jobs requiring human interpretation to an army of workers across the globe. Amazon dubs it, cleverly enough, as "Artificial Artificial Intelligence."

Twitter's jobs, however, don't go to just any Mechanical Turk workers. Instead, Twitter has culled a select number of folks on MT to "evaluate" trending search terms -- essentially by discussing them on forums and chatrooms and then telling Twitter which categories, images and videos the terms relate to. Because Twitter trusts this elite team, it can rely on a single evaluation to begin categorizing the new search term.

That then primes the company's advertising engine to serve up the right kind of ads. For instance, it might display @barackobama or @mittromney ads against the hashtag #bindersfullofwomen instead of, say, promotions for Office Depot.

Twitter declined to answer my questions about these human workers, including how many of them it relies on and how they're compensated. So it seems fair to note that Mechanical Turk is not, in general, exactly what you'd consider a generous employer. Of the ten most lucrative "human intelligence tasks" it listed at the time of writing, pay rates ranged from a high of $135.65 for transcribing and tagging five hours and 36 minutes of video to a low of $11 for writing a 300-word review article.

Remember, those are the best-paying jobs of the 1,842 listings currently displayed on the MT site. I couldn't get to the lowest-paying jobs without logging in as a worker, but the lowest ones I could see paid up to $1.69 for transcribing a four-minute audio clip.

Now, maybe Twitter pays its select human evaluators more than that, and it may not list its jobs on the MT site itself -- at least not anywhere the unwashed masses can see them. And the company does seem eager to portray its human evaluators as one big happy family.

Chen and Jain, for instance, note they "crowdsourced a singing telegram" to Twitter's Mechanical Turk workers to celebrate the launch of their project, and cite it as an example of "the kind of top quality our workers provide." And it's certainly a well put-together tribute to Jain herself, though there's no word as to whether the MT workers were compensated for their time and effort here: