Algorithm spots sarcasm--suuuuure it does

After a close look at word, syntax, and punctuation patterns in user-generated content, Hebrew University researchers come up with software that can detect sarcasm in online communication.

Amazon review
The researchers looked at 66,000 Amazon reviews such as this one to make their determinations about what constitutes sarcasm. Oren Tsur, Dmitry Davidov, and Ari Rappoport/Hebrew University

I'm just sooo happy to be sitting here reading through an eight-page PDF on algorithms. Seriously. Nothing in this world makes me happier than poring over phrases like "detailed results of the 5-fold cross validation of various components of the algorithm are summarized in Table 2."

If a new sarcasm-detecting algorithm out of Jerusalem's Hebrew University really knows what it's doing, it should be able to tell that I was just kidding there. Yeah, right. No, actually I was.

After an exhaustive look at word, syntax, and punctuation patterns in written user-generated content, the researchers came up with SASI (PDF), or Semi-supervised Algorithm for Sarcasm Identification, which can recognize sarcasm in online sentences and assign each sentence to a sarcastic class (not all sarcasm is created equal, of course). Meager attempts at sarcasm aside, this is a pretty novel idea that could possibly aid those pure souls who lack a sarcasm and irony meter--and could even have commercial applications.

One idea here is that automated sarcasm recognition could help improve review summarization and opinion-mining systems, since the inherently subtle and ambiguous nature of sarcasm sometimes makes it hard even for humans to decide whether a comment is sarcastic. According to the researchers--Oren Tsur, Dmitry Davidov, and Ari Rappoport--studies of user preferences suggest some consumers find sarcastic reviews biased and less helpful.

Uh-huh. I just love trying to make sense of formulas like these. Oren Tsur, Dmitry Davidov, and Ari Rappoport/Hebrew University

The Hebrew University team--which will present its findings next week at the International Conference for Weblogs and Social Media in Washington, D.C.--closely examined some 66,000 Amazon reviews for 120 products including books, music players, digital cameras, camcorders, GPS devices, e-readers, game consoles, and mobile phones.

Identifying cues common to sarcasm in online communication (excessive use of capital letters as in: "Well you know what happened. ALMOST NOTHING HAPPENED!!!"; puns; and explicit contradictions), the researchers created a complex algorithm in which a small number of sarcastic sentences "teach" the software to recognize sarcasm. They say the software precisely identifies sarcastic sentences 77 percent of the time--no small feat given the elusive nature of sarcasm, its intractable relationship to cultural context, and differences between the spoken and written varieties.

Of the reviews they studied, Shure and Sony noise-cancellation earphones, Dan Brown's "Da Vinci Code," and Amazon's Kindle e-reader attracted the most sarcastic comments. Tsur, Davidov, and Rappoport identified three factors that motivated reviewers to bust out the sarcasm:

  1. Popularity: The more popular a product is, the more sarcastic comments it draws.
  2. Simplicity: The simpler a product is, the more sarcastic comments it gets if it fails to fill its single function (i.e. noise blocking/canceling earphones that fail to block the noise).
  3. Price: The more a product costs, the more likely it is to attract sarcasm.

The researchers hypothesized that a main reason people use sarcasm in online communities and social networks is to "enlighten" the masses and compensate for products/people/ideas they believe to be overhyped. They say they may look into this topic further in the future, however. Great! More long PDFs for me to read.

(Via Slashdot)

 

ARTICLE DISCUSSION

Conversation powered by Livefyre

Don't Miss
Hot Products
Trending on CNET

Hot on CNET

The Next Big Thing

Consoles go wide and far beyond gaming with power and realism.