Posts in CorpusLinguistics
Who do you trust most: a robot, an alarm clock or your partner?

I’ll kick off this post with a definition of trust, but focus on an analysis of trust in everyday and not-so-everyday situations: about 12,000 conversations among friends, family members and strangers. About 10% of all these conversations make some mention of trust. Then I’ll turn to more extreme situations exemplified by characters in 135 different TV shows, episodes are longer than conversations but on average 53% of TV episodes make at least one mention of trust.

Read More
Cher is the queen of emoji even if she isn't

It is universally recognized by experts that Cher is the Queen of Emoji. (Hail, Cher.)

I’m pretty sure this is what Cher wears while she tweets emoji after emoji after emoji

But as far as I know, no one has (a) performed an actual analysis to prove this, nor has anyone (b) performed an adequate interpretive dance to Dark Lady. I once tried to tackle (b) at a retreat near Big Sur, but today my focus is (a).

Read More
Tournament of values!

Every few years, I go through an exercise where I collect a giant list of values, virtues, and intentions and rank them. The whole endeavor is a pseudo-quantitative approach to something deeply qualitative, but it articulates what I’m finding meaningful and helps me choose how I spend time, energy, and money. In the past, it’s been especially useful for helping me come up with responses to tricky situations where I don’t immediately know what to do.

Read More
Extreme language in presidential debates: Reagan, Trump and everyone in betwee

If you follow politics in America even a little bit, you know that Republicans talk a lot about taxes and that Donald Trump loves the word tremendous. But how do these rank relative to each other and to what Democrats (and Hillary Clinton, in particular) tend to talk about? Well, one finding is that over the years, Republican candidates have been even more preoccupied with Hillary Clinton than they have been with Ronald Reagan. Another finding is that the debates for the current election have been ~157% more negative than all previous debates.

Read More
U.S. presidential debates through the eyes of a computer

This post wraps up a series I’ve been doing on using machine learning models to understand recent American political debates (here and here). By taking all the transcripts of the debates since last year, I show which words and phrases most distinguish debaters’ styles and issues. Training a computer to identify speakers is usually thought of as a way of doing forensics or personalization. But here, I’m interested in something closer to summarization. If you can pick one section of talk for each candidate from the last debate, which moments are most consistent with everything they’ve said up to then?

Read More
Nattering Nabobs of Negativity: Bigrams, “Nots,” and Text Classification

You can get pretty far in text classification just by treating documents as bags of words where word order doesn’t matter. So you’d treat “It’s not reliable and it’s not cheap” the same as “It’s cheap and it’s not not reliable”, even though the first is an strong indictment and the second is a qualified recommendation. Surely it’s dangerous to ignore the ways words come together to make meaning, right?

Read More
Failed vs. fighting: the linguistic differences between speeches at the RNC and the DNC conventions

We know that Republicans and Democrats talk differently, but what’s the best way to describe these differences? Commentators note the relative darkness of the Republican National Convention and the focus on optimism and higher production quality for the Democratic National Convention. Looking at the words speakers use helps–but you can’t just use simple frequency (for details, check out the methodology section at the bottom).

Read More
Which new emoji will be the most popular?

June 21st is the release of Unicode 9, which will feature 72 new emoji–folks at Emojipedia have helpfully put them all together. The question in this blog post is: which ones will turn out to be the most popular? (Note that most people aren’t going to be able to use them immediately–you have to get an update of your phone/browser for them to show up and so will anyone you want to send them to.)

Read More
Poetry vs. non-poetry

I’ve been training an artificial intelligence system to write poetry and this morning I got interested in what the little parts of syntax and semantics are that preoccupy poets compared to other forms of written language. So I took a heap of poetry and a heap of not-poetry , pulled out the bigrams (two-word phrases) and did some statistics to see what distinguishes poetic writing from non-poetic writing.

Read More
The Emoji Is the Birth of a New Type of Language (👈 No Joke)

TYLER SCHNOEBELEN HAS discovered something curious about why people use the skull emoji. Schnoebelen is a linguist and the chief analyst for Idibon, a firm that interprets linguistic data. So recently he got interested in emoji. He analyzed a million social media posts containing those familiar little pictograms and found that when people talk about their phones they’re 11 times more likely to use the skull.

Read More