How much can your tweets reveal about you? Judging by the last nine hundred and seventy-two words that I used on Twitter, I’m about average when it comes to feeling upbeat and being personable, and I’m less likely than most people to be depressed or angry. That, at least, is the snapshot provided by AnalyzeWords, one of the latest creations from James Pennebaker, a psychologist at the University of Texas who studies how language relates to well-being and personality. One of Pennebaker’s most famous projects is a computer program called Linguistic Inquiry and Word Count (L.I.W.C.), which looks at the words we use, and in what frequency and context, and uses this information to gauge our psychological states and various aspects of our personality.
Since the creation of the L.I.W.C., in 1993, studies utilizing the program have suggested a close connection between our language, our state of mind, and our behavior. They have shown, for instance, that words used while speed dating can predict mutual romantic interest and desired future contact; that a person’s word choices can reveal her place in a social or professional hierarchy; and that the use of different filler words (“I mean”; “You know”) can suggest whether a speaker is male or female, younger or older, and more or less conscientious. Even the ways in which we use words like “and,” “under,” or “the” can be linked to depression, reactions to stress, social status, cultural norms, gender, and age. “The words we use in natural language reflect our thoughts and feelings in often unpredictable ways,” Pennebaker and his colleague Cindy Chung have written.
Social media seems tailor-made to take this kind of language analysis to the next level. You don’t have to ask for writing samples or diary entries. It’s all already online: tweets, Tumblr posts, and even Instagram captions give researchers access to the language that individuals use on an unprecedented scale. But the world of social-media language analysis is also fraught with difficulties. “The biggest problem with this approach is establishing causality,” Pennebaker said, when I spoke to him last week.
Take a study, out last month, from a group of researchers based at the University of Pennsylvania. The psychologist Johannes Eichstaedt and his colleagues analyzed eight hundred and twenty-six million tweets across fourteen hundred American counties. (The counties contained close to ninety per cent of the U.S. population.) Then, using lists of words—some developed by Pennebaker, others by Eichstaedt’s team—that can be reliably associated with anger, anxiety, social engagement, and positive and negative emotions, they gave each county an emotional profile. Finally, they asked a simple question: Could those profiles help determine which counties were likely to have more deaths from heart disease?
The answer, it turned out, was yes. Counties where residents’ tweets included words related to hostility, aggression, hate, and, fatigue—words such as “asshole,” “jealous,” and “bored”—had significantly higher rates of death from atherosclerotic heart disease, including heart attacks and strokes. Conversely, where people’s tweets reflected more positive emotions and engagement, heart disease was less common. The tweet-based model even had more predictive power than other models based on traditional demographic, socioeconomic, and health-risk factors.