The longest tweets may only be 140 characters long, but two Binghamton University researchers have been attempting to make them speak volumes.
Sang Won Yoon, an assistant professor of system sciences and industrial engineering, and Sarah Lam, an associate professor and an associate dean of the Watson graduate school, analyzed 500 million tweets from the New York City area. Using these tweets, they were able to see the location of users, the date, time and content of the tweet using signals from cell phone towers and Wi-Fi signals.
This anonymous information, provided by Xerox research, allowed Yoon and Lam to predict which areas were residential, industrial and recreational.
“We see a lot of Twitter information coming from Long Island at night time,” Yoon said. “And from the same users, we see a lot of tweets coming from the Manhattan area during the day. In this case, we can predict that maybe this person is living on Long Island and has work or school in Manhattan.”
According to Lam, the most successful individual user prediction they performed had 90 percent accuracy. They used a process called cross-validation, which showed how well the model generalizes new data.
In addition to using tweets, Yoon and Lam used census demographic information to account for people who did not use Twitter.
“Older people, like grandparents, are not likely to be doing any tweeting, but younger generations tend to do it more often,” Lam said. “We don’t want to miss out on the population that’s not captured by tweets.”
Although some social media users may be uncomfortable with the concept that their behavior can be predicted based on what they post, this is not a new phenomenon. Stores have used the tracking and pattern recognition that Yoon and Lam have done with tweets.
“Even some local grocery stores do it,” Lam said. “They know what the likely items that you are going to purchase and they will give you the relevant coupons hoping that you’ll buy more. They’re keeping track of individual shopping patterns.”
While they have only looked at data from Twitter, Lam said that the methodology could be modified to work with data from other social media sites, such as Facebook.
“It doesn’t really have to be constrained to Twitter data,” Lam said. “In any other large-scale data we’re looking for relationships, whether it is a group relationship or individual behavior. The methodology can be easily transferred.”