Pear Analytics Twitter Report: Criticisms of the coding methods

Pear Analytics produced a study about the usage of Twitter, and I fear they reveal more about their own organisational ability than then do anything about Twitter.  I’ve read the public white paper, and I’m finding myself doubting the value of the report simply on the basis of the categorisation Pear used for their twitter coding.  To describe it as limited, overly broad and prone to motivational bias is a charitable way of saying it’s poor quality, and I’d send it back for revision if this was a conference paper, and bounce the damn thing straight to reject if it was a journal article.  (Don’t really want to think about the pain I’d inflict on a student who turned this in as an essay).

Discussion of the Categories

(1) News: Any sort of main stream news that you might find on your national news stations such as CNN, Fox or others. This did not include tech news or social media news that you might find on TechCrunch or Mashable.

So by news, they don’t actually mean news that would be “news” in a social media community.  Fair enough. If “social media news” is excluded here, where was it included?  Specifically, there’s also something suspect about the division of news content in this manner – does this include original news such as the Hudson River tweet, Iran elections, election coverage, sports reports and score updates from live events? Is it restricted to the rebroadcast of news articles with short URLs? Can blog posts of original opinion columns similiar to those located in the websites of major news stations? Is it video/visual/audio news rather than text?  The selection of CNN, Fox or others indicates a bias towards the television style news rather than the print media – which is odd for a written medium.  I have doubts over the nature of this category, and believe it may significantly under report.

(2) Spam: These are the tweets such as “See how I got 3,000 followers in one day” type of tweets.

Fair definition.  Although I wonder where the line was drawn for content coding – did this include the keyword spam accounts who send @messages based on automated keyword triggers? Or did those @spam triggers fall into the conversation category?  Minor question, and I think this is a fair and well set up definition.

(3) Self?Promotion:  These are typical corporate tweets about products, services, or “Twitter only” promos.

Did this include press releases, blog post updates (like the one that appears on Twitter for this post) and private user self-promotion? For example, when I talk about social marketing course work, or presenting at a conference, or announcing an attendance at an event, did I sit in the corporate self promotion?  As with news, I think this category is possibly under-reporting, and I suspect some of the self-promotional was counted as conversational.

(4) Pointless Babble: These are the “I am eating a sandwich now” tweets.

If I believe that this was the singular use of the category, I’d still have concerns.  I freely admit to “pointless babble” posts which have sparked long conversations, been retweeted and more than a few times, a single silly tweet from me has more traffic and mileage than my “serious” tweets.  I’d also be interested to see whether this category included any tweets with hashtags – eg the  livebloogging of a conference.  Liveblogging isn’t news, spam or self-promotion, and the stuff I did under the #INSM09 tag doesn’t count as conversation either.  Was it pointless babble? Possibly, except that it was a rationale for a lot of people to start following that account.

(5) Conversational: These are tweets that go back and forth between folks, almost in an instant message fashion, as well as tweets that try to engage followers in conversation, such as questions or polls.  Note: Now, if there were any tweets that could fit into more than one category (which was rare), if it started with “@”, we deemed it as conversational, even if it was a news item or self?promotion.

A good piece of clarification that conversational could absorb tweets from any other category area just by virtue of having an @ or being a question or poll.  I think this category grossly overreports, and absorbs from other areas – I’m suspicious that a question can be conversational in nature,

(6) Pass?Along Value: These are any tweets with an “RT” in it.

A rebroadcast tweet counts as a pass-along. Fair enough.  But what about the RT of a “pointless babble” tweet? Would a RT mean the original tweet has value to the ReTweeter, and therefore, requires a new category?  I would also have liked to known where the 8.70% of RT originated from – conversation, babble selfpromotion or news?

Broad Concerns with the study

1) What value was placed on hashtags and URL shortening? You’ve recognised RT and @, how about the other advance use behaviours?

2) Which category contains the “tech news” or “social media news” that you might find on TechCrunch or Mashable?  Did it become an RT once @Mashable/@Techcrunch posted it, and sit inside self promotion initially?   Was it classified as a conversational once people talked about it? Given original statements of tech news or social media news was were explicitly excluded from being news (despite the fact you can find tech news and social media news on News  Corporation owned news sites), it would have been nice to have a statement in the white paper about where these items were included.

3) Defining all other categories as “pointless babble” strikes me as a case of over-reporting to create a desired result, rather than actually assessing the state of play of the Twitter content.  To demonstrate this possible problem, I coded the Foxtel Television channels for content within the existing twitter categories. Given it’s a one way broadcast medium, I declined to allocate “Conversational” to any channel.

10%    News (any recognised news network channel)
34%    Pointless Babble (anything not classfied elsewhere)
10%    Pass along value (the +2 channels)
41%    Self promotion (any named or branded channel such as National Geographic, MTV, Fox* or Discovery)
5%    Spam (pay per view or home shopping)

* Fox News was counted as self promotion since it’s a for-profit entertainment network rather than a legitimate news media outlet.

Is Australian television mostly pointless babble and self promotion? Well, that depends – I deliberately didn’t cast Fox Sport as news since it’s a named self promoting outlet, and if I recode sport broadcasts as news channels, then news forges ahead to 18%, pointless babble sits at 34% and self promotion drops to 30%.  So TV is just pointless babble and self promotion, and has no merit, right?  The content classification approach is problematic at times, particularly when there’s a considerably negatively worded catch-all category to pick up the unclassified.  Incorporating a judgemental categorisation system designed to condemn rather than report will bias the overall outcome – for example, if I change one label, the summary of the results changes remarkably

(1) News  3.6%
(2) Spam 3.75%
(3) Self Promotion 5.85%
(4) Collective Goods of Value (Pointless Babble)    40.55%
(5) Conversational 37.55%
(6) Pass?Along Value 8.70%

Suddenly Twitter is the most vital thing ever if you want community  since it’s so vibrant if you take Rheingold (1993) “collective goods of value” as the interpretation of the statement about what you’re having for lunch, along with the existing massive conversation structure. Since conversation and collective goods of value are precursor conditions for the creation of cybercommunity, then Twitter is the perfect cyber community incubator system.  If you code “Misc.other” as the foundation tools for a community, it’s all good.If you’ve decided that Twitter is a waste time/space/bandwidth, and arrived at the study with a preset attitude and a desire to prove the waste of space hypothesis, setting a broad classification of “Pointless Babble” is a great way to prove your point, and demonstrate some poor levels of market research, analysis and analytical thinking.

Plus, at 40.55%, I believe there would have been an opportunity to start digging deeper into the nature of these “pointless babble” posts to separate them into Facebook update style “I am here with Y, doing X” and  “letting you know I’m alive and okay”, truely recognised as “babble” (cat posts, misposts, and other apparently “useless” content), the social network ping command versus recommendations, shoutouts jokes and fourth-wall breaking messages.

Clustering it all under “babble” shows a lack of investigative desire to actively pursue a more meaningful investigation of the content of the twitterstream. It’s a shame, because with a more active method, a deeper interrogation of the data and a bit of desire, Pear Analytics should be able to produce something remarkable from what they’ve captured.

ETA: Sarah Monahan has been in contact to let me know she’s no longer with Pear Analytics.

ETA2: Pear Analytics responded to some of the community comments on their report on their blog.

References

Experience: The Blog: Twitter’s 40.55% “Pointless Babble”: The Insights Mainstream Media Missed, http://www.experiencetheblog.com/2009/08/twitters-4055-pointless-babble-insights.html (Accessed: Sat Aug 15 2009 16:47:44 GMT+1000 (AUS Eastern Standard Time)

Twitter Study Reveals Interesting Results About Usage | Pear Analytics, http://www.pearanalytics.com/2009/twitter-study-reveals-interesting-results-about-usage/
Sat Aug 15 2009 16:46:52 GMT+1000 (AUS Eastern Standard Time)

Rheingold, H. (1993) The virtual community: Homesteading on the electronic frontier. New York: Harper Collins

Reblog this post [with Zemanta]