Yesterday afternoon, in
jotter's impact diary, a few of us number-crunching types were chatting about trying to see if there were any interesting patterns underlying people's use of dkos.
While we came up with a few things to look at (see below for a couple), we also thought it would be useful to throw the question out to the community at large...
Some of the things we were thinking of looking at:
- Are there identifiable types of users, with distinct patterns of posting?
- Are there any patterns in the way people use (or abuse) tags?
- After registering, how long does the typical user stay active?
- Is there a "best time" to post to make the Recommended list?
- Are there characteristic structures to comment trees?
There was a brief discussion of this on the Open Thread last night as well.
And, it wouldn't be a stats diary without a bunch of numbers, so here are a few things to chew on:
1. Tagging use. I took the list of all the tags and did a frequency analysis on said list:
What does this mean? The bottom axis is number of uses of each tag, starting with 1 on the left (ie a tag used by only one diary) and going up to tags referrenced in several thousand diaries. On the left axis is the number of tags that have that number of uses. In other words, the point in the upper left corner is the number of tags which are only referenced once. Which, by the way is 23,000, or roughly 60% of all tags. There are roughly 4500 tags which are used by two diaries, and the number drops from there. 'Iraq' is the champion, used by ~6400 diaries.
Interestingly, there's a very simple numerical relationship in the frequency of tag use. That's the straight line on the graph. In technical terms, it's a power-law decay with an exponent of -1. In less mathematical language, that means that only a very few tags are used many times, a fair number of number of tags are used fairly frequently, and lots of tags are used once or twice.
2. A study of user retention patterns. jotter collected some data on user activitiy over time:
User activity means "look at the last 24 weeks; count up the number of weeks in which a given user did something (comment, diary, rate, recommend)". What we see is that there's a hard core of constant users, the people over on the right of the graph. These are the people who are present day-in, day-out. Then there are the occasional users, who drop by once in a great while, or who register, post a few comments, and then leave. And then, there's the long middle, representing a mid-point between those extremes.
Interestingly, this can also be described by a power-law decay, again with an exponent of -1 (ignoring, of course, the hard-core users at the tail):
By the way, the appearance of patterns from the roaring chaos of 94,000 individual people is a classic example of what is called Emergent Behavior.
So, what sort of things would people be interested in seeing?
-dms