I compile and publish daily and weekly lists of high impact diaries. This diary will serve as a reference to the methods used and a more detailed description of the statistics assocaiated.
Earlier methods are described in How to make a list of recommended diaries: A reprise. At that time I was periodically polling the daily kos site from home and compiling statistics based on what I was able to pull over. Pretty good, but far from perfect.
Since installing full text search, I've been able to use on site resources instead. This offered the possibility of more complete statistical analysis, which is only now beginning to be taken advantage of, but first required modifying and adapting my earlier code. That is now largely complete.
Overview
Each Tuesday through Saturday I produce a report consisting primarily of a table of the most recommended diaries from the preceding day. On Monday the report covers the preceding weekend. On Sunday the report covers the preceding Sat-Fri and includes lists of recommended authors, and recommending readers over that interval. Due to diary length restrictions the last two items are posted as comments to the main diary.
The report is generated at approximately 6 AM eastern time based on data retrieved directly from the Daily Kos database. For the time period covered (either 1, 2, or 7 days) every dairy posted is retrieved and diary impact calculated. Those with impact above a certain level (currently 0.15 for week days and week ends and 0.6 for the full week) are formatted for output in a table, sorted by the number of recommendations received. In addition, I include the uid and name of the last new user registered at the time of the report, as well as the number of new uids registered in the last 24 hours. These data are useful in tracking interest in Daily Kos over time, as detailed here.
The full report includes not only the table of diaries, but a summary of those who posted, recommended, and commented on them. In addition, a brief report (the statistical summary and the top 10 diaries) from exactly one year earlier is added.
Currently, although the report is generated automatically, posting is still done in the usual way - cutting and pasting and clicking.
I'd hoped to automate that as well, but got stuck just about the time of Yearly Kos, and haven't been able to make much progress since. Once I manage to automate it I'll probably move the posting time a little earlier in the day.
I'd like to add other statistical features that are useful and appropriate, and I'm very interested in expanding on the idea that the community here is well modeled as a network of nodes (users) and links between users (comments, recommendations, ratings).
Statistical Summary
Below is an example of the statistical summary accompanying every report, followed by explanatory text
2006-07-07 00:04:21 - 2006-07-07 23:58:40
DailyKos diaries = 295; 295 per day; 12.3 per hour
Active Kogs: 3015 (writes a diary, recommends a diary, or comments on a diary)
Line 1 - the post time of the first and last diaries in the time interval.
Line 2 - total diaries, and average diary production rate, within the time interval.
Line 3 - total number of participating registered users (Kosmopolitans, Kossites, Kossaks, or, in the most recent incarnation, Kogs).
Kogs who |
comment |
recommend |
write a diary |
only |
all |
comment |
1963 |
1130 |
242 |
723 |
132 |
recommend |
1130 |
2132 |
134 |
1000 |
132 |
write |
242 |
134 |
294 |
50 |
132 |
Table 1 - User summary: counts of who did what. The "only" column shows the number of users who engaged in only the activity listed in the first column. The "all" column shows the number of users who engaged in all the activities listed. Otherwise each cell shows the number of users who engaged in both activities listed in the corresponding first column and top row. For example 1130 people recommended a diary and commented on one. I currently don't include ranking a comment in a diary, since that is not connected directly to the diarist, but to the author of the comment.
diaries with >= |
0 |
1 |
10 |
30 |
100 |
recommendations |
295 |
279 |
117 |
30 |
14 |
commentators |
295 |
289 |
128 |
29 |
6 |
connections |
295 |
279 |
206 |
61 |
20 |
Table 2 - Diaries: counts of how many diaries were visited. Visits are counted by number of recommendations, which are equal to the number of recommending users, since each person can only recommend once, by number of commenters which is often smaller than the number of comments, since there can be many comments per person, and by "connections", the union of recommenders and commenters. If you both comment and recommend, you still only count once as a connection. I can't count "lurkers", those who read but take no visible (well really in my case countable) action while there. Though others have requested this be added, I confess I haven't looked into it as I'd argue that it is more reasonable to count only those who do something to make themselves known.
Observed writer-(reader or writer) pairs: 9128
Maximum possible such pairs: 3015 x 294 = 886410
Network Density (observed/possible): 1.0%
Network statistics: If we consider each user a node in a network, and each recommendation or comment from another user to a diary a link between the commenter/recommender and the author, than we can count the number of such links observed and calculate the maximum possible number of such links. The ratio of these two quantitites is the network density. It varies from 0, where no one is connected, to 1, where every possible connection is made.
At Daily Kos the number of links observed is the total number of connections over all authors. The maximum possible number of links between authors and readers is the product of the number of authors and the total number of active users. So we can calculate network density, and follow it over time.
Table 3 - the list of high impact diaries. The column headers are as follows.
- rank - order based on number of recommendations.
- nrec - number of recommendations
- ncom - number of comments
- tator - number of commenters or commenTators (ow, ... sorry, I know).
- cnx - number of connections, which is the total of commenters and recommenders, without duplication.
- impact - calculated currently using nrec and cnx
- Diary - a hyperlink to the Daily Kos diary displaying the title.
- Author - hyperlink to the diary authors Daily Kos home page.
- Time - post time shown as time of day for daily lists and date for weekend and weekly posts.
There is a very useful extension for the Firefox web browser, Tabletools that allows you to sort tables in place or copy for further examination in, for example, Excel.
Further thoughts on network density.
An example. If you went to a party attended by 100 people and on average everyone met or talked to ("connected with") 10 people, then the network density of the party would be the ratio of the meet ups that actually took place = 1000 = (100 * 10) to the number that could have taken place, = 9900 = (100 people * (100-1) other people. So that's 100/9900 or around 10%.
If you attended a party of 1000 people and everyone still on average met 10 people the network density would be (1000 * 10) / (1000 * 999) ~ 1%. Interesting! Each attendee had (numerically) the same experience at both parties, but the network they were a part of at the large party was much more sparsely connected.
Network density is a number describing the party, not the party people.
Perhaps the next statistic to add to the report should be the one mentioned above, the average number of links per user, also called the degree, so we can have some insight into the average user experience. This is already available from the data at hand; average degree from the example data above is ~3 links per user = 9128/3015. I will add this to the report going forward!
Other network statistics of interest would be average distance between nodes, centrality of nodes, and overall network cohesion.
Timing
One aspect of events at daily kos that has not been paid much attention to so far is their time course. This has been due to lack of data, not lack of interest. I am now able to see the times of every recommendation and comment. It would be very interesting to establish the normal time course of commenting and recommendation during the few days of active life for each diary.
In case it isn't obvious, I'm very pleased and honored to have been given the opportunity and the access required to make possible these lists, whose only real justification for existence is as an aid for people who can't or don't wish to spend all their waking hours reading diaries and who may through the lists find diaries they might have otherwise missed. Each person contributes what they can when they can - diaries, comments, recommendations, and I hope I have helped make the fruits of their labors, accessible to all. A large number of people have helped and contributed to shaping the reports to their present form and I hope that help will continue going forward. As always, if there are errors or omissions I would appreciate your pointing them out.