This meta-diary provides a description of methods used to compile daily and weekly reports of recommended diaries. I'm posting this separately in order to better and more completely answer a question I've gotten a lot lately, namely "how do you do that? Not by hand by I hope!". Nope. Not by hand.
I've divided the description into sections going from harder to softer considerations in the hope that the hard core will be satisfied sooner, while the soft and fuzzy won't mind going along for the ride. Then we end up with a list of previous reports generated with these methods.
As BiPM might once have said, "Toot, Toot! Here we go!".
Do you know how long it took me to consistenty spell it diary instead of dairy? Way, way too long.
How to make a list of recommended diaries
Sections
- Overview
- Hardware
- Databases
- Software and scripts
- Reports
- Motivation
- Concerns
- History
- Previous Reports
Overview Each Tuesday through Saturday I produce a report of the most recommended diaries from the preceding day. On Monday the report covers the preceding weekend. On Sunday the report covers the preceding week, and also includes lists of most highly recommended authors, and most highly recommending readers. The data that is necessary to produce the reports is obtained by periodically downloading the Daily Kos front page and looking for new diaries, which are then downloaded and stored locally. After diaries are closed for recommendation the number of recommendations and the comment thread are also downloaded and stored. The reports are made using custom scripts.
Hardware A first generation LCD iMac (lamp Mac), various routers and firewalls, a broadband (cable) modem.
Databases I use simple python based databases: gdbm and shelve files
- one gdbm file keyed on time of posting and holding story urls
- four shelve files keyed on story url and holding stories, recommendations, comment threads, summary fields.
Software and Scripts
- Operating system: OS X v 10.3, built on Darwin, a unix variant.
- Timed execution of programs: cron
- Swiss army tool: Microsoft Excel
- Visualization of graphs: graphviz for OS X, Adobe Illustrator
- Other data graphics: Ploticus
- Scripts: python
- story2db.py: (run every 10-30 minutes)
Python urllib calls to dailykos.com yields 12 recent diaries. Each of these are pulled in, html removed, and "before the jump" text analyzed for non stopword stems (this is the basic work needed for simple duplicate checking). Store author, title, date of post, text, stem words, and url in a python shelf file. Store url indexed by date of post as YYYYMMDD.hhmmss in a gdbm file.
- updaterec.py: (run once per day in early AM, or as needed)
retrieve urls of diaries without recommendations. For each url, use urllib to retrieve recommendations from "who recommended" page. parse out names of recommenders. Store new recommendations indexed by url along with time recorded. If already recorded, compare new number to old number and replace old if new is greater.
- parseKos.py: (run as needed, needs to be put in cron)
Retrieve diary and comments, parse and store in shelve file.
- weeklykos.py: (run daily)
given start - stop times, retrieve diaries in that interval and number of recommendations per diary. Analyze and print report. Optional generation of graph files (dot) or heatmap files (ploticus) and author and reader reports.
- evalcomments.py: (run as needed)
given start - stop times, retrieve comment threads and recommendation data for diaries in that interval. Analyze and print report.
Motivation The primary reason for making these lists available is to provide coverage for a temporal blind spot at Daily Kos: good diaries become relatively unavailable to most users after they drop off the "recent diary" and "recommended diary" boxes on the front page. By counting up recommendations and providing a link back to the most highly recommended diaries from previous days or weeks, continued availability is provided.
Concerns Does the production of lists of past diaries promote a past orientation rather than a current and future orientation? Do these lists interfere with or otherwise conflict with standard recommended list? Do the lists of most recommended authors and most rewarding readers promote an "award show" mentality? Since I'm still making the lists, you may conclude that though I have these concerns, on balance the intrinsic value of having access to recent diaries and the good sense of most Kosmopolitans keeps me sanguine.
History
I've been posting recommended diary diaries (a form of meta-diary) weekly since December of 2004 and daily since sometime in January this year, 2005.
I started downloading diaries en masse while trying to find a simple way to recognize and flag those with duplicate subjects. A method was developed to do just that, and offered to Kos. Scripts for this purpose were written using python and run on a single desktop machine, one that uses mac os x unix and has a broadband connection to the internet.
Since once you have a hammer everything looks like a nail, my frustration with the lack of longer diary history (the "down the memory hole" problem) seemed a likely thing to pound on. Soon I was periodically polling the "recent diaries" from the front page and downloading all new diaries, and making my own lists of recent diaries which were as long as I needed in order to cover my absence. I use cron to to poll every 15 minutes (that has varied between 10 and 30 minutes according to traffic). I think at miss at most 3-5 diaries per week using this method: I've got a pretty good connection, and some simple ways to look for and repair misses.
I later realized that it might be better to see something like the front page recommended list instead of a raw list in chronological order, just about the time I noticed that the full list of recommendations for each diary is made available through it's own link. A link to the recommendations for each diary can be found as "Who's recommended This Diary" in the Recommend Diary box under the Menu box on each diary page. Since I already have a list of diaries that I've downloaded, its a simple matter to download the recommendations for each of those. The sticky part here is to make sure to get the final number of recommendations. Because diaries are only open for recommendations for 24 hours, its necessary for each diary to keep track of when it was posted and when the recommendations were last updated, and to make sure that there has been an update 24 hours or more after posting.
I haven't said a thing about the front page or about the thread of comments that accompany most diaries. The front page changes slowly enough that there is no need to track them, and in any case those stories don't get recommendations. The comment threads are problematic because usually they never close - so it's not possible to get a definitively closed thread. For these reasons, I avoided front page and comment threads at first. However, user requests and the example of the diaries of social democrat that provide updates to highly rated comments every thirty minutes (!!) drove me to consider comments, I wrote scripts to download full diary threads and analyze them. Until recently, I hadn't analyzed comments more often than weekly, so I ran updates by hand. Based on requests I've recently begun incorporating comment counts into daily summaries, which means I have to download and analyze dairies daily, and some diaries twice, once when they're on the list, and once after they close for recommendations. Devising a usable strategy for handling comments is currently at the center of my attention.
For the future, I still hold out the hope that a new search page that allows user specified time limits, sorting criteria (number of comments, number of recommendations), and number of hits to view would supplant these reports. I'm interested in a combined measure of impact that combines number of recommendations and number of comments. I'm still interested in seeing whether the graph visualizations of who's reading who are in some way useful rather than just fun to stare at.
Previous Weekly Lists
January 2005
S M Tu W Th F S # week
1 # 1-7
2 3 4 5 6 7 8 # 8-14
9 10 11 12 13 14 15 # 15-21
16 17 18 19 20 21 22 # 22-28
23 24 25 26 27 28 29 # 29-4
30 31
February 2005
S M Tu W Th F S # week
1 2 3 4 5 # 5-11
6 7 8 9 10 11 12 # 12-18
13 14 15 16 17 18 19 # 19-25
20 21 22 23 24 25 26 # 26-4
27 28
March 2005
S M Tu W Th F S # week
2 3 4 5 # 5-11
6 7 8 9 10 11 12 # 12-18
13 14 15 16 17 18 19 # 19-25
20 21 22 23 24 25 26 #
27 28 29 30 31
Maps
- Recommendation Map Jan 22 - Feb 18 2005
- dailykos in hyperspace
- Recommendation Map Jan 1 - Jan 19 2005
Recommendations
- Rounding up the Notables, March 12-18 2005
- Rounding up the Notables
- Most highly ranked comments 2/26/2005 - 3/4/2005
- Most highly ranked comments 2/19/2005 - 2/25/2005
- Digging into Diaries Feb 5-11 2005
- Diaries Most Commented Upon 1/29/2005 - 2/4/2005
- Most Highly Ranked Diary Threads 1/22/2005 - 1/28/2005
- Most Highly Ranked Diary Threads 1/15/2005 - 1/21/2005
- Most Highly Ranked Diary Threads 1/8/2005 - 1/14/2005