On February 15, Seneca Doane advanced the idea of a DailyKos Diary Flow Gauge which would measure the rate of diaries published on the site. This metric, it was hoped, would help identify if the "firehose problem" manifested and in what what. In six diaries, Seneca began gathering data, from starting the Front Page's "Recently Rec'd" list and the Diary page, and began developing statistics from which inferences could be drawn (Diary Flow Gauges, 2/15/11).
This diary announces an automated flow meter with public data, based on the ideas advanced in the Diary Flow Gauge diaries and comments. Refined data is available is CSV-formatted text files here (download) and here (browser). See the end of this diary for links to source data.
Below the fold: Why Do this?, definitions for Diaries List Gauge (DL) and Recently Recommended Gauge (RR), finally, some notes about Timestamps and other data sources.
Why do this?
From Diary Flow Gauge, 2/16-17/11 by Seneca Doane
Some history: there was a big blow-up a month or so ago about how unlimited diaries would affect the site. Some were concerned (and convinced me, among others) that unlimited diaries would turn the Recent List -- no longer on the front page -- into a race track, with diaries shooting down the page so quickly that good ones would be lost, and that this combined with the tendency of readers to "fish only from their own stream" would undermine the sense of "sitewide" community that makes Dkos more than just a digest of stories. This is what led to the innovation of putting a truncated version of the Recent List -- the ReceRec list -- onto the front page, where TUs would filter the chaff out of the full/raw Diaries List and leave us with something that would still be "workable" beyond the front page & Rec List.
The main questions this raised was: well, would this work? Consider that the ReceRec list can fail either by being too exclusive or too inclusive. If TUs dont visit the Diaries page or otherwise pull diaries (especially from those without much of a following) from their streams, much more good work gets lost than we'd like; we'd lose much of the exposure to the unexpected that has made DKos so great. (The angels of the Rescue Squad could save a few through their Community Spotlight, but only a relative few.) But if we're too inclusive, then diaries could start to race down the ReceRec list almost as fast as they do the Diaries list, making it like drinking from a firehose.
So far, it looks like we're being quite inclusive -- but it's not yet a problem because the "load" of new diaries is not yet that large. There's reason to be optimistic that we will be able to find the "sweet spot" between either extreme -- if there is one! -- but it's useful to keep track if our progress to see how things will have changed over time.
"Great! Gimme the data!"
I already did. It's
here (download) and
here (browser) with more stuff
here. But you're going to want to know that (a) samples are now taken once per hour and (b) how the data is generated.
Diaries List Gauge (DL)
"How long does it take to get to the bottom of the Diaries List?"
Definition The Diaries List Gauge is a measurement of how long a diary stays on the front page of the
Diaries page. DL(
n) is used to denote one of two measurements: at any give moment, the time of the
nth diary was published on DailyKos, or number of minutes since the
nth diary was published.
Example If I tell you that at 3pm EST on Sunday DL(100) was 600, you'll be able to calculate that the 100th diary was published 10 hours earlier at 5 am. In other words, a hundred diaries were published between 5 am and 3 pm.
We default to DL(100) because the Diaries pages lists the last 100 entries, so the 101st is bumped to the second page.
Recently Recommended Gauge (RR)
"How long does it take to get to the bottom of the Recently Recommended List?"
Definition The Recently Recommended Gauge is a measurement of how long a diary stays on the front page's
Recently Rec'd list. RR(
n) is used to denote one of two measurements: at any give moment, the time of the
nth diary on the Recently Rec'd list was published, or number of minutes since the
nth diary was published. RR(
n) defaults to two measurements: RR(50) is the last diary on the front page, and RR(100) is the 100th diary, which is useful for comparisons to DL(100).
Timestamp
"What time zone is this? What about the 27th diary?"
For the time being, all timestamps are EST. Data will be most reliable beginning on Feb 21 2011, as errors were probably made with manual entry and conversion of samples before that date.
To convert the long-form (date format) of any metric to a short form (minutes), subtract the metric from the Timestamp, and multiply by 1440 to convert days into minutes. (Try importing the CSV into GoogleDocs like this .)
Example
Timestamp = Feb 21 2011 12:01:23 PM
RR(100) = Feb 20 2011 08:25 PM
[ Timestamp - RR(100long) ] * 1440 = RR(100long)
RR(100long) = 936
Details of the diaries for DL(100), RR(50) and RR(100) an are listed
here in files named by date and time: (mmddyy-hhmm).txt. Furthermore, Lists of
all DL(0-100),
RR(0-50) and
RR(51-100) diaries are also available, just in case they might be interesting (error: urls must be corrected to point to
dailykos.com/story...).
Thanks for reading
If you use visualize this data or have ideas for more metrics that we might gather, please let me know.