DK4 is a marvel: so many new ways to find new and interesting material. I'm casting about for new ways to add value. One of the things that didn't seem to make the cut for dk4 is tag visualizations. You probably remember them: more popular tags are bigger, presented as a jumble of words. I don't really miss them, but there could still be value in there somewhere.
Drew Conway recently (yesterday) published a method that adds that value. I've modified his published code to apply to daily kos (thanks Drew - open source rocks). The results are shown below the doodle.
Drew's title was "Visualizing the Language Used by Academics when Protected by Anonymity" and used posts on a political science blog. So I guess I should really have titled my diary "Visualizing the Language Used by Kossacks", but that seems kind of dry.
Paraphrasing Drew's explanation, in this method a number of diaries are downloaded and each word in each diary is counted. Then taking the words 2 at a time, number of times each word pair was used in the same diary is counted. Isn't counting great? The higher the pair count the closer the affiliation of the pair of words. To make things cleaner, only the top 25% of word pairs are used.
In order to show the words meaningfully, it is necessary to display them graphically. This is achieved considering words as nodes and the word pairs as edges in a network, with the edges weighted by the pair count. Once that is done, standard methods for display graphs can be used to place words near each other.
The closer words are, the more times they were used in the same diary. We can use the distance between words in the visualization to create clusters which we hope will represent current topics.
Check Drew's much more complete description for the methods used to do this. Here we just need to know that each of the resulting 8 "topics" are colored differently and the size of the words shows how often they were used overall.
The visualization above is the result of this analysis, based on the most recent 200 diaries from dailykos/diaries last night (Monday March 7, 2011) and again this morning (Tuesday March 8, 2011). In order to get the figures on the page, I had to make them too small to see, but if you open the images in a new page, you should be able to see and compare the two. One more point: the word coloring should not be taken to indicate similarity between days. The colors were assigned independently, so having the same color on 3/7 and 3/8 is not meaningful.
Comparing the two days, there has been a striking change in the most frequently used words and in the apparent topics that emerge. On both days, there is a central topic and seven "spokes" that show todays emerging interests. At least that's what I see. What do you see?