Yesterday, I was at work to tend some equipment, but in my spare time I started going through the
tag cloud in an effort to clean it up. After intermittantly spending 4 or 5 hours on the task, I made it as far as the beginnings of 'D', and would like to report some of my findings. I found a number of problems, some of which appear to be in the underlying code, and some of which are due to user error and/or laziness.
Firstly, the list of tags is woefully incomplete. There are many many tags which point to diaries but don't show up on the master list. The only way to find these tags is either to go to a diary which uses them or to click on a tag and see the unlisted tags in the 'Similar Tags' sidebar. In addition, for tags that do appear on the master list, the number of diaries using a given tag is almost always inaccurate; usually the number on the master list is a significant understatement of the number of diaries that appear when you click on a tag.
Secondly, names. The prevailing method for tagging people's names is to use the most common form of their first name and their last name. This came up in a diary in late November. Some examples at the time:
Bush: 6
George W. Bush: 1613
Cheney: 3
Dick Cheney: 633
Richard Cheney: 0
Libby: 0
Irving Libby: 0
Scooter Libby: 646
Kerry: 1
John Kerry: 95
John F Kerry: 0
Dean: 6
Howard Dean: 93
In the section of the alphabet I've gone through, I've added first names to as many of the last-name-only tags that I could find. I confess to skipping 'Alito'; going through 150 diaries to insert 'Samuel' in each was a bit much. I am sick of pasting 'George W.' into 80-odd diaries, though.
This sort of thing sounds anal, but there were 20 'Clinton' tags in the database, and they were split pretty evenly between Bill and Hillary. And for God's sake, if you aren't sure of how someone's name is spelled, look it up. Google is good for that sort of thing.
Thirdly, typo errors. A very common mistake was to forget the commas between two tags, resulting in a new tag like 'CT-Sen Joe Lieberman'. I fixed all of those that I could find. Other typo-type mistakes are to put a space at the beginning of a tag, or a period at the end.
Fourth, a request. When writing a diary that is primarily in response to an article or story from the conventional media, please include a 'New York Times' or 'Washington Post' or whatever tag. If nothing else, that will allow people to quickly check whether a particular media story has been diaried. I didn't do this on my first pass, because it would have required too much time (I took the tags that the author and readers chose, and just altered them for consistency, and didn't add new ones).
Finally, some of the oddities I found:
- There are very many ways to mispell 'Barack Obama'
- There are 2.5 times as many tags for 'Bill O'Reilly' as there are for 'Bill of Rights'
- The tag bestdiaryever points to 4 different diaries
- Most tags in a single diary: 80
- Worst case of comma-forgetfulness: 'Brian Barry Multiculturalism Political Theory America John Kerry'
- The tag tecleo appears 45 times; can somebody please tell me what it means?
- Most amusing tag: 'B-grade Diarists Who'll Never Make the Front Page'.
- Runner up: 'Agent Mulder is after me'
- Ambiguity of the day: There appears to be some confusion as to whether Kos's book is Crashing the Gate or Crashing the Gate*s*. Even kos has used both forms.
I have to say that it was an interesting excercise. Even just skimming the diaries, I saw a cross-section of dKos that I wasn't used to. Front-page stories, recommended diaries, 0-comment diaries, all were on an equal footing.
-dms