Skip to main content

Took a small break there... todays' tag data follows. More new tags, of course, but the tag cleanup still goes on, so the total tag count isn't growing as fast as before.

I'm putting together a proposal for a better tagging scheme. That sceme requires a force of Tag Librarians. If you'd like to join that effort, please add your name to the Tag Librarians page at the dKosopedia.

Note: this account (dKosopedia) will be used for posting dKosopedia items, such as daily tag runs. If you need to get ahold of me (The Centerfielder) then use my The Centerfielder email address (centerfielder atsign centerfieldview dot com). My followup comments will be posted as The Centerfielder. Sorry for any confusion and previous inconsistencies.

all tagstags-all.txt.zip47967
new tags (since 20061001)tags-new.txt.zip221
tags starting with whitespacetags-whitespace.txt.zip2
tags of 5 or more wordstags-multiword-5.txt.zip1152
tags containing a periodtags-period.txt.zip1432
tags containing a semicolontags-semicolon.txt.zip65
tags used oncetags-used-once.txt.zip29573
tags from 20061001 not found nowtags-old-gone.txt.zip136
soundex codes mapped by a single tagtags-single.txt.zip842
soundex codes starting with Atags-soundex-a.txt.zip1948
soundex codes starting with Btags-soundex-b.txt.zip2413
soundex codes starting with Ctags-soundex-c.txt.zip3682
soundex codes starting with Dtags-soundex-d.txt.zip2611
soundex codes starting with Etags-soundex-e.txt.zip1725
soundex codes starting with Ftags-soundex-f.txt.zip2099
soundex codes starting with Gtags-soundex-g.txt.zip1852
soundex codes starting with Htags-soundex-h.txt.zip1849
soundex codes starting with Itags-soundex-i.txt.zip1864
soundex codes starting with Jtags-soundex-j.txt.zip1708
soundex codes starting with Ktags-soundex-k.txt.zip827
soundex codes starting with Ltags-soundex-l.txt.zip1770
soundex codes starting with Mtags-soundex-m.txt.zip3361
soundex codes starting with Ntags-soundex-n.txt.zip1986
soundex codes starting with Otags-soundex-o.txt.zip1008
soundex codes starting with Ptags-soundex-p.txt.zip3625
soundex codes starting with Qtags-soundex-q.txt.zip78
soundex codes starting with Rtags-soundex-r.txt.zip2762
soundex codes starting with Stags-soundex-s.txt.zip4434
soundex codes starting with Ttags-soundex-t.txt.zip2903
soundex codes starting with Utags-soundex-u.txt.zip869
soundex codes starting with Vtags-soundex-v.txt.zip680
soundex codes starting with Wtags-soundex-w.txt.zip1621
soundex codes starting with Xtags-soundex-x.txt.zip11
soundex codes starting with Ytags-soundex-y.txt.zip142
soundex codes starting with Ztags-soundex-z.txt.zip139

Originally posted to dKosopedia on Mon Oct 02, 2006 at 06:32 PM PDT.

EMAIL TO A FRIEND X
Your Email has been sent.
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags

?

More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
(The diary will be removed from the site and returned to your drafts for further editing.)
(The diary will be removed.)
Are you sure you want to save these changes to the published diary?

Comment Preferences

  •  What other (8+ / 0-)

    tag runs would you like to see? How about a list of the top n (10, 20, 50, 100) tags? Or "Bush" tags, or...

    Forget the myths the media's created about the White House. The truth is, these are not very bright guys, and things got out of hand. -- Deep Throat

    by The Centerfielder on Mon Oct 02, 2006 at 06:31:31 PM PDT

    •  Request: Collocations (5+ / 0-)

      First, thanks for doing this

      I'd like to request that you analyzed pairs (and possibly tuples) of tags for their frequency of co-occurrence relative to their frequency of occurring independently.  There are various formulae you could use here (e.g., mutual information), but basically I'd like to see at least two lists come out of this:  the most statistically improbable combination of tags that happened to occur today and statistically redundant tags (pairs that always occur together... that is, when one tag occurs, you can be sure the other will as well).

      (-7.75, -6.05).   Life is like this analogy...

      by shock on Mon Oct 02, 2006 at 06:42:35 PM PDT

      [ Parent ]

      •  Is this more of a for fun sort of thing (0+ / 0-)

        or what sort of insight would this potentially provide?

        jotter's Lists of High Impact Diaries: daily and weekly archives (bring your own bendy straws)

        by sele on Mon Oct 02, 2006 at 07:11:48 PM PDT

        [ Parent ]

        •  Mainly for fun, but... (1+ / 0-)
          Recommended by:
          Abou Ben Adhem

          In the redundant tags case (common collocations) it could possibly allow compression of the tagset (e.g., by removing the two separate tags and only using the combined phrase), and (perhaps more usefully, to me) in the statistically improbable combination case, it could help identify diaries that are potentially interesting syntheses of unrelated topics that I might enjoy reading.

          (-7.75, -6.05).   Life is like this analogy...

          by shock on Mon Oct 02, 2006 at 07:16:40 PM PDT

          [ Parent ]

          •  Not only that, (1+ / 0-)
            Recommended by:
            Abou Ben Adhem

            but the correlations can help a tool like I'm building with normalize the tags as well as finding slightly wayward tags that just need to be corrected.

            For example, Kos has a preference for how house races are tagged: 'CA-51, House, Bilbray, Busby'. How many actually get tagged that way? Should we recommend a different tagging approach based on what we see statistically, or should we use the tool to add tags to diaries on CA-51 that are lacking the others? Lots of ways that this info can be useful.

            As for the tag condensing, my instinct is to not do that except when the tags are semantically redundant. CA-51 and House should be fully redundant, and that's a good thing. If I wanted to search across all house races, I'd hate to have to put in 435 tags to do it.

            As for the interesting syntheses, my hope is to also put together a set of recommendations for searching tag intersections ('Rep. Mark Foley' and 'Amish') so that people could explore the tag space a lot more effectively.

            -6.00, -7.03
            "I want my people to be the most intolerant people in the world." - Jerry Falwell

            by johnsonwax on Mon Oct 02, 2006 at 08:35:32 PM PDT

            [ Parent ]

      •  Agree (0+ / 0-)

        Yes, except I'm creating these from the alltags page. I don't have access to the tuples, other than downloading each diary individually. But if someone with access to the db provided a dump of the tags table we could ask a lot more questions.

        Forget the myths the media's created about the White House. The truth is, these are not very bright guys, and things got out of hand. -- Deep Throat

        by The Centerfielder on Tue Oct 03, 2006 at 04:31:18 AM PDT

        [ Parent ]

    •  Check tags vs. dKosopedia redirects. (0+ / 0-)

      That would be a good automated way of finding and replacing typos and synonymous tags:  If the dKosopedia entry for a tag redirects to another entry, it's probably safe to replace the tag with the title of the redirected entry.

      This could be done by with an external tool, or incorporated into the site's tagging mechanism. This method would automatically resolve the issue of which synonym would be the 'canonical' one. And it would allow anyone to contribute to tag cleanup without the laborious manual search-and-replace work:  If you see a tag with a typo, or some other tag that should be merged, just add a redirect entry to the dKosopedia and that tag can be automatically found and changed whenever it reappears in the future.

      <div style="color: gray; font-size: 80%">(-7.88, -8.97)</div>

      by Abou Ben Adhem on Mon Oct 02, 2006 at 08:24:25 PM PDT

      [ Parent ]

    •  How about last name onlys (0+ / 0-)

      Until the last couple hours when I lost TU, I've been updating "Foley" tags to add "Mark Foley" and noticed several "Clinton" tags (some Hillary some Bill) and "Edwards" tags (some John and some Donna)... get my drift?

      I also added to the dKosopedia on the Tag Cleanup page that it would be helpful to add something in the "hints" to recommend first and last names to avoid this kind of search problem.

      Traitor n.: 1. One who places party above people and our laws.

      by musicsleuth on Tue Oct 03, 2006 at 10:50:08 AM PDT

      [ Parent ]

  •  I'm grinding away at building a tool to (7+ / 0-)

    streamline this. Steady progress, though a bit slower than I had hoped.

    I emailed Kos that I was interested in providing some solutions, but I wanted to take a few days to work out some tests to make sure this would actually work. I'm guessing by this weekend I'll have a concrete proposal to make for him to chew on.

    -6.00, -7.03
    "I want my people to be the most intolerant people in the world." - Jerry Falwell

    by johnsonwax on Mon Oct 02, 2006 at 06:40:19 PM PDT

  •  I love the people who put in the systemic work (2+ / 0-)
    Recommended by:
    jonah in nyc, Buffy Orpington

    to make a whatever run smoothly.

    I'm not smart enough to figure out the how, but after parts of my world calm (hopefully, soonly) I'd like to be a librarian.

    Many thanks for this project.  And for dkosopedia.

    jotter's Lists of High Impact Diaries: daily and weekly archives (bring your own bendy straws)

    by sele on Mon Oct 02, 2006 at 07:08:49 PM PDT

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site