  •  What other (8+ / 0-)

    tag runs would you like to see? How about a list of the top n (10, 20, 50, 100) tags? Or "Bush" tags, or...

    Forget the myths the media's created about the White House. The truth is, these are not very bright guys, and things got out of hand. -- Deep Throat

    by The Centerfielder on Mon Oct 02, 2006 at 06:31:31 PM PDT

    •  Request: Collocations (5+ / 0-)

      First, thanks for doing this

      I'd like to request that you analyzed pairs (and possibly tuples) of tags for their frequency of co-occurrence relative to their frequency of occurring independently.  There are various formulae you could use here (e.g., mutual information), but basically I'd like to see at least two lists come out of this:  the most statistically improbable combination of tags that happened to occur today and statistically redundant tags (pairs that always occur together... that is, when one tag occurs, you can be sure the other will as well).

      (-7.75, -6.05).   Life is like this analogy...

      by shock on Mon Oct 02, 2006 at 06:42:35 PM PDT

      [ Parent ]

      •  Is this more of a for fun sort of thing (0+ / 0-)

        or what sort of insight would this potentially provide?

        jotter's Lists of High Impact Diaries: daily and weekly archives (bring your own bendy straws)

        by sele on Mon Oct 02, 2006 at 07:11:48 PM PDT

        [ Parent ]

        •  Mainly for fun, but... (1+ / 0-)
          Recommended by:
          Abou Ben Adhem

          In the redundant tags case (common collocations) it could possibly allow compression of the tagset (e.g., by removing the two separate tags and only using the combined phrase), and (perhaps more usefully, to me) in the statistically improbable combination case, it could help identify diaries that are potentially interesting syntheses of unrelated topics that I might enjoy reading.

          (-7.75, -6.05).   Life is like this analogy...

          by shock on Mon Oct 02, 2006 at 07:16:40 PM PDT

          [ Parent ]

          •  Not only that, (1+ / 0-)
            Recommended by:
            Abou Ben Adhem

            but the correlations can help a tool like I'm building with normalize the tags as well as finding slightly wayward tags that just need to be corrected.

            For example, Kos has a preference for how house races are tagged: 'CA-51, House, Bilbray, Busby'. How many actually get tagged that way? Should we recommend a different tagging approach based on what we see statistically, or should we use the tool to add tags to diaries on CA-51 that are lacking the others? Lots of ways that this info can be useful.

            As for the tag condensing, my instinct is to not do that except when the tags are semantically redundant. CA-51 and House should be fully redundant, and that's a good thing. If I wanted to search across all house races, I'd hate to have to put in 435 tags to do it.

            As for the interesting syntheses, my hope is to also put together a set of recommendations for searching tag intersections ('Rep. Mark Foley' and 'Amish') so that people could explore the tag space a lot more effectively.

            -6.00, -7.03
            "I want my people to be the most intolerant people in the world." - Jerry Falwell

            by johnsonwax on Mon Oct 02, 2006 at 08:35:32 PM PDT

            [ Parent ]

      •  Agree (0+ / 0-)

        Yes, except I'm creating these from the alltags page. I don't have access to the tuples, other than downloading each diary individually. But if someone with access to the db provided a dump of the tags table we could ask a lot more questions.

        Forget the myths the media's created about the White House. The truth is, these are not very bright guys, and things got out of hand. -- Deep Throat

        by The Centerfielder on Tue Oct 03, 2006 at 04:31:18 AM PDT

        [ Parent ]

    •  Check tags vs. dKosopedia redirects. (0+ / 0-)

      That would be a good automated way of finding and replacing typos and synonymous tags:  If the dKosopedia entry for a tag redirects to another entry, it's probably safe to replace the tag with the title of the redirected entry.

      This could be done by with an external tool, or incorporated into the site's tagging mechanism. This method would automatically resolve the issue of which synonym would be the 'canonical' one. And it would allow anyone to contribute to tag cleanup without the laborious manual search-and-replace work:  If you see a tag with a typo, or some other tag that should be merged, just add a redirect entry to the dKosopedia and that tag can be automatically found and changed whenever it reappears in the future.

      <div style="color: gray; font-size: 80%">(-7.88, -8.97)</div>

      by Abou Ben Adhem on Mon Oct 02, 2006 at 08:24:25 PM PDT

      [ Parent ]

    •  How about last name onlys (0+ / 0-)

      Until the last couple hours when I lost TU, I've been updating "Foley" tags to add "Mark Foley" and noticed several "Clinton" tags (some Hillary some Bill) and "Edwards" tags (some John and some Donna)... get my drift?

      I also added to the dKosopedia on the Tag Cleanup page that it would be helpful to add something in the "hints" to recommend first and last names to avoid this kind of search problem.

      Traitor n.: 1. One who places party above people and our laws.

      by musicsleuth on Tue Oct 03, 2006 at 10:50:08 AM PDT

      [ Parent ]

