Skip to main content

Searching Daily Kos

Quick Search Tips

Search: enter words you want to find in the target collection. By default only targets containing all the words will be returned. Capitalization is ignored. If you want to find several words in a precise order (a phrase) put double quotes around them as in "this is a phrase". Single quotes are not permitted, and are removed. Although "AND" is assumed by default, "OR" and "NOT" are supported, as follows.

  • "more than a box" AND "duck's breath mystery theatre"
  • "Jerome a Paris" NOT energy
  • "George W Bush" OR "Worst President Ever"

Multiple query words are by default understood to be joined by "AND". Certain characters have to be "escaped" by placing a back slash in front of them; to find "/", use "\/", to find "=", use "\=". Complex queries can be constructed using parentheses and restricted queries.

Restricted queries: limiting the parts of the document to be searched. The available options are described here.

For both stories and comments (documents):

  • By sid; story id. Add sid=2006 to your query to find documents whose story id contains the word 2006, eg 2006/2/2/13492/92435 (all stories and comments have an sid; the comment sid is the same as the story it is attached to, followed by the comment id (cid).
  • By author; add author=("Jerome a Paris" or Darksyde) to your query to find documents by Jerome a Paris or Darksyde
  • By site; add site=cnn to your query to find documents containing a link containing the word cnn. Don't use, use or site=(http this that com).
  • By images; add images=cnn to your query to find documents containing references to an image whose link contains the phrase cnn, e.g.

For stories only:

  • By title; add title=meta to your query to find documents containing the word meta in the title.
  • By tag; add tag=metajesus to your query to find documents whose tag list contains the word metajesus

For comments only:

  • By subject; add subject=meta to your query to find comments containing the word meta in the subject.
  • By cid; add 'NOT cid=1' to your query to avoid most tip jars.
  • By ntroll; add 'NOT ntroll=0 ' to your query to find troll rated comments
  • By nrec; add 'NOT nrec=0' to your query to find only recommended comments

Find: In order to choose what kind of documents to find, use the pull down menu to select one of the following. Each option is identified as either a database search, meaning that the primary database is searched using Scoop directly, or index search, meaning that a specially constructed index of words contained in documents is searched using swish-e. There are three different indices, one for diaries (aka user diaries), one for stories (aka front page stories), and one for comments. User diaries that are elevated to the front page are found in both story and diary indices.

  • Authors - database search
  • Stories and Diaries - index search of story index and diary index (default)
  • Comments - index search of comment index
  • Comments by - database search
  • Diaries - index search of diary index
  • Diaries by - database search
  • Polls - database search
  • Stories - index search of story index
  • Users - database search

From: and To: pull down menus restrict the time interval of the search. The default interval is from one day ago to most recent. Most recent may be as much as 5 minutes out of date.

Sort by: The order of presentation of results. Not all choices are relevant to all searches.

Story and Diary Search sort choices.

  • Relevance - Determined by the swish-e sort engine. Depends on how close to the beginning of the document the query words are found, or whether they are in the title.
  • Author - The author of the comment.
  • Time - The date and time the diary or story was posted.
  • Impact - A number combining both recommends and comments.
  • Recommend - The number of recommendations given to a user diary.
  • Comments - The number of comments associated with a front page story or user diary.
  • Title - The title of the story or the subject of a comment.

Comment search sort choices shared with stories and diaries are "Relevance", "Title", and "Time", which are as described above. Those specific to comment are "Cid", "Nrec", and "Ntroll".

  • cid - Comment number. Tip Jars are often cid=1.
  • Recommendations - The number of recommendations ("4"s) awarded. Also known as nrec.
  • Troll Ratings - The number of troll ratings ("0"s) awarded. Also known as ntroll.

More example queries

  • Alito
  • Alito NOT Armando
  • maryscott AND (Alito not Armando)
  • "Bob Johnson" AND dog NOT jotter
  • author=Conyers
  • Jotter NOT "recommended diaries"
  • tag=(meta OR metadiary) AND author=jerome
  • sid=2006

For comments only Everything past the "#" is a comment, not part of the query!

  • sid=2006 cid=1 NOT ntroll=0 # find troll rated tip jars
  • sid=2006 cid=1 nrec=0 # find tip jars that need some love

Searching Daily Kos with election tags (etags)

You can now search using election tags as follows.


Etags with embedded spaces should be quoted.

etag="CA-LT. GOV"

You can only use one etag per search.

Good election tags have a two letter postal state abbreviation hyphen seperated from either a house district number, or a statewide office abbreviation (SEN, GOV, Lt.GOV, SOS, etc).

There is one exception, etag="2006 ELECTION" also works.

Good etags are ones we already know about by finding them in the tag list of an existing diary or by appearing in one of the election roundup diaries.

The etag is used to build a more complicated query that looks for the tag in tags, title, or body text (stories/diaries) or in subject and comment text (comments). Certain meta dairies are excluded, including open threads, jotter's daily lists, and sidinny's election roundup, and the top comments diaries.

Extended Documentation

Software used.The search facility currently installed is based on the swish-e search engine: it is fast (compiled C code), open source, and flexible (you can feed it documents out of a database using your own custom preproccessor). It is intended for document collections numbering a million or less, which will do for Daily Kos with under half a million documents (excluding comments). In addition, swish-e offers many of features most desired, including search of phrases and search within configurable document areas (title, author, tags, URLs).

What is indexed. All Daily Kos documents from the beginning to within 5 minutes of the present moment are indexed. Older documents are stored in static yearly indices while more recent documents are indexed and reindexed, frequently at first, then less frequently as they age, until they roll over into a growing, static, permenant yearly index. The right mix of number of index files and frequency of update to obtain the optimal combination of ease of maintenance and speed of search may require further experimentation. At present three different series of indices are maintained, corresponding to (in order of increasing total number of documents) front page stories, user diaries, and comments. For 2005 the number of stories, diaries, and comments was 5097, 88943, and 693224, respectively, containing 56993, 371578, and 2945916 distinct words.

Design Considerations. The utility of search to a community of users is delivered not simply by the results of the text search, but most critically by the ordering of the results such that highest "value" hits are displayed first. (Most search researchers know that no one looks beyond about the first handful of results; almost no one looks beyond the first page.) The key to arranging for best to come first is to understand, have access to, and use the meta data associated with the text documents searched.

Meta Data. At Daily Kos, we have a number of sorts of meta data that are relevant to documents. First among these is the time of posting. What is the latest? When did they say that? Who was first? That sort of thing. So the one thing search has to get right is time, in the sense of being up to date, that is including the latest diaries, as well as in the sense of permitting searches to be organized by reference to the present time.

Making it up to date. Because recommendations are only open for 24 hours after a diary is published, it is important that search indices be refreshed sufficiently often during that interval to provide reasonably up to date information on total recommendations and comments. Currently, diaries up to 6 hours old are refreshed every 5 minutes, and diaries 6 to 24 hours old are refreshed every 15 minutes.

More Meta Data. Other important meta data associated with stories and diaries includes comments and recommendations.

The number of recommendations offers insight, imperfect, but useful, into the degree of approval a diary has received from the Daily Kos community.

The number of comments indicates the size of the discussion engendered. The two numbers together give an interesting insight into a diaries history at Daily Kos.

Some user diaries are promoted to the front page, and some diarists ("front pagers") are allowed to post directly to the front page. The latter are not open for recommendation. In order to harmonize listings from searches which include both front page stories and user diaries, a plausible value for number of recommendations (83) is assigned to front page stories, but in lists, this number is indicated by "*" to avoid placing too much emphasis on this made up number.

The search query. Search queries can contain individual words, all of which must be found in a text document to constitute a hit. Queries can also contain phrases, indicated by double quotes around the phrase. All words are converted to lower case while searching, so letter case is of no consequence. In addition, queries can include boolean terms, such as "and", "or", and "not" which can be used to construct more complex and perhaps more informative results.

By default "and" is assumed between individual words in a query. Finally, queries or parts of queries can be restricted to only search within parts of a document or document metadata. The parts available are shown above, along with examples.

Certain characters have special meaning to the search engine (open and close paren, double quote, equals sign). Only a restricted list of characters are expected to occur in a word.

For the indices here at daily kos, the word characters are as follows.

  • The numerals 0-9
  • The "standard" alphabet letters a-z (remember all letters are mapped to lower case)
  • The "extra" alphabet characters ªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ

All other characters (non word, non special) are silently replaced by spaces while searching. For instance searching with "Jerome a Paris", "Jerome.a.Paris", or "Jerome,a,Paris" all give the same result.

This is important in certain cases, for example searching for the url is accomplished with the query site=("").

The reasonable looking but incorrect query doesn't work as expected because in the absence of the double quotes and parentheses, replacement of "." with " " and the default boolean AND understood between search words makes it equivalent to site=riverbendblog AND blogspot AND com, which returns as hits only documents with riverbendblog in a url, and blogspot and com in the body text.

More on Search

If you are intriqued by search and would like to know more about the basics of how it is done, this is a good introduction.

On Search. Basic basics. by Tim Bray.

Since I've emphasized the importance of meta data, I'll also point to another article from the same series, On Search. SearchMeta, with some further thoughts on things meta.

While I'm on the subject of underlying technologies, let me end this section by pointing to an article by Tim Bray on Ajax, the other new addition here at Daily Kos. In it, he also gives credit to Jesse James Garrett for the Ajax acronym in the process of thoroughly describing it.

Final Thoughts
Daily Kos is an amazing place, community, phenomenon. It has been my privilege to have the opportunity to contribute.

-- jotter

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site