Searching Daily Kos
Quick Search Tips
Search: enter words you want to find in the target collection. By default only targets containing all the words will be returned. Capitalization is ignored. If you want to find several words in a precise order (a phrase) put double quotes around them as in "this is a phrase". Single quotes are not permitted, and are removed. Although "AND" is assumed by default, "OR" and "NOT" are supported, as follows.
Multiple query words are by default understood to be joined by "AND". Certain characters have to be "escaped" by placing a back slash in front of them; to find "/", use "\/", to find "=", use "\=". Complex queries can be constructed using parentheses and restricted queries.
Restricted queries: limiting the parts of the document to be searched. The available options are described here.
For both stories and comments (documents):
For stories only:
For comments only:
Find: In order to choose what kind of documents to find, use the pull down menu to select one of the following. Each option is identified as either a database search, meaning that the primary database is searched using Scoop directly, or index search, meaning that a specially constructed index of words contained in documents is searched using swish-e. There are three different indices, one for diaries (aka user diaries), one for stories (aka front page stories), and one for comments. User diaries that are elevated to the front page are found in both story and diary indices.
From: and To: pull down menus restrict the time interval of the search. The default interval is from one day ago to most recent. Most recent may be as much as 5 minutes out of date.
Sort by: The order of presentation of results. Not all choices are relevant to all searches.
Story and Diary Search sort choices.
Comment search sort choices shared with stories and diaries are "Relevance", "Title", and "Time", which are as described above. Those specific to comment are "Cid", "Nrec", and "Ntroll".
More example queries
For comments only Everything past the "#" is a comment, not part of the query!
Searching Daily Kos with election tags (etags)
You can now search using election tags as follows.
Etags with embedded spaces should be quoted.
You can only use one etag per search.
Good election tags have a two letter postal state abbreviation hyphen seperated from either a house district number, or a statewide office abbreviation (SEN, GOV, Lt.GOV, SOS, etc).
There is one exception, etag="2006 ELECTION" also works.
Good etags are ones we already know about by finding them in the tag list of an existing diary or by appearing in one of the election roundup diaries.
The etag is used to build a more complicated query that looks for the tag in tags, title, or body text (stories/diaries) or in subject and comment text (comments). Certain meta dairies are excluded, including open threads, jotter's daily lists, and sidinny's election roundup, and the top comments diaries.
Software used.The search facility currently installed is based on the swish-e search engine: it is fast (compiled C code), open source, and flexible (you can feed it documents out of a database using your own custom preproccessor). It is intended for document collections numbering a million or less, which will do for Daily Kos with under half a million documents (excluding comments). In addition, swish-e offers many of features most desired, including search of phrases and search within configurable document areas (title, author, tags, URLs).
What is indexed. All Daily Kos documents from the beginning to within 5 minutes of the present moment are indexed. Older documents are stored in static yearly indices while more recent documents are indexed and reindexed, frequently at first, then less frequently as they age, until they roll over into a growing, static, permenant yearly index. The right mix of number of index files and frequency of update to obtain the optimal combination of ease of maintenance and speed of search may require further experimentation. At present three different series of indices are maintained, corresponding to (in order of increasing total number of documents) front page stories, user diaries, and comments. For 2005 the number of stories, diaries, and comments was 5097, 88943, and 693224, respectively, containing 56993, 371578, and 2945916 distinct words.
Design Considerations. The utility of search to a community of users is delivered not simply by the results of the text search, but most critically by the ordering of the results such that highest "value" hits are displayed first. (Most search researchers know that no one looks beyond about the first handful of results; almost no one looks beyond the first page.) The key to arranging for best to come first is to understand, have access to, and use the meta data associated with the text documents searched.
Meta Data. At Daily Kos, we have a number of sorts of meta data that are relevant to documents. First among these is the time of posting. What is the latest? When did they say that? Who was first? That sort of thing. So the one thing search has to get right is time, in the sense of being up to date, that is including the latest diaries, as well as in the sense of permitting searches to be organized by reference to the present time.
Making it up to date. Because recommendations are only open for 24 hours after a diary is published, it is important that search indices be refreshed sufficiently often during that interval to provide reasonably up to date information on total recommendations and comments. Currently, diaries up to 6 hours old are refreshed every 5 minutes, and diaries 6 to 24 hours old are refreshed every 15 minutes.
More Meta Data. Other important meta data associated with stories and diaries includes comments and recommendations.
The number of recommendations offers insight, imperfect, but useful, into the degree of approval a diary has received from the Daily Kos community.
The number of comments indicates the size of the discussion engendered. The two numbers together give an interesting insight into a diaries history at Daily Kos.
Some user diaries are promoted to the front page, and some diarists ("front pagers") are allowed to post directly to the front page. The latter are not open for recommendation. In order to harmonize listings from searches which include both front page stories and user diaries, a plausible value for number of recommendations (83) is assigned to front page stories, but in lists, this number is indicated by "*" to avoid placing too much emphasis on this made up number.
The search query. Search queries can contain individual words, all of which must be found in a text document to constitute a hit. Queries can also contain phrases, indicated by double quotes around the phrase. All words are converted to lower case while searching, so letter case is of no consequence. In addition, queries can include boolean terms, such as "and", "or", and "not" which can be used to construct more complex and perhaps more informative results.
By default "and" is assumed between individual words in a query. Finally, queries or parts of queries can be restricted to only search within parts of a document or document metadata. The parts available are shown above, along with examples.
Certain characters have special meaning to the search engine (open and close paren, double quote, equals sign). Only a restricted list of characters are expected to occur in a word.
For the indices here at daily kos, the word characters are as follows.
All other characters (non word, non special) are silently replaced by spaces while searching. For instance searching with "Jerome a Paris", "Jerome.a.Paris", or "Jerome,a,Paris" all give the same result.
This is important in certain cases, for example searching for the url http://riverbendblogspot.com is accomplished with the query site=("riverbend.blogspot.com").
The reasonable looking but incorrect query site=riverbendblog.blogspot.com doesn't work as expected because in the absence of the double quotes and parentheses, replacement of "." with " " and the default boolean AND understood between search words makes it equivalent to site=riverbendblog AND blogspot AND com, which returns as hits only documents with riverbendblog in a url, and blogspot and com in the body text.
More on Search
If you are intriqued by search and would like to know more about the basics of how it is done, this is a good introduction.
Since I've emphasized the importance of meta data, I'll also point to another article from the same series, On Search. SearchMeta, with some further thoughts on things meta.
While I'm on the subject of underlying technologies, let me end this section by pointing to an article by Tim Bray on Ajax, the other new addition here at Daily Kos. In it, he also gives credit to Jesse James Garrett for the Ajax acronym in the process of thoroughly describing it.
Daily Kos is an amazing place, community, phenomenon. It has been my privilege to have the opportunity to contribute.
- Recommended (127)
- Community (58)
- 2016 (50)
- Environment (38)
- Elections (36)
- Media (34)
- Republicans (32)
- Hillary Clinton (31)
- Law (29)
- Jeb Bush (28)
- Culture (27)
- Iraq (26)
- Barack Obama (26)
- Trans-Pacific Partnership (25)
- Civil Rights (24)
- Climate Change (23)
- Economy (19)
- Labor (19)
- LGBT (16)
- Congress (15)
- 18 comments 19 Recs