ZIPs: Readability & Potential Impact: Mar31-6

by Quequeg

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Wednesday, Apr. 11, 2012 Wednesday, Apr. 11, 2012 at 8:20:13pm PDT

Welcome to the sixth edition of "Zero Impact Posts" (ZIPs). If you missed the first edition, you may want to read it in order to learn about the many positive aspects of ZIPs and thus the inspiration for the ZIP series. See the fourth edition for the raison d'etre for the ZIP series.

According to Jotter, during the previous week, there were 1361 posts, of which 1333 received recommendations and 809 had more than 9 commenters.

And according to me, there were only 4 posts that had no recommendations and no comments. They had absolutely zero impact, as judged by these measures. See below for a table with these rare and remarkable posts.

EST	rec	com	tip	impact	post	author
04/05 09:51 Thu	0	0	0	0.00	I wanna play	Joealan
04/04 14:37 Wed	0	0	0	0.00	Spy Wednesday	R T Saunders
04/01 14:03 Sun	0	0	0	0.00	Health Care Finance Reform	camano intp
04/01 08:03 Sun	0	0	2	0.00	The Late Great Commonwealth: Catching Up to the Republican Primary	brasch

Potential Impact
To wax philosophical for a moment, it can be said that all posts begin as ZIPs, as they are published with zero impact, until someone comments, recommends, or hotlists the post. (The ZIP list highlights those posts that look as though they will persist as ZIPs unless somehow discovered by the right readers, such as through the ZIP list.)

Though all posts start as ZIPs, I think most people would agree that some posts have more potential for impact than others. It is not completely random which posts receive more involvement than others. There are many possible factors (e.g., the content, the way the content is presented, how the content relates to current events, other posts, the time of day, the readership, the author's followers, etc). In theory, if you could objectively measure some of these factors and relate them to their influence on impact, then you could estimate the potential impact of a post even when it is still a ZIP.

So, I came up with a new quantity for describing posts called "Potential Impact" (PI). Assuming I could find a way to estimate PI, then I could show the PI # next to ZIPs and sort the ZIP list accordingly. In this way, viewers of the ZIP list could get a quick idea of which posts would be most likely to engage them, whether to enthrall or to enrage.

So, if you consider yourself a thrill-seeking risk-taker, then you would pick the posts from the top of the list, as they would have the highest PI.
B.A.S.E jumping from a cliff

On the other hand, if you prefer to relax on the "Lazy River", then you would pick the posts from the bottom of the list.
The Lazy River

Readability
One obvious factor is the readability of a post, because if a post is easier to read, then that means it will take less time to read and it will be easier to understand. This, in turn, would mean that people would be more likely to absorb the content of the post and subsequently respond in some way. At least, it seems like common sense to me that readability would make a difference.

Microsoft Word 2007 has a feature for showing "Readability Statistics". To use this feature, you need to select the proofing option "show readability statistics" and then run the spelling/grammar checker. After MS Word has finished the spelling/grammar check, it shows the readability statistics. Here is an example:
MS Word 2007 Readability Statistics

So, here is the big plan:
1) Use MS Word to get readability statistics for a bunch of posts.
2) Get the impact ratings for the same bunch of posts (i.e., com's + rec's + hot's).
3) Find patterns that link the readability statistics with the impact ratings for the posts.
4) Use the patterns to create a formula for estimating Potential Impact (PI).
5) Use the formula to estimate PI for new posts and sort the ZIP list accordingly.

The data is in!
I randomly picked 100 posts from last Saturday/Sunday and 100 from last Monday/Tuesday. Then, I collected the data and began my analysis. Here are some charts I made.

Impact Overview - All Posts
This first chart is just an overview showing the range of impact among the posts that I analyzed. It shows the impact rating of 200 posts in the order from low to high impact.
All Posts Shown in Order of Impact

Impact Overview - Some Posts
This chart is like the last one, except it shows only the 87 posts with the lowest impact. I chose to highlight these posts, because their impact moves up in a linear fashion, which made me think that their data might be more useful for analysis than the data from all 200 posts whose impact progresses exponentially.
Some Posts Shown in Order of Impact

Impact vs Reading Ease - All Posts
This chart shows impact along with the MS Word statistic "Flesch Reading Ease". The chart includes all 200 posts.
All Posts and Reading Ease

Impact vs Reading Ease - Some Posts
This chart is like the last one, except it only includes posts with a lower impact rating and that have more than 100 words.
Some Posts and Reading Ease

Impact vs Grade Level - All Posts
This chart shows impact along with the MS Word statistic "Flesch-Kincaid Grade Level". The chart includes all 200 posts.
All Posts and Grade Level

Impact vs Grade Level - Some Posts
This chart is like the last one, except it only includes posts with a lower impact rating and that have more than 100 words.
Some Posts and Grade Level

Impact vs Word Count - All Posts
This chart shows impact along with the MS Word count of the words in the post. The chart includes all 200 posts.
All Posts and Word Count

Impact vs Word Count - Some Posts
This chart is like the last one, except it only includes posts with a lower impact rating and that have between 100 words and 3300 words.
Some Posts and Word Count

Impact vs Spelling Errors - All Posts
This chart shows impact along with the MS Word count of the spelling errors in the post. The chart includes all 200 posts.
All Posts and Spelling Errors

Impact vs Spelling Errors - Some Posts
This chart is like the last one, except it only includes posts with a lower impact rating and that have between 0 and 25 spelling errors.
Some Posts and Spelling Errors

Impact vs Grammar Errors - All Posts
This chart shows impact along with the MS Word count of the grammar errors in the post. The chart includes all 200 posts.
All Posts and Grammar Errors

Impact vs Grammar Errors - Some Posts
This chart is like the last one, except it only includes posts with a lower impact rating and that have between 0 and 25 grammar errors.
Some Posts and Grammar Errors

What does the data mean?
I was hoping you'd tell me!

Well, I'll say a couple things. Obviously, there's not much correlation (as you can see from the low R-squared value on all the charts). But some of the charts hint at patterns. E.g., the chart "Impact vs Reading Ease - Some Posts" suggests that as the reading ease increases, the highest potential impact increases. If the reading ease is below 40, then the highest impact tends to stay below 20 (with the exception of one outlier). Also, if the reading ease is above 60, then the highest impact goes up to 34.

The charts about spelling/grammar errors would probably have been more valid, if I had divided by the number of words in the post. There are also other issues, such as the fact that MS Word does not know any slang, which is used a lot around here (e.g., "Kos", "snark", "fundie"). It may be that posts that use more slang are more in touch with the Kos community, which means that more spelling errors would be a positive. Still, it looks to me on the "200 post" charts that as the spelling/grammar errors climb, they tend to put a damper on the highest impact.

Also, I could make similar observations about the charts for "grade level" and "word count".

On the other hand, I have a special facility for seeing patterns where none exist. So, feel free to differ and express your opinion, because I'm sure everyone wants to hear it.

For the very curious, here are links to see the raw data:
- webpage
- Excel 2003 spreadsheet for download
- online spreadsheet
Let me know if you need some other format.