More accurate than polls alone, polling and search trends model is showing Biden with 343 EVs

by PluralVote

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Friday, Jul. 02, 2021 Friday, Jul. 02, 2021 at 2:18:28pm PDT

Based on polls and search trends, the Plural Vote presidential forecast model for the 2020 race forecasts Biden having a 76.4% probability of winning the Electoral College. This marks the highest probability that we have recorded since the forecast was launched on Friday, March 27th.

For this presidential race, our state projections depend primarily on polls to predict vote outcomes. In addition to polls, which constitute 2/3rds of our predictions, we incorporate a unique model based on "media partisanship". Its measurements are gathered from Google Trends in order to capture media polarization and thus predict vote margins. This search trends model tracks shifts in the relative frequency of searches for Fox News, Washington Post, MSNBC, New York Times, and Huffington Post. This model is combined with polling to form estimates of how each state will vote, which are more predictive of past election outcomes than polling averages alone.

For the purpose of transparency, in this article we are detailing our methodology in-depth. In addition, the source code for the Search Trends component of our statistical model (programmed in the R language) has been made available on GitHub. The source data for the comparison between polling averages and our estimates is also available on Github.

In 2016, polls alone had a correlation coefficient of r^2 = 0.910, whereas our model produces a higher correlation coefficient of r^2 = 0.941:

In 2012, polls had a correlation coefficient of r^2 = 0.954, whereas our model produces a higher correlation coefficient of r^2 = 0.964:

When our methodology is applied identically to the 2012 and 2016 elections, our state-by-state predictions prove more accurate than polling. Our search trends and polling model correctly predicted 46 out of 50 states in 2016 and 50 out of 50 in 2012. For reference, FiveThirtyEight called 45 out of 50 states in 2016 and 50 out of 50 correctly in 2012. As another point of comparison, raw polling averages called 46 out of 50 states correctly in 2016 and 49 out of 50 in 2012.

The mean absolute error of our model in 2012 across all states was 3.4 points. This represents 0.7 points less error than polls alone, which showed 4.1 points of error. In addition, the mean absolute error of our model in 2016 across all states was 4.3 points. This represents 0.2 points less error than polls alone, which showed 4.5 points of error.

In order to avoid overfitting, the methodology behind our model is neutrally devised and its retroactive predictions do not incorporate information available with the benefit of hindsight.

Below can be seen what our model would have predicted in the 2012 and 2016 races for each state. Empty rows indicate that the state lacked polling and was solidly Republican or Democratic.

Model’s predicted vote margins for 2012:

State	Search Trends + Polling	Mean Absolute Error
District of Columbia
Hawaii	-31.89	10.82
Vermont	-33.35	2.25
New York	-28.96	0.78
Rhode Island	-25.50	1.96
Maryland	-24.46	1.62
Massachusetts	-21.54	1.60
California	-18.87	4.25
Delaware
New Jersey	-15.53	2.28
Connecticut	-15.30	2.03
Illinois	-18.24	1.37
Maine	-16.00	0.71
Washington	-12.25	2.62
Oregon	-10.03	2.06
New Mexico	-5.12	5.03
Michigan	-5.29	4.21
Minnesota	-6.48	1.21
Wisconsin	-2.92	4.02
Nevada	-7.59	0.91
Iowa	-4.05	1.76
New Hampshire	-0.50	5.08
Pennsylvania	-6.38	0.99
Colorado	-3.49	1.88
Virginia	-6.93	3.06
Ohio	-1.93	1.05
Florida	-1.87	0.99
North Carolina	2.16	0.12
Georgia	7.59	0.23
Arizona	8.50	0.56
Missouri	10.35	0.97
Indiana	5.89	4.31
South Carolina	15.49	5.02
Mississippi
Alaska
Montana	11.90	1.75
Texas	13.55	2.23
Louisiana	16.04	1.17
South Dakota	13.96	4.06
North Dakota	16.27	3.36
Tennessee	20.10	0.30
Kansas	15.01	6.71
Nebraska	12.05	9.73
Alabama
Kentucky	12.16	10.53
Arkansas	19.71	3.98
West Virginia	14.71	12.05
Idaho	33.03	1.12
Oklahoma	24.00	9.54
Wyoming
Utah	36.36	11.68

Model’s predicted vote margins for 2016:

State	Search Trends + Polling	Mean Absolute Error
District of Columbia
Hawaii
California	-24.90	5.21
Massachusetts	-32.44	5.24
Maryland	-26.21	0.21
Vermont	-31.64	5.64
New York	-27.21	4.72
Illinois	-16.93	0.04
Washington	-16.70	0.47
Rhode Island	-18.08	2.58
New Jersey	-16.17	2.07
Connecticut	-17.33	3.69
Delaware	-26.86	15.49
Oregon	-14.15	3.17
New Mexico	-9.19	0.97
Virginia	-7.35	2.03
Colorado	-5.66	0.75
Maine	-8.26	5.30
Nevada	-2.09	0.33
Minnesota	-6.49	4.97
New Hampshire	-4.10	3.73
Michigan	-4.39	4.62
Pennsylvania	-4.20	4.92
Wisconsin	-4.33	5.10
Florida	-1.65	2.85
Arizona	4.16	0.66
North Carolina	0.82	2.84
Georgia	6.43	1.34
Ohio	3.61	4.52
Texas	11.72	2.73
Iowa	1.05	8.36
South Carolina	10.53	3.74
Alaska	8.20	6.53
Mississippi	17.94	0.14
Utah	17.86	0.22
Kansas	12.55	5.87
Missouri	10.44	8.07
Indiana	10.98	8.03
Louisiana	18.05	1.59
Montana	19.96	0.46
Nebraska	23.70	1.35
Tennessee	14.96	11.04
Arkansas	23.27	3.65
Alabama
South Dakota	15.96	13.83
Kentucky
Idaho	26.85	4.92
North Dakota
Oklahoma
West Virginia	26.29	15.78
Wyoming

The methodology we apply for our Search Trends model to predict outcomes in an election straightforwardly adheres to these following steps:

1) Retrieve Google Search Trends data for each state for five partisan-correlated media outlets (Fox News, Washington Post, MSNBC, New York Times, and Huffington Post) in the last three months of the previous election cycle.

2) Normalize the data by setting the minimum state value to 0 (the state with the highest frequency of Republican-associated searches for media outlets) and maximum state trend value to 100 (the state with the highest frequency of Democratic-associated searches).

3) Create an OLS Linear Regression to fit the search trends data of the previous election cycle for the five media outlets in order to predict the state-by-state election outcomes of the election cycle prior to the previous one.

4) Gather the prediction error for each of the regression’s prediction vs. the outcomes for the previous election cycle.

5) Retrieve Google Search Trends data for each state for five partisan-correlated media outlets (Fox News, Washington Post, MSNBC, New York Times, and Huffington Post) in the last three months of the current election cycle.

6) Normalize the data by setting the minimum state value to 0 (the state with the highest frequency of Republican-associated searches for media outlets) and maximum state trend value to 100 (the state with the highest frequency of Democratic-associated searches).

3) Apply the OLS Linear Regression used to predict outcomes for the previous election cycle to create election state-level predictions of the current election.

4) Subtract from each state the prediction error of the model in the previous cycle.

5) Normalize the predictions for the current election cycle by subtracting the median in order to set the median of the state-level vote estimates to zero.

The R code for the unique search trends portion of our estimates is available on GitHub. The prediction generated by this model is weighted 1/3rd of our state-level forecasts, with the remaining 2/3rds being polling. We average polls through a LOESS moving regression. Past polling error informs our modelling of the uncertainty of our predictions. We model the probabilities using the Beta, Weibull, and Logistic distributions. The Electoral College vote and probability for each candidate to win the majority of electors are estimated using 20,000 Monte Carlo simulations.

Feel free to follow @plural_vote for regular updates on the current electoral state of the 2020 race. Plural Vote updates daily its presidential and Senate race models.