I created an rudimentary OLS regression model to model the upcoming primaries. I created the model on Friday, and it predicted Saturday's elections with an error of 2% in Louisiana, 4% in Nebraska, and 2% in Washington. I then added the data to my model and reran my regression, and it forecasted Maine with an error of 3%.
Succinctly, here are my forecasts for Barrack Obama's vote percentage in the upcoming elections:
DC- 87.89 Maryland-68.91 Virginia- 60.41 Hawaii-52.83 WI-40.73
Granted, DC(and it's huge amount of black people and college graduates) may have broken my model. And Wisconsin is far out enough to still change.
Caveats and details below the fold...
I used the same model I used on my post earlier today, with a minor change. For those who have not read my earlier post, I ran an OLS regression on the primary results so far, so as to forecast later results. If you are unsure about the concept of regression, read Poblono's beautifully written explanation, and come back here.
I realized that the "Fundraising" variable, the dollar amount of money raised by a canidate in the state, kept changing after the election, so it was unreliable. After removing the variable, R^2, AIC, BIC, and HCQ improved.
Here is the new model
VARIABLE COEFFICIENT P-VALUE
BUSH2004-------------0.317906---0.00327
CAUCAS---------------16.9984----<0.00001</p>
Education------------1.11997---<0.00001</p>
EdwardsVote------(-1)*0.669123--<0.00001</p>
Immigrants-------(-1)*0.352012---0.04149
Baptists---------(-1)*0.255519---0.02379
Black-----------------0.810391--<0.00001</p>
with the new model statistics:
Sum of squared residuals = 39.6573
Standard error of residuals = 1.40814
Unadjusted R-squared = 0.956368
Adjusted R-squared = 0.943278
F-statistic (7, 20) = 62.6253 (p-value < 0.00001)
Akaike information criterion (AIC) = 101.002
Schwarz Bayesian criterion (BIC) = 110.073
Hannan-Quinn criterion (HQC) = 103.7
This model is of course, a work in progress, and I'd love to hear feedback or new ideas. Or if anyone wants to do this themselves, please email me for the dataset, I wouldn't want anyone to waste their time replicating work.
My current to-do-list:
- Test the "Racial polarization hypothesis". Some posters have suggested that Obama's relative failure among white people is only relevant in states that are racially divided(IE, contain a lot of black people).
The common example example is Idaho, where Obama won overwhelmingly, despite it's lily-white status.
There are a couple of ways to test this, and I will report back sometime tomorrow.
- Run a Probit regression: I have to admit, we only skimmed over the Probit regression model in my Econometrics class, but I've done some reading, and it looks like it can be done with some simple data conditioning. I want this done before Hawaii comes in.
- Use other nonlinear regression models: Other than some modifications on the Black variable, I'm not sure which other non-linear effects should be modeled. But I'm sure that the readers will come up with something. Post a comment and I will gladly test it.
Come back tomorrow to see a post detailing the relationship between Race and Obama's vote, and of course, to answer any questions I get between now and then.