Can Artificial Intelligence Predict the U.S. Presidential Election?
A Machine Learning Exploration
As a lifelong registered independent voter, I've cast my ballot for both the Republican and Democratic parties. My data-driven perspective on U.S. politics has often left me indifferent, even apathetic, about voting. Many have sought my insights on predicting the U.S. Presidential Election this November, and here's my unique, data-driven approach to tackling the complexity and recent uncertainty of the U.S. presidential race.
Identify Low-Quality Data and Ignore It
Mostly Useless
1. National Polls are not a good predictor of voter preferences and can only be a limited contributing factor two months before the election. Any National Poll three months before the election and further away on voting day is useless. Voters, no matter how small in number, can change their minds.
2. Economic data, like National Polls, only matter two months before voting day, but they are not good predictors. The unemployment rate, inflation, and other financial conditions can all contribute to a good prediction to some extent, but most of the time, it does not.
3. Candidate Personality Traits or Scandals do not resonate with U.S. voters, and plenty of Data Scientists, Statisticians, and Academics have tried to isolate this variable with no luck after several decades of work. Since 1960, the American National Election Studies have statistically measured Candidate Personality Traits or Scandals, and it always has very limited or no contributing factor to the outcome of the U.S. Presidential Race. Partisanship, Policy over Personality, and many other reasons have led to not caring about this issue. No, this is not a feature in the U.S. alone. It is a more human condition based on several data science studies globally.
Completely Useless
4. Endorsements, crowd sizes at rallies, social media metrics, the number of yard signs and bumper stickers, and even some unscientific polls do not translate to voter results or represent the broader electorate. Refrain from ignoring bad samples of likely voters.
Identify High-Quality Data and Use It for Machine Learning
Very Useful
1. State-level polling in swing states near the election is more predictive due to the Electoral College system.
2. Economic Indicators like unemployment rates, inflation, and growth figures near the election can influence voter sentiment, especially in 2024.
3. Presidential approval ratings, especially in contested states, are a strong indicator, and in 2024, this is for both candidates this time.
Mostly Useful
4. Voter registration increases and changes in demographic trends, especially in the swing states, can signal shifts in voting patterns.
5. Substantial fundraising numbers can indicate enthusiasm and an increase in likely voters.
6. Early voting on the morning of election day and mail-in ballot data may provide some insight.
Exogeneity
External effects or exogenous shocks can cause significant difficulties in predicting the outcome of the U.S. Presidential Election of 2024. A health pandemic, a natural disaster, a massive U.S. Stock Market Crash, a major international conflict leading to a war, a (domestic) terrorist attack, a cyber-attack, or another political scandal near the election. These events can rapidly shift public opinion, alter campaign strategies, and change the political landscape.
Endogeneity
Internal interference is the most significant risk in predicting the outcome of the U.S. Presidential Election of 2024. The invalidated outcome of a winner to argue a false positive can take many forms and is the most troubling. Electoral College disputes of state’s contested irregularities, Congressional Certification Challenges, and last-minute voting law changes, to name a few. Voters want a true positive, but a small group may attempt to escalate to a coup d’etat in the United States of America. I hope not.
Model Framework
No single model exists in isolation that can predict the outcome of the U.S. Presidential Election 2024. Models are representations of reality, and many models in data science could be better at predicting. However, if that reality is complicated and uncertain, move to a multiple-model dashboard with an aggregated narrative.
A combination of linear, instance-based, ensemble, and neural networks is all promising as a classification model for predicting with accuracy. However, predicting the win or loss is difficult because of the current uncertainty of the new candidate in the Democratic party, not to mention the shorter time horizon until voting day. The best methodology is to gather more data and evidence to think in probabilities, so think Bayesian.
Conclusion
So, back to the original question: Can AI predict the U.S. Presidential Election? Ask me on October 4th, 2024. Even then, I may not care because I have been indifferent all my life.
If you would like to learn more, please schedule a discovery call . . . or leave a comment.
Individual AI Accelerator for (non) technical executives and directors.
AI Incubator for Corporate Team Training for both business and technology groups.
By John Thomas Foxworthy
M.S. in Data Science from a Top Ten University w/ a 3.80 GPA or the top 5%
Veteran Data Scientist with his first Data Science Model in 2005
Freelance Artificial Intelligence Consultant for a Start-Up as of February 2024
Deep Learning Artificial Intelligence Instructor at UCSD Extended Studies
Master Instructor at Caltech’s Center for Technology & Management Education for Artificial Intelligence, Deep Learning, and Machine Learning