Some supporters of Bernie Sanders are circulating this paper from two Stanford students as proof that the Clinton camp committed electoral fraud. It's an interesting piece of work, but a weak reed for the claims surrounding it.
One of the key arguments is that states differ in whether they have a paper trail for their electronic voting or not, and that Sanders did much better in the states that have the trail, where fraud is presumably harder to commit.
The basic issue is whether there are other explanations for the discrepancy they found, and whether they adequately investigated them.
The initial numbers look undeniably bad. Sanders won only 35% in states without a paper trail, but 51% in states with a trail. If the 51% is the more accurate number, because it’s insulated from fraud, then the implication is that Sanders would have won in a fair election.
The first obvious question is whether states differed in some other way besides paper or no-paper. The authors are aware of that, and after presenting the raw 51% vs. 35% figure, they do a regression where they can at least in part control for those other factors.
The two variables that they add, besides paper vs. no-paper, are:
The two variables that they add, besides paper vs. no-paper, are:
- Percent of non-Hispanic whites in a given state
- The “blueness” of the state (its history of voting for Democrats rather than Republicans in presidential elections since 1992).
These are both reasonable first efforts. Much was made of Clinton’s “firewall” among people of color, and Sanders’ better performance among whites, so the first control variable is getting at something real. The two candidates also differed in their appeal to long-time party regulars vs. people who don’t necessarily identify as Democrats. I don’t know how much the exit polling bears this out, but commentary suggested that Sanders appealed both to people who were critical of Clinton from the left, and to some whose sense of the system being rigged left them open either to Sanders or to Trump. So it makes sense to have some sort of variable for the party identification of a given state.
But there are problems with both of the variables they chose, and the clearer problem is with the race variable.
If Clinton’s popularity is higher among non-whites, the relevant number is not percent of non-whites in a state, but percent of non-whites among Democrats in a state. Even better would be percent of non-whites among people who actually voted in the Democratic primary in a state.
The way the authors seem to have defined their control variable, they under-measure Clinton’s people-of-color firewall. Your typical southern state has a large black population, but it also has a comfortable Republican majority, as measured by that party’s success in the region in Senate and presidential races, and this distorts the effect of the authors' variable.
Imagine two states that both have 30% people of color, and for simplicity, assume that all of them vote in the Democratic primary rather than the Republican. In State A, 60% of the population votes in the Democratic primary; therefore, the 30% of the state that are people of color make up 50% of the Democratic electorate. In State B, Democrats make up only 40% of the electorate, so people of color are 75% of the Democratic electorate.
The authors’ variable treats both states the same: they both have 70% non-Hispanic white. But if Clinton really does outperform Sanders among people of color, she should do better than him in State B in the Democratic primary.
For evidence that this mis-specification might matter, re-sort their list by percent of delegates won by Clinton, from highest to lowest. (To the authors' credit, the make their data easily accessible to others who want to dig deeper.) The highest states are all ones where we might expect the “firewall” effect to be strongest: Mississippi, Alabama, S. Carolina, Louisiana, Georgia, Arkansas, Texas, Florida, Tennessee, Virginia, Maryland, Delaware. Simply put, the South went for Clinton. Except for Alabama and Maryland, none of those states had a paper trail. So if the authors’ race proxy is under-measuring Clinton’s firewall, they’re going to inaccurately attribute a significant effect to the lack of a paper trail.
Also, note Alabama in 2nd place there: 83% to Mississippi’s 86%, and way ahead of S. Carolina’s 74%. The authors’ hypothesis is that a paper trail prevented Clinton from stealing votes, but she did pretty damn well in Alabama, even with a paper trail.
I’m less clear on whether the “blueness” variable is doing what it’s supposed to. It might be, but I don’t understand the connection between the nature of the specifically Democratic electorate and whether a state as a whole tends to vote for Democratic presidential nominees.
Lastly, even if the authors’ work shows evidence of electoral fraud, it doesn’t show evidence of enough fraud to have changed the outcome.
With their two controls, they seem to show that having paper trails cost Clinton 9.4% of delegates. In other words, if states that don’t have a paper trail had had one, she would have done 9.4% worse than she did. That means she would have only won 56% in those states. If you combine that with the 49% from the states that really did have a paper trail?
She still wins.
The authors looked at the percent of delegates won in each state, rather than the total number of delegates. That’s a reasonable approach for what they were doing, looking for evidence of electoral fraud. But if you want to know whether correcting for the alleged fraud would have made a difference, you have to match the percentages back up with the numbers of delegates.
In fact, even in the paper-trail states, where Sanders averaged 51% of delegates on a state-by-state basis, he only won 48% of them of them in total.
How is that possible? It's because Clinton won big states like New York, Ohio, and (barely) Illinois; the states that Sanders won were, on average, smaller. So if you average his delegate rate treating all states as equal, you get 51%, but if you count the actual delegates he won, it was only 48%.
In the no-paper-trail states, Clinton won 65% of the delegates, and when you lower her winning percentage state-by-state, she only won 55%.
In other words, she still wins.
(I got the numbers of delegates per state from here. I subtracted the number of super-delegates from each state's total to get the ones that were determined by primaries. I don't know if that's exactly right, but it's probably pretty close.)
Now, I don’t mean to excuse electoral fraud. As I explained above, I think this paper’s evidence of fraud is weak, but if it holds up under further scrutiny, then there’s a problem. (It’s stupid not to have paper verification. I don’t know to what extent states make that decision vs. parties making the decision; whoever’s doing it, it’s stupid.)
My point is simply that, on the evidence of these statistics, even if fraud occurred, Clinton would have won without it.
And lastly (this time I mean it), the authors wrote before the California primary. According to Ballotpedia, California has paper verification, and Clinton won 56% of its delegates. California has by far the most delegates of any state, so including them noticeably weakens the Sanders case.