View on GitHub

Voter Satisfaction Efficiency Simulator

By Jameson Quinn, statistician/theorist

This document looks long. Is there a shorter version?

See here.

What is Voter Satisfaction Efficiency?

Voter Satisfaction Efficiency (VSE) is a way of measuring the outcome quality a voting method will give. It relies on making various assumptions about what kind of voters and candidates are likely to occur, then running large numbers of elections that are simulated using those assumptions, and measuring how satisfied the average simulated voter is by the outcome in each election.

VSE is expressed as a percentage. A voting method which could read voters minds and always pick the candidate that would lead to the highest average happiness would have a VSE of 100%. A method which picked a candidate completely at random would have a VSE of 0%. In theory, VSEs down to negative 100% would be possible if a voting method did worse than a random pick, but in practice, most real-world voting methods — even horrible ones such as plurality voting — can at least beat that mark.

Previously, a similar idea as VSE was known as “Bayesian Regret” or BR. They are related by the following formula: VSE(method)=1-[BR(method) / BR(Random Winner)]

What are the philosophical underpinnings of VSE?

VSE is based on the idea of “utility”, which is what economists call it when they represent each person’s happiness or satisfaction by a single number. The idea is that voting methods are better insofar as they increase overall happiness/satisfaction.

This ethical framework, if taken to its logical conclusion, is called “Utilitarianism”, often summed up by the phrase “the greatest good for the greatest number”. But you do not have to subscribe to full-blown Utilitarianism in order to use VSE. Even if you think that there are some aspects of ethics that cannot be reduced to increasing a one-dimensional measure of overall satisfaction, you’d still probably agree that all else equal, increasing satisfaction is better than decreasing it, and increasing it for many is better than doing so for just a few.

Note that VSE uses the word “satisfaction” rather than “utility” for two reasons. For one thing, it’s just a more everyday word, and so easier to understand. But for another, it’s important to draw a distinction between a voter’s short-term satisfaction with a given election result, and the long-term utility that they actually derive from the candidate’s actions in office. We have no better choice than to trust the voter and assume that the former is a worthwhile measure of the latter; but that does not make them the same thing.

What assumptions are necessary in order to calculate VSE?

Any evaluation of voting methods involves some kind of assumptions. In the case of VSE, the assumptions are necessary in order to generate voters, candidates, and ballots many times over to run virtual elections. To do that, you need to make assumptions about the following:

Why is VSE a good measure of many aspects of voting method quality?

In the field of voting theory, there are many desirable criteria a given voting method may or may not pass. Basically, most criteria define a certain kind of undesirable outcome, and say that good voting methods should make such outcomes impossible. But it’s been shown mathematically that it’s impossible for a method to pass all desirable criteria (see: Gibbard-Satterthwaite theorem, Arrow’s theorem, etc.), so tradeoffs are necessary. VSE measures how well a method makes those tradeoffs by using outcomes. Basically, instead of asking “can a certain kind of problem ever happen?”, VSE is asking “how rarely do problems of all kinds happen?”.

If the voter model, media model, and strategy model are realistic for a particular context, then VSE is probably a good metric for comparing voting methods. If you find a method which robustly gets a relatively high VSE, across a broad range of voter, media, and strategy models, then you can be confident that it reflects the will of the voters, no matter what that will is. That’s democracy.

What does VSE not measure?

VSE cannot measure:

What are the various voter models that were used to get the VSE numbers below?

There are three basic models. The first two are simple but unrealistic; the third is more complex but hopefully more realistic.

Note: As a grad student in statistics, building and working with statistical models is my expertise, so I apologize for the inevitable technicalities in what follows. I’ll try to keep things as understandable for a non-expert audience as I can, but it’s always hard to find the right balance. It’s important to be transparent about how things work and why, but I don’t want to overwhelm you with technicalities.

  1. “Impartial Culture”: Each voter’s satisfaction for each candidate is an independent (normally-distributed) random number.
  2. “N-dimensional ideology”: Voters and candidates each have a location in n-dimensional “ideology space”. A voter’s satisfaction for a given candidate goes down linearly with the ideological distance between the two. Locations are normally distributed, using the same distribution for candidates and voters; and the standard deviations for each dimension follow an exponentially descending sequence such as 1, 1/2, 1/4, 1/8, etc. The number of dimensions is set so as to be large enough that further dimensions would be insignificant. Thus, the only important parameter is the rate of exponential decay; in the example sequence above, it’s 2.
  3. “Hierarchical clusters”: This is a complicated model, which combines the following aspects:
    • Issue dimensions, much as in n-dimensional ideology.
    • However, unlike in n-dimensional ideology these dimensions are grouped into “issue clusters”. Conceptually, one might imagine a cluster of social issues, a cluster of domestic economy issues, a cluster of foreign policy issues, etc.; although of course in the model, these are all merely numbers, and the labels have no impact.
    • The dispersion of individuals decreases, both from dimension to dimension within each cluster, and for the largest cluster dimension between clusters. This is similar to the exponential decay of the n-dimensional ideology above, but it is slightly random; the decay factors between each dimension and the next are numbers between 0 and 1, drawn from a beta distribution (which allows adjusting the average value and dispersion).
    • Within each issue cluster, voters are organized into “identity clusters” (assigned using the Chinese restaurant process). You might imagine that a certain voter was in a liberal cluster on social issues, a pro-Egypt cluster on foreign policy issues, etc. Another voter might share the same cluster on social issues but be in an isolationist cluster on foreign policy. A given voter’s identity clusters on different issue clusters are independent.
    • Each identity cluster has a mean, a standard deviation, and an overall level of caring on the dimensions in that issue cluster.
      • Technical: The standard deviation is based on the overall level of caring; higher for clusters that care less, lower for clusters that care more. The means are also chosen from a normal distribution, so that the sum of the squares of the standard deviation used to draw the cluster mean and the standard deviation of the individuals inside that cluster add up to a constant for each dimension; that way, if you don’t know the cluster’s mean, your best guess for where an individual would land on a dimension (the marginal distribution) would always be the same Normal distribution with the standard deviation associated with that dimension.
    • Technical: thus, this model has 5 parameters that can be usefully varied:
      1. The mean of the beta distribution for the decay of standard deviations of dimensions within a cluster.
      2. The mean of the beta distribution for the decay of standard deviations of dimensions between clusters.
      3. The “α” (alpha) parameter for all the Chinese restaurant processes, which determines the expected number and size of identity clusters for any given issue cluster. A high α leads to many similarly-sized identity clusters; a low α leads to most voters falling into a few dominant clusters.
      4. The mean of the beta distribution for how much voters care about each cluster.
      5. The dispersion (α + β) for all three beta distributions above.
    • The 5 parameters above are set to numbers that seem to produce realistic-seeming voter sets, and varied in order to see how robust each method is to different styles of electorate (different amounts of diversity of various kinds).

What are the advantages and disadvantages of each of the voter models used?

What are the strategy models used?

The strategy models include various mixes of the following possibilities:

Where’s the code?

The Center for Election Science’s GitHub

What voting methods do you test?

OK, can I see some results?

Click to go to interactive versions of the following graphs:

VSE

Effectiveness of strategies

Or, broken down by scenario type:

VSE Effectiveness of strategies

Those results are broken down by “Scenario type”. Why?

Classifying each simulated election into one of several “scenario types” makes it less important to get the precise voter model correct. Depending on the voter model used and on the parameters of that voter model, different types of scenarios will be more or less common. However, if you define “scenario types” with an eye to grouping elections with similar strategic considerations, the results for a given voting method within a given scenario type will be less dependent on the voter model details than the overall results are across types. The overall results will basically be a weighted average of the results by scenario type, and adjusting the voter model will mostly change the weights only, not the results within a type.

Could you explain the 6 scenario types you use?

The “type classifier” tries to fit each scenario into 5 types in order, labeling it with the first type that it fits. If it fits none of the 5 types, it’s labeled “other”.

The types are:

Can you summarize the outcomes?

See here.