Search This Blog

Saturday, September 19, 2020

More on gerrymandering

 I have done some analysis on the variance between the popular vote vs. seats held in a legislative body for a few states across the 2018 and 2016 elections.  I've tries to find if I can download the data in some format and while some of the states do provide some files that can be downloaded, it has been easier to create a spreadsheet and enter the values myself.  As a result it is rather slow to develop the data, but already, for comparison I have found a few interesting results.  See the graph below (made with matplotlib, let me know if you are interested in the code).  This plots a number of elections with the Kg coefficient that I have calculated on the x-axis and the number of seats in the legislature on the vertical axis.  For the state senates, I have used the number of seats in the senate even though generally only half the seats are up in any general election.  Then I plotted 2 lines.  The green line is where the marker would be if the overall Kg reflected a variance of 1 seat between the popular vote and the seats held.  The yellow line is where the marker would be if the Kg reflected a variance of 2 seats.

Graph of elections

Some of the interesting things that we see here is that most legislative bodies seem to reflect some bias.  This does not identify which party benefits.  It just identifies if there is a bias.  Some bodies do have a pretty high level of bias.  For a number of these bodies I included a point for 2016 as well as 2018.  The direction of change cannot be predicted and seems to be a function of what motivates voters to go vote more than a function of the map.  While setting up a map overall can tilt the playing field, judging on the change in Kg from election to election, shows that migration, candidate quality and voter motivation can all influence the results, for example note the movement of the Idaho Senate and House between 2016 and 2018.  One moved in the direction of more gerrymandering and the other in the direction of less.  In compiling the statistics, I was struck by how many districts (about 20%) only had a single candidate.  This kind of situation clearly depresses voting based on the results that were reviewed.  In a district where there were multiple candidates, the number of votes cast was in some cases double.  For example, in Oregon, when there was only 1 candidate the number of "other" votes, typically write-ins and possibly spoilt ballots, was 10 times when a candidate from both the democratic and republican parties was on the ballot.
The next thing I am going to investigate is the change over time of the Oregon House and Senate, especially if I can see if there was a marked change as a result of the decennial redistricting.


Tuesday, September 15, 2020

Definition of a Gerrymandering coefficient using Oregon voting data 2012-2018

Abstract

Drawing of the lines of electoral districts is difficult work.  It is also one where an advantage can be built in systematically to favor a political party.  This systematic bias is called gerrymandering.  It creates a difference between the aggregate vote for each party and the number of seats held in a legislative body by each party.  It can be done in multi-party or two-party systems.  We will briefly look at some previous work that has been done to measure the level of gerrymandering.  This post introduces a way to quantitatively compare the level of gerrymandering by way of calculating a gerrymandering index [Kg] to be able to compare different jurisdictions.  In the USA, since district lines are drawn by a state body, it makes sense to look at this at the state level.  We will calculate the Kg for Oregon across a number of different elections and draw conclusions about the fairness of how the district lines are drawn.

Theortical considerations

An assumption of a fair electoral system is that the population, as a whole, gets a legislative body that closely mirrors the overall vote for the various parties that were running representatives.  This is represented by equation (1).







Where Vtotal is the total number of ballots cast in the election, Vi is the number of ballots cast for party i, Stotal is the total number of seats in the legislature, Si is the number of seats won by party i.

We can calculate the difference between the actual result in the proportion between parties and the theoretical result as shown in equation (2)

In the case where there is perfect agreement between the aggregate vote totals for each party and the number of seats won, this value is close to zero.  The maximum value is n, the number of parties.  One of the limitations is that the number of seats being considered is a small integer and as a result there is not a level of granularity to be able to get Kg smaller than n/2Stotal.

Previous work that has been done on this topic include:

https://www.washingtonpost.com/wp-srv/special/politics/gerrymandering/

http://www.svds.com/gerrymandering/

One of the metrics used in the previous research is "squiggliness" of the borders of districts where they are compared based on the variance from regular shapes.  I think this is unrealistic.  Borders and coastlines are ragged in many places and population density varies.

The Silicon Valley Data Science site has aged data and it is calculated for only the 2012 general election.  But it does use a similar methodology to determine the discrepancy between the number of seats vs. proportion of the vote on a state by state basis as overall nationally.  (It reports nationally, that for the US house of representatives, the republicans have 17 more seats than proportional representation would award them on a national scale but when considered on a state by state basis, the discrepancy is only 4 seats in favor of the republicans.)

Analysis of Oregon voting and results

I've calculated the Kg for the US house seats in Oregon, the Oregon Senate seats and the Oregon House seats based on data obtained from the Oregon Secretary of State for the 2012 through 2018 elections.  The total number of votes for Democratic candidates was summed.  Similarly for Republicans and finally, all of the other candidates as well as write-in votes was totaled for a third category.  Then the seat allocation by party was compared with the proportion of the overall vote.  From the difference in this comparison, the coefficient Kg was calculated.  The results are summarized in the graph below.

Conclusion and reflection

The US house races have the highest Kg values due to the small number of seats.  There isn't really any specific time trend in the values.  As you might imagine the imbalance was mostly in the Democrats favor, but in one case, the imbalance was in favor of the Republicans.  Comparing the US house vs. OR senate vs. OR house, it seems that the OR senate districts have the least amount gerrymandering since with 30 seats, you would expect higher values than the house, which has 60 seats.  More data needs to be gathered.  Analysis of years prior to 2012 will give insight as to whether the redistricting after the 2010 decennial census tilted the political field or not.  Analysis of other states will help calibrate the interpretation of Kg for the number of districts being considered.

In terms of seats, the US house of representatives is actually 1 seat in favor of the democrats.  The Oregon House of representatives is tilted 4 or 5 seats in favor of the democrats.  However the Oregon Senate has changed over time having 1 or 2 seats in favor of both parties at different times.

Having delved into this data so deeply has made me realize how simplistic the representation of Oregon as a Democratic haven is.  There are some areas that are have a large majority of supporters of one of the parties, but there are in fact a lot of areas which are quite close and the "Other" vote total was greater than the difference between the democrat and republican candidates.  Since the collapse of the center and the proliferation of litmus test issues, it does raise serious questions how government can be made to work to find compromises between the two factions and avoiding the tyranny of the majority.

Friday, September 11, 2020

We need a Universal Borrowing Privilege for all Oregon Libraries

Knowledge is power.  Libraries, especially free public libraries, are a source of knowledge and we need to do everything possible to encourage their use.  I am advocating that Oregon adopt a Universal Borrowing Privilege similar to the one in place in California.  Effectively, this allows every Oregon resident of a district which has a public library to be able obtain borrowing privileges from any other Oregon public library.  Since many of the resources loaned by libraries are now electronic, allowing residents to get library cards at other libraries can smooth out demand for electronic resources.  Also, if you happen to live on the edge of a jurisdiction, this would make it possible to access the library closest to you if it happened to be in another jurisdiction.  Further, this can allow a library to justify establishing a narrowly targeted collection since this now may be accessed by all Oregon residents.  Already, the Inter-Library Loan system does something similar with physical books and a number of libraries have established reciprocity agreements amongst themselves.  The proposed change would simply make these reciprocity agreements universal across all of Oregon automatically.  This can be simply enacted by adding to ORS section 357 wording similar from California Law Sections 20204 and 20205, with just a couple of editorial change, reproduced below:


ยง20204: (a) Public libraries participating in direct loan programs under this Act shall not charge any fee to non residents for borrowing privileges.

(b) Reserves and interlibrary loan requests shall be accepted by the participating public library under the same rules and policies applied to local residents.

(c) All procedures governing registration of borrowers shall apply equally to residents and non-residents.

(d) All materials normally loaned by a participating public library are available for loan to non-residents under the same rules and policies applied to local residents.

(e) All loan and return rules governing circulation apply equally to residents and nonresidents. If overdue material are returned to a library other than the library from which borrowed, fines may be paid to and retained by the library to which the return is made. Payments for lost or damaged materials are payable to the lending library, and are to be forwarded by the library to which payment is made.

(f) Special loan privileges extended by the participating public library to teachers and other groups within its jurisdiction need not be extended beyond the jurisdiction.


ยง20205: An eligible non-resident borrower must be a resident of California,

(a) Hold a valid borrowers card issued by their home library, or

(b) Hold or obtain a valid non-resident borrowers card issued by any California public library, or

(c) Hold a valid state borrowers identification card issued by any California public library;

(d) And present any additional identification normally required by a library of its own residents.

(e) Nothing in this section shall prevent the issuing of a non-resident card or charging of fees to a resident of another state, except that loans to such non-residents shall not be counted as reimbursable transactions.


I strongly urge that you write to your State Senator, State Representative and the Governor of Oregon to express your support to pass legislation to give Oregonians a Universal Borrowing Privilege.  Here's a link where you can find their email addresses:

https://www.oregonlegislature.gov/FindYourLegislator/leg-districts.html

Saturday, April 18, 2020

Death rates

In the news, we are told that Covid-19 has killed 37,841 people as of 4/18/2020.  While the rate has varied, it seems to be about 2100-2200 people per day.  So let's get some context around that.
The population of the US is approximately 330 million people.  The average life expectancy is 78.5 years.  This means that each year, 4.2 million people die.  That is 11,517 people per day.  So what we are saying seems to be that an extra 20% of death is being handed out?  Not necessarily.  Later, government statistics will come out to determine whether Covid-19 was killing people in addition to normal causes of death, or rather instead of normal causes of death.
Repeating the calculation for Oregon, with the same life expectancy and a population of 4.2 million, normally 147 people die each day.  Currently, the number of deaths from Covid-19 seems to be about 4 or 5 people per day.  This is a much lower rate in comparison to the national picture.  Here one might argue that it is much more likely that people are being killed by Covid-19 rather than some other causes such as traffic and gun accidents because of social distancing measures.  In other words, probably Covid-19 is not adding to the death toll, people are dying at the same rate as before but for different reasons.  Unfortunately, data compilation, validation and publishing take time, so resources such as CDC's Wonder are not going to answer just what kind of effect Covid-19 has had on the country until later.
Just also want to note that I have way simplified the subject.  There are regional and seasonal variations.  The short term effect of social distancing may be very hard to determine vs. the effect of the Covid-19 infection rate.  Assumptions of paribus ceteris may be invalid for many reasons.  Also, it is not clear that we have reached the peak of the mortality curve for the US or the deaths from Covid-19 could possibly remain level for some weeks.  In the case of a doubling of the daily death rate or a sustained high level, then clearly the virus is adding marginally to the death rate and the life expectancy will go down when they calculate the statistics.  For example, if we assume that 2200 daily deaths are indeed on top of the normal national death rate, then this would reduce the life expectancy to 65.9 years which is a pretty significant reduction.
Again, I want to stress that we won't be able to figure out the actual effects until all the data has been gathered and analyzed but it sure is fun to speculate about it.

Saturday, April 11, 2020

Random Alien Writing

I've been thinking about this for some time and finally found some time to write the program.  The program is at the bottom of the post and written with Python 3.8.  Basically, it creates a canvas filled with strange looking writing.  I was inspired by some of the examples of "alien" writing used in various sci-fi movies and TV shows.  I thought to myself, should be easy to create a program that can create examples of this kind of writing.  I re-used the basic framework I had developed for the program to simulate cellular automata. (I have re-used the framework for some other bits I've made for myself, not (yet) posted here and it is becoming one of my favorites.)  My first attempts, frankly, looked like white noise on a screen.  Over a couple of days I refined the parameters to get it closer to what the brain will recognize as having some semantic content and organization and the current version is pretty good.  I've used it to generate the background I've added to the blog.  I hope you like it!  I'm not going to go into a detailed explanation of the program elements, but if you have questions about it, post a comment.

The Program




from tkinter import *
from random import randint

class App:
    """This class is derived, I think, from a TK class and is designed to
    work with the TK environment.
    Requires imports from Tkinter, math, and random"""
    def __init__(self, master):
        """This class is the whole of the application """
        # Necessary Frames
        frame = Frame(master)
        control_frame = Frame(frame)
        control_frame.grid(row = 0, column = 0, sticky = N)
        canvas_frame = Frame(frame)
        canvas_frame.grid(row =0, column = 1)

        #Application variables
        self.char_pts = [(1,1),(7,1),(14,1),(1,7),(7,7),(14,7),(1,14),(7,14)
                         ,(14,14)]
        self.chars = []
        self.text = []
        self.tmpstr = '\nFor another,\nHit Randomize'
        
        #Control FrameWidgets
        self.lab = Label(control_frame, text=self.tmpstr)
        self.lab.pack(side = TOP)
        self.b1 = Button(control_frame, text='Randomize')
        self.b1.config(command=self.randomize)
        self.b1.pack()
        
        # The Canvas
        self.canvas = Canvas(canvas_frame, width = 800, height = 800)
        self.canvas.config(bg='white')
        self.canvas.pack()
        frame.pack()

        #Menu
        menubar = Menu(root)
        menu_1 = Menu(menubar, tearoff=0)
        menu_1.add_command(label='Quit',command=root.destroy)
        menubar.add_cascade(label='File', menu=menu_1)
        master.config(menu=menubar)
        self.randomize()

    def randomize(self):
        self.canvas.delete(ALL)
        # create characters
        self.chars = []
        for j in range(10):
            new_char = []
            b = randint(0,8)
            for i in range(randint(2,7)):
                a = b
                b = (a + randint(1,8))%9
                new_char.append((a,b,randint(1,2)))
            self.chars.append(new_char)
        # create text, in 2-7 letter long words.
        tl = 0
        self.text = []
        while tl <3000:
            wl = randint(2,7)
            for k in range(wl):
                self.text.append(randint(1,len(self.chars)-1))
            tl = tl + wl + 1
            self.text.append(0)
        # now write the text using the characters
        i = 0
        j = 0
        for char in self.text:
            if char != 0:
                for seg in self.chars[char]:
                    start = seg[0]
                    end = seg[1]
                    wt = seg[2]
                    x1 = i*15+self.char_pts[start][0]
                    y1 = j*17+self.char_pts[start][1]
                    x2 = i*15+self.char_pts[end][0]
                    y2 = j*17+self.char_pts[end][1]
                    if x1==x2 or y1==y2:
                        self.canvas.create_line(x1,y1,x2,y2,width=wt)
                    else:
                        self.canvas.create_arc(x1,y1,x2,y1,width=wt)
            if i == 50:
                j = j + 1
                i =0
            else:
                i= i + 1
        self.canvas.update()

if __name__ == '__main__':
    root = Tk()
    root.wm_title('Alien Text')
    app = App(root)
    root.mainloop()
     


Thursday, April 9, 2020

About "random" numbers

It is an interesting thing that if you were to ask some person to give you a "random" 6-digit number, they would tend to try really hard to not repeat digits.  People have a tendency to think that random number do not have repeating digits.  But an analysis of these numbers shows that, in fact, for longer numbers, it starts to be more likely that there are repeated digits than not.  Below is a table of the number of digits and the likelihood that there are repeating digits.
(argh, excuse me but there is not a control to create a table!!)
Number of digits vs. likelihood of repeated digits:

  1. None
  2. 9.99%
  3. 28%
  4. 50%
  5. 70%
  6. 85%
  7. 94%
  8. 98%
  9. 99%
  10. nearly 100%
So the psychology going on here is that people have a sense in 2 digit and maybe 3 digit numbers that repeated digits in a number are rarer than unique.  However, in reality, when you get to 5 and 6 digit numbers, most numbers have repeated digits.
The math of permutations and combinations is explained here .
The python math module functions are explained here.  The ones that are relevant are the factorial, comb and perm functions.  I wrote some code to re-create these.  I found out that after 10 levels of recursion, Python gives you an error message.  So I had to use the math module version of the factorial function.

# This defines the permutation function
# and the combination function

from math import factorial

def perm(n,r):
    return factorial(n)/factorial(n-r)

def comb(n,r):
    return factorial(n)/factorial(n-r)/factorial(r)

for i in range(2,11):
    print(i,1-perm(10,i)/10**i)

Also, if you don't believe the math, I also wrote a piece of code that does a brute force calculation.  It does take some time to work its way through all the different combinations.  And it will not give the exact same results since it is not considering the numbers that start with 0.
 
# This program prints out how many of the numbers in a range
# have repeating digits

n = 2
flag = True
while flag:
    test = 10**(n-1)
    count = 0
    print(test)
    while test < 10**n:
        a = str(test)
        dupe = False
        for i in range(10):
            if a.count(str(i))>1:
                dupe = True
        if dupe:
            count = count + 1
        test = test +1
    ratio = count / 10**n
    print(n,count, ratio)
    ask = input("continue? (n to end) >")
    if ask == 'n':
        flag = False
    n = n+1
So it's an interesting bit of psychology so I hope you enjoy it.             

Thursday, December 26, 2019

Social Security Administration Baby Name Database

Introduction

At this link, you can access the SSA's so-called Baby Name data broken down both nationally and by state.  Each file is a zip file.  When you unzip the State file, it gives you a text file named for each 2-letter state code.  These files use a comma separated values format and each line includes, the state (which seems to me to be redundant), the sex, the year, the name and number of births.  The national data files are broken down by year.  In each year's file, again formatted in CSV, it includes the name, the sex and the number of births.  I downloaded these files and decided I wanted to make a program that would show a graph for a given name, for a given locale over the years.  In the program, I used Python's CSV module to read the file data into lists of lists.  One thing is that CSV defaults every field to strings so you have to convert things to floats.

The UI

So the UI has been made with Tkinter.  It consists of an Entry for the name, a couple of radio selectors for sex, a OptionMenu for the state, a button to generate the graph and a bunch of labels.
I used an Entry for the name information.  For the sex, so as to reduce user entry errors, I used radio buttons.  Again, to ensure data validity in calls to subroutines, I used an OptionMenu object (usually called a dropdown).

The Action

Basically, everything starts after the user hits the Graph button.  This then calls the calc function.  The calc function figures out whether the user selected a national or state level request.  If it is national, then the code calls the national function, if it is state level then it calls the bystate function.
Both the national and bystate functions go and read the appropriate files to build two lists which have corresponding x and y values to be graphed.  These both then call the graph function which uses PyPlot API calls to draw and display the graph in a separate window.  Typical results are below and of course, there's no reason why I would have picked the name Martin for the examples.


The Code

## Code to process downloaded national name data - Using GUI

from tkinter import *
from tkinter import ttk
import matplotlib.pyplot as plt
import csv

filelocation = ##INSERT STRING OF WHERE YOUR FILES ARE

## This function reads the information from the national files.
## Data is Name, M/F, number of births
def national(the_name,sex):
    xdata=[]
    ydata=[]
    for year in range(1880,2019):
        file = open(filelocation+"yob"+str(year)+'.txt', newline='')
        r = csv.reader(file)
        for row in r:
            if row[0]==the_name and row[1]==sex:
                xdata.append(float(year))
                ydata.append(float(row[2]))
        file.close()
    graph(xdata,ydata,the_name,sex,"USA")


## This function reads the information from the State files.
## Data is State,Sex (M/F), Year,Name, Number of births
def bystate(the_name,sex,state):
    xdata=[]
    ydata=[]
    file=open(filelocation+state+'.txt',newline='')
    r = csv.reader(file)
    for row in r:
        if row[3]==the_name and sex==row[1]:
            xdata.append(float(row[2]))
            ydata.append(float(row[4]))
    file.close()
    graph(xdata,ydata,the_name,sex,state)
            
## This part creates the graph using Matplotlib (plt)
def graph(xdata,ydata,the_name,sex,state):
    fig,ax = plt.subplots()
    line1, = ax.plot(xdata,ydata,label=the_name)
    ax.legend(loc='upper left')
    ax.set_title('Births per year in '+state+': '+the_name+' ('+sex+')')
    plt.ylabel('Number of births')
    plt.xlabel('Year')
    plt.show()

def calc():
    mf = ['M','F']
    if sel_state.get()=='USA':
        national(getn.get(),mf[sel_sex.get()])
    else:
        bystate(getn.get(),mf[sel_sex.get()],sel_state.get())
## Below is the data for the state/national selection dropdown
states=['USA','AK','AL','AR','AZ','CA','CO','CT','DC','DE','FL','GA',
        'HI','IA','ID','IL','IN','KS','KY','LA','MA','MD','ME',
        'MI','MN','MO','MS','MT','NC','ND','NE','NH','NJ','NM',
        'NV','NY','OH','OK','OR','PA','RI','SC','SD','TN','TX',
        'UT','VA','VT','WA','WI','WV','WY']
##Below is the set up for the GUI window
root = Tk()
content = ttk.Frame(root)
frame = ttk.Frame(content)
sel_state = StringVar()
sel_sex = IntVar()
lblinstr = ttk.Label(content, text="Enter Name and location")
getn= ttk.Entry(content, text="Name")
male= ttk.Radiobutton(content, text='Male', variable=sel_sex, value=0)
female=ttk.Radiobutton(content, text='Female',variable=sel_sex,value=1)
ok = ttk.Button(content, text="Graph", command=calc)
rlbl = ttk.Label(content, text="Enter Name")
slbl = ttk.Label(content, text="Enter Sex (M/F)")
stlbl = ttk.Label(content, text="Select USA or state")
statedd = ttk.OptionMenu(content, sel_state, *states)
##Below, we put the GUI together
content.grid(column = 0, row = 0)
frame.grid(column=0, row=0)
lblinstr.grid(column=0, row=0)
rlbl.grid(column=0, row=1)
getn.grid(column=1, row=1, sticky=N)
slbl.grid(column=0, row=2)
male.grid(column=1, row=2, sticky=N)
female.grid(column=2, row=2)
statedd.grid(column=1, row=3)
ok.grid(column=0, row=4)
## And run!!
root.mainloop()