Exploring the AustralianPoliticans R Package
Australia federated in 1901.1 Rohan Alexander is unusually interested in the history of Australian politicians, and he decided to convert some of his knowledge into an R package, the appropriately named, AustralianPoliticians
. In brief, the package has datasets that contain information on every person who has ever sat in the House of Representatives (MPs) or the Senate since 1901. This post is a shameless plug for that package,2 and shows you how to read in and play around with the data.
Install the package and load in the data
First, let’s load in some packages we need and install the AustralianPoliticians
package. It’s not on CRAN so you’ll need to install it from GitHub.
library(tidyverse)
library(lubridate)
devtools::install_github("RohanAlexander/AustralianPoliticians")
The AustralianPoliticians
package has a series of datasets built into it. Let’s read in the main dataset all
and the MP and Senate datasets:
all <- AustralianPoliticians::all %>% as_tibble()
by_division_mps <- AustralianPoliticians::by_division_mps %>% as_tibble()
by_state_senators <- AustralianPoliticians::by_state_senators %>% as_tibble()
The README on GitHub has good explanations of what each dataset contains. Briefly, the all
dataset contains one row for each politician, and has information on their name, gender, date of birth, date of death, Wikipedia page etc. The by_division_mps
and by_state_senators
datasets have info on which electoral divisions / states each politician held. Note, these can change over time, so there can be more than one row/observation per politician. There’s dates the positions were held, the reason why the position ended (defeated, resigned, died etc), and other interesting info. The tables are easily joined the the all
dataset based on the uniqueID
column. There are other datasets available based on party and whether or not the person was a Prime Minister.
Deaths of Australian politicians
Because I’m a demographer, and a fun sort of person, I wanted to look at the mortality of politicians. The following bit of code calculates the age of death for all those who have died, as well as the year and age they were first elected:
deaths <- all %>%
rowwise() %>%
# some people only have a birth year available, let's arbitrarily say they were born in the middle of the year
mutate(birth_final = as_date(ifelse(is.na(birthDate),
ymd(paste(birthYear, 06, 30, sep="-")),
birthDate))) %>%
select(uniqueID, displayName, deathDate, birth_final) %>%
# calculate age at death
mutate(age_at_death = interval(birth_final, deathDate)/years(1)) %>%
# filter(!is.na(age_at_death)) %>%
# join on MP and senate info
left_join(by_state_senators) %>%
left_join(by_division_mps) %>%
group_by(uniqueID) %>%
# just keep the initial election
filter(row_number()==1) %>%
mutate(year_first_active = ifelse(is.na(senatorsFrom), year(mpsFrom), year(senatorsFrom)),
age_active = ifelse(is.na(senatorsFrom),
interval(birth_final, mpsFrom)/years(1),
interval(birth_final, senatorsFrom)/years(1)),
birth_year = year(birth_final)) %>%
ungroup()
deaths
## # A tibble: 1,776 x 22
## uniqueID displayName deathDate birth_final age_at_death senatorsState
## <chr> <chr> <date> <date> <dbl> <chr>
## 1 Abbott1… Abbott, Ri… 1940-02-28 1859-06-30 80.7 VIC
## 2 Abbott1… Abbott, Pe… 1940-09-09 1869-05-14 71.3 NSW
## 3 Abbott1… Abbott, Mac 1960-12-30 1877-07-03 83.5 NSW
## 4 Abbott1… Abbott, Au… 1975-04-30 1886-01-04 89.3 <NA>
## 5 Abbott1… Abbott, Jo… 1965-05-07 1891-10-18 73.6 <NA>
## 6 Abbott1… Abbott, To… NA 1957-11-04 NA <NA>
## 7 Abel1939 Abel, John NA 1939-06-25 NA <NA>
## 8 Abetz19… Abetz, Eric NA 1958-01-25 NA TAS
## 9 Adams19… Adams, Jud… 2012-03-31 1943-04-11 69.0 WA
## 10 Adams19… Adams, Dick NA 1951-04-29 NA <NA>
## # … with 1,766 more rows, and 16 more variables: senatorsFrom <date>,
## # senatorsTo <date>, senatorsEndReason <chr>, senatorsSec15Sel <int>,
## # senatorsComments <chr>, mpsDivision <chr>, mpsState <chr>,
## # mpsEnteredAtByElection <chr>, mpsFrom <date>, mpsTo <date>,
## # mpsEndReason <chr>, mpsChangedSeat <int>, mpsComments <chr>,
## # year_first_active <dbl>, age_active <dbl>, birth_year <dbl>
So what proportion of all politicians have died? Almost 56%:
sum(!is.na(deaths$age_at_death))/nrow(deaths)
## [1] 0.5579955
Let’s look at the proportion of politicians who have died by birth year:
deaths %>%
group_by(birth_year) %>%
summarise(proportion = sum(!is.na(age_at_death))/n()) %>%
ggplot(aes(birth_year, proportion)) +
geom_point() +
theme_bw(base_size = 12) +
ggtitle("Proportion of politicians who are dead by birth year")
So all politicians born before 1916 are now dead. In contrast, no politicans born after 1963 has died so far. The oldest politician is George Pearce, who is almost 102:
deaths %>%
filter(is.na(age_at_death)) %>%
arrange(birth_year) %>%
filter(row_number()==1) %>%
mutate(age = interval(birth_final, today())/years(1)) %>%
select(displayName, age)
## # A tibble: 1 x 2
## displayName age
## <chr> <dbl>
## 1 Pearce, George 102.
Average age at death by cohort
Let’s look at the average age of death of these politicians and compare it to the national average. I got the national data from the Australian Institute of Health and Welfare’s website. The indicator is \(e45+45\) for males, which is the expected age at death for those who lived at least until age 45. I didn’t want to compare to the usual life expectancy at birth, because we know that politicians already have to survive long enough to become politicians. Looking at the average age that people entered parliament, 45 is not too far off:
deaths %>%
summarise(mean(age_active, na.rm = T))
## # A tibble: 1 x 1
## `mean(age_active, na.rm = T)`
## <dbl>
## 1 45.2
I use males because there’s been hardly any women in parliament (:( ). Let’s read in the national data and calculate a year mid-point:
e45 <- read_csv("e45.csv")
## Parsed with column specification:
## cols(
## Year = col_character(),
## e45 = col_double()
## )
e45 <- e45 %>%
mutate(start_year = as.numeric(str_sub(Year, 1,4)),
end_year = as.numeric(str_sub(Year, 6,9)),
year = floor((start_year+end_year)/2))
e45
## # A tibble: 37 x 5
## Year e45 start_year end_year year
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1881–1890 68 1881 1890 1885
## 2 1891–1900 69 1891 1900 1895
## 3 1901–1910 69.8 1901 1910 1905
## 4 1920–1922 71 1920 1922 1921
## 5 1932–1934 71.9 1932 1934 1933
## 6 1946–1948 71.8 1946 1948 1947
## 7 1953–1955 72.2 1953 1955 1954
## 8 1960–1962 72.4 1960 1962 1961
## 9 1965–1967 72 1965 1967 1966
## 10 1970–1972 72.1 1970 1972 1971
## # … with 27 more rows
Now graph the average age at death for politicians and the national data that we have. The size of the dot represents the number of people who died from that cohort.
deaths %>%
full_join(e45 %>% rename(birth_year = year)) %>%
filter(age_at_death>0) %>%
group_by(birth_year, e45) %>%
summarise(mean_age = mean(age_at_death),
deaths = n()) %>%
ggplot(aes(birth_year, mean_age)) + geom_point(aes(size = deaths)) +
geom_point(aes(birth_year, e45, color = 'National average'), size = 4, pch = 10) +
scale_color_manual(name = "", values = c("National average" = "red")) +
scale_size_continuous(name = "number of deaths") +
ylab("average age at death (years)") + xlab("birth year") +
ggtitle("Average age at death of Australian politicians by birth year") +
theme_bw(base_size = 12)
So the average age of death for politicians is generally well above the national average. There’s a steep drop in the later years, from about 1935 onwards, as these cohorts are still fairly young. The youngest ten are listed below, along with their reason for leaving parliament:
deaths %>%
filter(!is.na(age_at_death), birth_year>1935) %>%
arrange(age_at_death) %>%
mutate(reason_leaving = ifelse(is.na(senatorsEndReason), mpsEndReason, senatorsEndReason)) %>%
select(displayName, birth_year, age_at_death, reason_leaving) %>%
filter(row_number() %in% 1:10)
## # A tibble: 10 x 4
## displayName birth_year age_at_death reason_leaving
## <chr> <dbl> <dbl> <chr>
## 1 Knight, John 1943 37.3 Died
## 2 Kirwan, Frank 1937 39.0 Defeated
## 3 Gerick, Jane 1963 40.7 Defeated
## 4 Wilton, Greg 1955 44.6 Died
## 5 Bell, Robert 1950 51.1 Defeated
## 6 West, Andrea 1952 57.6 Defeated
## 7 Vigor, David 1939 58.8 Defeated
## 8 Knott, Peter 1956 59.2 Defeated
## 9 Young, Mick 1936 59.5 Resigned
## 10 Haines, Janine 1945 59.5 Term Expired
Summary
If you’ve ever wanted to know about Australian Politicians, this is the package for you. These data could be combined with data from other sources, for example Twitter data to study more recent politicians, Hansard data, or data from other countries for international comparisons. This is also a great dataset to study a relatively privledged group of society.