Analysing complexity of parliamentary speech

Introduction

In this post I’ll walk through the steps of an analysis of parliamentary speech. I’ll be using data from ParlSpeechV2 to assess the complexity of how politicians talk in parliaments, as well as analyse predictors of the complexity of parliamentary speech. In this post I go through how to use ParlSpeech data, and how to merge it with other sources such as ParlGov and EveryPolitician. I will then go through how to evaluate the complexity a speech using text-as-data methods, some exploratory analysis and visualization, and some statistical inference.

Content

Data preparation

ParlSpeech data

To start of I’ll be using data from ParlSpeechV2, which is a dataset collected by Rauh and Schwalbach (2020). It contains parliamentary data from eight western countries, with a coverage of about 21-32 years depending on the country. To simplify matters we’ll only use the Swedish data this time, but this could be expanded from there to include data from the other countries. I first create a project directory for this project, and then download the data from this. I save it as Corp_Riksdagen_V2.rds in my project folder.

The data can then be loaded into the R session through

> Corp_Riksdagen_V2 <- readRDS("Corp_Riksdagen_V2.rds")

We start by just exploring what the data looks like and what it contain.

> str(Corp_Riksdagen_V2)
'data.frame':	365560 obs. of  11 variables:
 $ date          : chr  "1990-10-02" "1990-10-02" "1990-10-16" "1990-10-16" ...
 $ agenda        : chr  "Välkomstord" "Meddelande om kammarens arbete" "Svar på fråga 1990/91:44 om lagen om anställningsskydd" "Svar på fråga 1990/91:44 om lagen om anställningsskydd" ...
 $ speechnumber  : int  1 2 1 2 3 4 5 6 7 8 ...
 $ speaker       : chr  "TALMANNEN" "TALMANNEN" "Arbetsmarknadsminister MONA SAHLIN" "LARS-OVE HAGBERG" ...
 $ party         : chr  NA NA "S" "V" ...
 $ party.facts.id: num  NA NA 487 830 487 830 487 830 487 690 ...
 $ chair         : logi  TRUE TRUE FALSE FALSE FALSE FALSE ...
 $ terms         : num  563 92 204 255 231 148 39 95 291 242 ...
 $ text          : chr  "Ärade ledamöter! Jag hälsar er alla varmt välkomna tillbaka till riksdagen och till ett nytt arbetsår. Ett år s"| __truncated__ "Såsom framgår av utsänd sammanträdesplan anordnas i morgon partiledardebatt, som tar sin början kl. 10.00. På o"| __truncated__ "Herr talman! Lars-Ove Hagberg har frågat mig om regeringen är beredd att ändra i lagen om anställningsskydd så "| __truncated__ "Herr talman! När Borlänge kommun olyckligtvis var med och planerade ett affärscentrum uppstod problem. Butikern"| __truncated__ ...
 $ parliament    : chr  "SW-Riksdagen" "SW-Riksdagen" "SW-Riksdagen" "SW-Riksdagen" ...
 $ iso3country   : chr  "SWE" "SWE" "SWE" "SWE" ...

Clearly there are many variables we don’t need, and there seem to be speeches from actors in a non-partisan role such as the speaker of the house (Talmannen). To sort out the data we therefore use the tidyverse package to fix this. Here we select the variables of interest, and remove all rows from persons with no party affiliation.

library(tidyverse)
Corp_Riksdagen_V2 <- Corp_Riksdagen_V2 %>%
  select(date,party,party.facts.id,speaker,text) %>%
  filter(!is.na(party))

This is much better. However, the party variable is still treated as a character in reality we want to use it as a factor. To solve this we use

Corp_Riksdagen_V2$party <- factor(Corp_Riksdagen_V2$party)

To see what parties that exist in our data we can use the levels command.

> levels(Corp_Riksdagen_V2$party)
 [1] "C"   "FP"  "KD"  "L"   "M"   "MP"  "NYD" "S"   "SD"  "V" 

Those who are familiar with Swedish politics, will probably notice an issue here, namely that we have two distinct levels called “FP” and “L”. These in fact represent the same party which underwent a namechange from Folkpartiet to Liberalerna, in 2015. We would want these to be of the same level, and can therefore adjust this through:

levels(Corp_Riksdagen_V2$party) <- c("C","L","KD","L","M","MP","NYD","S","SD","V")

Last thing before moving on is to create an identifier of each post, I often do this to keep track of my data. To do this I create a new id variable with a unique number for each speech.

Corp_Riksdagen_V2$id <- 1:nrow(Corp_Riksdagen_V2)

Merging with EveryPolitician and ParlGov

Now we have our data in order, but before moving on to analyzing the parliamentary speech there are a number of variables I would like to add, namely the gender and age of each politician, and the government status of each party for each date. To get the gender and age of each politician I will try to use data from EveryPolitician (the project is currently paused and does not have a 100% coverage, but will make do for this small project). To make everything convenient there is an R package available for accessing their data.

library(devtools)
install_github("ajparsons/everypoliticianR")
library(everypoliticianR)

Now to get the data from Sweden we can run

sweden <- everypolitician("Sweden")
sweden <- sweden$persons

This code access their entire database of Sweden, and then extracts the data on every individual politician in the data. In their data there are two variables of interest, namely gender and birthyear (which we can use to get age), and we want to merge this data with our ParlSpeech data. Unfortunately there is no ID key linked with each politician such that we can match them perfectly. Therefore I’ll try to match every politician with their full name. This is not unproblematic, as some politicians have the same name. However, most often those who share the same name will also have the same gender so that is not an issue. For age, this might be more problems but I will ignore that. Another issue that can emerge in this approach is that politicians change name, there might be misspellings and different structures, which makes matching on the name imperfect, this I will also just ignore. Before matching is possible, I will have to do some minimal preprocessing, to do this I use the stringr package.

The names in the EveryPolitician is fairly cleaned so I will just make all names lowercase.

> sweden$name <- tolower(sweden$name)
> head(sweden$name)
[1] "björn rosengren"           "lisbeth grönfeldt bergman" "anita bråkenhielm"        
[4] "erik ezelius"              "karin israelsson"          "osama ali maher" 

For the ParlSpeech data there is more work to be done.

> head(Corp_Riksdagen_V2$speaker)
[1] "Arbetsmarknadsminister MONA SAHLIN" "LARS-OVE HAGBERG"                  
[3] "Arbetsmarknadsminister MONA SAHLIN" "LARS-OVE HAGBERG"                  
[5] "Arbetsmarknadsminister MONA SAHLIN" "LARS-OVE HAGBERG" 

Other than these examples some names are written as “MARIANNE ANDERSSON i Vårgårda”". In all these different strings we want to extract only the name. To do this I use str_replace to remove all words in each string which contain any word starting with a uppercase letter and then lowercase letters, I also remove all words containing of only lowercase letters After this I remove all excess spaces and turn the name fully to lowercase.

> Corp_Riksdagen_V2$name <- str_replace_all(Corp_Riksdagen_V2$speaker, "[A-ZÅÄÖ][a-zåäö-]+|[a-zåäö]+", "")
> Corp_Riksdagen_V2$name <- str_trim(Corp_Riksdagen_V2$name,side="both")
> Corp_Riksdagen_V2$name <- tolower(Corp_Riksdagen_V2$name)
> head(Corp_Riksdagen_V2$name)
[1] "mona sahlin"      "lars-ove hagberg" "mona sahlin"      "lars-ove hagberg" "mona sahlin"     
[6] "lars-ove hagberg"

Now we have our data in order and should be able and merge it with the new data from EveryPolitician. To do this I use the merge command. In it I first write the main dataframe, and then the dataframe I want to add on. Given that there are only two variables we want to add on from the sweden dataframe, I only use those columns in the merge command. The by argument specifies which column we are merging by, and all.x=T specifies that we are keeping all rows in the first dataframe, even if there is no match with the second dataframe.

Corp_Riksdagen_V2 <- merge(Corp_Riksdagen_V2,sweden[c("birth_date","gender","name")],by="name",all.x = T)

So, how did the matching work? The attentive person will note that the new resulting dataframe contains more rows than the original dataframe, why is that? It is because some rows in Corp_Riksdagen_V2 matched with many rows in the sweden dataframe, this is because some politicians have the same name. To count the number of times this happened we can use the duplicated command. Furthermore, to see how many politicians did not find a match we can count the number of rows that are na in the gender column.

> sum(duplicated(Corp_Riksdagen_V2$ID))
[1] 7536
> sum(is.na(Corp_Riksdagen_V2$gender))
[1] 18318

There are clearly some problems with the match. Mostly the missing values seem to originate from lack of coverage rather than misspellings, although few examples of this likely occur. In a real study I would likely devote more time to solving these matching issues and do an error analysis of the mismatches, but for now we’ll just remove all missing values, and for all duplications we’ll just use one of the matches.

Corp_Riksdagen_V2 <- Corp_Riksdagen_V2 %>%
  group_by(ID) %>%
  filter(row_number() == 1) %>%
  ungroup() %>% 
  filter(!is.na(gender))

Now there is just one variable missing, which is government status of each party. We could likely do this manually, but in order to be able and extend the analysis to more countries later it is good if we find a systematic way of doing this. For this project I’ll use the ParlGov data set, which is available here. After I’ve downloaded the cabinets data set and stored it as “parl_gov.csv” in the project folder, it can be accessed through

parl_gov <- read.csv("parl_gov.csv")

As can be seen it contains information about the government status for a lot of parties, in a lot of countries, for many elections. To reduce the data for the information we’ll need I do the following

parl_gov <- parl_gov[parl_gov$country_name_short=="SWE",]
parl_gov$start_date <- as.Date(parl_gov$start_date)
parl_gov <- parl_gov[parl_gov$start_date>as.Date("1985-01-01"),]
parl_gov <- parl_gov[c("party_id","start_date","cabinet_party","party_name")]

Now the data looks like this:

> head(parl_gov)
      party_id start_date cabinet_party                     party_name
12350     1461 1985-09-15             0                  Centerpartiet
12351      892 1985-09-15             0                    Folkpartiet
12352      282 1985-09-15             0              Kristdemokraterna
12353      657 1985-09-15             0                    Moderaterna
12354      904 1985-09-15             1             Socialdemokraterna
12355      882 1985-09-15             0 Vänsterpartiet (kommunisterna)

and the key issue now becomes, how can we use this data to create a government variable in our main data set? There are likely many better solutions to this, but what I, and ChatGPT, came up with after a lengthy discussion was to first create a new variable called end_date, which is the end_date of each government formed. The idea is that we can then see if a speach is made between these dates in order to then match it with a party and its corresponding value of the cabinet_party column. To do this we then write

parl_gov <- parl_gov %>%
  arrange(party_id, start_date) %>%
  group_by(party_id) %>%
  mutate(end_date = lead(start_date, order_by = start_date, default = as.Date("2100-01-01"))) %>%
  ungroup()

Now we are almost there, but how can we match this data on a party level with the other data? We have party_name and something called party_id. While we could likely do this by simply adjusting the party_name variable manually, I want a systematic way of doing this. Luckily, there exist such a way. The Party Facts project is an iniative that links together different datasets on political parties, and their dataset can be downloaded here. Thus I download the data called “external parties” and store it in my project folder.

partyfacts <- read_csv("partyfacts-mapping.csv")

When looking at the data we can see that it contains information on several countries and datasets. To make things easier we thus select all data on Sweden and parl_gov.

partyfacts <- partyfacts %>%
  filter(!is.na(partyfacts_id)&country=="SWE"&dataset_key=="parlgov")

In the data there are two variables that we are interested in. The first one is dataset_party_id which is the id number used in parlgov for each party. The other one is the partyfacts_id which is the “master i” which party facts uses to link different datasets together. This id is infact the one which is given in the ParlSpeech data, and is currently called party.facts.id.

We can now use merge this data with the parl_gov data to get a party id we can use

parl_gov <- merge(parl_gov,partyfacts[c("dataset_party_id","partyfacts_id")],
                  by.x="party_id",by.y="dataset_party_id",all.x=T)

Now finally, we can use this data to get the government status of each speech in the ParlSpeech dataframe. To do this we can write

Corp_Riksdagen_V2 <- merge(Corp_Riksdagen_V2,parl_gov,by.x="party.facts.id",by.y="partyfacts_id") %>%
  mutate(is_within_date = date > start_date & date <= end_date,
         government = ifelse(is_within_date, cabinet_party, NA)) %>%
  select(-start_date, -end_date, -cabinet_party, -is_within_date) %>%
  arrange(date, party.facts.id) %>%
  filter(!is.na(government))

This could is not beautiful, but it works… What I do is that I first merge the parl_gov and Corp_Riksdagen_V2 dataframes by the now joint party id. This becomes a very large dataframe given that for each individual speech there is a a match with all rows in the parl_gov dataframe containing that party id. Then, for each speech I check during which government period the speech was made, and create a new variable called government which takes the value of the cabinet_party variable in that period, for all other government periods this variable becomes NA. When this is done I remove the variables I don’t need, and remove all rows in which the government variable is NA. If done correctly, it should only be NA for all the excess rows where added in the initial merge. (This worked almost perfectly, only one row got accidentally excluded due to a specific case I’ll ignore in this case.)

Now we are almost done with all data preparation. The only last things I’ll do is to create a year variable containing the year a speech was made. I will do this by extracting the four first characters in the date column, which works given that the dates are in the format “YYYY-MM-DD”. In the same way I’ll create a birth year variable out of the birth_date variable, with these I’ll compute the age of the speaker for each speech. Finally I’ll make the gender and government variables into factors. This is all summarized in the following steps.

Corp_Riksdagen_V2$year <- as.numeric(substr(Corp_Riksdagen_V2$date,1,4))
Corp_Riksdagen_V2$birth_year <- as.numeric(substr(Corp_Riksdagen_V2$birth_date,1,4))
Corp_Riksdagen_V2$age <- Corp_Riksdagen_V2$year-Corp_Riksdagen_V2$birth_year
Corp_Riksdagen_V2$government <- factor(Corp_Riksdagen_V2$government)
Corp_Riksdagen_V2$gender <- factor(Corp_Riksdagen_V2$gender)

Evaluating the Complexity of Language

Now to the task of evaluating how complex a speech is. One popular method which have developed for Swedish texts is the LIX score, which stands for Läsbarhetsindex. It was developed in the 1960s by a Swedish scholar, and boils down to a very concise formula:

$$ LIX = \frac{W}{S}+\frac{L\times 100}{O} $$

In this text $W$ equals the number of words in the text, $S$ the number of sentences, and $L$ the number of long words, that is words longer than 6 characters. It is hard to get a valid measure language complexity across languages (Schoonvelde et al. 2019), but for now this will do, it is fairly well used in Swedish. Now, to compute the LIX score for all speeches I first create a function that computes the LIX score, and then apply it to every speech in the data.


calculate_lix <- function(text) {
  sentences <- unlist(strsplit(text, "[.!?]"))
  words <- unlist(strsplit(text, "\\W+"))
  avg_sentence_length <- length(words) / length(sentences)
  
  long_words <- sum(nchar(words) > 6)
  proportion_long_words <- long_words / length(words)
  
  lix <- avg_sentence_length + (proportion_long_words * 100)
  
  return(lix)
}

Corp_Riksdagen_V2$lix <- sapply(Corp_Riksdagen_V2$text, calculate_lix)

Exploratory analysis

Now with the LIX scores calculated, and all the datasets merged, we can finally explore what the data looks like. First I’ll just use the summary function in R.

> summary(Corp_Riksdagen_V2$lix)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.50   34.58   38.44   38.72   42.64  181.63 

The mean LIX score is 38, and most speeches score between 34 and 42, but there are some extreme outliers. To assess this further I’ll use a histogram.

ggplot(Corp_Riksdagen_V2,aes(lix))+
  geom_histogram(col="black",stat="bin")+
  xlab("LIX score")+
  ylab("Count")+
  theme_bw()

Image Description

So what does the extremes mean hear? What is a complicated and not complicated speech? In my mind this analysis would require additional validation to make sure that we actually measure what we intend to measure here, but to get some understanding of this I look at the most simple and most complicated speech in the data.

> Corp_Riksdagen_V2[which.min(Corp_Riksdagen_V2$lix),]$text
[1] "Herr talman! Ja."
> Corp_Riksdagen_V2[which.max(Corp_Riksdagen_V2$lix),]$text
[1] "Fru Talman! Valberedningen har enhälligt godkänt gemensamma listor för valen av ledamöter och suppleanter i utskotten. I egenskap av ordförande i valberedningen ber jag att till talmannen få överlämna de gemensamma listorna.  Företogs val av 17 ledamöter i utskotten.  De av Sven Hulterström för dessa val avlämnade gemensamma listorna godkändes av kammaren, varvid följande personer, vilkas namn i här angiven ordning upptagits på respektive listor, befanns valda för återstoden av riksdagens innevarande valperiod till ledamöter i  konstitutionsutskottet  Kurt Ove Johansson Catarina Rönnung Anders Björck Axel Andersson Widar Andersson Birger Hagård Barbro Hietala Nordlund Birgitta Hambraeus Pär-Axel Sahlberg Jerry Martinger Birgit Friggebo Mats Berglind Kenneth Kvist Nikos Papadopoulos Inger René Peter Eriksson Håkan Holmberg  finansutskottet  Jan Bergqvist Sören Lekberg Lars Tobisson Lisbet Calner Bo Nilsson Sonja Rembo Arne Kjörnsberg Per-Ola Eriksson Sonia Karlsson Lennart Hedquist Anne Wibble Susanne Eberstein Johan Lönnroth Kristina Nordström Fredrik Reinfeldt Roy Ottosson Mats Odell  skatteutskottet  Lars Hedfors Anita Johansson Bo Lundgren Sverre Palm Karl Hagström Karl-Gösta Svenson Lisbeth Staaf-Igelström Rolf Kenneryd Björn Ericson Carl Fredrik Graf Isa Halvarsson Inger Lundberg Per Rosengren Ulla Rudin Jan-Olof Franzén Ronny Korsberg Holger Gustafsson  justitieutskottet  Lars-Erik Lövdén Birthe Sörestedt Gun Hellsvik Göran Magnusson Sigrid Bolkéus Göthe Knutson Märta Johansson Ingbritt Irhammar Margareta Sandgren Anders G Högmark Siw Persson Ann-Marie Fagerström Alice Åström Pär Nuder Maud Ekendahl Kia Andreasson Rolf Åbjörnsson  lagutskottet  Anita Persson Bengt Kronblad Rolf Dahlberg Carin Lundberg Rune Berglund Stig Rindborg Karin Olsson Agne Hansson Eva Arvidsson Henrik S Järrel Bengt Harding Olson Inger Segelström Tanja Linderborg Anders Ygeman Tomas Högström Yvonne Ruwaida Birgitta Carlsson  utrikesutskottet  Viola Furubjelke Inga-Britt Johansson Göran Lennmarker Nils T Svensson Berndt Ekholm Inger Koch Urban Ahlin Helena Nilsson Carina Hägg Bertil Persson Karl-Göran Biörsmark Tone Tingsgård Eva Zetterberg Agneta Brendt Lars Hjertén Bodil Francke Ohlsson Ingrid Näslund  försvarsutskottet  Britt Bohlin Iréne Vestlund Arne Andersson Christer Skoog Sven Lundberg Henrik Landerholm Karin Wegestål Anders Svärd Ola Rask My Persson Lennart Rohdin Birgitta Gidblom Jan Jennehag Håkan Juholt Olle Lindström Annika Nordgren Åke Carnerö  socialförsäkringsutskottet  Börje Nilsson Margareta Israelsson Gullan Lindblad Maud Björnemalm Anita Jönsson Margit Gennser Lennart Klockare Ingrid Skeppstedt Sven-Åke Nygårds Gustaf von Essen Sigge Godin Ronny Olander Ulla Hoffmann Mona Berglund Nilsson Ulf Kristersson Ragnhild Pohanka Rose-Marie Frebran  socialutskottet  Ingrid Andersson Rinaldo Karlsson Sten Svensson Hans Karlsson Christina Pettersson Liselotte Wågö Marianne Jönsson Roland Larsson Conny Öhman Leif Carlson Barbro Westerholm Mariann Ytterberg Stig Sandström Christin Nilsson Birgitta Wichne Thomas Julin Chatrine Pålsson  kulturutskottet  Åke Gustavsson Berit Oscarsson Elisabeth Fleetwood Anders Nilsson Leo Persson Stig Bertilsson Björn Kaaling Marianne Andersson Monica Widnemark Lennart Fridén Carl-Johan Wilson Agneta Ringman Charlotta L Bjälkebring Annika Nilsson Jan Backman Ewa Larsson Fanny Rizell  utbildningsutskottet  Björkman Jan Silfverstrand Bengt Ask Beatrice Johansson Eva Wärnersson Ingegerd Rydén Rune Lundberg Agneta Carlgren Andreas Danielsson Torgny Melin Ulf Edgren Margitta Eneroth Tomas Danestig-Olofsson Britt-Marie Westerlund Panke Majléne Hjortzberg-Nordlund Hans Goude Gunnar Davidson Inger  trafikutskottet  Monica Öhman Håkan Strömberg Per Westerberg Jarl Lander Per Erik Granström Tom Heyman Krister Örnfjäder Karin Starrin Hans Stenberg Birgitta Wistrand Kenth Skårvik Monica Green Karl-Erik Persson Lena Sandlin Lars Björkman Elisa Abascal Reyes Christina Axelsson  jordbruksutskottet  Sinikka Bohlin Inge Carlsson Göte Jonsson Kaj Larsson Leif Marklund Ingvar Eriksson Alf Eriksson Lennart Daléus Ingemar Josefsson Carl G Nilsson Eva Eriksson Ann-Kristine Johansson Maggi Mikaelsson Åsa Stenberg Eva Björne Gudrun Lindvall Lennart Brunander  näringsutskottet  Birgitta Johansson Bo Finnkvist Karin Falkmer Reynoldh Furustrand Mats Lindberg Mikael Odenberg Sylvia Lindgren Kjell Ericsson Barbro Andersson Chris Heister Christer Eirefelt Marie Granlund Lennart Beijer Dag Ericson Ola Karlsson Eva Goës Göran Hägglund  arbetsmarknadsutskottet  Johnny Ahlqvist Sten Östlund Per Unckel Berit Andnor Ingvar Johnsson Kent Olsson Martin Nilsson Elving Andersson Laila Bjurling Patrik Norinder Elver Jonsson Sonja Fransson Hans Andersson Kristina Zakrisson Christel Anderberg Barbro Johansson Dan Ericsson  bostadsutskottet  Lennart Nilsson Rune Evensson Knut Billing Bengt-Ola Ryttar Britta Sundin Sten Andersson Marianne Carlström Rigmor Ahlstedt Lars Stjernkvist Stig Grauers Erling Bager Lena Larsson Owe Hellberg Lilian Virgin Inga Berggren Per Lager Ulf Björklund"

To explore party differences I use a boxplot to show how the LIX score varies between parties.

ggplot(Corp_Riksdagen_V2,aes(x=party,y=lix))+
  geom_boxplot()+
  theme_bw()+
  xlab("Party")+
  ylab("LIX score")

Image Description

Statistical analysis

To explore these variations systematically I estimate a number of regression models, predicting the LIX score of a speech. First we’ll look at gender and age. Does it significantly impact how complex politicians talk?

m1 <- lm(lix~gender+age,data=Corp_Riksdagen_V2)
m2 <- lm(lix~gender+age+year,data=Corp_Riksdagen_V2)
stargazer(m1,m2,type="text")
Dependent variable:
lix
(1)(2)
gendermale0.705***0.670***
(0.022)(0.022)
age-0.058***-0.054***
(0.001)(0.001)
Constant41.154***42.213***
(0.056)(0.115)
Year FENoYes
Observations342,510342,510
R20.0110.017
Note:*p<0.1; **p<0.05; ***p<0.01

Indeed it seems like male politicians talk significantly more complicated, and that older politician speak less complicated. Next, I’ll look at party and government status. Do politicians from different parties talk more or less complicated? And how do politicians talk when they are in government?

m3 <- lm(lix~government+year,data=Corp_Riksdagen_V2)
m4 <- lm(lix~party+year,data=Corp_Riksdagen_V2)
m5 <- lm(lix~government+party+year,data=Corp_Riksdagen_V2)
stargazer(m3,m4,m5,type="text")
Dependent variable:
lix
(1)(2)(3)
government10.839***0.824***
(0.022)(0.025)
partyL-0.209***-0.180***
(0.049)(0.049)
partyKD-0.457***-0.425***
(0.051)(0.051)
partyM0.257***0.247***
(0.043)(0.043)
partyMP-0.982***-0.769***
(0.052)(0.053)
partyNYD-4.167***-3.703***
(0.126)(0.126)
partyS-0.206***-0.400***
(0.041)(0.041)
partySD0.230***0.634***
(0.085)(0.086)
partyV-0.471***-0.107**
(0.120)(0.120)(0.120)
Constant39.564***40.087***39.800***
(0.100)(0.105)(0.105)
Year FEYesYesYes
Observations342,510342,510342,510
R20.0120.0140.017
Note:*p<0.1; **p<0.05; ***p<0.01

In the first model I have only looked at the government variable with year fixed effects, here the results show that politicians in government talk more complicated. However, it could be that some parties talk more complicated in general, and that this affects who gets into government. Therefore in the second model I first look at party effects alone. In this model the Centre party is represented by the intercept, and all other party effects are how they differ from the Centre party. Here we see for example that the Moderate party and the Sweden Democrats are the only parties that talk more complicated, than the Centreparty. Where as all other parties, and especially Ny Demokrati, talk less complicated. In the third model I again look at the effect of government while controlling for year and party, meaning that I look how parties change their rhetoric as they enter, and leave, government. The effect is still positive, which indicates that parties change as they go into government.

The last thing I’ll do is to look at all variables at the same time, and present the results in a coefficient plot. While tables like these are nice, sometime it can be useful to use a Coefficient plot instead.

m6 <- lm(lix~government+party+gender+age+year,data=Corp_Riksdagen_V2)
coefs <- tidy(m6)
coefs <- coefs[!grepl("year|Intercept",coefs$term),]
ggplot(coefs,aes(x=term,y=estimate))+
  geom_point()+
  geom_errorbar(aes(ymin=estimate-1.96*std.error,ymax=estimate+1.96*std.error))+
  theme_bw()+
  coord_flip()+
  geom_hline(yintercept=0,col="red",lty="dashed")+
  xlab("Estimate")+
  ylab("Coefficient")

Image Description

This plot summarizes most of the previous findings. For example parties talk more complicated in government, and male politicians talk more complicated. One interesting finding now however is that in the previous models SD was estimated to talk more complicated than other parties, but now they have a negative effect - why is that? Probably it is because SD is a male dominated party, and now when we control for gender, thus we compare males to males and females to females, within each party, every year, it seems like politicians from SD talk slightly less complicated than the Centre party which represent the intercept in this model. However, they still talk more complicated than politicians from S and KD.

Summary

In this post I have merged parliamentary speech data from ParlSpeech, merged it with data from EveryPolitician and ParlGov. I have then computed complexity of every speech through the LIX score and done some analysis on the results. Note that there is more to be done for these findings to be conclusive, for example analyzing coverage and validating the LIX measure. Nevertheless, I have gone through some of the most central steps in how I would approach this analysis.