The Varieties of Democracy (V-Dem) dataset is a new approach to conceptualizing and measuring democracy. The data provides a multidimensional and disaggregated perspective that reflects the complexity of the concept of democracy as a system of rule that goes beyond the simple presence of elections. The V-Dem project distinguishes between five high-level principles of democracy: electoral, liberal, participatory, deliberative, and egalitarian, and collects data to measure these principles.
The data regularly surveys 3000 expert to construct each measure. Below I’ve selected all metrics concerning civil liberties in a country. Below the following outlines all the V-Dems variables in the data.
v2cltort
)v2clkill
)v2clslavem
)v2clslavef
)v2cltrnslw
)v2clrspct
)v2clacjstm
)v2clacjstw
)v2clacjust
)v2clsocgrp
)v2cldiscm
)v2cldiscw
)v2clacfree
)v2clrelig
)v2clfmove
)v2cldmovem
)v2cldmovew
)v2clstown
)v2clprptym
)v2clprptyw
)The data has been aggregated to the country level (in an effort to keep the file size small). To do this, I first subsetted the data to reflect the post-World War II period and then averaged the scores for the entire time series. In addition, I’ve included a country-average measure of polity
, which offers an alternative measure of democracy generated by the Center for Systemic Peace. (More on this below).
Let’s explore some of the clustering methods used in lecture to see if we can cluster countries into basic regime types — democracies/non-democracies — using the V-Dems data of a country’s civil liberties record. We’ll then compare our clustered categories to the averaged polity metric to see if our bins reflect the democracy scale reflected in that metric.
Read in the data.
vdems <- read_csv("Data/vdems_civil_liberties.csv")
Parsed with column specification:
cols(
.default = col_double(),
country = col_character()
)
See spec(...) for full column specifications.
Quick summary of the data distribution using skimr
. Somethings to note:
polity
variable, but we’ll be using that as confirmation of the clusters we draw out of the data rather than an input into the clustering algorithm.skimr
plot, there doesn’t appear to be any significant distributional issues.skimr::skim(vdems)
── Data Summary ────────────────────────
Values
Name vdems
Number of rows 177
Number of columns 25
_______________________
Column type frequency:
character 1
numeric 24
________________________
Group variables
── Variable type: character ──────────────────────────────────────────
skim_variable n_missing complete_rate min max empty n_unique
1 country 0 1 4 32 0 177
whitespace
1 0
── Variable type: numeric ────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25
1 polity 0 1 0.980 5.88 -10 -3.75
2 v2cltort 0 1 0.215 1.28 -2.22 -0.794
3 v2clkill 0 1 0.480 1.27 -2.47 -0.604
4 v2clslavem 0 1 0.595 1.05 -2.19 -0.249
5 v2clslavef 0 1 0.536 1.03 -1.99 -0.176
6 v2cltrnslw 0 1 0.258 1.24 -1.79 -0.715
7 v2clrspct 0 1 0.0599 1.25 -1.99 -0.904
8 v2clacjstm 0 1 0.288 1.24 -2.32 -0.694
9 v2clacjstw 0 1 0.260 1.23 -2.96 -0.699
10 v2clacjust 0 1 0.595 1.12 -2.17 -0.178
11 v2clsocgrp 0 1 0.332 1.14 -2.24 -0.568
12 v2cldiscm 0 1 0.200 1.23 -2.88 -0.666
13 v2cldiscw 0 1 0.169 1.19 -2.81 -0.743
14 v2clacfree 0 1 0.233 1.26 -2.80 -0.738
15 v2clrelig 0 1 0.428 1.17 -3.34 -0.388
16 v2clfmove 0 1 0.416 1.15 -3.55 -0.479
17 v2cldmovem 0 1 0.468 1.08 -4.22 -0.220
18 v2cldmovew 0 1 0.374 1.20 -4.24 -0.558
19 v2clstown 0 1 -0.0333 1.06 -3.51 -0.762
20 v2clprptym 0 1 0.479 1.14 -3.41 -0.509
21 v2clprptyw 0 1 0.491 1.22 -2.83 -0.461
22 v2clgencl 0 1 0.497 1.10 -2.80 -0.304
23 v2clgeocl 0 1 0.0856 1.08 -2.50 -0.828
24 v2clpolcl 0 1 0.299 1.16 -2.18 -0.556
p50 p75 p100 hist
1 0.0959 6 10 ▃▇▇▅▇
2 -0.0369 1.01 3.06 ▃▇▆▃▃
3 0.476 1.40 2.95 ▂▇▆▇▅
4 0.824 1.38 2.73 ▁▃▅▇▃
5 0.652 1.40 2.69 ▂▅▇▇▂
6 0.0656 1.05 3.29 ▆▇▇▃▂
7 -0.239 0.826 3.56 ▆▇▅▂▁
8 0.127 0.983 3.10 ▂▆▇▃▃
9 0.217 1.05 3.04 ▁▅▇▃▃
10 0.722 1.35 2.93 ▂▅▇▇▃
11 0.381 1.22 2.80 ▂▇▇▇▃
12 -0.0185 1.18 2.86 ▁▆▇▅▃
13 0.0179 1.06 3.03 ▁▆▇▅▃
14 0.151 1.09 2.75 ▁▇▇▇▅
15 0.616 1.39 2.47 ▁▂▅▇▆
16 0.542 1.32 2.45 ▁▂▇▇▆
17 0.694 1.28 2.07 ▁▁▃▆▇
18 0.671 1.29 2.26 ▁▁▆▆▇
19 0.0998 0.818 2.11 ▁▃▆▇▃
20 0.812 1.43 2.18 ▁▂▆▆▇
21 0.708 1.37 2.52 ▁▃▆▇▆
22 0.476 1.42 2.68 ▁▂▇▆▅
23 -0.0766 1.19 2.36 ▁▇▇▆▅
24 0.0236 1.14 2.99 ▃▇▇▅▃
Let’s now explore the correlation of the variables. As you can see, all the variables are highly correlated with one another. This isn’t surprising since they all attempt to get at similar concepts. That is, when you infringe upon the civil liberties along one dimension, you usually also do so along another. Clustering in an of itself makes no parametric assumptions, so we don’t need to worry about lurking issues here, such as multicolinearity. That said, these data are likely prime candidates for a decomposition (see reading)!
vdems %>%
select(-country,-polity) %>%
GGally::ggcorr(.)
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2