Data

The Varieties of Democracy (V-Dem) dataset is a new approach to conceptualizing and measuring democracy. The data provides a multidimensional and disaggregated perspective that reflects the complexity of the concept of democracy as a system of rule that goes beyond the simple presence of elections. The V-Dem project distinguishes between five high-level principles of democracy: electoral, liberal, participatory, deliberative, and egalitarian, and collects data to measure these principles.

The data regularly surveys 3000 expert to construct each measure. Below I’ve selected all metrics concerning civil liberties in a country. Below the following outlines all the V-Dems variables in the data.

  • Freedom from torture (v2cltort)
  • Freedom from political killings (v2clkill)
  • Freedom from forced labor for men (v2clslavem)
  • Freedom from forced labor for women (v2clslavef)
  • Transparent laws with predictable enforcement (v2cltrnslw)
  • Rigorous and impartial public administration (v2clrspct)
  • Access to justice for men (v2clacjstm)
  • Access to justice for women (v2clacjstw)
  • Social class equality in respect for civil liberty (v2clacjust)
  • Social group equality in respect for civil liberties (v2clsocgrp)
  • Freedom of discussion for men (v2cldiscm)
  • Freedom of discussion for women (v2cldiscw)
  • Freedom of academic and cultural expression (v2clacfree)
  • Freedom of religion (v2clrelig)
  • Freedom of foreign movement (v2clfmove)
  • Freedom of domestic movement for men (v2cldmovem)
  • Freedom of domestic movement for women (v2cldmovew)
  • State ownership of economy (v2clstown)
  • Property rights for men (v2clprptym)
  • Property rights for women (v2clprptyw)

The data has been aggregated to the country level (in an effort to keep the file size small). To do this, I first subsetted the data to reflect the post-World War II period and then averaged the scores for the entire time series. In addition, I’ve included a country-average measure of polity, which offers an alternative measure of democracy generated by the Center for Systemic Peace. (More on this below).

Task

Let’s explore some of the clustering methods used in lecture to see if we can cluster countries into basic regime types — democracies/non-democracies — using the V-Dems data of a country’s civil liberties record. We’ll then compare our clustered categories to the averaged polity metric to see if our bins reflect the democracy scale reflected in that metric.

Summarize

Read in the data.

vdems <- read_csv("Data/vdems_civil_liberties.csv")
Parsed with column specification:
cols(
  .default = col_double(),
  country = col_character()
)
See spec(...) for full column specifications.

Quick summary of the data distribution using skimr. Somethings to note:

  • There are 177 observations, meaning there are 177 countries represented in the data.
  • All the V-Dems variables are already scaled (this actually has to do with how the measures are generated from the expert surveys). Generally speaking, since we leverage the concept of “distance” to cluster, we need all the variables to exist in the same space. The only exception here is the polity variable, but we’ll be using that as confirmation of the clusters we draw out of the data rather than an input into the clustering algorithm.
  • There is no missingness in the data.
  • From the mini histograms in the skimr plot, there doesn’t appear to be any significant distributional issues.
skimr::skim(vdems)
── Data Summary ────────────────────────
                           Values
Name                       vdems 
Number of rows             177   
Number of columns          25    
_______________________          
Column type frequency:           
  character                1     
  numeric                  24    
________________________         
Group variables                  

── Variable type: character ──────────────────────────────────────────
  skim_variable n_missing complete_rate   min   max empty n_unique
1 country               0             1     4    32     0      177
  whitespace
1          0

── Variable type: numeric ────────────────────────────────────────────
   skim_variable n_missing complete_rate    mean    sd     p0    p25
 1 polity                0             1  0.980   5.88 -10    -3.75 
 2 v2cltort              0             1  0.215   1.28  -2.22 -0.794
 3 v2clkill              0             1  0.480   1.27  -2.47 -0.604
 4 v2clslavem            0             1  0.595   1.05  -2.19 -0.249
 5 v2clslavef            0             1  0.536   1.03  -1.99 -0.176
 6 v2cltrnslw            0             1  0.258   1.24  -1.79 -0.715
 7 v2clrspct             0             1  0.0599  1.25  -1.99 -0.904
 8 v2clacjstm            0             1  0.288   1.24  -2.32 -0.694
 9 v2clacjstw            0             1  0.260   1.23  -2.96 -0.699
10 v2clacjust            0             1  0.595   1.12  -2.17 -0.178
11 v2clsocgrp            0             1  0.332   1.14  -2.24 -0.568
12 v2cldiscm             0             1  0.200   1.23  -2.88 -0.666
13 v2cldiscw             0             1  0.169   1.19  -2.81 -0.743
14 v2clacfree            0             1  0.233   1.26  -2.80 -0.738
15 v2clrelig             0             1  0.428   1.17  -3.34 -0.388
16 v2clfmove             0             1  0.416   1.15  -3.55 -0.479
17 v2cldmovem            0             1  0.468   1.08  -4.22 -0.220
18 v2cldmovew            0             1  0.374   1.20  -4.24 -0.558
19 v2clstown             0             1 -0.0333  1.06  -3.51 -0.762
20 v2clprptym            0             1  0.479   1.14  -3.41 -0.509
21 v2clprptyw            0             1  0.491   1.22  -2.83 -0.461
22 v2clgencl             0             1  0.497   1.10  -2.80 -0.304
23 v2clgeocl             0             1  0.0856  1.08  -2.50 -0.828
24 v2clpolcl             0             1  0.299   1.16  -2.18 -0.556
       p50   p75  p100 hist 
 1  0.0959 6     10    ▃▇▇▅▇
 2 -0.0369 1.01   3.06 ▃▇▆▃▃
 3  0.476  1.40   2.95 ▂▇▆▇▅
 4  0.824  1.38   2.73 ▁▃▅▇▃
 5  0.652  1.40   2.69 ▂▅▇▇▂
 6  0.0656 1.05   3.29 ▆▇▇▃▂
 7 -0.239  0.826  3.56 ▆▇▅▂▁
 8  0.127  0.983  3.10 ▂▆▇▃▃
 9  0.217  1.05   3.04 ▁▅▇▃▃
10  0.722  1.35   2.93 ▂▅▇▇▃
11  0.381  1.22   2.80 ▂▇▇▇▃
12 -0.0185 1.18   2.86 ▁▆▇▅▃
13  0.0179 1.06   3.03 ▁▆▇▅▃
14  0.151  1.09   2.75 ▁▇▇▇▅
15  0.616  1.39   2.47 ▁▂▅▇▆
16  0.542  1.32   2.45 ▁▂▇▇▆
17  0.694  1.28   2.07 ▁▁▃▆▇
18  0.671  1.29   2.26 ▁▁▆▆▇
19  0.0998 0.818  2.11 ▁▃▆▇▃
20  0.812  1.43   2.18 ▁▂▆▆▇
21  0.708  1.37   2.52 ▁▃▆▇▆
22  0.476  1.42   2.68 ▁▂▇▆▅
23 -0.0766 1.19   2.36 ▁▇▇▆▅
24  0.0236 1.14   2.99 ▃▇▇▅▃

Let’s now explore the correlation of the variables. As you can see, all the variables are highly correlated with one another. This isn’t surprising since they all attempt to get at similar concepts. That is, when you infringe upon the civil liberties along one dimension, you usually also do so along another. Clustering in an of itself makes no parametric assumptions, so we don’t need to worry about lurking issues here, such as multicolinearity. That said, these data are likely prime candidates for a decomposition (see reading)!

vdems %>% 
  select(-country,-polity) %>%  
  GGally::ggcorr(.)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2