8  šŸ”ƒ Loading Data

In R, we arenā€™t just adding 2+2, we are often trying to analyze data that weā€™ve collected on Countries, Legislative bodies, or citizens. As a result, many of the calculations we are preforming in Political and Social Scientific research are too cumbersome to be doing by hand.

One of the first things you need to know how to do to engage in research, is to load a dataset that which records variables in columns, and rows as observations.

A variable refers to some quantification of some concept. An example of a variable we often deal with in political science may be something like the GDP (Gross Domestic Product) of a Country. In this example, GDP would be the variable and row would be Country. Another example would be individual attitudes on Highway spending. The variable would be peopleā€™s attitudes toward Highway spending (so if they like or dislike it) and the row would be the people that respondent to the survey question. You can visualize the first example below.

               country     gdppc
1          Afghanistan  1629.167
2               Angola  6360.849
3              Albania  9646.582
4 United Arab Emirates 56245.478
5            Argentina 18333.995

8.1 How to load data for this class

There are technically a couple of options, but you ideally should use the second one.

Option 1. Load the data by double clicking on the file

Option 2. Write code to load the data

For this class only, will both options work. However, datasets come in many different forms. Sometimes data come to you as .csv files, sometimes as .dta files, sometimes in an SQL database, etc. In these cases, you will not be able to load the data by clicking on the file. You will have to write the code.

Even for this class, you should ideally write code to load the data. Why? Well, you want everything you do for your analyses documented. Including loading the data. One of the main reasons for this is to reduce the number of steps that you have to take for you to run your analyses or for someone else to run your analyses. Writing the code that you need to load your dataset helps with this.

To write the code to write your data, you can simply enter:

# Load my data
load("PSCI_2075_v2.1.RData")

What did that do? Running that loaded three different datasets. The nes, the world, and the states dataset. These popped up as three objects called nes, world, and states in your RStudio environment.

To access these, you can access them the way you would with any object: by referring to their name.

If you want a preview of each you can use the head() function:

# Preview first 5 rows of:
    #* NES
head(nes, 5)
# A tibble: 5 Ɨ 51
  follow      birthyr turnoā€¦Ā¹ vote12 meet  march comprā€¦Ā² ftobama ftblack ftwhite
  <fct>         <int> <fct>   <fct>  <fct> <fct> <fct>     <int>   <int>   <int>
1 Most of thā€¦    1960 Definiā€¦ Baracā€¦ Extrā€¦ Haveā€¦ Comproā€¦     100     100     100
2 Some of thā€¦    1957 Probabā€¦ Not aā€¦ A liā€¦ Haveā€¦ Comproā€¦      39       6      74
3 Most of thā€¦    1963 Definiā€¦ Mitt ā€¦ Extrā€¦ Haveā€¦ Sticksā€¦       1      50      50
4 Most of thā€¦    1980 Definiā€¦ Baracā€¦ Not ā€¦ Haveā€¦ Comproā€¦      89      61      64
5 Most of thā€¦    1974 Definiā€¦ Mitt ā€¦ Veryā€¦ Haveā€¦ Sticksā€¦       1      61      58
# ā€¦ with 41 more variables: fthisp <int>, ftgay <int>, fttrump <int>,
#   fthrc <int>, ftsanders <int>, ftpolice <int>, ftfem <int>, ftmuslim <int>,
#   ftsci <int>, econnow <fct>, lcself <fct>, disc_b <fct>, disc_h <fct>,
#   disc_g <fct>, disc_w <fct>, disc_m <fct>, disc_fed <fct>,
#   disc_police <fct>, immig_numb <fct>, terror_worry <fct>, healthspend <fct>,
#   finwell <fct>, warmcause <fct>, freetrade <fct>, stopblack <fct>,
#   stop_ever <fct>, birthright_b <fct>, bo_muslim <fct>, amer_ident <fct>, ā€¦
    #* States
head(states, 5)
       state st raperate murderrate abort   density     ineq region   gunfree
1    Alabama AL     31.9        6.9    16  86.17970 46.01423  South  0.444921
2     Alaska AK     73.3        3.1    15   1.08848 34.18490   West  0.932850
3    Arizona AZ     32.0        5.4    20  42.12770 49.00724   West  0.969988
4   Arkansas AR     47.3        6.2    11  49.10960 46.83973  South  0.520867
5 California CA     23.6        5.3    33 211.97700 51.47233   West -3.270630
    alcfree    mjfree  marrfree  freedom knowgov evangel poptotal stuspend
1 -0.024292 -0.054232 -0.009304  21.6063   63.76    42.8  4700000     5273
2 -0.003965  0.076145 -0.010426  28.5846   68.44    18.7   686293     8599
3  0.016018  0.015321 -0.009304  32.4745   52.10    18.1  6500000     4785
4  0.005779 -0.012482 -0.009304  -5.7844   67.08    39.9  2900000     5140
5  0.015981  0.043902  0.041186 -85.7562   88.31    11.5 37000000     5685
  ptratio hsdiploma democrat       pid house senate   inc minwage year
1   15.77      77.5 30.54249 -0.033898    -1  0.976 34650    7.25 2016
2   16.29      90.4 26.38180 -0.350000     0  0.638 45529    7.75 2016
3   20.75      85.1 28.86411 -0.102564     0  0.576 35875    7.90 2016
4   12.90      81.7 27.68069  0.084507     0  0.816 34014    6.25 2016
5   19.80      81.2 37.08743  0.179825    -1 -1.395 44481    9.00 2016
   polscore newimmig popover65 percwom medinc turnout  margin co2 femleg
1 -0.073118     4063    657792   75.49  42590    0.59 59.5446 130   11.4
2  0.036343     1799     54938   79.02  57431    0.59 52.4528  43   18.3
3 -0.058591    20333    881831   84.00  48621    0.53 37.1173  88   33.3
4 -0.022314     2874    419981   88.52  41302    0.51 78.2980  62   17.0
5 -0.088499   210591   4200000   89.94  53367    0.55 35.9504 368   30.8
  corrupt infant trumpwin weed death stand obamawin
1      23   9.53        1    0     1     1        0
2      NA   5.93        1    1     0     1        1
3      18   6.85        1    1     1     1        0
4       8   7.84        1    1     1     0        0
5      52   5.32        0    1     1     0        1
    #* World
head(world, 5)
  iso3c        fdi nourish         aid          oil homicide  military infemale
1   AFG  0.3400968    24.7 40.45103119  0.002378189      3.5 1.8974728     70.7
2   AGO -3.9131508    20.7  0.31614304 38.618626728       NA 4.2448843    100.6
3   ALB  9.1340709      NA  3.09176990  1.530367571      4.4 1.5585919     13.2
4   ARE  3.0752631     5.0          NA 20.177806945      0.8 6.1194678      6.5
5   ARG  2.6751617     5.0  0.03136884  2.129501789       NA 0.8148781     11.6
    inf infmale       co2     trade   health     gdppc womleg gtbeduc
1  75.1    79.3 0.5315226  54.96733 9.197723  1629.167   27.7 0.62951
2 109.6   117.9 0.3523340 105.33771 3.391146  6360.849   38.6 0.78801
3  14.8    16.4 0.3855484  85.46456 5.335035  9646.582   16.4 0.98444
4   7.3     8.0 0.5621849 151.00043 3.929452 56245.478   22.5      NA
5  13.0    14.4 0.4435952  34.97101 6.550156 18333.995   38.5 1.03598
               region              country cid imfcode politycode bankscode
1          South Asia          Afghanistan 700     512        700        10
2  Sub-Saharan Africa               Angola 540     614        540        35
3              Europe              Albania 339     914        339        20
4 ME and North Africa United Arab Emirates 696     466        696        NA
5       Latin America            Argentina 160     213        160        40
  dpicode                  aclpregion          epost                  regime
1     AFG Eastern Europe/Soviet Union      president   Civilian Dictatorship
2     AGO          Sub-Saharan Africa      president   Civilian Dictatorship
3     ALB Eastern Europe/Soviet Union prime minister Parliamentary Democracy
4     ARE                  Oil States      president      Royal Dictatorship
5     ARG               Latin America      president  Presidential Democracy
  stra fhprights fhliberties personal lifeexp polity2 durable    rgdpe  pwtpop
1    0         5           6        0    <NA>      NA       0       NA      NA
2    0         6           5        0    <NA>      -2      11  87251.8 18.0380
3    0         3           3        0    <NA>       9      11  21701.8  3.1814
4    0         6           5        0    <NA>      -8      37       NA      NA
5    5         2           2        0    <NA>       8      25 524470.0 39.7143
    pwthc turnout       colony womyear urban    young ethfrac
1      NA   45.83           UK      NA 23.28       NA  0.7693
2      NA   62.77     Portugal    1975 53.96 46.32196  0.7867
3 2.99982   53.31 Soviet Union    1920 46.14 26.35428  0.2204
4      NA      NA         <NA>      NA    NA       NA      NA
5 2.79697   81.07        Spain    1947 90.26 26.11798  0.2550
# Preview first 10 rows of NES:
head(nes, 10)
# A tibble: 10 Ɨ 51
   follow     birthyr turnoā€¦Ā¹ vote12 meet  march comprā€¦Ā² ftobama ftblack ftwhite
   <fct>        <int> <fct>   <fct>  <fct> <fct> <fct>     <int>   <int>   <int>
 1 Most of tā€¦    1960 Definiā€¦ Baracā€¦ Extrā€¦ Haveā€¦ Comproā€¦     100     100     100
 2 Some of tā€¦    1957 Probabā€¦ Not aā€¦ A liā€¦ Haveā€¦ Comproā€¦      39       6      74
 3 Most of tā€¦    1963 Definiā€¦ Mitt ā€¦ Extrā€¦ Haveā€¦ Sticksā€¦       1      50      50
 4 Most of tā€¦    1980 Definiā€¦ Baracā€¦ Not ā€¦ Haveā€¦ Comproā€¦      89      61      64
 5 Most of tā€¦    1974 Definiā€¦ Mitt ā€¦ Veryā€¦ Haveā€¦ Sticksā€¦       1      61      58
 6 Most of tā€¦    1958 Definiā€¦ Someoā€¦ Modeā€¦ Haveā€¦ Sticksā€¦       0      50      51
 7 Most of tā€¦    1978 Definiā€¦ Someoā€¦ Extrā€¦ Haveā€¦ Comproā€¦      73     100      70
 8 Most of tā€¦    1951 Definiā€¦ Mitt ā€¦ Not ā€¦ Haveā€¦ Comproā€¦       0      70      70
 9 Most of tā€¦    1973 Definiā€¦ Mitt ā€¦ Modeā€¦ Haveā€¦ Sticksā€¦      12      50      50
10 Most of tā€¦    1936 Definiā€¦ Baracā€¦ Extrā€¦ Haveā€¦ Comproā€¦      87      75      90
# ā€¦ with 41 more variables: fthisp <int>, ftgay <int>, fttrump <int>,
#   fthrc <int>, ftsanders <int>, ftpolice <int>, ftfem <int>, ftmuslim <int>,
#   ftsci <int>, econnow <fct>, lcself <fct>, disc_b <fct>, disc_h <fct>,
#   disc_g <fct>, disc_w <fct>, disc_m <fct>, disc_fed <fct>,
#   disc_police <fct>, immig_numb <fct>, terror_worry <fct>, healthspend <fct>,
#   finwell <fct>, warmcause <fct>, freetrade <fct>, stopblack <fct>,
#   stop_ever <fct>, birthright_b <fct>, bo_muslim <fct>, amer_ident <fct>, ā€¦

If you want to see the entire thing, you can use the View() function:

# View NES
View(nes)

# View States
View(states)

# View World
View(world)

8.2 To view the number of rows you have in your dataset

# Number of observations (rows) for:
    #* NES
nrow(nes)
[1] 1178
    #* States
nrow(states)
[1] 50
    #* World
nrow(world)
[1] 182

8.3 To view the number of columns you have in your dataset

# Number of variables (columns) for:
    #* NES
ncol(nes)
[1] 51
    #* States
ncol(states)
[1] 42
    #* World
ncol(world)
[1] 42