country gdppc
1 Afghanistan 1629.167
2 Angola 6360.849
3 Albania 9646.582
4 United Arab Emirates 56245.478
5 Argentina 18333.995
8 š Loading Data
In R, we arenāt just adding 2+2
, we are often trying to analyze data that weāve collected on Countries, Legislative bodies, or citizens. As a result, many of the calculations we are preforming in Political and Social Scientific research are too cumbersome to be doing by hand.
One of the first things you need to know how to do to engage in research, is to load a dataset that which records variables
in columns, and rows
as observations.
A variable refers to some quantification of some concept. An example of a variable we often deal with in political science may be something like the GDP (Gross Domestic Product) of a Country. In this example, GDP would be the variable
and row
would be Country. Another example would be individual attitudes on Highway spending. The variable
would be peopleās attitudes toward Highway spending (so if they like or dislike it) and the row would be the people that respondent to the survey question. You can visualize the first example below.
8.1 How to load data for this class
There are technically a couple of options, but you ideally should use the second one.
Option 1. Load the data by double clicking on the file
Option 2. Write code to load the data
For this class only, will both options work. However, datasets come in many different forms. Sometimes data come to you as .csv
files, sometimes as .dta
files, sometimes in an SQL database, etc. In these cases, you will not be able to load the data by clicking on the file. You will have to write the code.
Even for this class, you should ideally write code to load the data. Why? Well, you want everything you do for your analyses documented. Including loading the data. One of the main reasons for this is to reduce the number of steps that you have to take for you to run your analyses or for someone else to run your analyses. Writing the code that you need to load your dataset helps with this.
To write the code to write your data, you can simply enter:
# Load my data
load("PSCI_2075_v2.1.RData")
What did that do? Running that loaded three different datasets. The nes
, the world
, and the states
dataset. These popped up as three objects called nes
, world
, and states
in your RStudio environment.
To access these, you can access them the way you would with any object: by referring to their name.
If you want a preview of each you can use the head()
function:
# Preview first 5 rows of:
#* NES
head(nes, 5)
# A tibble: 5 Ć 51
follow birthyr turnoā¦Ā¹ vote12 meet march comprā¦Ā² ftobama ftblack ftwhite
<fct> <int> <fct> <fct> <fct> <fct> <fct> <int> <int> <int>
1 Most of th⦠1960 Defini⦠Barac⦠Extr⦠Have⦠Compro⦠100 100 100
2 Some of th⦠1957 Probab⦠Not a⦠A li⦠Have⦠Compro⦠39 6 74
3 Most of th⦠1963 Defini⦠Mitt ⦠Extr⦠Have⦠Sticks⦠1 50 50
4 Most of th⦠1980 Defini⦠Barac⦠Not ⦠Have⦠Compro⦠89 61 64
5 Most of th⦠1974 Defini⦠Mitt ⦠Very⦠Have⦠Sticks⦠1 61 58
# ⦠with 41 more variables: fthisp <int>, ftgay <int>, fttrump <int>,
# fthrc <int>, ftsanders <int>, ftpolice <int>, ftfem <int>, ftmuslim <int>,
# ftsci <int>, econnow <fct>, lcself <fct>, disc_b <fct>, disc_h <fct>,
# disc_g <fct>, disc_w <fct>, disc_m <fct>, disc_fed <fct>,
# disc_police <fct>, immig_numb <fct>, terror_worry <fct>, healthspend <fct>,
# finwell <fct>, warmcause <fct>, freetrade <fct>, stopblack <fct>,
# stop_ever <fct>, birthright_b <fct>, bo_muslim <fct>, amer_ident <fct>, ā¦
#* States
head(states, 5)
state st raperate murderrate abort density ineq region gunfree
1 Alabama AL 31.9 6.9 16 86.17970 46.01423 South 0.444921
2 Alaska AK 73.3 3.1 15 1.08848 34.18490 West 0.932850
3 Arizona AZ 32.0 5.4 20 42.12770 49.00724 West 0.969988
4 Arkansas AR 47.3 6.2 11 49.10960 46.83973 South 0.520867
5 California CA 23.6 5.3 33 211.97700 51.47233 West -3.270630
alcfree mjfree marrfree freedom knowgov evangel poptotal stuspend
1 -0.024292 -0.054232 -0.009304 21.6063 63.76 42.8 4700000 5273
2 -0.003965 0.076145 -0.010426 28.5846 68.44 18.7 686293 8599
3 0.016018 0.015321 -0.009304 32.4745 52.10 18.1 6500000 4785
4 0.005779 -0.012482 -0.009304 -5.7844 67.08 39.9 2900000 5140
5 0.015981 0.043902 0.041186 -85.7562 88.31 11.5 37000000 5685
ptratio hsdiploma democrat pid house senate inc minwage year
1 15.77 77.5 30.54249 -0.033898 -1 0.976 34650 7.25 2016
2 16.29 90.4 26.38180 -0.350000 0 0.638 45529 7.75 2016
3 20.75 85.1 28.86411 -0.102564 0 0.576 35875 7.90 2016
4 12.90 81.7 27.68069 0.084507 0 0.816 34014 6.25 2016
5 19.80 81.2 37.08743 0.179825 -1 -1.395 44481 9.00 2016
polscore newimmig popover65 percwom medinc turnout margin co2 femleg
1 -0.073118 4063 657792 75.49 42590 0.59 59.5446 130 11.4
2 0.036343 1799 54938 79.02 57431 0.59 52.4528 43 18.3
3 -0.058591 20333 881831 84.00 48621 0.53 37.1173 88 33.3
4 -0.022314 2874 419981 88.52 41302 0.51 78.2980 62 17.0
5 -0.088499 210591 4200000 89.94 53367 0.55 35.9504 368 30.8
corrupt infant trumpwin weed death stand obamawin
1 23 9.53 1 0 1 1 0
2 NA 5.93 1 1 0 1 1
3 18 6.85 1 1 1 1 0
4 8 7.84 1 1 1 0 0
5 52 5.32 0 1 1 0 1
#* World
head(world, 5)
iso3c fdi nourish aid oil homicide military infemale
1 AFG 0.3400968 24.7 40.45103119 0.002378189 3.5 1.8974728 70.7
2 AGO -3.9131508 20.7 0.31614304 38.618626728 NA 4.2448843 100.6
3 ALB 9.1340709 NA 3.09176990 1.530367571 4.4 1.5585919 13.2
4 ARE 3.0752631 5.0 NA 20.177806945 0.8 6.1194678 6.5
5 ARG 2.6751617 5.0 0.03136884 2.129501789 NA 0.8148781 11.6
inf infmale co2 trade health gdppc womleg gtbeduc
1 75.1 79.3 0.5315226 54.96733 9.197723 1629.167 27.7 0.62951
2 109.6 117.9 0.3523340 105.33771 3.391146 6360.849 38.6 0.78801
3 14.8 16.4 0.3855484 85.46456 5.335035 9646.582 16.4 0.98444
4 7.3 8.0 0.5621849 151.00043 3.929452 56245.478 22.5 NA
5 13.0 14.4 0.4435952 34.97101 6.550156 18333.995 38.5 1.03598
region country cid imfcode politycode bankscode
1 South Asia Afghanistan 700 512 700 10
2 Sub-Saharan Africa Angola 540 614 540 35
3 Europe Albania 339 914 339 20
4 ME and North Africa United Arab Emirates 696 466 696 NA
5 Latin America Argentina 160 213 160 40
dpicode aclpregion epost regime
1 AFG Eastern Europe/Soviet Union president Civilian Dictatorship
2 AGO Sub-Saharan Africa president Civilian Dictatorship
3 ALB Eastern Europe/Soviet Union prime minister Parliamentary Democracy
4 ARE Oil States president Royal Dictatorship
5 ARG Latin America president Presidential Democracy
stra fhprights fhliberties personal lifeexp polity2 durable rgdpe pwtpop
1 0 5 6 0 <NA> NA 0 NA NA
2 0 6 5 0 <NA> -2 11 87251.8 18.0380
3 0 3 3 0 <NA> 9 11 21701.8 3.1814
4 0 6 5 0 <NA> -8 37 NA NA
5 5 2 2 0 <NA> 8 25 524470.0 39.7143
pwthc turnout colony womyear urban young ethfrac
1 NA 45.83 UK NA 23.28 NA 0.7693
2 NA 62.77 Portugal 1975 53.96 46.32196 0.7867
3 2.99982 53.31 Soviet Union 1920 46.14 26.35428 0.2204
4 NA NA <NA> NA NA NA NA
5 2.79697 81.07 Spain 1947 90.26 26.11798 0.2550
# Preview first 10 rows of NES:
head(nes, 10)
# A tibble: 10 Ć 51
follow birthyr turnoā¦Ā¹ vote12 meet march comprā¦Ā² ftobama ftblack ftwhite
<fct> <int> <fct> <fct> <fct> <fct> <fct> <int> <int> <int>
1 Most of t⦠1960 Defini⦠Barac⦠Extr⦠Have⦠Compro⦠100 100 100
2 Some of t⦠1957 Probab⦠Not a⦠A li⦠Have⦠Compro⦠39 6 74
3 Most of t⦠1963 Defini⦠Mitt ⦠Extr⦠Have⦠Sticks⦠1 50 50
4 Most of t⦠1980 Defini⦠Barac⦠Not ⦠Have⦠Compro⦠89 61 64
5 Most of t⦠1974 Defini⦠Mitt ⦠Very⦠Have⦠Sticks⦠1 61 58
6 Most of t⦠1958 Defini⦠Someo⦠Mode⦠Have⦠Sticks⦠0 50 51
7 Most of t⦠1978 Defini⦠Someo⦠Extr⦠Have⦠Compro⦠73 100 70
8 Most of t⦠1951 Defini⦠Mitt ⦠Not ⦠Have⦠Compro⦠0 70 70
9 Most of t⦠1973 Defini⦠Mitt ⦠Mode⦠Have⦠Sticks⦠12 50 50
10 Most of t⦠1936 Defini⦠Barac⦠Extr⦠Have⦠Compro⦠87 75 90
# ⦠with 41 more variables: fthisp <int>, ftgay <int>, fttrump <int>,
# fthrc <int>, ftsanders <int>, ftpolice <int>, ftfem <int>, ftmuslim <int>,
# ftsci <int>, econnow <fct>, lcself <fct>, disc_b <fct>, disc_h <fct>,
# disc_g <fct>, disc_w <fct>, disc_m <fct>, disc_fed <fct>,
# disc_police <fct>, immig_numb <fct>, terror_worry <fct>, healthspend <fct>,
# finwell <fct>, warmcause <fct>, freetrade <fct>, stopblack <fct>,
# stop_ever <fct>, birthright_b <fct>, bo_muslim <fct>, amer_ident <fct>, ā¦
If you want to see the entire thing, you can use the View()
function:
# View NES
View(nes)
# View States
View(states)
# View World
View(world)
8.2 To view the number of rows you have in your dataset
# Number of observations (rows) for:
#* NES
nrow(nes)
[1] 1178
#* States
nrow(states)
[1] 50
#* World
nrow(world)
[1] 182
8.3 To view the number of columns you have in your dataset
# Number of variables (columns) for:
#* NES
ncol(nes)
[1] 51
#* States
ncol(states)
[1] 42
#* World
ncol(world)
[1] 42