# Data Frames

In R a data frame is a kind of object. Like vectors, data frames store data. However, data frames are differ in that they store multiple vectors. It is important that you understand what a data frame is as it is the most frequently used tool in political statistical analysis.

If you are having a hard time visualizing a dataframe simply think of what a spreadsheet looks like. Each column of the dataframe can be said to be vector, each vector represents a variable and the rows coincide with an observation. In all statistical software variables are represented by columns and observations are by rows.

You may create a data frame manually if you want but living in the age of big data this is rarely the case! There are many example datasets pre-loaded in RStudio.

Let’s have a look at one of these pre-loaded data frames. The data frame is called longley (this is an pre-loaded economic dataset)

Using the View function let’s see the variables included in the dataset

data("longley")

View(longley)


If we want to see individuals columns, in other words, a specific variable in the data frame, then we use the $sign between the name of the dataset and the name of the variable (e.g name_of_dataset$name_of_variable). Let’s start by observing the Unemployment column.

longley$Unemployed  In addition, often we want to access only certain observations (rows) or only certain variables (columns). By using the square brackets [ ] we subset the data frame. In the square brackets, we insert the coordinates for a row and a column. The row is always first followed by the column. For example, longey[7, 5] gives us the 7th row and the 5th column. If we leave the column coordinate empty then we want to see all columns longey[7, ]. If we leave the row coordinate empty then we want all columns. longley[7,5]  Leave the column coordinate empty to see the 7th row longley[7, ]  Leave the row empty to see the 5th column longley[ ,5]  We may see the first ten rows of a dataset by adding a colon in the brackets longley[1:10, ]  # Plots Let’s create a plot from our dataset. Let’s start by creating a scatterplot with the one axis (X) representing the Year and the other (Y) axis the Gross National Product plot(longley$Year,longley$GNP)  to create the same plot but by using a line instead of dots we add the argument type="l" plot(longley$Year,longley\$GNP,type = "l")