In R a data frame is a kind of object. Like vectors, data frames store data. However, data frames are differ in that they store multiple vectors. It is important that you understand what a data frame is as it is the most frequently used tool in political statistical analysis.
If you are having a hard time visualizing a dataframe simply think of what a spreadsheet looks like. Each column of the dataframe can be said to be vector, each vector represents a variable and the rows coincide with an observation. In all statistical software variables are represented by columns and observations are by rows.
You may create a data frame manually if you want but living in the age of big data this is rarely the case! There are many example datasets pre-loaded in RStudio.
Let’s have a look at one of these pre-loaded data frames. The data frame is called longley (this is an pre-loaded economic dataset)
View function let’s see the variables included in the dataset
If we want to see individuals columns, in other words, a specific variable in the data frame, then we use the $ sign between the name of the dataset and the name of the variable (e.g name_of_dataset$name_of_variable). Let’s start by observing the Unemployment column.
In addition, often we want to access only certain observations (rows) or only certain variables (columns). By using the square brackets [ ] we subset the data frame. In the square brackets, we insert the coordinates for a row and a column. The row is always first followed by the column. For example, longey[7, 5] gives us the 7th row and the 5th column. If we leave the column coordinate empty then we want to see all columns longey[7, ]. If we leave the row coordinate empty then we want all columns.
Leave the column coordinate empty to see the 7th row
Leave the row empty to see the 5th column
We may see the first ten rows of a dataset by adding a colon in the brackets
Let’s create a plot from our dataset. Let’s start by creating a scatterplot with the one axis (X) representing the Year and the other (Y) axis the Gross National Product
to create the same plot but by using a line instead of dots we add the argument
plot(longley$Year,longley$GNP,type = "l")
title() function, to give labels to the axes, and a title to your plot.
The examples in the help are particularly informative.