By the end of this section you will:
The most common data visualisation package in RStudio is ggplot. We may install ggplot by using the
install.packages() function. We write
install.packages("ggplot2") and we call the package by using the
First of all, as most packages in RStudio the ggplot team created a cheatsheet. You may find it here ggplot Cheatsheet
If we call the
ggplot() function then we will create an empty canvas.
We start by loading our dataset entitled EVS_UK.RData
load("EVS_UK.RData") ggplot(EVS_UK) # this created an empty plot
the next step is to specify the variables we would like to use, as you know we cannot plot the whole dataset!
To specify which variables we would like to plot we have to include in the function the so called aes() section that specifies the aesthetic mappings, in other words, this section specifies how to map our variables.
Let’s start by creating a bar to see how the aes section works - you already know how to do that but this time we will give a name to our plot.
In our analysis we will use two socio-demographics variables gender and education. Gender is a dichotomous variable describing gender. Education is an ordinal variable with three levels - low, medium, and high.
In the dataset the code of the variable describing gender is
EVS_UK$v225, and the name of the variable describing education is
Before we proceed, we will give meaningful names to our variable swhile at the same time we will make sure that our new variables are categorical (factor) variable for gender- as it should be. We can easily do that by using the assignment operator \(<-\) and the
EVS_UK$gender<- factor(EVS_UK$v225, levels = c(1,2), labels = c("Men", "Women")) table(EVS_UK$gender)
## ## Men Women ## 792 996
Note: at the left side of the equation I specified the dataset at which my new variable belongs to - that is
We will do the same for the variable describing education and is currently named
Recall, Lecture 2 (Descriptive Statistics) during the lab session we used the same function to rename
In this example education is an ordinal variable with three levels- low, medium, high.
EVS_UK$education <- ordered(EVS_UK$v243_r_weight, #here you specify that this is ordered variable levels = c(1,2,3), # here you specify the values of the variable labels = c("Low", "Medium", "High")) #here you specify the names of the values
aes()is the function used tell ggplot2 which variables to use in our plot.