Topic 6 Graphing and data visualisation with R
6.3 By the end of this section, you will be able to:
- Describe the ggplot “grammar of visualisation”: coordinates and geoms
- Write a graph function to display multiple variables on a plot
- Amend the titles and legends of a plot
- Save plots in PDF or image formats
6.4 The “grammar of visualisation”
- Graphs are made up of 3 components:
- A dataset
- A coordinate system
- Visual marks to represent data (geoms)
The “grammar of visualisation” #2
- In the above example, the dataset is the studentData that we used previously.
- The grades variable is mapped to the X axis
- The hoursOfStudy variable is mapped to the Y axis
6.5 How to code a graph
The graph is created using the following code:
In this code, we specify the dataset, the variables for the X and Y axes and the geom that will represent the data points visually (in this case, each datum is a point)
6.6 The graph output
library(ggplot2)
ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) + geom_point()
6.7 Changing the geoms leads to different visualisations
- If we change from points to lines, for example we get a different plot:
library(ggplot2)
ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) + geom_line()
6.8 It is possible to represent more variables on the plot
- By specifying that colours of our points should be attached to the route variable, the data is now colour-coded
library(ggplot2)
ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) + geom_point(aes(color = route))
6.9 It is possible to represent more variables on the plot #2
- By specifying that size of our points should be attached to the satisfactionLevel variable, the size of the points adjusts
library(ggplot2)
ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) + geom_point(aes(color = route, size=satisfactionLevel))
## Warning: Using size for a discrete variable is
## not advised.
6.10 It is possible to represent more variables on the plot #3
- By specifying that shape of our points should be attached to the hasDependents variable, the shape of the points changes accordingly
library(ggplot2)
ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) + geom_point(aes(color = route, size=satisfactionLevel, shape=hasDepdendants))
## Warning: Using size for a discrete variable is
## not advised.
6.11 Plotting summaries of data
- We can summarise the data (e.g. get the mean or sd) using the stat_summary() function
- Below we are making a bar chart with the mean grade for each route
ggplot(data=studentData, aes(x=route, y= grades, fill=route)) + stat_summary(fun.y = "mean", geom = "bar")
## Warning: The `fun.y` argument of
## `stat_summary()` is deprecated as of
## ggplot2 3.3.0.
## ℹ Please use the `fun` argument
## instead.
6.12 Changing the axis labels and title on a plot
We can change the axis labels and title using the labs() command:
labs(x="Student Grade", y="Hours of Study", title = "Scattterplot of student data")
library(ggplot2)
ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) + geom_point(aes(color = route, size=satisfactionLevel, shape=hasDepdendants)) + labs(x="Student Grade", y="Hours of Study", title = "Scattterplot of studentdata")
## Warning: Using size for a discrete variable is
## not advised.
6.13 Changing the legend on a plot
To change the legend, we use the labs() command too, and reference the relevant property (e.g. size, shape, colour)
labs(x="Student Grade", y="Hours of Study", title = "Scattterplot of student data", color="Route of study", size="Satisfaction level", shape="Has dependents?")
library(ggplot2)
ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) +
geom_point(aes(color = route, size=satisfactionLevel, shape=hasDepdendants)) +
labs(x="Student Grade", y="Hours of Study", title = "Scattterplot of studentdata", color="Route of study", size="Satisfaction level", shape="Has dependents?")
## Warning: Using size for a discrete variable is
## not advised.
6.14 Storing plots to be recalled later
- Plots can be assigned to objects in R and recalled later, just like any other piece of data
library(ggplot2)
## Create plot and store it as "myPlot" object
<- ggplot(data=studentData, aes(x=grades,y=hoursOfStudy)) +
myPlot geom_point(aes(color = route, size=satisfactionLevel, shape=hasDepdendants)) +
labs(x="Student Grade", y="Hours of Study", title = "Scattterplot of studentdata", color="Route of study", size="Satisfaction level", shape="Has dependents?")
6.15 Recalling a stored plot
#Recall myPlot
myPlot
## Warning: Using size for a discrete variable is
## not advised.
6.17 Plots can also be saved using code
- You might want to include code to save your plot in a script, for example
- This can allow greater control over the output file and plot dimensions:
ggsave(plot= myPlot, file="myPlot.pdf", width = 4, height = 4)
## Warning: Using size for a discrete variable is
## not advised.
ggsave(plot= myPlot, file="myPlot.png", width = 4, height = 4, units="cm", dpi=320)
## Warning: Using size for a discrete variable is
## not advised.