2 Working with data in R
2.1 By the end of this section, you will be able to:
- Import data into R from excel, SPSS and csv files
- Save data to objectA word that identifies and stores the value of some data for later use.
- Identify different data structures and data types
- Convert data types from one type to another
- Order, filter and group data
- Summarise data
- Create new variables or objectsA word that identifies and stores the value of some data for later use. from data
2.2 In this section, we will use the Tidyverse set of packages
- A ‘toolkit’ of packages that are very useful for organsing and manipulating data
- We will use the
haven package to import SPSS files - We will use the
dplyr to organise data - Also includes the
ggplot2 andtidyR packages which we will use later
To install:
install.packages(“tidyverse”)
(See the previous section on installing packages)
2.3 Import data into R from excel, SPSS and csv files
We can import data from a range of sources using the
It is also possible to import data using code, for example:
` # importing a .csv file
Once the data are imported, it will be visible in the environment:
2.4 Restructuring and reorganising data in R (long versus wide data)
2.5 Understanding objects in R
In R, an objectA word that identifies and stores the value of some data for later use. is anything that is saved to memory. For example, we might do some analysis:
mean(happiness)
However, in the example above, the result would appear in the console but not be saved anywhere. To store the result for reuse later, we save it to an object:
In the above code (reading left to right):
- We name the object “happinessMean”. This name can be anything we want.
- The arrow means that the result of the code on the right will be saved to the object on the left.
- The code on the right of the arrow calculates the mean of happiness data
When this code is run, happinessMean will be stored in the environment window:
To recall an object from the environment, we can simply type its name. For example:
Its important to note that anything can be stored as an object in R and recalled later. This includes, dataframes, the results of statistical calculations, plots etc.
2.6 Identify different data structuresA data structure that aggregates data, such as a vector, list, matrix, or data frame and variable types
2.6.1 Data structures (sometimes referred to as “data containersA data structure that aggregates data, such as a vector, list, matrix, or data frame”)
There are many different types of data structures that R can work with. The most common type of data for most people tends to be a data frame. A data frame is what you might consider a “normal” 2-dimensional dataset, with rows of data and columns of variables:
R can also use other data structures.
A vector is a one-dimensional set of values:
A matrix is a multi-dimensional set of values. The below example is a 3-dimensional matrix, there are 2 groups of 2 rows and 3 columns:
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
We will primarily work with dataframes (and sometimes vectors), as this is how the data in psychology research is usually structured.
2.6.2 Types of numerical data
With numerical data, there are 4 key data types:
- factor (data type) or nominal (a category, group or factor)
- ordinalDiscrete variables that have an inherent order, such as level of education or dislike/like. (a ranking)
- interval (data type) (scale data that can include negative values)
- ratio (data type) (scale data that cannot include negative values)
R can use all of these variable types:
- Nominal variables are called factors
- Ordinal variables are called ordered factors
- Interval and ratio variables are called numeric data and can sometimes be called integers (if they are only whole numbers) or doubles (if they all have decimal points)
R can also use other data types that are not numerical such as text (characterA data type representing strings of text.) data.
2.6.3 Convert variables from one data type to another
When we first import data into R, it might not recognise the data types correctly. For example, in the below data, we can see the intervention variable :
participant | intervention | happiness |
---|---|---|
4 | 2 | 6.245260 |
7 | 2 | 8.745944 |
9 | 2 | 8.906846 |
13 | 2 | 9.199057 |
8 | 2 | 9.301780 |
5 | 2 | 9.381039 |
16 | 1 | 9.446345 |
3 | 1 | 9.909773 |
18 | 2 | 10.017880 |
17 | 2 | 10.075152 |
In the intervention variable, the numbers 1 and 2 refer to different intervention groups. Therefore, the variable is a factor (data type) variable. To ensure that R understands this, we can resave the intervention variable as a factor using the as.factor()
function:
2.7 Working with dataframes
Dataframesa 2-dimensional dataset, usually with rows of data and columns of variables are the more standard data format that were are used to (think of how a dataset looks in SPSS or Excel).
In a dataframea 2-dimensional dataset, usually with rows of data and columns of variables, variables are columns and each row usually reperesents one measurement or one participant.
2.7.1 View dataframe
To view a dataframe, we can click on it in the
2.7.2 Refer to variables (columns) in a dataframe
Columns in a dataframe are accessed using the “$” sign. For example, to access the happiness column in the happinessSample dataframe, we would type:
[1] 11.580517 11.947034 9.909773 6.245260 9.381039 11.515421 8.745944
[8] 9.301780 8.906846 11.011479 10.726459 11.337853 9.199057 11.120169
[15] 11.563120 9.446345 10.075152 10.017880 11.284192 12.638480
As we can see above, the result is then displayed.
2.8 Order, filter and group data
If you have the
arrange(happinessSample, happiness)
arrange(happinessSample, desc(happiness)) # Arrange in descending order
participant | intervention | happiness |
---|---|---|
4 | 2 | 6.245260 |
7 | 2 | 8.745944 |
9 | 2 | 8.906846 |
13 | 2 | 9.199057 |
8 | 2 | 9.301780 |
5 | 2 | 9.381039 |
16 | 1 | 9.446345 |
3 | 1 | 9.909773 |
18 | 2 | 10.017880 |
17 | 2 | 10.075152 |
11 | 2 | 10.726459 |
10 | 2 | 11.011479 |
14 | 1 | 11.120169 |
19 | 2 | 11.284192 |
12 | 1 | 11.337853 |
6 | 2 | 11.515421 |
15 | 2 | 11.563120 |
1 | 1 | 11.580517 |
2 | 1 | 11.947034 |
20 | 1 | 12.638480 |
participant | intervention | happiness |
---|---|---|
20 | 1 | 12.638480 |
2 | 1 | 11.947034 |
1 | 1 | 11.580517 |
15 | 2 | 11.563120 |
6 | 2 | 11.515421 |
12 | 1 | 11.337853 |
19 | 2 | 11.284192 |
14 | 1 | 11.120169 |
10 | 2 | 11.011479 |
11 | 2 | 10.726459 |
17 | 2 | 10.075152 |
18 | 2 | 10.017880 |
3 | 1 | 9.909773 |
16 | 1 | 9.446345 |
5 | 2 | 9.381039 |
8 | 2 | 9.301780 |
13 | 2 | 9.199057 |
9 | 2 | 8.906846 |
7 | 2 | 8.745944 |
4 | 2 | 6.245260 |
- Show clients with a happiness score of less than 4
- Show Intervention group 2 with happiness scores above 7
participant | intervention | happiness |
---|---|---|
5 | 2 | 9.381039 |
6 | 2 | 11.515421 |
7 | 2 | 8.745944 |
8 | 2 | 9.301780 |
9 | 2 | 8.906846 |
10 | 2 | 11.011479 |
11 | 2 | 10.726459 |
13 | 2 | 9.199057 |
15 | 2 | 11.563120 |
17 | 2 | 10.075152 |
18 | 2 | 10.017880 |
19 | 2 | 11.284192 |
- Group by intervention and show the mean happiness score
2.9 Create new variables / objects from data
To create new variables from data, we can use the mutate() function.
For example, let’s say we wanted to calculate the difference between each person’s happiness score and the mean happiness score.
We could do the following:
participant | intervention | happiness | difference |
---|---|---|---|
1 | 1 | 11.580517 | 1.2828274 |
2 | 1 | 11.947034 | 1.6493438 |
3 | 1 | 9.909773 | -0.3879166 |
4 | 2 | 6.245260 | -4.0524304 |
5 | 2 | 9.381039 | -0.9166514 |
6 | 2 | 11.515421 | 1.2177310 |
7 | 2 | 8.745944 | -1.5517460 |
8 | 2 | 9.301780 | -0.9959098 |
9 | 2 | 8.906846 | -1.3908438 |
10 | 2 | 11.011479 | 0.7137887 |
11 | 2 | 10.726459 | 0.4287693 |
12 | 1 | 11.337853 | 1.0401634 |
13 | 2 | 9.199057 | -1.0986329 |
14 | 1 | 11.120169 | 0.8224791 |
15 | 2 | 11.563120 | 1.2654296 |
16 | 1 | 9.446345 | -0.8513449 |
17 | 2 | 10.075152 | -0.2225381 |
18 | 2 | 10.017880 | -0.2798103 |
19 | 2 | 11.284192 | 0.9865019 |
20 | 1 | 12.638480 | 2.3407900 |