There are several functions in base R to pivot data. These include
the t()
and reshape()
functions. There are
also well-designed functions in the tidyverse for
transposing data. Nevertheless, some users will prefer the syntax of
proc_transpose
. This function provides control over output
column naming, and an intuitive set of parameters.
To explore proc_transpose
, let’s first create some
sample data:
# Create input data
dat <- read.table(header = TRUE, text = '
Name Subject Semester1 Semester2
Samma Maths 96 94
Sandy English 76 51
Devesh German 76 95
Rakesh Maths 50 63
Priya English 62 80
Kranti Maths 92 92
William German 87 75')
# View data
dat
# Name Subject Semester1 Semester2
# 1 Samma Maths 96 94
# 2 Sandy English 76 51
# 3 Devesh German 76 95
# 4 Rakesh Maths 50 63
# 5 Priya English 62 80
# 6 Kranti Maths 92 92
# 7 William German 87 75
The proc_tranpose
function may be executed without any
parameters. The default usage will tranpose all numeric variables and
construct generic column names for the new columns:
# No parameters
res <- proc_transpose(dat)
# View result
res
# NAME COL1 COL2 COL3 COL4 COL5 COL6 COL7
# 1 Semester1 96 76 76 50 62 92 87
# 2 Semester2 94 51 95 63 80 92 75
Notice that in the default usage, all output column names have been
assigned generic names. We may control the output column names using the
id
and prefix
parameters:
# With prefix
<- proc_transpose(dat, name = "VarName", prefix = "Student")
res
# View result
res
VarName Student1 Student2 Student3 Student4 Student5 Student6 Student71 Semester1 96 76 76 50 62 92 87
2 Semester2 94 51 95 63 80 92 75
Here is the same function call with a suffix
:
# With suffix
<- proc_transpose(dat, name = "VarName", prefix = "S", suffix = "Score")
res
# View result
res
VarName S1Score S2Score S3Score S4Score S5Score S6Score S7Score1 Semester1 96 76 76 50 62 92 87
2 Semester2 94 51 95 63 80 92 75
Note that since a variable name in R cannot start with a number, some sort of prefix is required.
If our data contains a column with appropriate labels for the
transposed column, we can assign it to the id
parameter.
The id
values will then be used for the new column
names.
# Assign column names from data
<- proc_transpose(dat, name = "VarName", id = "Name")
res
# View result
res
VarName Samma Sandy Devesh Rakesh Priya Kranti William1 Semester1 96 76 76 50 62 92 87
2 Semester2 94 51 95 63 80 92 75
Using two id
parameters tells the function that you want
columns that are combinations of the two variables:
# Two id variables
res <- proc_transpose(dat, id = c("Name", "Subject"))
res
# NAME Samma.Maths Sandy.English Devesh.German Rakesh.Maths Priya.English Kranti.Maths William.German
# 1 Semester1 96 76 76 50 62 92 87
# 2 Semester2 94 51 95 63 80 92 75
The default delimiter shown above is a dot (“.”). The delimiter may
be changed with the delimiter
parameter:
# Underscore delimiter
res <- proc_transpose(dat, id = c("Name", "Subject"), delimiter = "_")
res
# NAME Samma_Maths Sandy_English Devesh_German Rakesh_Maths Priya_English Kranti_Maths William_German
# 1 Semester1 96 76 76 50 62 92 87
# 2 Semester2 94 51 95 63 80 92 75
The by
parameter tells the function to group by the
by
variable before transposing. As you can see below, the
by
varible is then retained on the output dataset so you
can identify which rows belong to which group.
# By variable
res <- proc_transpose(dat, by = "Name", id = "Subject", name = "Semester")
# View result
res
# Name Semester German English Maths
# 1 Devesh Semester1 76 NA NA
# 2 Devesh Semester2 95 NA NA
# 3 Kranti Semester1 NA NA 92
# 4 Kranti Semester2 NA NA 92
# 5 Priya Semester1 NA 62 NA
# 6 Priya Semester2 NA 80 NA
# 7 Rakesh Semester1 NA NA 50
# 8 Rakesh Semester2 NA NA 63
# 9 Samma Semester1 NA NA 96
# 10 Samma Semester2 NA NA 94
# 11 Sandy Semester1 NA 76 NA
# 12 Sandy Semester2 NA 51 NA
# 13 William Semester1 87 NA NA
# 14 William Semester2 75 NA NA
The by
parameter is a valuable feature of
proc_transpose
. The by variable can have a significant
effect on the shape of the output data. Let’s see what happens when we
use a different by variable.
# By variable
res <- proc_transpose(dat, by = "Subject", id = "Name")
# Subject NAME Sandy Priya Devesh William Samma Rakesh Kranti
# 1 English Semester1 76 62 NA NA NA NA NA
# 2 English Semester2 51 80 NA NA NA NA NA
# 3 German Semester1 NA NA 76 87 NA NA NA
# 4 German Semester2 NA NA 95 75 NA NA NA
# 5 Maths Semester1 NA NA NA NA 96 50 92
# 6 Maths Semester2 NA NA NA NA 94 63 92
Now let’s use two by variables:
# Two by variables
res <- proc_transpose(dat, by = c("Name", "Subject"))
# View results
res
# Name Subject NAME COL1
# 1 Priya English Semester1 62
# 2 Priya English Semester2 80
# 3 Sandy English Semester1 76
# 4 Sandy English Semester2 51
# 5 Devesh German Semester1 76
# 6 Devesh German Semester2 95
# 7 William German Semester1 87
# 8 William German Semester2 75
# 9 Kranti Maths Semester1 92
# 10 Kranti Maths Semester2 92
# 11 Rakesh Maths Semester1 50
# 12 Rakesh Maths Semester2 63
# 13 Samma Maths Semester1 96
# 14 Samma Maths Semester2 94
By transposing one more time on the results of the previous example, we can nearly restore the original data frame.
# Restore original data shape
res2 <- proc_transpose(res, by = c("Name", "Subject"), id = "NAME")
# View results
res2[ , c("Name", "Subject", "Semester1", "Semester2")]
# Name Subject Semester1 Semester2
# 1 Priya English 62 80
# 2 Sandy English 76 51
# 3 Devesh German 76 95
# 4 William German 87 75
# 5 Kranti Maths 92 92
# 6 Rakesh Maths 50 63
# 7 Samma Maths 96 94