5.2 Managing vectors

The most basic data structure in R is the vector. A vector is a sequence of data elements of the same basic type i.e.: numeric (integer or double), logical, or character (there are two additional vector types which I will not discuss - complex and raw). This tutorial provides you with the basics of managing vectors.

5.2.1 Creating vectors

There are four main ways to create a vector: :, c(), seq(), rep(). The colon : operator can be used to create a vector of integer-differenced values between two specified numbers or the c() function can be used to create vectors of objects by concatenating elements together:

# integer vector
w1 <- 8:17
w1
 [1]  8  9 10 11 12 13 14 15 16 17

w2 <- 8.35:20
w2
 [1]  8.35  9.35 10.35 11.35 12.35 13.35 14.35 15.35 16.35 17.35 18.35
[12] 19.35

# double vector
x <- c(0.5, 0.6, 0.2)
x
[1] 0.5 0.6 0.2

# logical vector
y1 <- c(TRUE, FALSE, FALSE)
y1
[1]  TRUE FALSE FALSE

# logical vector in shorthand
y2 <- c(T, F, F) 
y2
[1]  TRUE FALSE FALSE

# Character vector
z <- c("a", "b", "c") 
z
[1] "a" "b" "c"

The seq() function generates a vector sequence of numbers (or dates) with a specified arithmetic progression. The rep() function allows us to conveniently repeat specified constants into long vectors in a collated or non-collated manner.

# generate a sequence of numbers from 1 to 21 by increments of 2
seq(from = 1, to = 21, by = 2)             
 [1]  1  3  5  7  9 11 13 15 17 19 21

# generate a sequence of numbers from 1 to 21 that has 15 equal incremented 
# numbers
seq(0, 21, length.out = 15)    
 [1]  0.0  1.5  3.0  4.5  6.0  7.5  9.0 10.5 12.0 13.5 15.0 16.5 18.0 19.5
[15] 21.0

# replicates the values in x a specified number of times in a collated fashion
rep(1:4, times = 2)   
[1] 1 2 3 4 1 2 3 4

# replicates the values in x in an uncollated fashion
rep(1:4, each = 2)    
[1] 1 1 2 2 3 3 4 4

You can also use the as.vector() function to initialize vectors or change the vector type:

v <- as.vector(8:17)
v
 [1]  8  9 10 11 12 13 14 15 16 17

# turn numeric vector to character
as.vector(v, mode = "character")
 [1] "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17"

Vectors are atomic, meaning that every element of a vector must be the same type. Combining objects of different atomic modes coerces the “higher” mode element to that of the “lowest” mode element accoring to the heirarchy raw \(\rightarrow\) logical \(\rightarrow\) integer \(\rightarrow\) numeric \(\rightarrow\) complex \(\rightarrow\) character

# numerics are turned to characters
str(c("a", "b", "c", 1, 2, 3))
 chr [1:6] "a" "b" "c" "1" "2" "3"

# logical are turned to numerics...
str(c(1, 2, 3, TRUE, FALSE))
 num [1:5] 1 2 3 1 0

# or character
str(c("A", "B", "C", TRUE, FALSE))
 chr [1:5] "A" "B" "C" "TRUE" "FALSE"

Likewise, function calls on objects containing elements with different atomic modes coerces the “higher” mode element to that of the element with the “lowest” mode

2 < "george"
[1] TRUE

2 > "george"
[1] FALSE
 
-2 < "-3"
[1] TRUE

-2 < FALSE
[1] TRUE

In many cases, objects can be coerced from a lower atomic mode to a higher atomic mode.

as.complex(-2)
[1] -2+0i

as.character(-2)
[1] "-2"

as.logical(-2) ### Only 0 returns as FALSE
[1] TRUE

as.numeric('2')
[1] 2

5.2.2 Adding on to Vectors

To add elements onto an existing vector, we can continue to leverage the c() function. Also, note that vectors are always flat so nested c() functions will not add additional dimensions to the vector:

v1 <- 8:17

c(v1, 18:22)
 [1]  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22

# same as
c(v1, c(18, c(19, c(20, c(21:22)))))
 [1]  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22

5.2.3 Adding Attributes to Vectors

The attributes that you can add to vectors includes names and comments. If we continue with our vector v1 we can see that the vector currently has no attributes:

attributes(v1)
NULL

We can add names to vectors using two approaches. The first uses names() to assign names to each element of the vector. The second approach is to assign names when creating the vector.

# assigning names to a pre-existing vector
names(v1) <- letters[1:length(v1)]
v1
 a  b  c  d  e  f  g  h  i  j 
 8  9 10 11 12 13 14 15 16 17 

attributes(v1)
$names
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

# adding names when creating vectors
v2 <- c(name1 = 1, name2 = 2, name3 = 3)
v2
name1 name2 name3 
    1     2     3 

attributes(v2)
$names
[1] "name1" "name2" "name3"

We can also add comments to vectors to act as a note to the user. This does not change how the vector behaves; rather, it simply acts as a form of metadata for the vector.

comment(v1) <- "This is a comment on a vector"
v1
 a  b  c  d  e  f  g  h  i  j 
 8  9 10 11 12 13 14 15 16 17 

attributes(v1)
$names
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

$comment
[1] "This is a comment on a vector"

5.2.4 Subsetting Vectors

The four main ways to subset a vector include combining square brackets [ ] with:

You can also subset with double brackets [[ ]] for simplifying subsets.

5.2.4.1 Subsetting with positive integers

Subsetting with positive integers returns the elements at the specified positions:

v1
 a  b  c  d  e  f  g  h  i  j 
 8  9 10 11 12 13 14 15 16 17 

v1[2]
b 
9 

v1[2:4]
 b  c  d 
 9 10 11 

v1[c(2, 4, 6, 8)]
 b  d  f  h 
 9 11 13 15 

# note that you can duplicate index positions
v1[c(2, 2, 4)]
 b  b  d 
 9  9 11

5.2.4.2 Subsetting with negative integers

Subsetting with negative integers will omit the elements at the specified positions:

v1[-1]
 b  c  d  e  f  g  h  i  j 
 9 10 11 12 13 14 15 16 17 

v1[-c(2, 4, 6, 8)]
 a  c  e  g  i  j 
 8 10 12 14 16 17

5.2.4.3 Subsetting with logical values

Subsetting with logical values will select the elements where the corresponding logical value is TRUE:

v1[c(TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE)]
 a  c  e  f  g  j 
 8 10 12 13 14 17 

v1[v1 < 12]
 a  b  c  d 
 8  9 10 11 

v1[v1 < 12 | v1 > 15]
 a  b  c  d  i  j 
 8  9 10 11 16 17 

# if logical vector is shorter than the length of the vector being
# subsetted, it will be recycled to be the same length
v1[c(TRUE, FALSE)]
 a  c  e  g  i 
 8 10 12 14 16

5.2.4.4 Subsetting with names

Subsetting with names will return the elements with the matching names specified:

v1["b"]
b 
9 

v1[c("a", "c", "h")]
 a  c  h 
 8 10 15

5.2.5 Simplifying vs. Preserving

It’s also important to understand the difference between simplifying and preserving when subsetting. A Simplifying subset returns the simplest possible data structure that can represent the output. A Preserving subset keeps the structure of the output the same as the input.¹⁰

For vectors, subsetting with single brackets [ ] preserves while subsetting with double brackets [[ ]] simplifies. The change you will notice when simplifying vectors is the removal of names.

v1[1]
a 
8 

v1[[1]]
[1] 8

5.2.6 Performing functions on vectors

A key difference between R and many other languages is a topic known as vectorization. What does this mean? It means that many functions that are to be applied individually to each element in a vector of numbers require a loop assessment to evaluate; however, in R many of these functions have been coded in C to perform much faster than a for loop would perform. For example, let’s say you want to add the elements of two seperate vectors of numbers (x and y).

x <- c(1, 3, 4)
y <- c(1, 2, 4)

x ; y
[1] 1 3 4
[1] 1 2 4

In other languages you might have to run a loop to add two vectors together. In this for loop I print each iteration to show that the loop calculates the sum for the first elements in each vector, then performs the sum for the second elements, etc.

# empty vector 
z <- as.vector(NULL)

# `for` loop to add corresponding elements in each vector
for (i in seq_along(x)) {
        z[i] <- x[i] + y[i]
        print(z)
}
[1] 2
[1] 2 5
[1] 2 5 8

Instead, in R, + is a vectorized function which can operate on entire vectors at once. So rather than creating for loops for many function, you can just use simple syntax:

x + y
[1] 2 5 8

x * y
[1]  1  6 16

x > y
[1] FALSE  TRUE FALSE

When performing vector operations in R, it is important to know about recycling. When performing an operation on two or more vectors of unequal length, R will recycle elements of the shorter vector(s) to match the longest vector. For example:

long <- 1:10
short <- 1:5

long
 [1]  1  2  3  4  5  6  7  8  9 10
short
[1] 1 2 3 4 5

long + short
 [1]  2  4  6  8 10  7  9 11 13 15

The elements of long and short are added together starting from the first element of both vectors. When R reaches the end of the short vector, it starts again at the first element of short and contines until it reaches the last element of the long vector. This functionality is very useful when you want to perform the same operation on every element of a vector. For example, say we want to multiply every element of our vector long by 3:

long <- 1:10
c <- 3

long * c
 [1]  3  6  9 12 15 18 21 24 27 30

Remember there are no scalars in R, so c is actually a vector of length 1; in order to add its value to every element of long, it is recycled to match the length of long.

When the length of the longer object is a multiple of the shorter object length, the recycling occurs silently. When the longer object length is not a multiple of the shorter object length, a warning is given:

even_length <- 1:10
odd_length <- 1:3

even_length + odd_length
Warning in even_length + odd_length: longer object length is not a multiple
of shorter object length
 [1]  2  4  6  5  7  9  8 10 12 11

5.2.6.1 Logical Functions on Vectors

Is either \((\overline{A}<\overline{B})\) or \((\overline{A}>\overline{B})\) true?

A <- c(1,2,3,4,5) 

B <- c(6,7,8,9,10)

A < B | A > B
[1] TRUE TRUE TRUE TRUE TRUE

A < B || A > B
[1] TRUE

Are both \((\overline{A}<\overline{B})\) and \((\overline{A}>\overline{B})\) true?

A < B & A > B
[1] FALSE FALSE FALSE FALSE FALSE

A < B && A > B
[1] FALSE

5.2.6.2 Mathematical Functions on Vectors

A + B
[1]  7  9 11 13 15

A * B ## Scalar multiplication
[1]  6 14 24 36 50

A%*%B ##  Matrix multiplication

130

round(sqrt(A),digits = 3)
[1] 1.000 1.414 1.732 2.000 2.236

round(exp(A),digits = 2)
[1]   2.72   7.39  20.09  54.60 148.41

round(log(A),digits = 3)
[1] 0.000 0.693 1.099 1.386 1.609

round(log10(A),digits=3)
[1] 0.000 0.301 0.477 0.602 0.699

sum(A)
[1] 15

cumsum(A)
[1]  1  3  6 10 15

prod(B)
[1] 30240

cumprod(B)
[1]     6    42   336  3024 30240

t(A) ### Returns the transpose of A

t(t(A))

abs(-B/2)
[1] 3.0 3.5 4.0 4.5 5.0

table(c(A,B/2)) ### displays how many times each unique value is observed

1	2	3	3.5	4	4.5	5
1	1	2	1	2	1	2

See Hadley Wickham’s section on Simplifying vs. Preserving Subsetting to learn more.↩