5.2 Managing vectors

The most basic data structure in R is the vector. A vector is a sequence of data elements of the same basic type i.e.: numeric (integer or double), logical, or character (there are two additional vector types which I will not discuss - complex and raw). This tutorial provides you with the basics of managing vectors.

5.2.1 Creating vectors

There are four main ways to create a vector: :, c(), seq(), rep(). The colon : operator can be used to create a vector of integer-differenced values between two specified numbers or the c() function can be used to create vectors of objects by concatenating elements together:

The seq() function generates a vector sequence of numbers (or dates) with a specified arithmetic progression. The rep() function allows us to conveniently repeat specified constants into long vectors in a collated or non-collated manner.

You can also use the as.vector() function to initialize vectors or change the vector type:

Vectors are atomic, meaning that every element of a vector must be the same type. Combining objects of different atomic modes coerces the “higher” mode element to that of the “lowest” mode element accoring to the heirarchy raw \(\rightarrow\) logical \(\rightarrow\) integer \(\rightarrow\) numeric \(\rightarrow\) complex \(\rightarrow\) character

Likewise, function calls on objects containing elements with different atomic modes coerces the “higher” mode element to that of the element with the “lowest” mode

In many cases, objects can be coerced from a lower atomic mode to a higher atomic mode.

5.2.2 Adding on to Vectors

To add elements onto an existing vector, we can continue to leverage the c() function. Also, note that vectors are always flat so nested c() functions will not add additional dimensions to the vector:

5.2.3 Adding Attributes to Vectors

The attributes that you can add to vectors includes names and comments. If we continue with our vector v1 we can see that the vector currently has no attributes:

We can add names to vectors using two approaches. The first uses names() to assign names to each element of the vector. The second approach is to assign names when creating the vector.

We can also add comments to vectors to act as a note to the user. This does not change how the vector behaves; rather, it simply acts as a form of metadata for the vector.

5.2.4 Subsetting Vectors

The four main ways to subset a vector include combining square brackets [ ] with:

You can also subset with double brackets [[ ]] for simplifying subsets.

5.2.4.2 Subsetting with negative integers

Subsetting with negative integers will omit the elements at the specified positions:

5.2.4.4 Subsetting with names

Subsetting with names will return the elements with the matching names specified:

5.2.5 Simplifying vs. Preserving

It’s also important to understand the difference between simplifying and preserving when subsetting. A Simplifying subset returns the simplest possible data structure that can represent the output. A Preserving subset keeps the structure of the output the same as the input.10

For vectors, subsetting with single brackets [ ] preserves while subsetting with double brackets [[ ]] simplifies. The change you will notice when simplifying vectors is the removal of names.

5.2.6 Performing functions on vectors

A key difference between R and many other languages is a topic known as vectorization. What does this mean? It means that many functions that are to be applied individually to each element in a vector of numbers require a loop assessment to evaluate; however, in R many of these functions have been coded in C to perform much faster than a for loop would perform. For example, let’s say you want to add the elements of two seperate vectors of numbers (x and y).

In other languages you might have to run a loop to add two vectors together. In this for loop I print each iteration to show that the loop calculates the sum for the first elements in each vector, then performs the sum for the second elements, etc.

Instead, in R, + is a vectorized function which can operate on entire vectors at once. So rather than creating for loops for many function, you can just use simple syntax:

When performing vector operations in R, it is important to know about recycling. When performing an operation on two or more vectors of unequal length, R will recycle elements of the shorter vector(s) to match the longest vector. For example:

The elements of long and short are added together starting from the first element of both vectors. When R reaches the end of the short vector, it starts again at the first element of short and contines until it reaches the last element of the long vector. This functionality is very useful when you want to perform the same operation on every element of a vector. For example, say we want to multiply every element of our vector long by 3:

Remember there are no scalars in R, so c is actually a vector of length 1; in order to add its value to every element of long, it is recycled to match the length of long.

When the length of the longer object is a multiple of the shorter object length, the recycling occurs silently. When the longer object length is not a multiple of the shorter object length, a warning is given:

5.2.6.1 Logical Functions on Vectors

  • Is either \((\overline{A}<\overline{B})\) or \((\overline{A}>\overline{B})\) true?
  • Are both \((\overline{A}<\overline{B})\) and \((\overline{A}>\overline{B})\) true?

  1. See Hadley Wickham’s section on Simplifying vs. Preserving Subsetting to learn more.