R Programming – Quick Starter

Data Structures / R Objects

R supports the following most commonly used Data structures. These Data objects are built using the data types Logical, Numeric, Logical, Complex, Character & Raw. Apart from these basic types these Data Structures might also have other data structures. The following are the most commonly used Data Objects.

  1. Vector
  2. Lists
  3. Matrices
  4. Arrays
  5. Factors
  6. Data Frames

Vectors

Vectors are basic data structures used in R that must contain same type of data elements. The supported data types include logical, integer, double, character, complex and raw. If the data elements are not of the same type, automatic coercion is attempted to make all the elements to the same data type. Coercion is from lower to higher types, ie logical to integer to Numeric to character.

Creating Vectors

Vectors can be created using 

  1. combine c() function
  2. : operator
  3. seq() function
  4. rep() function
The combine c() function

Vectors with multiple values can be built using the combine c() function. 

Syntax:

vectorName<-c(data_element1,data_element2,...data_elementn)

# Vectors in R can be created using c() function and providing one or more values of same or different data types
#create vector numbers with integer numbers
>numbers<-c(5,8,1,11,3,7,19,2,14,9)

#create vector with integer and floating numbers
>mixed_numbers <- c(3.14, 2.86, 48, 29, 96.32, 12.3456, 39)

#create a vector with character data type
>world_cities<-c("paris","madrid", "Newyork","Delhi","Lima")

#The character values can also be provided in single quotes but when printed back R displays in double quotes.
>fruits<-c('apples','oranges','mangoes','grapes','plums','berries')

#multiple vectors of different types can be used as parameters of c() function to create a new vector.
>distances<-c(34, -43, 17, c(74, 56), 2e+2, 3+43*7-26/(3+4*2))
>combo<-c(mixed_numbers, distances)
>cities_distances <- c(world_cities,distances)

#To create a sequence of numbers in either ascending order or descending order we can use c(start : end) form of combine function
>asc_sequence <- c(-3:3)
>asc_sequence
[1] -3 -2 -1  0  1  2  3
>asc_seq2 <- c(1:4)
>asc_seq2
[1] 1 2 3 4
>desc_seq <- c(4:1)
>desc_seq
[1] 4 3 2 1

Basic details of a vector can be found using the following build-in functions typeof(), class() and mode().

#The type of vector can be obtained using typeof(vector_name) function
#typeof() on vectors numbers, mixed_numbers, distances returns "double"
>typeof(numbers)
[1] "double"

#typeof() on vectors world_cities, fruits return "character"
>typeof(fruits)
[1] "character"

#typeof() on the vector where a numeric vector and a character vector is combined will return "character"
>typeof(cities_distances)
[1] "character"

# class() on vectors numbers, mixed_numbers, distances, combo will return "numeric"
>class(numbers)
[1] "numeric"

#class() on vectors world_cities, fruits, cities_distances will return "character"
>class(cities_distances)
[1] "character"

# mode() on vectors numbers, mixed_numbers, distances, combo will return "numeric"
>mode(mixed_numbers)
[1] "numeric"

# mode() on vectors world_cities, fruits, cities_distances will return "character"
>mode(fruits)
[1] "character"

The number of elements in a vector (length) can be obtained by length(vector_name) function

#The number of elements in a vector can be obtained using the length() function
>length(numbers)
[1] 10
>length(cities_distances)
[1] 12
The : operator

To create vectors with a sequence of consecutive numbers in either ascending or descending order, the : operator can be used.

#The sequence operator : can be used to create vectors with either ascending or descending sequence of numbers.
#This is a trimmer version of c(start:end) function where c() can be omitted and just use the : operator.
>asc_seq <- -3:3
>asc_seq
[1] -3 -2 -1 0 1 2 3
>desc_seq <- 4:1
>desc_seq 
[1] 4 3 2 1
The seq() function

Vectors with sequential numbers can also be generated using seq() and rep() functions.

Syntax:

vectorName <- seq(from=startValue, to=endValue, by=stepValue)

The from, to and by are optional to mention and can be used to improve readability of the code.

If the step value is not mentioned, 1 is by default taken as the step value. If no startValue is mentioned, 1 is taken by default.

#The seq(from=startValue, to=endValue, by=stepValue) can be used to generate a sequence or a non sequence of numbers.
# For sequence of numbers a step value of 1 is used and a step value other than 1 is used for a non sequence of numbers.
>even <- seq(from=2, to=10, by=2)
>even
[1] 2 4 6 8 10

# The keywords 'from', 'to' and 'by' are only used for improving readability and need not be included.
>even <- seq(2,10,2)
>even
[1] 2 4 6 8 10

# If the 'step' value is 1, including that in seq() parameter list is optional.
# The below commented statements are valid, but in case of consecutive numbers the : operator can also be used.
#sequence1 <- seq(1,4)
#sequence1 <- seq(,4)
>sequence1 <- seq(1:4)
>sequence1
[1] 1 2 3 4

#If the 'from' value is 1, including that in seq() parameter list is optional. Considering the 'step' value to be 1 as well, the statement is
>sequence1 <- seq(4)
>sequence1
[1] 1 2 3 4

# A negative step value or by value is used to create a descending sequence of numbers.
>milesToGo <- seq(20,0,-5)
>milesToGo
[1] 20, 15, 10, 5, 0

#If the to value does not fall with in the multiples of 'step' / 'by' value, then the sequence is generated till the multiple nearest to 'to' value
>milesToGo <- seq(20,8,-5)
>milesToGo
[1] 20, 15, 10

#The 'step / by' value can also by a non-whole number or a decimal number like 0.5, 0.3 etc
>series <- seq(5,0,by = -0.5)
>series
[1] 5, 4.5, 4.0, 3.5, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5, 0

#The number of elements in the sequence can be determined by a parameter 'length.out' based on which the step value is auto-calculated 
>lengthOut <- seq(1,5,3)
>lengthOut
[1] 1.0 2.5 4.0

#If a fractional value is used for 'length.out', it is rounded off to the nearest integer value. 
#For example, if length.out is taken as 2.2 or 2.8, 2 elements are generated as part of sequence.
>fractionLength <- seq(1,5,2.2)
>fractionLength <- seq(1,5,2.8)
[1] 1.0, 3.8
The rep() function

The function rep(x, times, len, each) can also be used for generating vectors.

Syntax:

rep(x, times, len, each)

where x is the vector (could be a single element vector or a multi-element vector) to be repeated, times is the number of times the vector as a whole is repeated, len is used to define the length of the sequence, each is used to specify the number of times each element is repeated. If the length (len) specified is not a multiple of the number of elements in the vector, the elements are repeated until the value in len is met and the rest of the sequence is not repeated.

# The rep(vectorName, times, len, each) is used to create vectors. 
# The same vector name is used in all below examples and therefore the values are over written with execution of each command.
>numbers <- rep(c(2,3,4))
>numbers
[1] 2 3 4

#The sequence 2,3,4 is repeated two times by using the times=2 argument
>numbers <- rep(c(2,3,4), times=2)
>numbers
[1] 2 3 4 2 3 4

#The len argument is used to specify the length of the sequence. If len-4 is used 4 elements are printed. 
>numbers <- rep(c(2,3,4), len = 4)
>numbers
[1] 2 3 4 2
>numbers <- rep(c(2,3,4), len = 5)
>numbers
[1] 2 3 4 2 3

#The each=value argument is used to repeat each element the given number of times.
>numbers <- rep(c(2,3,4), each - 2)
>numbers
[1] 2 2 3 3 4 4 

Usually only one of the options is used but they can be used in combination as well with each taking precedence. Also the values provided for times and each need not be whole numbers but can be fractions or vectors themselves.

# Combination of times, len and each can be used as required to created vectors.
>numbers<-rep(c(c(2,4,6), seq(8,16,2)), times=3, each=2)
>numbers
[1]  2  2  4  4  6  6  8  8 10 10 12 12 14 14 16 16  2  2  4  4  6  6  8  8 10 10 12 12
[29] 14 14 16 16  2  2  4  4  6  6  8  8 10 10 12 12 14 14 16 16

For more information read help (?rep()).

Accessing Elements

The following are the key ways to access specific elements of the vectors. These methods can be used in all kinds of combinations to define the elements to be accessed.

  • In R the indexing does not start with 0 but starts from 1 to n where n is the length of the vector. 
  • The elements in logical, integer, numeric and character vectors can be accessed using the index.
  • Positive integer values are used to specify the position of the element in the vector to access.
  • Multiple index values in the form of a vector can be used to access the specific values of the vector.
  • Negative integer values are used to access all other elements except for the specified indexed elements.
  • Multiple negative values in the form of a vector can be used to access all elements except for the listed values. 
  • Negative and positive index values cannot be mixed while accessing the elements.
  • If real numbers are used as index values, the fractional part is truncated and integer values are considered.
  • A logical vector can be used for accessing elements in the vector.
  • Logical conditions can also be used to access elements in the vectors.
  • The length() function can be used to define the range of elements to be retrieved.
  • Using character vector to access elements in a vector. This is mostly used when we have named vectors. Named vectors are those where elements are labeled.
  • The seq() function can be used to build a sequence of data elements to be retrieved.
# Define a vector using which multiple ways of accessing elements can be demonstrated.
> sample <- c(c(1:12), 3.14, 2.85, 6.9, -4.56)
>sample
[1]  1.00  2.00  3.00  4.00  5.00  6.00  7.00  8.00  9.00 10.00 11.00 12.00  3.14  2.85
[15]  6.90 -4.56

# A positive integer value can be used to specify the position of element to be accessed. 
# To access 4th element from vector 'sample'
>sample[4]
[1] 4

# To access 'first', 'fifth', 'tenth', 'fifteenth' elements of the vector 'sample', create a vector with indexes required.
>sample[ c(1,5,10,15)]
[1]  1.0  5.0 10.0  6.9

# To access a range of consecutive elements from a vector, a sequence can be used.
# To access all elements from 2nd element to 7th element from sample vector.
>sample[2:7]
[1] 2 3 4 5 6 7

# A Negative index value can be used to access all elements except for the element in specified index.
# To access all elements except for the element in 5th index. 
>sample[-5]
[1]  1.00  2.00  3.00  4.00  6.00  7.00  8.00  9.00 10.00 11.00 12.00  3.14  2.85  6.90
[15] -4.56

# Multiple negative values in the form of a vector can be used to access all elements except for the indexed values.
# To access all elements except for 1st, 5th, tenth, fifteenth elements from sample vector.
>sample[c(-1,-5,-10,-15)]
[1]  2.00  3.00  4.00  6.00  7.00  8.00  9.00 11.00 12.00  3.14  2.85 -4.56
# To access all elements except for 2nd to seventh element from the sample vector.
>sample[-2:-7]
[1]  1.00  8.00  9.00 10.00 11.00 12.00  3.14  2.85  6.90 -4.56

# Negative and positive index values cannot be mixed in the same statement while accessing the elements. 
# "incorrect number of dimensions" error is thrown when used. sample[-3,7], sample[-7,3] both give same result.
>sample[-3,7]
Error in sample[-3, 7] : incorrect number of dimensions

# when 0 as an index value is used along with positive or negative index values, "incorrect number of dimensions" error is thrown
# sample[0,7], sample[7,0], sample[0,-7], sample[-7,0] all throw the same error
>sample[0,7]
Error in sample[0, 7] : incorrect number of dimensions
# However when 0 is used to build a vector, it is ignored and rest of the vector is considered while returning the elements.
# The below statement is equivalent to sample[14]
>sample[c(0,14)]
[1] 2.85
>sample[c(0,-14)]
[1]  1.00  2.00  3.00  4.00  5.00  6.00  7.00  8.00  9.00 10.00 11.00 12.00  3.14  6.90
[15] -4.56

# If real numbers are used as index values, the fractional part is ignored and only the integer part is considered.
>sample[c(3.5, 4.8, 14.2)]
[1] 3.00 4.00 2.85
>sample[c(-3.5, -4.8, -14.2)]
[1]  1.00  2.00  5.00  6.00  7.00  8.00  9.00 10.00 11.00 12.00  3.14  6.90 -4.56

# A logical vector can be used for accessing elements in the vector. 
# If the number of elements in the logical vector matches the number of elements in the actual vector, they are applied as is
# If the number of elements in the logical vector is less than the elements in the actual vector, the sequence is repeated across the collection.
>sample[c(TRUE, FALSE, FALSE, TRUE, TRUE)]
[1]  1.00  4.00  5.00  6.00  9.00 10.00 11.00  2.85  6.90 -4.56
# To access the elements in odd positions the below sequence can be used.
>sample[c(TRUE,FALSE)]
[1]  1.00  3.00  5.00  7.00  9.00 11.00  3.14  6.90
# To access the elements in even positions
>sample[c(FALSE,TRUE)]
[1]  2.00  4.00  6.00  8.00 10.00 12.00  2.85 -4.56

Vector elements can be labelled so they can be referred using the label for easier identification called named vectors. In case of named vectors, the elements can be accessed using the labels to refer to the elements to be retrieved. A character vector with the labels as elements can be used to access elements from a named vector.

# Labels in a named vector can be used to create a character vector to access specific elements 
>shapes <- c("side1" = 10, "side2" = 15, "side3" = 20, "side4"=25, "side5"=30)
> shapes
side1 side2 side3 side4 side5 
   10    15    20    25    30

#To access first, third and fifth element use
> shapes[c("side1", "side3", "side5")]
side1 side3 side5 
   10    20    30 

Logical conditions can also be used to access elements in the vectors. To create the logical conditions, R supported operators can be used (Refer to Operators section from the Table of contents on the right hand side). Please note that when operators such as +, -, /, *, >, <, etc are used in conjunction with vectors, scalar operation is performed. ie the same operation is performed on all elements.

# Performing scalar operations on vectors to formulate logical conditions to access specific elements.
# To access all elements in vector sample that are less than 7
>sample[sample<7]
[1]  1.00  2.00  3.00  4.00  5.00  6.00  3.14  2.85  6.90 -4.56

# To access all elements in vector sample that are greater than 7
>sample[sample>7]
[1]  8  9 10 11 12

# To access all elements that are less than 5 and greater than 10
>sample[ sample < 5 | sample > 10 ]
[1]  1.00  2.00  3.00  4.00 11.00 12.00  3.14  2.85 -4.56

  • The length() function can be used to define the range of elements to access.

>numbers[((length(numbers)-2):(length(numbers)-4))]

  • The length function can be used to define the range of elements to be retrieved.

>numbers[length(numbers) – 4 : length(numbers)]

  • The seq() function can be used to build a sequence of data elements to be retrieved. For example, to get the odd and even indexed numbers from the vector use the below code.

> numbers[ seq(1, length(numbers), 2)]

#used 2 as starting index as indexing starts at 1.

> numbers[ seq(2, length(numbers), 2)]