11 Matrices, lists and data frames
Tải bản đầy đủ - 0trang 17
INTRODUCTION TO R
> rbind(X, Y)
[,1] [,2] [,3] [,4] [,5]
X 16.92 24.03 7.61 15.49 11.77
Y 8.37 12.93 16.65 12.20 13.12
Row and column names can be set (and viewed) using the rownames() and colnames() functions:
> colnames(XY)
[1] "X" "Y"
> rownames(XY) <- LETTERS[1:5]
> XY
X
Y
A 16.92 8.37
B 24.03 12.93
C 7.61 16.65
D 15.49 12.20
E 11.77 13.12
The object, LETTERS, is a 26 character vector inbuilt into R that contains the
uppercase letters of the English alphabet. Similarly, letters, contains the equivalent
lowercase letters.
1.11.2 Lists
Whilst matrices store vectors of the same type (class) and length, lists are used to store
collections of objects that can be of differing lengths and types. Lists are constructed
using the list() function. For example, we have previously created a number of
isolated vectors (temperature, shade and names and coordinates of sites) that may
actually represent data or information from a single experiment. These objects can be
grouped together such that they all become components of a list object:
> EXPERIMENT <- list(SITE = SITE, COORDINATES = paste(X,
+
Y, sep = ","), TEMPERATURE = TEMPERATURE,
+
SHADE = SHADE)
> EXPERIMENT
$SITE
[1] "A1" "A2" "B1" "B2" "C1" "C2" "D1" "D2" "E1" "E2"
$COORDINATES
[1] "16.92,8.37" "24.03,12.93" "7.61,16.65"
[5] "11.77,13.12"
$TEMPERATURE
Q1
Q2
Q3
Q4
Q5
36.1 30.6 31.0 36.3 39.9
Q6
Q7
Q8
6.5 11.2 12.8
"15.49,12.2"
Q9 Q10
9.7 15.9
18
CHAPTER 1
$SHADE
[1] no
full no
Levels: no full
full no
full no
full no
full
Note that this list consists of four components made up of two character vectors
(SITE and COORDINATES: a vector of XY coordinates for sites A, B, C, D and E), a
numeric vector (TEMPERATURE) and a factor (SHADE). Note also that while three of
the components have a length of 10, the COORDINATES component has only ﬁve.
1.11.3 Data frames - data sets
Rarely are single biological variables collected in isolation. Rather, data are usually
collected in sets of variables reﬂecting investigations of patterns between and/or among
the different variables. Consequently, data sets are best organized into matricies of
variables (vectors) all of the same lengths yet not necessarily of the same type. Hence,
neither lists nor matrices represent natural storages for data sets. This is the role of
data frames which are used to store a list of vectors of the same length (yet potentially
different types) in a rectangular matrix.
Data frames are generated by combining multiple vectors together such that each
vector becomes a separate column in the data frame. In this way, a data frame is similar
to a matrix in which each column can represent a different vector type. For a data
frame to faithfully represent a data set, the sequence in which observations appear in
the vectors must be the same for each vector, and each vector should have the same
number of observations. For example, the ﬁrst, second, third...etc entries in each vector
must represent respectively, the observations collected from the ﬁrst, second, third...etc
sampling units.
Since the focus of this book is in the exploration, analysis and summary of data sets,
and data sets are accommodated in R by data frames, the generation, importation/
exportation, manipulation and management of data frames receives extensive coverage
in chapter 2.
1.12
Object information and conversion
1.12.1 Object information
Everything in R is an object and all objects are of a certain type or class. The class of an
object can be examined using the class() function. For example:
> class(TEMPERATURE)
[1] "numeric"
There is also a family of functions preﬁxed with is. that evaluate whether or not an
object is of a particular class (or type) or not. Table 1.3 lists the common object query
functions. All object query functions return a logical vector. Enter methods(is) for a
more comprehensive list.
INTRODUCTION TO R
19
Table 1.3 Common object query functions and their corresponding return values.
Function
Returns TRUE:
is.numeric(x)
is.null(x)
is.logical(x)
is.character(x)
if all elements of x are numeric or integer (x <-c(1,-3.5))
if x is NULL (the object has no length) (x <-NULL)
if all elements of x are logical (x <- c(TRUE,FALSE))
if all elements of x are character strings
(x <- c(,A,,,Quad,))
if the object x is a vector (a single dimension). Returns FALSE if
object has any attributes other than names
if the object x is a factor
if the object x is a matrix (2 dimensions but not a data frame)
if the object x is a list
if the object x is a data frame
for each missing (NA) element in x (x <- c(NA,2))
(‘not’) character as a preﬁx converts the above functions into
‘is.not.’
is.vector(x)
is.factor(x)
is.matrix(x)
is.list(x)
is.data.frame(x)
is.na(x)
!
Many R objects also have a set of attributes, the number and type of which are
speciﬁc to each class of object. For example, a matrix object has a speciﬁc number
of dimensions as well as row and column names. The attributes of an object can be
viewed using the attributes() function:
> attributes(XY)
$dim
[1] 5 2
$dimnames
$dimnames[[1]]
[1] "A" "B" "C" "D" "E"
$dimnames[[2]]
[1] "X" "Y"
Similarly, the attr() function can be used to view and set individual attributes of
an object, by specifying the name of the object and the name of the attribute (as a
character string) as arguments. For example:
> attr(XY, "dim")
[1] 5 2
> attr(XY, "description") <- "coordinates of quadrats"
> XY
X
Y
A 16.92 8.37
B 24.03 12.93
20
CHAPTER 1
C 7.61 16.65
D 15.49 12.20
E 11.77 13.12
attr(,"description")
[1] "coordinates of quadrats"
Note that in the above example, the attribute "description" is not a inbuilt attribute
of a matrix. When a new attribute is set, this attribute is displayed along with the object.
This provides a useful way of attaching a description to an object, thereby reducing the
risks of the object becoming unfamiliar.
1.12.2 Object conversion
Objects can be converted or coerced into other objects using a family of functions
with a as. preﬁx. Note that there are some obvious restrictions on these conversions
as most objects cannot be completely accommodated by all other object types, and
therefore some information (such as certain attributes) may be lost or modiﬁed during
the conversion. Objects and elements that cannot be successfully coerced are returned
as NA. Table 1.4 lists the common object coercion functions. Use methods(as) for a
more comprehensive list.
Table 1.4 Common object coercion functions and their corresponding return values.
Function
Converts object to
as.numeric(x)
as.null(x)
as.logical(x)
as.character(x)
as.vector(x)
as.factor(x)
as.matrix(x)
a numeric vector (‘integer’ or ‘real’). Factors converted to integers.
a NULL
a logical vector. Values of >1 converted to TRUE, otherwise FALSE
a character vector
a vector. All attributes (including names) are removed.
a factor. This is an abbreviated version of factor
a matrix. Any non-numeric elements result in all matrix elements
being converted to character strings
a list
a data frame. Matrix columns and list columns are converted into a
separate vectors of the data frame, and character vectors are
converted into factors. All previous attributes are removed
as.list(x)
as.data.frame(x)
1.13
Indexing vectors, matrices and lists
This section makes use of a number of objects created in earlier sections. Importantly, the TEMPERATURE object is a named vector and thus output will differ
slightly from unnamed vectors in that returned elements are headed by their row
names.
21
INTRODUCTION TO R
1.13.1 Vector indexing
It is possible to print or refer to a subset of a vector by appending an index vector
(enclosed in square brackets, []), to the vector name. There are four common forms
of vector indexing used to extract a sub-set of vectors:
(i) Vector of positive integers. A set of integers that indicate which elements of the
vector are to be selected. Selected elements are concatenated in the speciﬁed order.
– Select the nth element
> TEMPERATURE[2]
Q2
30.6
– Select elements n through m
> TEMPERATURE[2:5]
Q2
Q3
Q4
Q5
30.6 31.0 36.3 39.9
– Select a speciﬁc set of elements
> TEMPERATURE[c(1, 5, 6, 9)]
Q1
Q5
Q6
Q9
36.1 39.9 6.5 9.7
(ii) Vector of negative integers. A set of integers that indicate which elements of the
vector are to be excluded from concatenation.
– Select all but the nth element
> TEMPERATURE[-2]
Q1
Q3
Q4
Q5
36.1 31.0 36.3 39.9
Q6
Q7
Q8
6.5 11.2 12.8
Q9 Q10
9.7 15.9
(iii) Vector of character strings. This form of vector indexing is only possible for vectors
whose elements have been named. A vector of element names can be used to select
elements for concatenation.
– Select the named element
> TEMPERATURE["Q1"]
Q1
36.1
– Select the names elements
> TEMPERATURE[c("Q1", "Q4")]
Q1
Q4
36.1 36.3