13 Indexing vectors, matrices and lists
Tải bản đầy đủ - 0trang
21
INTRODUCTION TO R
1.13.1 Vector indexing
It is possible to print or refer to a subset of a vector by appending an index vector
(enclosed in square brackets, []), to the vector name. There are four common forms
of vector indexing used to extract a sub-set of vectors:
(i) Vector of positive integers. A set of integers that indicate which elements of the
vector are to be selected. Selected elements are concatenated in the speciﬁed order.
– Select the nth element
> TEMPERATURE[2]
Q2
30.6
– Select elements n through m
> TEMPERATURE[2:5]
Q2
Q3
Q4
Q5
30.6 31.0 36.3 39.9
– Select a speciﬁc set of elements
> TEMPERATURE[c(1, 5, 6, 9)]
Q1
Q5
Q6
Q9
36.1 39.9 6.5 9.7
(ii) Vector of negative integers. A set of integers that indicate which elements of the
vector are to be excluded from concatenation.
– Select all but the nth element
> TEMPERATURE[-2]
Q1
Q3
Q4
Q5
36.1 31.0 36.3 39.9
Q6
Q7
Q8
6.5 11.2 12.8
Q9 Q10
9.7 15.9
(iii) Vector of character strings. This form of vector indexing is only possible for vectors
whose elements have been named. A vector of element names can be used to select
elements for concatenation.
– Select the named element
> TEMPERATURE["Q1"]
Q1
36.1
– Select the names elements
> TEMPERATURE[c("Q1", "Q4")]
Q1
Q4
36.1 36.3
22
CHAPTER 1
(iv) Vector of logical values. The vector of logical values must be the same length as
the vector being sub-setted and usually are the result of an evaluated condition. Logical
values of T (TRUE) and F indicate respectively to include and exclude corresponding
elements of the main vector from concatenation.
– Select elements for which the logical condition is true
> TEMPERATURE[TEMPERATURE < 15]
Q6
Q7
Q8
Q9
6.5 11.2 12.8 9.7
> TEMPERATURE[SHADE == "no"]
Q1
Q3
Q5
Q7
Q9
36.1 31.0 39.9 11.2 9.7
– Select elements for which multiple logical conditions are true
> TEMPERATURE[TEMPERATURE < 34 & SHADE == "no"]
Q3
Q7
Q9
31.0 11.2 9.7
– Select elements for which one or other logical conditions are true
> TEMPERATURE[TEMPERATURE < 10 | SHADE == "no"]
Q1
Q3
Q5
Q6
Q7
Q9
36.1 31.0 39.9 6.5 11.2 9.7
1.13.2 Matrix indexing
Like vectors, matrices can be indexed from vectors of positive integers, negative
integers, character strings and logical values. However, whereas vectors have only
a single dimension (length) (thus enabling each element to be indexed by a single
number), matrices have two dimensions (height and width) and, therefore, require
a set of two numbers for indexing. Consequently, matrix indexing takes on the
form of [row.indices, col.indices], where row.indices and col.indices
respectively represent sequences of row and column indices of the form described for
vectors in section 1.13.1.
Before proceeding, re-examine the XY matrix generated in section 1.11.1:
> XY
X
Y
A 16.92 8.37
B 24.03 12.93
C 7.61 16.65
D 15.49 12.20
E 11.77 13.12
attr(,"description")
[1] "coordinates of quadrats"
INTRODUCTION TO R
23
The following examples will illustrate the variety of matrix indexing possibilities:
> XY[3, 2]
[1] 16.65
# select the element at row 3,
column 2
> XY[3, ]
X
Y
7.61 16.65
# select the entire 3rd row
> XY[, 2]
# select the entire 2nd column
A
B
C
D
E
8.37 12.93 16.65 12.20 13.12
> XY[, -2]
A
B
16.92 24.03
C
D
E
7.61 15.49 11.77
# select all columns except the
2nd
> XY["A", 1:2]
X
Y
16.92 8.37
#select columns 1 through 2 for
row A
> XY[, "X"]
A
B
16.92 24.03
#select the column named 'X'
C
D
E
7.61 15.49 11.77
> XY[XY[, "X"] > 12, ]
X
Y
A 16.92 8.37
B 24.03 12.93
D 15.49 12.20
#select all rows for which the
value of the column X is
greater than 12
1.13.3 List indexing
Lists consist of collections of objects that need not be of the same size or type. The
objects within a list are indexed by appending an index vector (enclosed in double
square brackets, [[]]), to the list name. A single object within a list can also be referred
to by appending a string character ($) followed by the name of the object to the list
names (e.g. list$object). The elements of objects within a list are indexed according
to the object type. Vector indices to objects within other objects (lists) are placed within
their own square brackets outside the list square brackets:
Recall the EXPERIMENT list generated in section 1.11.2
> EXPERIMENT
$SITE
[1] "A1" "A2" "B1" "B2" "C1" "C2" "D1" "D2" "E1" "E2"
24
CHAPTER 1
$COORDINATES
[1] "16.92,8.37" "24.03,12.93" "7.61,16.65"
[5] "11.77,13.12"
$TEMPERATURE
Q1
Q2
Q3
Q4
Q5
36.1 30.6 31.0 36.3 39.9
$SHADE
[1] no
full no
Levels: no full
Q6
Q7
Q8
6.5 11.2 12.8
full no
full no
"15.49,12.2"
Q9 Q10
9.7 15.9
full no
full
The following examples illustrate a variety of list indexing possibilities:
> #select the first object in the list
> EXPERIMENT[[1]]
[1] "A1" "A2" "B1" "B2" "C1" "C2" "D1" "D2" "E1" "E2"
> #select the object named 'TEMPERATURE' within the list
> EXPERIMENT[['TEMPERATURE']]
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9 Q10
36.1 30.6 31.0 36.3 39.9 6.5 11.2 12.8 9.7 15.9
> #select the first 3 elements of 'TEMPERATURE' within
> #'EXPERIMENT'
> EXPERIMENT[['TEMPERATURE']][1:3]
Q1
Q2
Q3
36.1 30.6 31.0
> #select only those 'TEMPERATURE' values which correspond
> #to SITE's with a '1' as the second character in their name
> EXPERIMENT$TEMPERATURE[substr(EXPERIMENT$SITE,2,2) == '1']
Q1
Q3
Q5
Q7
Q9
36.1 31.0 39.9 11.2 9.7
1.14
Pattern matching and replacement (character search and replace)
It is often desirable to select a subset of data on the basis of character entries that match
more general patterns. Furthermore, the ability to search and replace character strings
within a character vector can be very useful.
1.14.1 grep - pattern searching
The grep() function searches within a vector for matches to a pattern and returns the
index of all matching entries.
INTRODUCTION TO R
25
# select only those 'SITE' values that contain an 'A'
> grep("A", EXPERIMENT$SITE)
[1] 1 2
> EXPERIMENT$SITE[grep("A", EXPERIMENT$SITE)]
[1] "A1" "A2"
By default, the pattern comprises any valid regular expressionh which provides great
pattern searching ﬂexibility.
# convert the EXPERIMENT list into a data frame
> EXP <- as.data.frame(EXPERIMENT)
# select only those rows that contain correspond to a 'SITE'
value of either an A, B or C followed by a '1'
> grep("[A-C]1", EXP$SITE)
[1] 1 3 5
> EXP[grep("[A-C]1", EXP$SITE), ]
SITE COORDINATES TEMPERATURE SHADE
Q1
A1 16.92,8.37
36.1
no
Q3
B1 7.61,16.65
31.0
no
Q5
C1 11.77,13.12
39.9
no
1.14.2 regexpr - position and length of match
Rather than return the indexes of matching entries, the regexpr() function returns
the position of the match within each string as well as the length of the pattern
within each string (-1 values correspond to entries in which the pattern is not
found).
#recall the AUST character vector that lists the Australian
capital cities
> AUST
[1] "Adelaide" "Brisbane" "Canberra" "Darwin"
[5] "Hobart"
"Melbourne" "Perth"
"Sydney"
#get the position and length of string of characters containing
an 'a' and an 'e' separated by any number of characters
> regexpr("a.*e", AUST)
[1] 5 6 2 -1 -1 -1 -1 -1
attr(,"match.length")
[1] 4 3 4 -1 -1 -1 -1 -1
h
A regular expression is a formal computer language consisting of normal printing characters and
special metacharacters (which represent wildcards and other features) that together provide a concise
yet ﬂexible way of matching strings.