Tải bản đầy đủ - 0 (trang)


Tải bản đầy đủ - 0trang

DK4712_C011.fm Page 424 Thursday, March 16, 2006 3:40 PM


Practical Guide to Chemometrics

(response), one at a time. This procedure is performed from top to bottom of the

data set (forward EFA) and from bottom to top (backward EFA) to investigate the

emergence and the decay of the process contributions, respectively. Figure 11.4b

displays the information provided by EFA for an HPLC-DAD example and how to

interpret the results.

Each time a new row is added to the expanding submatrix (Figure 11.4b), a

PCA model is computed and the corresponding singular values or eigenvalues are

saved. The forward EFA curves (thin solid lines) are produced by plotting the saved

singular values or log (eigenvalues) obtained from PCA analyses of the submatrix

expanding in the forward direction. The backward EFA curves (thin dashed lines)

are produced by plotting the singular values or log (eigenvalues) obtained from the

PCA analysis of the submatrix expanding in the backward direction. The lines

connecting corresponding singular values (s.v.), i.e., all of the first s.v., the second

s.v., the ith s.v., indicate the evolution of the singular values along the process and,

as a consequence, the variation of the process components. Emergence of a new

singular value above the noise level delineated by the pool of nonsignificant singular

values indicates the emergence of a new component (forward EFA) or the disappearance of a component (backward EFA) in the process.

Figure 11.4b also shows how to build initial estimates of concentration profiles

from the overlapped forward and backward EFA curves as long as the process evolves

in a sequential way (see the thick lines in Figure 11.4b). For a system with n

significant components, the profile of the first component is obtained combining the

curve representing the first s.v. of the forward EFA plot and the curve representing

the nth s.v. of the backward EFA plot. Note that the nth s.v. in the backward EFA

plot is related to the disappearance of the first component in the forward EFA plot.

The profile of the second component is obtained by splicing the curve representing the

second s.v. in the forward EFA plot to the curve representing (n − 1)th s.v. from the

backward EFA plot, and so forth. Combining the two profiles into one profile is easily

accomplished in a computer program by selecting the minimum value from the two

s.v. lines to be combined. It can be seen that the resulting four elution profiles

obtained by EFA are good approximations of the real profiles shown in Figure 11.4a.

The information provided by the EFA plots can be used for the detection and

location of the emergence and decay of the compounds in an evolving process.

As a consequence, the concentration window and the zero-concentration region

for each component in the system are easily determined for any process that evolves

such that the emergence and decay of each component occurs sequentially. For

example, the concentration window of the first component to elute is shown as a

shadowed zone in Figure 11.4b. Uses of this type of information have given rise

to most of the noniterative resolution methods, explained in Section 11.4 [32–39].

Iterative resolution methods, explained in Section 11.5, use the EFA-derived estimates of the concentration profiles as a starting point in an iterative optimization

[40, 41]. The location of selective zones and zones with a number of compounds

smaller than the total rank can also be introduced as additional information to

minimize the ambiguity in the resolved profiles [21, 41, 42].

As mentioned earlier, FSMW-EFA is not restricted in its applicability to evolving

processes, although the interpretation of the final results is richer for this kind of problem.

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 425 Thursday, March 16, 2006 3:40 PM

Multivariate Curve Resolution

















Retention times





Forward EFA


Log (eigenvalues)



















Retention times










Backward EFA







Log (eigenvalues)


















Retention times








FIGURE 11.4 (a) Concentration profiles of an HPLC-DAD data set. (b) Information derived from

the data set in Figure 11.4a by EFA: scheme of PCA runs performed. Combined plot of forward EFA

(solid black lines) and backward EFA (dashed black lines). The thick lines with different line styles

are the derived concentration profiles. The shaded zone marks the concentration window for the first

eluting compound. The rest of the elution range is the zero-concentration window. (c) Information

derived from the data set in Figure 11.4a by FSMW-EFA: scheme of the PCA runs performed. The

straight lines and associated numbers mark the different windows along the data set as a function of

their local rank (number). The shaded zones mark the selective concentration windows (rank 1).

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 426 Thursday, March 16, 2006 3:40 PM


Practical Guide to Chemometrics

FSMW-EFA does not focus on a description of the evolution of the different components in a system as EFA does; rather, it focuses on the local rank of windows in the

concentration domain (rows) or the local rank of windows in the spectral response

domain (columns).

FSMW-EFA is carried out by conducting a series of PCA analyses on submatrices obtained by moving a window of a fixed size through the data set, starting at

the top of the matrix and moving downward, one row at a time. The singular values

or eigenvalues from the repeated analyses are saved, and a plot is constructed by

connecting the corresponding singular values as done in EFA. Visual examination

of these plots gives a local-rank map of the data set, i.e., a representation of how

many components are simultaneously present in the different zones of the data set

(Figure 11.4c). For each window analyzed, the number of singular values exceeding

the noise level threshold is used to determine the local rank. The local-rank map

helps to identify selective zones in the data set (e.g., zones where the local rank is 1)

and to know the degree of compound overlap in the data set. The unambiguous

determination of the number of compounds present and their identities is only

possible in processes where components evolve sequentially or when more external

information is available. The window size is a parameter that has an effect on the

information obtained (e.g., local-rank maps). Wider windows increase the sensitivity

for detecting minor components, including components completely embedded under

major compounds. Narrower windows can provide more accurate resolution of

boundaries between zones of different rank.

New algorithms based on EFA and FSMW-EFA have refined the performance

of the parent methods [43, 44] and have widened their applicability to the study of

systems with concurrent processes [45] or complex spatial structure, such as spectroscopic images [46].


Resolution methods are often divided in iterative and noniterative methods. Most

noniterative methods are one-step calculation algorithms that focus on the one-ata-time recovery of either the concentration or the response profile of each component.

Once all of the concentration (C) or response (S) profiles are recovered, the other

member of the matrix pair, C and S, is obtained by least-squares according to the

general CR model, D = CST [32–38].

Noniterative methods use information from local-rank maps or concentration

windows in a characteristic way. In mathematical terms, these windows define

subspaces where the different compounds are present or absent. The subspaces can

be combined in clever ways through projections or by extraction of common vectors

(profiles) to obtain the profiles sought.

As mentioned in Section 11.3, the cornerstone of these procedures is the correct

location of concentration windows of the compounds of interest. Limitations of these

methods are linked to this point. Thus, data sets where the compositional evolution

of the compounds does not follow any clear pattern, such as in a series of mixtures

or image pixels, cannot be resolved by these methods because it is practically impossible to determine the concentration windows of components. Evolving processes are

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 427 Thursday, March 16, 2006 3:40 PM

Multivariate Curve Resolution


the most suitable systems to be analyzed but, again, attention should be paid to

situations where the pattern by which components emerge and decay is not sequentially ordered. Some examples that violate this requirement are nonunimodal concentration profiles or small embedded peaks under major peaks. In cases such as these,

specialized EFA derivations [39] should be used to avoid incorrect assignment of

component windows. Other problems associated with locating window boundaries

are due to the presence of noise that can blur the extremes of the concentration

windows. Errors from this source can also affect the quality of the final results.

Noniterative methods are fast, but they have clear limitations in their applicability

because of the difficulties associated with correct definition of concentration windows

and local rank. Their use is practically restricted to processes with sequentially

evolving components like chromatography, the components of which fulfill the conditions required by Manne’s theorems, to ensure a correct component resolution [22].


Window factor analysis (WFA) was described by Malinowski and is likely the most

representative and widely used noniterative resolution method [34, 35]. WFA recovers the concentration profiles of all components in the data set one at a time. To do

so, WFA uses the information in the complete original data set and in the subspace

where the component to be resolved is absent, i.e., all rows outside of the concentration window. The original data set is projected into the subspace spanned by where

the component of interest is absent, thus producing a vector that represents the

spectral variation of the component of interest that is uncorrelated to all other

components. This specific spectral information, combined appropriately with the

original data set, yields the concentration profile of the related component. To ensure

the specificity of this spectral information, all other components in the data set should

be present outside of the concentration window of the component to be resolved.

This means, in practice, that component peaks with embedded peak profiles under

them cannot be adequately resolved.

Figure 11.5 illustrates the scheme followed in the WFA resolution. The steps of

the WFA method are listed below, followed by a description clarifying their meaning.

1. A PCA model of the original data matrix, D, is computed.

2. The concentration windows of each component in the data set are


For each component:

3. A PCA model of a submatrix, Do, is computed where the rows related to

the concentration window of the nth component to be resolved have been


4. The vector, pnoT, is computed, which is the part of the spectrum of the

nth component orthogonal to the spectra of all other components in the

original matrix.

5. The true concentration profile of the nth component is recovered using

pnoT and D.

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 428 Thursday, March 16, 2006 3:40 PM


Practical Guide to Chemometrics


Conc. window

nth component



Rank n






Rank (n −1)













FIGURE 11.5 Recovery of the concentration profile of the nth compound by window factor

analysis. (a) PCA of the raw data matrix and determination of the concentration window, D

(steps 1 and 2); (b) PCA of the matrix formed by suppression of the concentration window of

the nth component, D␱ (step 3); (c) recovery of the part of the spectrum of the nth component

orthogonal to all the spectra in D␱, pnTo (step 4); and (d) recovery of the concentration profile

of the nth component (step 5).

As a general last step after obtaining the concentration profiles of all components:

6. The pure-spectrum data matrix ST is estimated by least squares using D

and C.

WFA starts with the PCA decomposition of the D matrix, giving the product of

scores and loadings, TPT. In general, the D matrix will have n components, i.e., rank

n. The determination of the location of concentration windows for each component is

carried out using EFA (see Figure 11.4b) or other methods. Steps 3 to 5 are the core

of the WFA method and should be performed as many times as compounds are present

in matrix D to recover the concentration profiles of the C matrix, one at a time.

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 429 Thursday, March 16, 2006 3:40 PM

Multivariate Curve Resolution


For each component, a Do submatrix is constructed by removing the rows related

to its concentration window. Then, a PCA model is computed and the product ToPoT

is obtained. Note that Do has rank n − 1 because the variation due to the component

of interest disappears when its corresponding window (rows) in the data matrix D

is deleted. The loading matrices, PT and PoT, describe the space of the n pure spectra

in D and the (n − 1) pure spectra in Do, respectively. The rows in these loading

matrices are actually “abstract spectra,” and the real spectra can be expressed as a

linear combination of them. Using these two loading matrices, PT and PoT, it is

possible to calculate a vector pnoT that is orthogonal to the (n − 1) pioT vectors and

that belongs to the space defined by PT. This vector completes the set of vectors in

PoT and contains the part of the spectra of the removed component uncorrelated to

the spectra of the other (n − 1) components in the data matrix. Using this vector

with information exclusively related to the removed component, the true concentration profile of this compound can be calculated as follows:

Dpno = cn


The complete C matrix is then formed by appending row-wise the column

concentration profiles found for each component in the D matrix. The matrix of

spectra, ST, is obtained by least squares using the D and C matrices and the basic

equation of CR methods, D = CST:



Recent modifications of the WFA method attempt to solve some of the problems

caused by poorly defined boundaries for concentration windows [35].




Following the idea of using concentration windows and the subspaces that can be

derived, other noniterative methods are focused on the recovery of the response

profiles (spectra). This is the case of subwindow factor analysis (SFA), proposed by

Manne [38], and other derivations of this method, like parallel vector analysis (PVA)

[39]. Unlike WFA, SFA recovers the pure response profile of each component. The

individual row response profiles are appended in a columnwise fashion, until the

complete ST matrix is built. The C matrix is easily derived by least-squares according

to the CR model, D = CST, as follows:

C = DS(STS)−1


In SFA, the knowledge of the concentration windows is used in such a way

that each pure spectrum is calculated as the intersection of two subspaces that

have only the compound to be resolved in common. Figure 11.6 illustrates the

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 430 Thursday, March 16, 2006 3:40 PM


Practical Guide to Chemometrics

Retention times


A, B subspace




B, C subspace




B, C


A, B


FIGURE 11.6 Application of subwindow factor analysis (SFA) for resolution. (a) Concentration profiles of A, B, and C and subwindows used for the resolution of component B (first

containing A and B compounds and second containing B and C compounds). (b) The A,B

plane is defined by the pure spectra of A and B (sA, sB) and the plane B,C by the pure spectra

of B and C (sB, sC). The intersection of both planes must be necessarily the pure spectrum of B.

idea behind SFA for a three-component HPLC-DAD system (A, B, and C). Once

the concentration windows of the three components are known, one subwindow

can be constructed with rows including only A and B and another one with rows

where only B and C are present. The intersection of the two planes derived from

these subspaces must necessarily give the pure spectrum of B as an answer. The

same strategy would be applied to recover the spectra of the rest of the compounds

in a general example.

To conduct SFA in practice, the singular-value decomposition (SVD, see Chapter 4)

of the two subwindows yields a basis of orthogonal vectors spanning the (A,B)

subspace, called {ei}, and another basis for the (B,C) subspace, called {fi}. The

spectrum of B, sB, can be obtained from these two sets of basis vectors as shown in

Equation 11.9,

sB =

∑a e = ∑b f

i i


© 2006 by Taylor & Francis Group, LLC

i i



DK4712_C011.fm Page 431 Thursday, March 16, 2006 3:40 PM

Multivariate Curve Resolution


The SFA algorithm computes the ai and bi values that minimize Equation 11.10,


ai , bi


aiei −


bi fi



after which sB can be obtained by using any of the resulting linear combinations, Σi aiei or Σi aiei .

The spectra of C and A can be obtained in a straightforward fashion, since these

components have selective zones in their elution profiles.

The HELP method is another pioneering noniterative method using local-rank

information [36, 37] and based on the local-rank analysis of the data set and

focuses on finding selective concentration or response windows. When these

selective zones exist, the resolution of the system is clear. Thus, for an HPLCDAD data set, a row related to a selective elution time directly provides the shape

of the spectrum of the only component present at that stage of the chromatographic

elution. In a similar manner, a column related to a selective wavelength directly

provides the chromatographic peak of the only absorbing compound at that


HELP works by exploring both the concentration and spectral response spaces

with a powerful graphical tool (the so-called datascope) to visually detect potential

selective zones in the scores and then loading plots of the data matrix, which are

seen as points (representing rows or columns of the original data set) lying on straight

lines centered near the origin. A statistical method to confirm the presence of

selectivity in the concentration or spectral windows is based on the use of an F-test

to compare the magnitude of eigenvalues related to potential selective zones of the

data set with eigenvalues related to noise zones of the data matrix, i.e., those regions

where no chemical components are supposed to be present. The confirmation of a

selective zone in the data set, which is actually a rank-one window in the data matrix,

will then be obtained when no significant differences are found between the first

eigenvalue of a noise-related zone of the data matrix and the second eigenvalue of

the potential selective zone. Components with selective concentration and response

zones are straightforwardly resolved. Subtraction of the cisiT contribution of the

resolved components from the raw data set can facilitate the resolution from components originally lacking selectivity.


Iterative resolution methods obtain the resolved concentration and response matrices

through the one-at-a-time refinement or simultaneous refinement of the profiles in

C, in ST, or in both matrices at each cycle of the optimization process. The profiles

in C or ST are “tailored” according to the chemical properties and the mathematical

features of each particular data set. The iterative process stops when a convergence

criterion (e.g., a preset number of iterative cycles is exceeded or the lack of fit goes

below a certain value) is fulfilled [21, 42, 47–50].

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 432 Thursday, March 16, 2006 3:40 PM


Practical Guide to Chemometrics

Iterative resolution methods are in general more versatile than noniterative

methods. They can be applied to more diverse problems, e.g., data sets with partial

or incomplete selectivity in the concentration or spectral domains, and to data sets

with concentration profiles that evolve sequentially or nonsequentially. Prior knowledge about the data set (chemical or related to mathematical features) can be used

in the optimization process, but it is not strictly necessary. The main complaint about

iterative resolution methods has often been the longer calculation times required to

obtain optimal results; however, improved fast algorithms and more powerful PCs

have overcome this historical limitation.

The next subsection deals first with aspects common to all resolution methods.

These include (1) issues related to the initial estimates, i.e., how to obtain the profiles

used as the starting point in the iterative optimization, and (2) issues related to the

use of mathematical and chemical information available about the data set in the

form of so-called constraints. The last part of this section describes two of the most

widely used iterative methods: iterative target transformation factor analysis (ITTFA)

and multivariate curve resolution–alternating least squares (MCR-ALS).




Starting the iterative optimization of the profiles in C or ST requires a matrix or a

set of profiles sized as C or as ST with rough approximations of the concentration

profiles or spectra that will be obtained as the final results. This matrix contains the

initial estimates of the resolution process. In general, the use of nonrandom estimates

helps shorten the iterative optimization process and helps to avoid convergence to

local optima different from the desired solution. It is sensible to use chemically

meaningful estimates if we have a way of obtaining them or if the necessary

information is available. Whether the initial estimates are either a C-type or an STtype matrix can depend on which type of profiles are less overlapped, on which

direction of the matrix (rows or columns) has more information, or simply on the

will of the chemist.

There are many chemometric methods to build initial estimates: some are particularly suitable when the data consists of the evolutionary profiles of a process,

such as evolving factor analysis (see Figure 11.4b in Section 11.3) [27, 28, 51],

whereas some others mathematically select the purest rows or the purest columns

of the data matrix as initial profiles. Of the latter approach, key-set factor analysis

(KSFA) [52] works in the FA abstract domain, and other procedures, such as the

simple-to-use interactive self-modeling analysis (SIMPLISMA) [53] and the orthogonal projection approach (OPA) [54], work with the real variables in the data set to

select rows of “purest” variables or columns of “purest” spectra, that are most

dissimilar to each other. In these latter two methods, the profiles are selected sequentially so that any new profile included in the estimate is the most uncorrelated to all

of the previously selected ones.

Apart from using chemometric methods, a matrix of initial estimates can always

be formed with the rows or columns of the data set that the researcher considers most

representative because of chemical reasons, and it can also include external information, such as some reference spectra or concentration profiles, when available.

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 433 Thursday, March 16, 2006 3:40 PM

Multivariate Curve Resolution





Although resolution does not require previous information about the chemical system

under study, additional knowledge, when it exists, can be used to tailor the sought

pure profiles according to certain known features and, as a consequence, to minimize

the ambiguity in the data decomposition and in the results obtained.

The introduction of this information is carried out through the implementation

of constraints. A constraint can be defined as any mathematical or chemical property

systematically fulfilled by the whole system or by some of its pure contributions

[55]. Constraints are translated into mathematical language and force the iterative

optimization to model the profiles while respecting the conditions desired.

The application of constraints should always be prudent and soundly grounded,

and constraints should only be set when there is an absolute certainty about the

validity of the constraint. Even a potentially useful constraint can play a negative

role in the resolution process when factors like experimental noise or instrumental

problems distort the related profile or when the profile is modified so roughly that

the convergence of the optimization process is seriously damaged. When well implemented and fulfilled by the data set, constraints can be seen as the driving forces of

the iterative process to the right solution and, often, they are found not to be active

in the last part of the optimization process.

The efficient and reliable use of constraints has improved significantly with

the development of methods and software that allow them to be easily used in

flexible ways. This increase in flexibility allows complete freedom in the way

combinations of constraints can be used for profiles linked in the different concentration and spectral domains. This increase in flexibility also makes it possible

to apply a certain constraint with variable degrees of tolerance to cope with noisy

real data. For example, the implementation of constraints often allows for small

deviations from ideal behavior before correcting a profile [7, 21, 55]. Methods for

correcting the profile to be constrained have evolved into smoother methodologies,

which modify the poorly behaving profile so that the global shape is retained as

much as possible and the convergence of the iterative optimization is minimally

upset [56–61].

There are several ways to classify constraints: the main ones relate either to the

nature of the constraints or to the way they are implemented. In terms of their nature,

constraints can be based on either chemical or mathematical features of the data set.

In terms of implementation, we can distinguish between equality constraints or

inequality constraints [56]. An equality constraint sets the elements in a profile to

be equal to a certain value, whereas an inequality constraint forces the elements in

a profile to be unequal (higher or lower) than a certain value. The most widely used

types of constraints will be described using the classification scheme based on the

constraint nature. In some of the descriptions that follow, comments on the implementation (as equality or inequality constraints) will be added to illustrate this

concept. Figure 11.7 shows the effects of some of these constraints on the correction

of a profile.

© 2006 by Taylor & Francis Group, LLC

DK4712_C011.fm Page 434 Thursday, March 16, 2006 3:40 PM


Practical Guide to Chemometrics















Σ = ct


FIGURE 11.7 Effects of some constraints on the shape of resolved profiles. The thin and

the thick lines represent the profiles before and after being constrained, respectively. Constraints shown are (a) nonnegativity, (b) unimodality, and (c) closure. Nonnegativity

The nonnegativity constraint is applied when it can be assumed that the measured

values in an experiment will always be positive. For example, it can be applied to

all concentration profiles and to many experimental responses, such as UV (ultraviolet) absorbances and fluorescence intensities [42, 47, 48, 56, 59]. This constraint

forces the values in a profile to be equal to or greater than zero. It is an example of

an inequality constraint (see Figure 11.7). Unimodality

The unimodality constraint allows the presence of only one maximum per profile

(see Figure 11.7) [42, 55, 60]. This condition is fulfilled by many peak-shaped

concentration profiles, like chromatograms or some types of reaction profiles, and

by some instrumental signals, like certain voltammetric responses. It is important

to note that this constraint does not only apply to peaks, but to profiles that have a

constant maximum (plateau) or a decreasing tendency. This is the case for many

monotonic reaction profiles that show only the decay or the emergence of a compound [47, 48, 51, 61], such as the most protonated and deprotonated species in an

acid-base titration, respectively. Closure

The closure constraint is applied to closed reaction systems, where the principle of

mass balance is fulfilled. With this constraint, the sum of the concentrations of all of

the species involved in the reaction (the suitable elements in each row of the C matrix)

is forced to be equal to a constant value (the total concentration) at each stage in the

reaction [27, 41, 42]. The closure constraint is an example of an equality constraint.

© 2006 by Taylor & Francis Group, LLC

Tài liệu bạn tìm kiếm đã sẵn sàng tải về


Tải bản đầy đủ ngay(0 tr)