1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Kỹ thuật lập trình >

4 RiskBoost: Optimizing Classification by Minimizing Risk

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (30.67 MB, 785 trang )


242



C. Mao et al.



On the other hand, By Bayes’ theorem [8], P (C|X) can be computed as

Equation (4), where P (C) and P (X) are the respective prior probabilities of C

and X, and P (X|C) is the posterior probability of X conditioned on C.

P (C|X) =



P (X|C)P (C)

P (X)



(4)



Equation (4) can be extended to local conditions, where each item should

be estimated under the local condition δ(X). Then we get the Bayesian formula

under local condition as Equation (5).

P (C|(X, δ(X))) =



P (X|(C, δ(X)))P (C|δ(X))

P (X|δ(X))



(5)



The Bayesian classifier [10] maximizes P (C|X) according to formula (4) and

estimates it in the whole dataset. While our method maximizes P (C|X) according to formula (3) and (5), we estimate it in a local area around the query

sample. Under the assumption that the near neighbors can represent the property of a query sample better than the more distant samples, estimating the LPP

by formula (5) represents more reasonable than by formula (4).

To maximize P (C|(X, δ(X))) according to formula (5): as P (X|δ(X)) is constant for all classes, only P (X|(C, δ(X)))P (C|δ(X)) needs to be maximized.

Then, the optimization problem can be transformed to

ω = arg max P (X|(C, δ(X)))P (C|δ(X)).

C



2.2



(6)



Local Distribution Estimation



Given an arbitrary query sample X and a distance metric, its k nearest neighbors

can be obtained from the training set. In this paper, we call the set of the k

nearest neighbors k -neighborhood of sample X and denote it by δk (X). To solve

the optimization problem (6), the two items P (X|(C, δ(X))) and P (C|δ(X))

which are relevant to the local distribution of class C should be estimated based

on δk (X) for each class.

P (C|δ(X)) derives the probability of a sample belonging to class C given

that the sample is in the neighborhood of X. If there are Nj samples from class

Cj in the k -neighborhood of X, then P (C|δ(X)) can be estimated by

P (Cj |δk (X)) = Nj /k.



(7)



P (X|(C, δ(X))) derives the probability of a sample being equal to X given

that the sample is from class C and is in the neighborhood of X; this can be

regarded as the local probability distribution density of class C at point X for

continuous attributes.

To estimate P (X|(C, δ(X))) accurately we just consider the continuous

attributes in our method; the estimation of P (X|(C, δ(X))) becomes a problem of probability density estimation in local area. In our method, we assume



Nearest Neighbor Method Based on Local Distribution



243



that the samples in the neighborhood follow a Gaussian distribution with a mean

μ and covariance matrix Σ defined by Equation (8).

1



f (X; μ, Σ) =



(2π)d |Σ|



e−0.5(X−μ)



T



Σ−1 (X−μ)



(8)



where d is the dimension of the data. So that, for δk (X) and a specified class

Cj , we have

(9)

P (X|(Cj , δk (X))) ∝ f (X; μCj , ΣCj )

where μCj and ΣCj respectively represent the mean and the covariance matrix

of class Cj in δk (X).

Then we need to estimate the mean μ and the covariance matrix Σ from

δk (X) for each class. In our approach, to ensure the covariance matrix is positive

definite, we take the naive assumption of local class conditional independence

that an attribute on each class does not correlate with the other attributes in

local area; that is, the covariance matrix (Σ) would be a diagonal matrix. If there

C

are Nj samples from class Cj in δk (X), denoted by Xi j (i = 1, · · · , Nj ), the two

parameters the mean (μCj ) and the covariance matrix (ΣCj ) can be estimated

through maximum likelihood estimation by the following Formulae (10) and (11)

[15].

μˆCj =

1

ΣˆCj = diag(

Nj



Nj

i=1



1

Nj

Cj



(Xi



Nj

i=1



Cj



Xi



(10)

Cj



− μˆCj )(Xi



− μˆCj )T )



(11)



where diag(·) converts a square matrix to a diagonal matrix with the same

diagonal elements.

Then, we plug the mean (μ) and covariance matrix (Σ) respectively estimated

from Formulae (10) and (11) into Equation (8) to estimate f (X; μCj , ΣCj ) and

then estimate P (X|(Cj , δk (X))) from Formula (9).

2.3



Classification Rules



As k is constant for all classes, according to Formulae (7) and (9), the classification problem as defined in (6) can be transformed into an optimization problem

finally formulated as shown in Formula (12).

ω = arg



max



j=1,··· ,NC



{Nj · f (X; μCj , ΣCj )}



(12)



where NC is the total number of classes, f (·), μCj and ΣCj is denoted by Formulae (8), (10) and (11) respectively.

According to the aforementioned process, the LD-k NN approach classifies a

query sample by the LPP estimated from local distribution. This is calculated

according to the Bayesian Theorem in the local area. The query sample is then

labeled with the class having a maximum LPP.



244



2.4



C. Mao et al.



Related Methods



The traditional V-k NN classified the query sample only by the number of nearest neighbors for each class in the k -neighborhood (i.e. Nj for the j th class).

Compared with the V-k NN rule, LD-k NN takes into account the local probability density around the query sample (f (X; μCj , ΣCj )) besides the number (Nj ).

For different classes, the local probability densities are not always the same and

may play a significant role for classification.

Another classification method related with LD-k NN is the Bayesian classification method. Bayesian classifier assigns the query sample to the class with the

highest posterior probability, which is estimated through the global distribution.

While LD-k NN estimates the posterior probability through the local distribution around the query sample. Naive Bayesian Classification (NBC) method can

be considered as a special case of LD-k NN with k approaching the size of the

dataset. Thus, LD-k NN would be more effective and comprehensive for a special

query sample.

In actuality, the LD-k NN method may be viewed as a compromise between

the nearest neighbor rule and the Bayesian method. The parameter k denotes

the locality in LD-k NN; when parameter k is close to 1, LD-k NN approaches the

nearest neighbor rule. And when k is large and equal to the size of the dataset,

the local area is extended to the whole dataset; in this case LD-k NN becomes

a Bayesian classifier. Thus, LD-k NN may combine the advantages of the two

classifiers and become a more effective and comprehensive classification method.

As for CAP and LPC, they consider an equal number of nearest neighbors

for each class and the classification is based on the nearest center. As presented

in Equation (12), CAP and LPC use a constant Nj for all classes and the other

item (f (X; μCj , ΣCj )) is estimated only from the center of the Nj samples in

each class. Thus, CAP and LPC can be viewed as special cases of LD-k NN.



3

3.1



Experiments

The Datasets



In our experimentation we have selected 15 real datasets from the well-known UCIIrvine repository of machine learning datasets [1]. The selected datasets include

six two-class problems and nine multi-class problems, and vary in terms of their

domain, size, and complexity. The estimation of probability density is only for

continuous attributes and we only take into account continuous attributes in our

experiments. Table 1 summarizes the relevant information for these datasets; for

more information, please turn to http://archive.ics.uci.edu/ml.

3.2



Experimental Settings



Before classification, to prevent attributes with an initially large range from

inducing bias by out-weighing attributes with initially smaller ranges, we use

z -score normalization to linearly transform each of the numeric attributes of a



Nearest Neighbor Method Based on Local Distribution



245



Table 1. Some Information about the datasets

datasets

#Instances #Attributes #Classes

Abalone

4177

7

3

690

6

2

Australian

106

9

6

Breast

345

6

2

Bupa Liver

366

33

6

Dermatology

214

9

6

Glass

583

9

2

ILPD

150

4

3

Iris

20000

16

26

Letters

5473

10

5

Pageblock

208

60

2

Sonar

4601

57

2

Spambase

267

44

2

spectf

846

18

4

Vehicle

178

13

3

Wine



A

dataset with mean value 0 and standard deviation 1 by v = v−μ

σA , where μA

and σA are the mean and standard deviation, respectively, of attribute A.

In order to achieve an impartial evaluation, we have employed six competing

classifiers to test the performance of alternative approaches and to provide a

comparative analysis to evaluate the effectiveness of our LD-k NN algorithm.

These competing classifiers include base classifiers (e.g. V-k NN, DW-k NN [5]

and NBC), and the state-of-the-art classifiers (e.g. CAP [13], LPC [16] and SVM

[2]).

For k NN-type classifiers, we use Euclidean distance to measure the distance

between two samples in search of the nearest neighbors. In addition, the parameter k in k NN-type classifiers indicates the number of nearest neighbors, we use

the average number of nearest neighbors per class (denoted by kpc) to indicate

the neighborhood size, i.e. kpc ∗ NC nearest neighbors are searched, where NC

is the number of classes.

To express the generalization capacity, i.e. the classification ability of a classifier classifying previously unseen samples, the training samples and the test

samples should be independent. In our research we use stratified 5-fold cross

validation to estimate the misclassification rate of a classifier on each dataset.

The data are stratified into 5 folds. For the 5 folds, 4 folds constitute the training set with the remaining fold being used as the test set. The training and test

sessions are performed 5 times with each session using a different test set and

the corresponding training set. To avoid bias, the 5-fold cross validation process

is applied to each dataset 10 times and the average misclassification rate (AMR)

is calculated to evaluate the performance of the classifier.



246



C. Mao et al.



Table 2. The AMR (%) of the seven methods with corresponding stds on the 15 UCI

datasets (the best recognition performance is described in bold-face on each data set)

datasets

Abalone

Australian

Breast

Bupaliver

Dermatology

Glass

ILPD

Iris

Letter

Pageblock

Sonar

Spambase

spectf

Vehicle

Wine

Average AMR

Average Rank



4



LD-kNN



V-kNN



DW-kNN



CAP



LPC



SVM



NBC



35.26±0.23 34.92±0.20 34.95±0.28 35.52±0.26 35.54±0.43 34.52±0.08 41.55±0.09

24.49±0.83 24.57±0.47 24.55±0.59 25.00±0.80 25.12±0.65 24.22±0.34 27.87±0.46

30.19±2.63 33.77±1.45 31.89±1.62 30.38±0.57 30.00±1.18 41.13±2.00 35.38±1.21

31.86±1.36 34.43±1.28 34.00±1.37 32.09±1.74 32.93±1.22 29.83±1.02 48.06±0.71

1.75±0.44 3.93±0.28 3.83±0.24 2.81±0.41 2.84±0.22 2.70±0.35 7.46±0.59

26.73±1.47 31.26±1.50 28.83±1.82 28.27±1.75 29.72±1.11 30.61±1.06 59.67±2.73

29.97±1.34 28.54±0.61 28.99±0.83 30.67±1.36 30.81±1.25 29.11±0.54 44.80±0.49

3.67±0.33 3.87±0.65 3.80±0.60 3.73±0.68 3.73±0.68 4.07±0.31 4.33±0.58

5.08±0.09 8.02±0.08 5.29±0.07 3.93±0.05 4.14±0.10 5.54±0.08 35.75±0.07

3.27±0.10 3.30±0.06 3.19±0.06 3.16±0.13 3.17±0.11 3.99±0.09 13.12±1.39

11.30±1.40 14.47±1.42 14.33±1.23 11.68±1.96 14.47±1.42 17.45±1.81 31.11±1.01

7.77±0.17 8.60±0.10 7.85±0.20 7.97±0.21 8.13±0.23 6.80± 0.12 18.24±0.16

20.00±1.56 19.40±0.71 20.60±0.92 20.22±0.75 20.45±1.24 21.09±0.71 32.62±1.15

24.04±1.05 28.53±0.86 27.86±0.83 23.96±1.20 24.36±0.73 24.20±0.82 53.95±0.74

0.84±0.38 2.30±0.69 2.13±0.42 1.85±0.67 1.46±0.45 1.97±0.55 2.58±0.45

17.08

2.13



18.66

4.63



18.14

3.80



17.42

2.90



17.79

3.80



18.48

3.80



30.43

6.93



Results and Discussion



The parameter kpc is an important factor that can affect the performance of LDkNN. If kpc is too small, the estimation of the local distribution may be unstable;

however, if it is too large, there will be many distant neighbors that may have an

adverse effect on the local distribution estimation. To investigate the influence

of the parameter kpc on classification results for k NN-type classifiers, we tune

the parameter kpc as an integer in the range 1 to 30 for each dataset, perform

the classification tasks and achieve the corresponding AMR for each kpc value.

This procedure will guide us in the selection of parameter kpc for classification.

Fig. 1 shows the performance curves with respect to kpc of the five k NN-type

methods on several real datasets. Because different real datasets usually have

different distributions, the curves of AMR with respect to the kpc for LD-k NN

are usually different. These performance curves show that, on average the LDk NN method can be quite effective for these real problems, and validate that a

modest kpc for LD-kNN can usually achieve a more effective performance.

We use the lowest AMR with the corresponding kpc ranging from 1 to 30 to

evaluate the performance of a k NN-type classifier. Then, following experimental

testing we obtained a comparative performance for our posited approach when

compared with the alternative approaches. The classification results on each

dataset for all the classifiers are shown in Table 2 in terms of AMR with the

corresponding standard deviations (stds).

From the results in Table 2 we can see that LD-k NN offers the best performance on 5 datasets, more than all other classifiers; this is an improvement

over the alternative classifiers. The overall average AMR and rank of LD-k NN

on these datasets are 17.08% and 2.13 respectively, lower than all other classi-



Nearest Neighbor Method Based on Local Distribution

0.4



0.16

LD−kNN

V−kNN

DW−kNN

CAP

LPC



0.39

0.38



0.14

0.12



AMR(%)



0.37



AMR(%)



247



0.36

0.35



LD−kNN

V−kNN

DW−kNN

CAP

LPC



0.1

0.08



0.34

0.06

0.33

0.04



0.32

0.31

0



5



10



15



20



25



0.02

0



30



5



10



kpc



(a) Bupaliver



25



30



20



25



30



20



25



30



0.12

LD−kNN

V−kNN

DW−kNN

CAP

LPC



0.115

0.11

0.105



AMR(%)



AMR(%)



20



(b) Iris



0.35



0.3



15



kpc



0.25



0.2



LD−kNN

V−kNN

DW−kNN

CAP

LPC



0.1

0.095

0.09

0.085



0.15



0.08

0.1

0



5



10



15



20



25



0.075

0



30



5



10



kpc



(c) Sonar



(d) Spambase



0.45



0.14

LD−kNN

V−kNN

DW−kNN

CAP

LPC



0.12

0.1



AMR(%)



AMR(%)



0.4



15



kpc



0.35



0.3



LD−kNN

V−kNN

DW−kNN

CAP

LPC



0.08

0.06

0.04



0.25

0.02

0.2

0



5



10



15



kpc



(e) Vehicle



20



25



30



0

0



5



10



15



kpc



(f) Wine



Fig. 1. The performance curves with respect to kpc on different real datasets



fiers, which means that the proposed LD-k NN may be more effective than other

classifiers for these datasets.

To evaluate the statistical significance of the difference between LD-k NN

and each other classifiers, we have performed a Wilcoxon signed rank test [12]

between LD-k NN and each other classifiers. The p-values of the tests between



248



C. Mao et al.



4

3.5

3

2.5

2

1.5

1

LD−kNN V−kNN DW−kNN



CAP



LPC



SVM



NBC



Fig. 2. The rm distributions of different methods



LD-k NN and V-k NN, DW-k NN, CAP, LPC, SVM and NBC are 0.0103, 0.0125,

0.0181, 0.0103, 0.1876 and 0.0001 respectively, all less than 0.05 except that of

SVM. Combined with the result that the average AMR for the LD-k NN method

is the lowest among these classifiers, it can be seen that the LD-k NN method

can outperform other classifiers and be comparable with SVM in terms of AMR

at the 5% significance level.

To evaluate how well a particular method performs on average among all the

problems taken into consideration we have addressed the issue of robustness. Following the method designed by Friedman [7], we quantify the robustness of a classifier m by the ratio rm of its error rate em to the smallest error rate over all the

methods being compared in a particular application (i.e. rm = em /min1≤k≤7 ek ).

The optimal method m* for that application will have the ratio with rm∗ = 1,

and all other methods will have a greater ratio. The greater the value for this

ratio, the worse the performance of the corresponding method is for that application among the comparative methods. Thus, the distribution of rm for each

method, over all the datasets, provides information concerning its robustness.

We illustrate the distribution of rm for each method over the 15 datasets by box

plots in Fig. 2 where it is clear that the spread of rm for LD-k NN is narrow

and close to point 1.0, which demonstrates that the LD-k NN method performs

extremely robustly over these datasets.

From the above analysis, it can be seen that LD-k NN performs better than

other classifiers in respect of the overall AMR. In considering the k NN-type classifiers, the DW-k NN improves the performance over the traditional V-k NN by

weighting; the CAP and the LPC has improved the k NN method by local centering. The LD-k NN is a more comprehensive method and considers the nearest

neighbor set integrally by local distribution; thus it is reasonable to conclude

that among the k NN-type classifiers the LD-k NN performs best followed by

CAP, LPC, DW-k NN and V-k NN.



Nearest Neighbor Method Based on Local Distribution



249



The SVM, as an advanced and highly respected algorithm, can also achieve a

comparable performance with LD-k NN for certain classification problems; however the performance of the LD-k NN is more robust to application than SVM;

that is, SVM may perform effectively on certain datasets however it also performs

badly on other datasets and is not as stable as the LD-k NN on the experimental

datasets. NBC performs badly in the experimental classification tasks principally

due to the fact that the class conditional independence assumption is too severe

in practical problems.

The LD-k NN can be viewed in terms of a Bayesian classification method

as it is predicated on the Bayes theorem. Since the classification is based on

maximum posterior probability, the LD-k NN classifier can in theory achieve the

Bayes error rate. Additionally, As a k NN-type classifier, LD-k NN can inherit

the advantages of k NN method. Thus, it may be intuitively anticipated that

LD-k NN can perform much more effectively than NBC and other k NN-type

classifiers in most cases.



5



Conclusion



We have introduced the concept of local distribution to the k NN methods for

classification. The proposed LD-k NN method essentially considers the k nearest

neighbors of the query sample as several integral sets by the class labels and

then estimates the local distribution of these integral sets to achieve the LPP

for each class; then the query sample is classified based on the maximum LPP.

This approach provides a simple mechanism for quantifying the probability of

the query sample attached to each class and has been shown to present several

advantages. The experimental results demonstrate the effectiveness and robustness for LD-k NN and show its potential superiority.

In the proposed method, a significant step is the estimation of local distribution. In our experiments, we assume that the local probability distributions

of the instances for each class can be modeled as a Gaussian distribution. However, the Gaussian distribution assumption may not be always appropriate for all

practical problems; there are other probability distribution estimation methods

available, such as Gaussian mixture model [19] and kernel density estimation

[6]. Different local distribution estimation methods for LD-k NN may produce

different results. For a particular classification problem in a specific domain of

interest various methods may be tested to achieve good results; this represents

a future direction for our research.



References

1. Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.

ics.uci.edu/ml

2. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM

Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011). http://

www.csie.ntu.edu.tw/cjlin/libsvm



250



C. Mao et al.



3. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on

Information Theory 13(1), 21–27 (1967)

4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley & Sons

(2012)

5. Dudani, S.: The distance-weighted k-nearest-neighbor rule. IEEE Transactions on

Systems, Man and Cybernetics 4, 325–327 (1976)

6. Duong, T.: ks: Kernel density estimation and kernel discriminant analysis for multivariate data in r. Journal of Statistical Software 21(7), 1–16 (2007)

7. Friedman, J., et al.: Flexible metric nearest neighbor classification. Unpublished manuscript available by anonymous FTP from playfair. stanford. edu (see

pub/friedman/README) (1994)

8. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.:

Bayesian data analysis. CRC Press (2013)

9. Govindarajan, M., Chandrasekaran, R.: Evaluation of k-nearest neighbor classifier

performance for direct marketing. Expert Systems with Applications 37(1), 253–

258 (2010)

10. Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques. Morgan kaufmann (2006)

11. Hand, D., Mannila, H., Smyth, P.: Principles of data mining. MIT Press (2001)

12. Hollander, M., Wolfe, D.A.: Nonparametric statistical methods. John Wiley &

Sons, NY (1999)

13. Hotta, S., Kiyasu, S., Miyahara, S.: Pattern recognition using average patterns of

categorical k-nearest neighbors. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 4, pp. 412–415. IEEE (2004)

14. Kononenko, I., Kukar, M.: Machine learning and data mining. Elsevier (2007)

15. Lehmann, E.L., Casella, G.: Theory of point estimation, vol. 31. Springer (1998)

16. Li, B., Chen, Y., Chen, Y.: The nearest neighbor algorithm of local probability centers. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

38(1), 141–154 (2008)

17. Magnussen, S., McRoberts, R.E., Tomppo, E.O.: Model-based mean square error

estimators for k-nearest neighbour predictions and applications using remotely

sensed data for forest inventories. Remote Sensing of Environment 113(3), 476–

488 (2009)

18. Mitani, Y., Hamamoto, Y.: A local mean-based nonparametric classifier. Pattern

Recognition Letters 27(10), 1151–1159 (2006)

19. Reynolds, D.: Gaussian mixture models. In: Encyclopedia of Biometrics, pp. 659–

663 (2009)

20. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J.,

Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowledge and

Information Systems 14(1), 1–37 (2008)



Immune Centroids Over-Sampling Method

for Multi-Class Classification

Xusheng Ai1 , Jian Wu1(B) , Victor S. Sheng2 ,

Pengpeng Zhao1 , Yufeng Yao1 , and Zhiming Cui1

1



The Institute of Intelligent Information Processing and Application,

Soochow University, Suzhou 215006, China

jianwu@suda.edu.cn

2

Department of Computer Science,

University of Central Arkansas, Conway 72035, USA



Abstract. To improve the classification performance of imbalanced

learning, a novel over-sampling method, Global Immune Centroids OverSampling (Global-IC) based on an immune network, is proposed. GlobalIC generates a set of representative immune centroids to broaden the

decision regions of small class spaces. The representative immune centroids are regarded as synthetic examples in order to resolve the imbalance problem. We utilize an artificial immune network to generate

synthetic examples on clusters with high data densities. This approach

addresses the problem of synthetic minority oversampling techniques,

which lacks of the reflection on groups of training examples. Our comprehensive experimental results show that Global-IC can achieve better

performance than renowned multi-class resampling methods.

Keywords: Resampling · Immune network

anced learning · Synthetic examples



1



·



Over-sampling



·



Imbal-



Introduction



The class imbalance problem typically occurs when there are many more

instances belonging to some classes than others in multi-class classification.

Recently, reports from both academy and industry indicate that the imbalanced

class distribution of a data set has posed a serious difficulty to most classification

algorithms which assume a relatively balanced distribution. Furthermore, identifying rare objects is of crucial importance. In many real-world applications,

the classification performances on the small classes are the major concerns in

determining the property of a classification model.

In the research community of imbalanced learning, almost all reported solutions are designed for binary classification. However, multi-class imbalanced

learning problems appear frequently. Identifying the concept for each class in

these problems is usually equally important. When multiple classes are present in

an application domain, solutions proposed for binary classification problems may

c Springer International Publishing Switzerland 2015

T. Cao et al. (Eds.): PAKDD 2015, Part I, LNAI 9077, pp. 251–263, 2015.

DOI: 10.1007/978-3-319-18038-0 20



252



X. Ai et al.



not be directly applicable, or may achieve a lower performance than expected.

For example, solutions at the data level suffer from the increased search space,

and solutions at the algorithm level become more complicated, since they must

consider small classes and it is difficult to learn the corresponding concepts for

these small classes. Additionally, learning from multiple classes itself implies a

difficulty, since the boundaries among the classes may overlap. The overlap would

downgrade the learning performance.

There exist many researches on multi-class imbalance learning. However,

most ex-isting researches transfer multi-class imbalance learning into binary

using different class decomposition schemes and apply existing binary imbalance learning solutions. These decomposition approaches help reuse the existing

binary imbalance learning solutions. However, they have their own shortcomings, which will be discussed in the next section related work. To overcome

these shortcomings, in this paper we present a novel global multi-class imbalance learning approach, which does not need to transfer multi-class into binary.

This novel approach is based on immune network theory, and utilizes an aiNet

model [3] to generate immune centroids for the clusters of each small class, which

have high data density, called global immune centroids over-sampling (denoted

as Global-IC). Specifically, our novel approach Global-IC resamples each small

class by introducing immune centroids of the clusters of the examples belonging

to the small class. Our experimental results show that Global-IC achieves better

performance, comparing with existing methods.

The rest of this paper is organized as follows. We review related work in

Section 2. Section 3 presents our proposed over-sampling method Global-IC.

Our experimental results and comparisons are shown in Section 4. Finally, we

conclude this paper in Section 5.



2



Related Work



As we said before, most existing solutions for multi-class imbalance classification problems use different class decomposition schemes to convert a multi-class

classifi-cation problem into multiple binary classification problems, and then

apply binary imbalance techniques on each binary classification problem. For

example, Tan et al. [4] used both one-vs-all (OVA) [2] and one-vs-one (OVO)

[1] schemes to break down a multi-class problem to binary problems, and then

built rule-based learners to im-prove the coverage of minority class examples.

Zhao [20] used OVA to convert a multi-class problem into multiple binary problems, and then used under-sampling and SMOTE [5] techniques to overcome

the imbalance issues. Liao [6] investigated a variety of over-sampling and undersampling techniques with OVA for a weld flaw classification problem. Chen et

al. [7] proposed an approach that used OVA to convert a multi-class classification problem to binary problems and then applied some advanced resampling

methods to rebalance the data of each binary problem. All these methods are

based on multi-class decomposition. Multi-class decomposition oversimplifies the

original multi-class problem. It is obvious that each individual classifier learned



Xem Thêm
Tải bản đầy đủ (.pdf) (785 trang)

×