Unsupervised Feature Selection Using Structured Self-Representation

2018-06-29 02:53:10YanbeiLiuKaihuaLiuXiaoWangChangqingZhangandXianchaoTang

Journal of Harbin Institute of Technology(New Series) 2018年3期

Yanbei Liu, Kaihua Liu, Xiao Wang, Changqing Zhang and Xianchao Tang

(1.School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China;2.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;3.School of Computer and Science Technology, Tianjin University, Tianjin 300072, China)

1 Introduction

Recent years have witnessed a growing number of application areas involving high-dimension data, such as machine learning, data mining, bio-informatics and computer vision[1-2]. In practice, the learning task is not relevant to all the features, whereas lots of them are usually irrelevant and redundant. This not only dramatically increases the computation and storage cost, but also induces over-fitting and incomprehensible models. To overcome these issues, feature selection technique has been considered as an effective tool for dimensionality reduction by removing irrelevant and redundant features. The aim of feature selection is to obtain a subset of features by removing the noise and redundancy in original features, so that the more intrinsic representation of data and the better performance is achieved[3].

A great amount of feature selection methods have been proposed and categorized into supervised methods[4-6]and unsupervised methods[7-9]. Supervised feature selection methods increase the performance of learning tasks by utilizing label information. However, label information is usually costly in reality, so unsupervised feature selection holds great potential in real-world applications.

The early unsupervised feature selection algorithms use feature ranking techniques as the principle criteria for feature selection[8, 10-12]. One of the main limitations of these methods is that they treat features independently without considering possible correlation among features. To address this problem, a series of algorithms[9, 13-15]have been proposed. A typical method is spectral clustering based algorithms, which can select a feature subset to best preserve the underlying structure between clusters. Spectral clustering based methods explore the cluster structure of data using matrix factorization for spectral analysis, and then select features via sparsity regularization models. Nevertheless, they heavily rely on the learned graph Laplacian. Noises in features may lead to their unreliability. Recently, the self-representation technique has shown signicant potential in many tasks, such as subspace clustering[16-19]and active learning[20-21]. Motivated by this, some researchers consider the feature selection from the perspective of self-representation property among features[22], i.e., each feature can be well expressed by the linear combination of other relevant features. However, considering the local geometrical structure of feature, it is more reasonable to constrain the representation coefficients of two features to be close when the features are close to each other.

In this paper, we propose a novel method, called Structured Self-Representation (SSR) for unsupervised feature selection. SSR can accurately select the most representative features by exploring the local geometrical structure of features. Specifically, based on self-representation property, each feature can be commendably approximated by the linear combination of other similar features. Meanwhile, the representation coefficients of the features are constrained by their local geometrical structure. In the objective function, two important regularization terms are employed to incorporate the locality information and enforce the sparsity of the coefficients for feature reconstruction, respectively. Furthermore, an efficient algorithm is presented to optimize the problem. Experiments on the synthetic dataset and six real-world datasets demonstrate that our proposed SSR has a better performance than other state-of-the-art algorithms.

Notations: In the following paper, boldface uppercase letters indicate matrices and boldface lowercase letters indicate vectors. Letpi,pjdenote theith row,j-column for a matrixP={pij}. For a vectorv, its2norm is given asfor a matrix, its2,1norm is defined as

2 Related Work

In this section, we mainly review most existing unsupervised feature selection methods. These methods can be roughly divided into three main categories: filter, wrapper and embedded methods.

Filter methods use feature ranking techniques as the principle criteria for feature selection due to their simplicity and good success reported for practical applications. They usually employ a ranking criterion to score the features and use a threshold to remove features below it. The typical methods include: Maximum Variance[10], Laplacian Score (LS)[8], Spectral Feature Selection (SPEC)[11], Trace Ratio[12], and Eigenvalue Sensitive Feature Selection[23]. However, the main limitation of these methods is that they treat features independently without considering possible correlation among features and thus the selected subset might not be optimal in that a redundant subset might be obtained.

Wrapper methods commonly use the clustering as the mining algorithm[24-27]. These methods consider feature selection and clustering simultaneously and search for features better suited to clustering with the aim of improving clustering performance. However, the optimization program of these methods is known to be NP-hard[28]and the searching becomes quickly computationally intractable.

Embedded methods are developed to perform feature selection and fit a learning model simultaneously. The basic principle behind these methods is to use sparsity regularization for all features indicating which ones are selected. To enforce the sparsity of the feature weights, various sparsity inducing norms has been explored, such as sparse logistic regression[29-30]and group sparsity[13-14, 31-34]. The typical methods include: Multi-Cluster Feature Selection (MCFS)[9], joint Feature Selection and Subspace Learning (FSSL)[35], Unsupervised Discriminative Feature Selection (UDFS)[14], and Unsupervised Feature Selection using Feature Similarity (FSFS)[36]. Recently, an unsupervised feature selection method based on the regularized self-representation is proposed to choose the most representative features and shows state-of-the-art results[22]. However, this method only considers the representativeness of selected features and neglects the local geometric structure among them.

3 Background

The recent work most related to our proposed method is the regularized self-representation (RSR)[22]for unsupervised feature selection. The key of RSR is to select the most representative features based on self-representation property. The objective function can be formulated as follows:

(1)

whereX=[f1,f2,…,fm]∈Rn×mis the data matrix withnsamples andmdimensional features,andf1,f2,…,fm∈Rnare the corresponding feature vectors. The parameterλ>0 sets the trade-off between the two terms.Ais the coefficient matrix. Note that,2,1-norm minimization term is firstly adopted into the loss function, which enforces robustness to outlier samples; meanwhile,2,1norm is also applied to the feature selection coefficient matrix to ensure the matrix sparse in rows.

With the obtained coefficient matrixA, rows of matrixAare sorted by its row-sum values in the decreasing order and the feature selection task is carried out by choosing thekfeatures corresponding to the topkrow-sum ofA.

4 Unsupervised Feature Selection Using Structured Self-representation

From the objective function (1), it can be observed that RSR reconstructs each feature via a linear combination of all the selected features. Geometrically speaking, it is more reasonable to reconstruct coefficient matrixAby the linear combination to preserve the local geometrical information of features. For any two selected featuresfi,fj, we denote ||fi-fj||2to be the Euclidean distance between them. Intuitively, the smaller ||fi-fj||2is (fiis closer tofj), the smaller ||ai-aj||2should be (aiis also closer toaj). Therefore, when the features are similar to each other their representation coefficients should be also similar to each other. Motivated by this, we introduce the local geometrical structure formulated by minimizing the following[37]:

(2)

(3)

Putting Eq.(1) and Eq.(2) together, the proposed method SSR is to solve the following optimization problem:

(4)

whereαandβare two positive trade-off parameters to adjust the degree of penalty.

5 Optimization

Although the Eq.(4) is convex, it is not easy to be settled because two non-smooth terms are contained. In the following section, an iterative algorithm is presented to settle this optimization problem.

5.1 Optimization Algorithm

Take the derivative of Eq.(4) with regard toAand set the derivative to zero, we get:

XTGXA-XTGX+αHA+βAL=0

(5)

whereGdenotes a diagonal matrix with itsith diagonal entry asgii=(2||xi-xiA||2)-1, andHdenotes a diagonal matrix with itsith diagonal entry ashii=(2||ai||2)-1. Note that the derivation of ||A||2,1with regard toAis computed to 2HA. Then the Eq.(5) can be rewritten as:

(XTGX+αH)A+βAL=XTGX

(6)

which is a standard Sylvester equation[38]and has a unique solution as given in Proposition 1.

Proposition1There is a unique solution for the Sylvester Eq.(6).

ProofMatricesG,XTXandHare all positive semi-definite, hence term (XTGX+αH) is positive semi-definite and its eigenvalues are all less than or equal to zero:ξj≤0,?j. Meanwhile, matrixLis also positive definite, hence its eigenvalues are all greater than zero:φi>0,?i. Therefore, we getφi+ξj>0 for all the eigenvalues of (XTGX+αH) andL. According to Ref.[39], there is a unique solution for the Sylvester Eq.(6).

The Bartels-Stewart algorithm[38]can be used to solve the Sylvester equation. This algorithm transforms coefficient matrices into Schur forms using QR decomposition, and then settle the resulting triangular system by back substitution.

In summary, the objective function in Eq.(3) is solved through an iterative way as illustrated in Algorithm 1. The most time-consuming part of Algorithm 1 is solving Sylvester equation in Step 2. Letmdenote the dimension size. The Bartels-Stewart algorithm has a computational complexity ofO(m3) for solving the Sylvester equation. If Algorithm 1 hastiterations, its overall time complexity turns intoO(tm3).

Algorithm 1:Unsupervised feature selection using structured self-representation (SSR) Input: The data matrix X∈Rn×m.Initialization:A∈Rm×m, ξ=10-6.While not converged do 1) Solve matrix G, in which the i-th diagonal element of G is: gii=12||xi-xiA||2 Solve matrix H, in which the i-th diagonal element of His: hii=12||ai||2. 2)Solving the Sylvester Eq. (6) by the Bartels-Stewart algorithm to get the coefficient matrix A. 3)Check the convergence condition by: ||X-XA||?<ξend whileOutput: according to ||ai||2(i=1,…,m) in descending order, all m features are sorted and the top k ranked features can be selected.

5.2 Convergence Analysis

The following section shows that the objective function in Eq.(4) decreases in each iteration and it is guaranteed to convergence.

(7)

Next, we give a simple proof of convergence of our algorithm in the Theorem 1.

Theorem1Algorithm 1 decreases the objective value of Eq.(4).

αTr(ATHA)+βTr(ALAT)

(8)

Therefore, the following inequality holds:

αTr(ATHA)+βTr(ALAT)

(9)

which is further equivalent to:

(10)

According to the Lemma 1, we have the following two inequalities:

(11)

(12)

Summing Eqs.(10)-(12) in the two sides, we obtain:

(13)

Therefore, Theorem 1 can be proved. Note that the derivative of Eq.(4) with regard toAis exactly the Eq.(5), thusA,G,Hwill satisfy the Eq.(5) in the convergence. Because closed form solution can be obtained in each iteration, the proposed algorithm converges fast.

6 Experiments

In this section, we conduct experiments to evaluate the effectiveness of SSR. Following Refs.[9, 13], performances of the proposed algorithm are evaluated in terms of clustering.

6.1 Experimental Settings

1)Datasets.

In the experiments, six public datasets are used to compare the performance of different unsupervised feature selection algorithms. These datasets include two face image datasets (ORL[9]and COIL20[9]), one hand written digit image dataset (USPS[40]), one spoken letter recognition dataset (Isolet[9]) and two biological datasets (Colon[41]and Lung[42]). The detailed information of these datasets is listed in Table 1. Example images of three datasets among all datasets are shown in Fig.1.

Table 1 Summary of datasets

Fig.1 Example images of three datasets used in this paper

2)Compared algorithms.

The proposed method SSR is compared with the following representative unsupervised feature selection methods:

·AllFea: The method selects all the features as a baseline method.

·Laplacianscore(LS)[8]: The method selects the features according to their capacity of locality preserving of the data manifold structure.

·MCFS[9]: The method evaluates the features by using spectral regression with1norm regularization.

·UDFS[14]: The method exploits feature correlations and local discriminative information simultaneously to select features.

·NDFS[13]: The method selects features by a joint framework of nonnegative spectral analysis and2,1regularized regression.

·RSR[22]: The method choosess the most representative features by using the self-representation property with2,1norm regularization.

There are some parameters to be set in advance. Following Refs.[15, 43], for LS, MCFS, UDFS and NDFS, the neighborhood size is fixed to be 5 for all the datasets. To fairly compare all the unsupervised feature selection methods, the parameters are tuned by a “grid-search” strategy from{10-4,10-3,…,103,104} for all the compared algorithms. The number of chosen features is set as {50,100,150,200,250,300} for all the datasets except USPS. Since the total feature number of USPS is 256, we set the number of chosen features as {50,100,150,200,250} for this dataset. After choosing the features, both the clustering and classication are performed based on the selected features. Then the average results are reported over different number of features.

3) Evaluation metrics.

Following Refs.[11, 14], all the unsupervised feature selection algorithms are evaluated on the clustering task. To better estimate the clustering performance, we adopt two metrics, i.e., Accuracy (ACC) and Normalized Mutual Information (NMI). Supposelibe the cluster label obtained by all the different algorithms andpibe the ground truth collected by the dataset. TheACCis given as follows:

(14)

whereδ(x,y) denotes the delta function, whereδ(x,y)=1 ifx=yandδ(x,y)=0 otherwise.map(li) denotes the mapping function, which transforms each cluster labellito the equivalent label in the dataset.ACCis used to measure the percentage of correct clustering labels against the ground truth class labels. The largerACCis, the better performance has. Given two variablesPandQ,NMIis given as follows:

(15)

wherePis the clustering result andQis the true label.H(P) andH(Q) are the entropies ofPandQ, respectively.R(P,Q) is the mutual information betweenQandP.NMIshows the consistency between clustering results and ground truth labels. Meanwhile, a largerNMIindicates a better clustering result.

6.2 Experiments on Synthetic Data

The experiment on synthetic data is first conducted to demonstrate that the proposed SSR can effectively select representative features using local structure information. The synthetic data has 135 samples and 225 features. There are 6 independent types for the features. The features in the same type can be represented each other whereas the features between different types are orthogonal. The numbers of data produced from the six types of features are 10, 15, 20, 25, 30 and 35, respectively. The numbers of the six types of features are 25, 30, 35, 40, 45 and 50, respectively, which are relatively large to the data. The features within each type are randomly generated in the range of 0 and 1. Furthermore, to enhance difficulty in feature selection, the noise data randomly generated in the range of 0 and 0.5 is added to synthetic features.

Fig.2(a) exhibits the learned matrixAin Eq.(1) by algorithm RSR, and Fig.2(b) exhibits the learned matrixAin Eq.(1) by algorithm SSR. Fig.2(c) is the original similarity matrix of features. Compared with RSR, the learned matrixAcan reveal the underlying feature structure more clearly. The results clearly indicate that our method could effectively select representative features by exploring the local structure of features.

6.3 Experiments on Real-world Data

To better estimate the performances of unsupervised feature selection algorithms (except for AllFea), the averaged clustering results are reported over different number of chosen features since the optimal number of chosen features is unpredictable in advance. In the experiment, we compare the clustering performances of various feature selection algorithms in terms of ACC and NMI. The best results are marked in bold. TheK-means clustering algorithm is carried out with the selected features of all the competing algorithms. Because the clustering results vary with different initializations, we repeatedly run the experiment with different random initializations for 20 times and report the average results for all the competing algorithms.

Fig.2 The learned A by RSR and DISR on the synthetic data. Obviously, SSR can reveal the underlying feature structure more clearly

The experimental results are shown in Table 2 and Table 3. From two tables, it can be observed that the performance of the proposed method SSR is superior to the rest of the algorithms on most of the datasets in terms ofACCandNMI. In particular, compared with the clustering results using all features (AllFea), SSR achieves the best performance on all the datasets, which implies that it cannot only reduce the number of features, but also improve the clustering performance. Moreover, SSR outperforms other methods on most of six datasets in terms ofACCandNMI, further demonstrating the effectiveness of SSR. AS shown in two tables, for LS, UDFS and MCFS, both of them select the features that try preserving the data similarity of the original feature space. UDFS and MCFS utilize the regression model while LS uses the feature ranking model. The results of UDFS and MCFS are better than LS, which demonstrates the superiority of the regression manner. For NDFS, and RSR, they both consider the relationship between features. NDFS explores the relationship between two features and adopts clustering technique to choose the representative features, while RSR utilizes the self-representation property of features to consider their relationship jointly and achieves a promising result. However, the proposed SSR considers the representation property and the local structure information of features simultaneously to select representative features and achieves better performances.

For further evaluating the performance of SSR, performances of various feature selection algorithms are compared in terms of classification accuracy. In classication task, for each dataset, we randomly select 50% of all the data as the training data and the rest as the testing data. This process is repeated for 50 times with random data selection. The nearest neighbor classier[44]is adopted for the classifi cation. The average results are reported in Table 4. Similarly, bold values in the table indicate the best results. From Table 5, it can be observed that SSR also produces superior classifi cation performance compared with other algorithms.

Table2Clusteringresults(ACC% ±std)ondifferentdatasetsfordifferentfeatureselectionalgorithms(Thebestperformancesarehighlightedinbold)

MethodColon[41]Lung[42]USPS[40]Isolet[9]ORL[9]COIL20[9]AllFea55.5±1.13 67.3±6.50 61.9±3.2350.2±6.08 51.1±3.48 54.8±3.64LS[8]58.1±1.80 67.7±4.03 64.8±3.6647.7±2.64 46.8±3.07 54.6±3.41MCFS[9]53.2±1.35 75.6±7.58 64.5±4.4851.2±3.19 54.3±3.8157.7±5.10UDFS[14]63.9±2.66 68.4±5.00 65.5±4.1055.4±3.41 51.0±2.38 58.9±2.52NDFS[13]63.7±1.37 72.2±1.01 66.6±3.8658.5±3.20 51.3±2.49 61.6±3.82RSR[22]59.7±2.20 72.6±6.10 64.5±4.9255.3±3.50 50.2±3.10 60.7±4.84SSR64.2±3.5080.8±6.8867.1±2.3657.0±2.86 52.7±2.03 63.7±2.30

Table3Clusteringresults(NMI% ±std)ondifferentdatasetsfordifferentfeatureselectionalgorithms(Thebestperformancesarehighlightedinbold)

MethodColon[41]Lung[42]USPS[40]Isolet[9]ORL[9]COIL20[9]AllFea1.34±0.2048.5±7.56 59.1±1.3368.3±3.08 71.6±2.36 70.6±2.03LS[8]1.83±0.8649.0±2.72 60.3±1.1963.7±1.26 68.7±1.91 69.9±2.42MCFS[9]2.38±1.0554.2±6.89 59.4±1.6867.3±1.41 73.2±2.4171.4±2.19UDFS[14]6.07±3.2548.3±2.68 59.6±0.9470.2±1.50 70.9±1.47 73.2±1.41NDFS[13]4.24±8.9350.3±5.07 60.9±1.3872.9±1.4372.1±1.43 75.5±3.03RSR[22]5.43±0.9453.2±7.11 60.3±2.1270.5±1.19 70.6±0.54 74.1±2.36SSR9.62±4.3158.8±6.9462.3±1.9971.0±1.17 72.5±1.46 76.0±2.24

Table4Classificationaccuracy(ACC% ±std)ondifferentdatasetsfordifferentfeatureselectionalgorithms(Thebestperformancesarehighlightedinbold)

MethodColon[41]Lung[42]USPS[40]Isolet[9]ORL[9]COIL20[9]AllFea79.2±0.00 91.2±0.00 92.8±0.0080.6±0.00 83.7±0.00 97.1±0.00LS[8]75.6±5.09 88.2±6.23 93.6±4.0871.3±3.36 70.1±4.28 92.5±3.08MCFS[9]81.2±2.85 92.1±6.78 94.8±2.6777.0±1.89 85.5±3.48 96.8±2.67UDFS[14]82.4±3.66 93.0±5.09 94.6±3.5577.9±4.01 87.8±2.48 97.1±2.12NDFS[13]83.28±5.17 90.1±3.06 94.2±3.1179.8±3.20 86.6±2.89 96.6±3.10RSR[22]82.6±4.21 92.8±5.10 95.0±1.9878.9±3.15 86.1±3.24 97.5±2.26SSR83.9±3.6193.2±4.0895.8±2.1979.2±2.62 86.9±1.75 98.4±2.50

Next, a study of parameter sensitivity in SSR is provided as follows. There are two important parametersαandβ, in our method SSR, whereαis used to control the sparsity of selected features andβcontrols the degree of low-dimensional latent semantics reserving the local geometry structure in the original space. In this subsection, we study how the clusteringACCandNMIof SSR vary with the two parameters. Due to the space limit, we only show the results on datasets COIL20 and Lung. The experimental results are shown in Fig.3 and Fig.4. It can be observed that the proposed algorithm is not sensitive toαandβwith wide ranges. Furthermore, experiments are conducted to compare the results with well-chosen parameters to ones with the default parameters. In the proposed experimental results,α=0.1 andβ=0.1 are set as our default parameter values. The comparative results are given in Fig.5. It can be seen that, with the default parameter, the proposed algorithm gains relatively good results on all the datasets. That is, without tuning parameters, our method can work well on the given datasets in most cases. However, the performance of algorithm is more sensitive to the number of chosen features, which is still an open issue.

Finally, an experimental study of the speed of convergence of the proposed Algorithm 1 is given. The convergence curves of the objective function value over all the datasets are given in Fig.6. It can be observed that the proposed Algorithm 1 converges very fast and almost within 10 iterations. So, the proposed optimization algorithm is effective and it converges quickly.

Fig.3 The ACC and NMI of SSR with respect to the parameters α, β and feature numbers on the dataset COIL20

Fig.4 The ACC and NMI of SSR with respect to the parameters α, β and feature numbers on the dataset Lung

Fig.5 Performance comparison between the cases with the default (i.e., α=0.1 and β=0.1) and the best parameters

Fig.6 Convergence curves of the objective function value over all the datasets

7 Conclusions

In this paper, we propose a novel Structured Self-Representation (SSR) model for an unsupervised feature selection by integrating local geometrical structure and self-representation property of features simultaneously. Two important regularization terms are utilized to incorporate the locality information and enforce the sparsity of the coefficients for feature reconstruction, respectively. To solve the proposed objective function, a simple but efficient algorithm is presented by an iterative manner. In the experiments, the proposed algorithm is applied in both clustering and classification task. Experiments on the synthetic dataset and six real-world datasets have validated the effectiveness of our proposed method.

[1]Wang M, Hua X S, Hong R, et al. Unified video annotation via multi-graph learning. IEEE Transactions on Circuits and Systems for Video Technology, 2009, 19(5): 733-746.DOI：10.1109/TCSVT.2009.2017400.

[2]Gupta M D, Xiao J. Non-negative matrix factorization as a feature selection tool for maximum margin classifiers. Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2011.2841-2848. DOI: 10.1109/CVPR.2011.5995492.

[3]Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3(3): 1157-1182.

[4]Zhao Z, Wang L, Liu H, et al. On similarity preserving feature selection. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(3): 619-632. DOI: 10.1109/TKDE.2011.222.

[5]Nie F, Huang H, Cai X, et al. Efcient and robust feature selection via joint2,1-norms minimization. Advances in Neural Information Processing Systems (NIPS), 2010.1813-1821.

[6]Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (8): 1226-1238. DOI:10.1109/TPAMI.2005.159.

[7]Liang D, Shen Z, Xuan L, et al. Local and global discriminative learning for unsupervised feature selection. Proceedings of the 2013 IEEE 13th International Conference on Data Mining (ICDM). Piscataway:IEEE, 2013.131-140.DOI:10.1109/ICDM.2013.23.

[8]He X, Cai D, Niyogi P. Laplacian score for feature selection. Proceedings of the Advances in Neural Information Processing Systems (NIPS). Vancouver, 2005. http://papers.nips.cc/paper/2909-laplacian-score-for-feature-selection.pdf.[2017-05-19]

[9]Cai D, Zhang C, He X. Unsupervised feature selection for multi-cluster data. Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD). New York:ACM, 2010.333-342.DOI: 10.1145/1835804.1835848.

[10]Krzanowski W J. Selection of variables to preserve multivariate data structure, using principal components. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1987,36(1): 22-33. DOI: 10.2307/2347842.

[11]Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning. Proceedings of the 24th Annual International Conference on Machine Learning (ICML). New York: ACM, 2007. 1151-1157.DOI: 10.1145/1273496.1273641.

[12]Nie F, Xiang S, Jia Y, et al. Trace ratio criterion for feature selection. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. Palo Alto:AAAI,2008, 2: 671-676.

[13]Li Z, Yang Y, Liu J, et al. Unsupervised feature selection using nonnegative spectral analysis. Proceedings of the Twenty-sixth AAAI Conference on Artificial Intelligence. Palo Alto:AAAI ,2012.1026-1032.

[14]Yang Y, Shen H T, Ma Z, et al.2,1-norm regularized discriminative feature selection for unsupervised learning.Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI). Palo Alto:AAAI , 2011. 1589-1597.

[15]Qian M, Zhai C. Robust unsupervised feature selection. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI). Palo Alto:AAAI, 2013.1621-1627.

[16]Elhamifar E, Vidal R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2765-2781.

[17]Lu C, Feng J, Lin Z, et al. Correlation adaptive subspace segmentation by trace lasso. Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE,2013. 1345-1352. DOI: 10.1109/ICCV.2013.170.

[18]Wang S, Yuan X, Yao T, et al. Efficient subspace segmentation via quadratic programming. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI). Palo Alto:AAAI, 2011.519-524.

[19]Lu C Y, Min H, Zhao Z Q, et al. Robust and efficient subspace segmentation via least squares regression. European Conference on Computer Vision (ECCV). Berlin:Springer,2012, 7578:47-360. DOI: 10.1007/978-3-642-33786-4_26.

[20]Nie F, Wang H, Huang H, et al. Early Active Learning via Robust Representation and Structured Sparsity. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI). Palo Alto:AAAI,2013.1572-1578.

[21]Hu Y, Zhang D, Jin Z, et al. Active learning via neighborhood reconstruction. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI). Palo Alto:AAAI, 2013.1415-1421.

[22]Zhu P, Zuo W, Zhang L, et al. Unsupervised feature selection by regularized self-representation. Pattern Recognition, 2015, 48(2): 438-446.DOI: 10.1016/j.patcog.2014.08.006.

[23]Jiang Y, Ren J. Eigenvalue sensitive feature selection. Proceedings of the 28th International Conference on Machine Learning. Washington ,2011. 89-96.

[24]Constantinopoulos C, Titsias M K, Likas A. Bayesian feature and model selection for Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(6): 1013-1023. DOI: 10.1109/TPAMI.2006.111.

[25]Dy J G, Brodley C E, Wrobel S. Feature selection for unsupervised learning. Journal of Machine Learning Research, 2004, 5(8): 845-889.

[26]Law M H C, Figueiredo M A T, Jain A K. Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(9): 1154-1166. DOI: 10.1109/TPAMI.2004.71.

[27]Roth V, Lange T. Feature selection in clustering problems. Advances in Neural Information Processing Systems (NIPS). Massachusetts :MIT, 2003. http://papers.nips.cc/paper/2486-feature-selection-in-clustering-problems.pdf.

[28]Amaldi E, Kann V. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 1998, 209(1/2): 237-260.DOI: 10.1016/S0304-3975(97)00115-1.

[29]Ng A Y. Feature selection,1vs.2regularization, and rotational invariance. Proceedings of the Twenty-First International Conference on Machine Learning. New York:ACM, 2004.78-86. DOI:10.1145/1015330.1015435.

[30]Cawley G C, Talbot N L C, Girolami M. Sparse multinomial logistic regression via Bayesian1regularization. Advances in Neural Information Processing Systems (NIPS). Neural Information Processing Systems Foundation, Inc. 2007. 209-217.

[31]Zhao Z, Wang L, Liu H. Efficient Spectral Feature Selection with Minimum Redundancy. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI). Palo Alto:AAAI, 2010.673-678.

[32]Hou C, Nie F, Yi D, et al. Feature selection via joint embedding learning and sparse regression. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI). Palo Alto:AAAI, 2011.1324-1330.

[33]Liu X, Wang L, Zhang J, et al. Global and local structure preservation for feature selection. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(6): 1083-1095. DOI: 10.1109/TNNLS.2013.2287275.

[34]Li Z, Liu J, Yang Y, et al. Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(9): 2138-2150. DOI: 10.1109/TKDE.2013.65.

[35]Gu Q, Li Z, Han J. Joint feature selection and subspace learning. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence. Palo Alto:AAAI, 2011.1294-1302. DOI:10.5591/978-1-57735-516-8/IJCAI11-219.

[36]Mitra P, Murthy C A, Pal S K. Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 301-312. DOI: 10.1109/34.990133.

[37]Hu H, Lin Z, Feng J, et al. Smooth representation clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Picataway: IEEE, 2014.3834-3841.

[38]Bartels R H, Stewart G W. Solution of the matrix equationAX+XB=C[F4]. Communications of the ACM, 1972, 15(9): 820-826.

[39]Lancaster P. Explicit solutions of linear matrix equations. Siam Review, 1970, 12(4): 544-566.

[40]Hull J J. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16(5): 550-554. DOI: 10.1109/34.291440.

[41]Alon U, Barkai N, Notterman D A, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. The National Academy of Sciences, 1999, 96(12): 6745-6750.

[42]Bhattacharjee A, Richards W G, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. The National Academy of Sciences, 2001, 98(24): 13790-13795.DOI: 10.1073/pnas.191502998.

[43]Wang S, Tang J, Liu H. Embedded unsupervised feature selection. Proceedings of the National Conference on Artificial Intelligence.Palo Alto:AAAI, 2015.470-476.

[44]Lindenbaum M, Markovitch S, Rusakov D. Selective sampling for nearest neighbor classiers. Machine Learning, 2004, 54 (2): 125-152.

Journal of Harbin Institute of Technology(New Series)2018年3期

Journal of Harbin Institute of Technology(New Series)的其它文章: Output Feedback Control of Discrete-Time T-S Fuzzy Affine Systems Using Quantized Measurements; Stable Isotopes and Chloride Applied as Soil Water Tracers for Phreatic Evaporation Experiment; A Novel Tracking-by-Detection Method with Local Binary Pattern and Kalman Filter; Electric Thrusters Redundancy Configuration Strategy Study for All Electric Propulsion Platform Station-Keeping; Review: Progress and Trends in Ultrasonic Vibration Assisted Friction Stir Welding; Review:Recent Developments in the Uncertainty-Based Aero-Structural Design Optimization for Aerospace Vehicles

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放