亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Robust Texture Classification via Group-Collaboratively Representation-Based Strategy

2013-06-19 17:39:42XiaoLingXiaandHangHuiHuang

Journal of Electronic Science and Technology 2013年4期

Xiao-Ling Xia and Hang-Hui Huang

Xiao-Ling Xia and Hang-Hui Huang

—In this paper, we present a simple but powerful ensemble for robust texture classification. The proposed method uses a single type of feature descriptor, i.e. scale-invariant feature transform (SIFT), and inherits the spirit of the spatial pyramid matching model (SPM). In a flexible way of partitioning the original texture images, our approach can produce sufficient informative local features and thereby form a reliable feature pond or train a new class-specific dictionary. To take full advantage of this feature pond, we develop a group-collaboratively representation-based strategy (GCRS) for the final classification. It is solved by the well-known group lasso. But we go beyond of this and propose a locality-constraint method to speed up this, named local constraint-GCRS (LC-GCRS). Experimental results on three public texture datasets demonstratetheproposedapproachachieves competitive outcomes and even outperforms the state-of-the-art methods. Particularly, most of methods cannot work well when only a few samples of each category are available for training, but our approach still achieves very high classification accuracy, e.g.an average accuracy of 92.1% for the Brodatz dataset when only one image is used for training, significantly higher than any other methods.

Index Terms—Dictionary learning, group lasso, local constraint,spatialpyramidmatching,texture classification.

1. Introduction

Texture classification is an important problem in the computer vision community with many applications. Yet despite several decades of research, designing a high-accuracy and robust texture classification system for real-world applications remains a challenge due to at least three reasons: the wide range of various natural texture types; the presence of large intra-class variations in texture images, e.g. rotation, scale, and viewpoint, caused by arbitrary viewing and illumination conditions; and the demands of low computational complexity and a desire to limit algorithm tuning[1].

Liuet al.pointed out in [2] that there are four basic elements that constitute a reliable texture classification system: 1) local texture descriptors, 2) non-local statistical descriptors, 3) the design of a distance/similarity measure, and 4) the choice of classifier. Thanks to the emergence of the bag-of-feature words (BoF) model, which treats an image as a collection of unordered appearance descriptors extracted from local patches, quantizes them into discrete“visual words”, and then computes a compact histogram representation for semantic image classification. Recent interests for texture classification tend to represent a texture non-locally by the distribution of local textons[1],[3]?[5].

Inspired by a spatial pyramid matching model (SPM)[6], an extension of BoF, which is a similar framework to the spatial pyramid matching model (SPM), is used to partition an image into increasingly finer segments, but in a more flexible way by exploiting multi-level partitions and permitting various overlapping patterns. Thereby, our method can produce redundant local texture features and form a reliable feature pond containing these feature codes, or a much compacted feature pond (a new dictionary learned from those codes).

To take full advantage of the feature pond, we develop an effective and efficient mechanism for the final classification via the group-collaboratively representationbased strategy (GCRS), which is similar in appearance to the sparse representation-based classification (SRC)[7], but essentially differs in employing group sparsity rather than the simple1l sparse penalty. It is the well-known group lasso problem, but we go beyond of this by exploring the local constraint (LC) to speed up the group lasso as well as promoting intra-group sparsity. We call our classification mechanism as LC-GCRS. The overall flowchart of our method is shown in Fig. 1.

2. Proposed Texture Classification

2.1 Local Texture Descriptor

In our work, we use a single type of feature descriptor, the popular scale-invariant feature transform descriptor (SIFT)[8], which is extracted on a dense grid rather than at interest points and has been shown to yield superiorclassification performance in [9] and [10]. Suppose there areTimages fromCclasses andLcdenotes the index of thecth class, and let thetth image be represented by a set of dense scale-invariant feature transform (SIFT) descriptorsatNlocations identified with their indicesAndmregions of interest are defined on the image, whereanddenotes the set of these regions. Then,means themth region belongs to thelth level,indexes the regions in thelth level. So we use all the dense SIFT descriptors to train a dictionary DICdD×∈R , where R denotes the real number range,dis the dimensionality, andDis the number of atoms. And we employ the learned dictionary back to represent the dense SIFT descriptors into a sparse code vector, as the formulation below:

Fig. 1. Flowchart of our proposed robust texture classification approach (best seen in color).

Each elementkaof the code vectoraindicates the local descriptor’s response to thekth visual word in the dictionary. Let|denote the cardinality of the setNm, meaning the number of elements. We align all the SIFT descriptors belonging to the regionmas a matrix, then the corresponding code matrixis obtained. Here we aggregate the local descriptors’ responses across all thelocations of this region into an-dimensional response vector(thekth row ofA), in which each elementrepresents the response of the local descriptormxat themth location to thekth visual word. After obtaining all the feature descriptorsAwithin a region, we can use a pooling operation to pool these feature descriptors into a single vectoryof a fixed dimension. Before the feature pooling, we first address the relevant partition issues.

? Partition issues. Different from the classical and common SPM method[9]which is three-level pyramid comprising pooling regions of {1×1, 2×2, 4×4}, we adopt a more flexible partition strategy and divide the original image into finer regions, e.g. {3×3, 4×4, 5×5}. Merely relying on this flexible partition fashion, through our observation, the proposed method can indeed capture sufficient local features in different scales and is resilient to local rotations. We do further by permitting different overlapping patterns at the same level. Various overlapping patterns within a single level produce more regions and therefore these redundant local texture features can effectively alleviate the difficulty caused by local variance. In conjunction with our proposed classification mechanism described in Subsection 2.2, the proposed method will lead to state-of-the-art performance of texture classification in the experiments.

? Feature pooling. Feature pooling is essential to map the response vectors within each region into a statistic valuevia some spatial pooling operationf.Among various pooling methods, such as average pooling, max pooling, and some other pooling methods transiting from the average to the max, max pooling is inspired by the mechanism of the complex cells in the primary visual cortex and has been shown a powerful operation empirically and theoretically[9],[11]. In this paper, we also adopt max pooling for its translation-invariance in different levels of partitions[12], thusand the pooled feature codeover the code matrixAof the regionm:Actually, no matter how the size of different regions differs, the pooled feature code is of the same dimension and well summarizes the distribution of the SIFT feature descriptors in each region. This property enables us to adopt the flexible partition way and various overlapping patterns within the same level of partition, thereby producing redundant local texture features.

2.2 Texture Image Representation

As described in the previous subsection, we store all the pooled feature codes of one image to form a matrixas the new texture image representation. That is to say, regardless of the region size and overlapping patterns, all the pooled feature vectors of regions are stored in an orderless way. This orderless storage, in conjunction with max pooling, holds invariance to translation, rotation, and scale, then we will see the benefit of it from the experiment in Section 3.

2.3 Proposed Classification Mechanism

Actually, all the pooled feature codes from regions of various levels of training images can be seen as redundant feature bases, or a feature pond, which can effectively represent pooled feature codes of a new image, and in this way, scale, translation, and rotation invariance can be achieved. This idea has been explored in the SRC scheme[7].

In SRC, a vectorized test imagezis coded collaboratively over the feature pond of all the training samples under the-norm sparsity constraint, whereconsists of all the images from thecth category. For simplicity, SRC first calculates the sparse coefficient vector by

Then, SRC classifies the test imagezindividually to determine which classzshould belong to. In other words, it calculates the reconstruction errorfor all theCclasses, whereis the part fromathat corresponds tocLY. Finally, it selectsas the predicted label.

SRC uses the1l-norm penalty in (2) and the resultant nonzero elements of a scatters, therefore, it is desirable to make the nonzero elements cluster in one part of the feature pond. For this reason, we propose to apply the group-sparse penalty instead of the1l-norm penalty, or the well-known group lasso problem. Moreover, we also keep the coefficient a sparse intra group:

where “°” means the element-wise production, andcLdis the group mask in which the elements corresponding toare 1 and 0 elsewhere, and they are of the same dimension asa. There are several toolboxes to solve (3), and we do not elaborate the algorithms due to limited space.

In fact, the number of the atoms from the feature pond can be very big, and direct solving (3) will be computationally expensive. To circumvent this problem, we borrow the idea of learning locality-constrained linear coding (LLC)[10],[21]by applying theK-nearest neighbors (KNN) search among the feature pond before solving (3) by choosing theKnearest neighbors to formwith indices ()H K, and representing the testing image by solving a much lower-complexity sparse group lasso problem, replacingYin (3) with()KYand the relevant modification of group masksd. After this, an overall coefficient vector (code vector)ais formed by embedding the elements oflocations ofaand zeros elsewhere. The final classification follows the SRC method.

3. Experiment

We evaluate the proposed texture classification framework on three public datasets: the Brodatz dataset[13], KTH-TIPS dataset[14], and UMD texture database[15]. Due to the limited space, we briefly summarize the three datasets in Fig. 2.

Direct comparisons between the proposed and the state-of-the-art methods on three datasets are shown in Table 1. Scores were originally reported or taken from the comparative study in Zhanget al.[4]. For the three datasets, 3, 41, and 20 samples per class are used for training, respectively. Interested readers can refer to the papers of these methods for details. We can easily see that our method achieves comparable performance or even outperforms the state-of-the-art approaches. It is worth noting that our method uses only a single type feature descriptor, i.e. SIFT, whereas other methods simultaneously adopt several types of features, such as multiple histograms in [16]. Moreover, benefiting from our LC-GCS classification method, we avoid more complex classifiers, such as combining several classifier in [2].

Fig. 3 plots the performance of our method vs. the number of training samples on the three databases, as well as the performances of other methods. Here we compare our method with three methods from [16], Mellors’s method[17], and Lazebnik’s method[3]on the Brodatz dataset; with the methods of Zhanget al.[4], Lazebniket al.[3], and Liuet al.[2]on the KTH-TIPS dataset; and with the methods proposed by Lazebniket al.[3], Xiaet al.[16], Xuet al.[1], and Liuet al.[2]on the UMD dataset. From Fig. 3, it is easy to see that our method can extract reliable texture features, and even though only a few training sample images are available, our method can still achieve promising performance.

Fig. 2. Summary of texture datasets used in our experiment.

Fig. 3. Classification rate vs. number of training samples on the three datasets: (a) Brodatz, (b) KTH-TIPS, and (c) UMD.

Table 1: Direct comparisons between the proposed and the state-of-the-art methods

Fig. 4. Confusion matrix on KTH-TIPS database.

Fig. 5. Textures from two categories of KTH-TIPS.

A confusion matrix is presented in Fig. 4. As shown in Fig. 4, the number at rowRand columnCis the proportion ofRclass which is classified asCclass. For example, 7.04% of corduroy images are misclassified as cotton. The average accuracy is 94.8%. While the number of training sample grows, our method can still achieve decent results. Particularly on the Brodatz dataset, when only one sample per category is used for training, our method achieves an impressive result with the accuracy of 90.12%, largely higher than that of other methods. Also on the KTH-TIPS dataset, when only ten sample images of each class are used for training, our method achieves the classification accuracy of 95.14%, much higher than others. Fig. 5 displays some similar textures of the KTH-TIPS dataset. It is easy to see that some texture images are very similar with various scales. This phenomenon explains the misclassification on this dataset.

4. Conclusions and Future Work

Different from many advanced texture classification methods which combine several types of descriptors, we propose a method which uses only a single type of feature descriptor (SIFT). This makes our method much simple but be capable of discriminating textures demonstrated in the experiment. Benefiting from the flexible partition strategy inspired by SPM, our method can produce redundant features to form a reliable feature pond, even though only a few samples of each category are available for training. Instead of the widely-used SVMs, we develop a new classification mechanism called LC-GORS, which is a simple and fast implementation of group lasso with intra-group sparsity, and use the reconstruction error for the final classification. Experiments show the proposed LC-GORS is very effective and efficient. As future work, we intend to introduce a new class-specific sub-dictionary[20]?[24]instead of the original feature pond to improve the performance further. This idea can be transformed to a multi-layer dictionary learning problem. Furthermore, besides the low-level SIFT feature descriptor, other features can be also simultaneously adopted to improve the performance, such as multi-feature fusion[25],[26].

5. References

[1] Y. Xu, X. Yang, H. Ling, and H. Ji, “A new texturedescriptor using multifractal analysis in multi-orientation wavelet pyramid,” inProc. of IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, 2010, pp. 161?168.

[2] L. Liu, P. Fieguth, G. Kuang, and H. Zha, “Sorted random projections for robust texture classification,” inProc. of IEEE Int. Conf. on Computer Vision, Barcelona, 2011, pp. 391?398.

[3] S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation using local affine regions,”IEEE Trans.on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1265?1278, 2005.

[4] J.-G. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid,“Local features and kernels for classification of texture and object categories: a comprehensive study,” inProc. of Conf. on Computer Vision and Pattern Recognition Workshop, doi: 10.1109/CVPRW.2006.121.

[5] M. Crosier and L. D. Griffin, “Use basic image features for texture classification,”Int. Journal of Computer Vision, vol. 88, no. 3, pp. 447?460, 2010.

[6] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: spatial pyramid matching for recognition natural scene categories,” inProc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 2006, doi: 10.1109/CVPR.2006.68.

[7] J. Wright, A.-Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma,“Robust face recognition via sparse representation,”IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210?227, 2008.

[8] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. Journal of Computer Vision, vol. 60, no. 2, pp. 91?110, 2004.

[9] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” inProc. of IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009, pp. 1794?1801.

[10] J.-J. Wang, J.-C. Yang, K. Yu, and F.-J. Lv, T. Huang, and Y. Gong, “Locality-Constrained linear coding for image classification,” inProc. of IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, 2010, pp. 3360?3367.

[11] Y. L. Boureau, J. Ponce, and Y. Lecun, “A theoretical analysis of feature pooling in visual recognition,” inProc. of the 27th Int. Conf. on Machine Learning, Haifa, 2010.

[12] J.-C. Yang, K. Yu, and T. Huang, “Supervised translation-invariant sparse coding,” inProc. of IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, 2010, pp. 3517?3524.

[13] P. Brodatz,Textures: A Photographic Album for Artists and Designers, New York: Dover Publications, 1966.

[14] E. Hayman, B. Caputo, M. Fritz, and J. O. Eklundh, “On the significance of real-World conditions for material classification,”Lecture Notes in Computer Science, vol. 3024, pp. 253-266, 2004, doi: 10.1007/978-3-540-24673-2_21

[15] Y. Xu, H. Ji, and C. Fermuller, “Viewpoint invariant texture description using fractal analysis,”Int. Journal of Computer Vision, vol. 83, no. 1, pp. 85?100, 2009.

[16] G. S. Xia, J. Delon, and Y. Gousseau, “Shape-based invariant texture indexing,”Int. Journal of Computer Vision, vol. 88, no. 3, pp. 382?403, 2010.

[17] M. Mellor, B. W. Hong, and M. Brady, “Locally rotation, contrast, and scale invariant descriptors for texture analysis,”IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 52?61, 2008.

[18] M. Varma and A. Zisserman, “A statistical approach to material classification using image patches,”IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp. 2032?2047, 2009.

[19] Y. Xu, S. B. Huang, H. Ji, and C. Fermuller, “Combining powerful local and global statistics for texture description,”inProc. of IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009, pp. 573?580.

[20] M. Yang, L. Zhang, J. Yang, and D. Zhang, “Metaface learning for sparse representation based face recognition,” inProc. ofthe 17th IEEE Int. Conf. on Image Processing, Hong Kong, 2010, pp. 1601?1604.

[21] S. Kong and D. Wang. Multi-level feature descriptor for robust texture classification via locality-constrained collaborative strategy. [Online]. Available: http://arxiv.org/abs/1203.0488

[22] S. Kong and D. Wang, “A dictionary learning approach for classification: separating the particularity and the commonality,”Lecture Notes in Computer Science, vol. 7572, pp. 186?199, 2012, doi: 10.1007/978-3-642-33718-5_14.

[23] S. Kong and D.-H. Wang, “Learning exemplar-represented manifolds in latent space for classification,”Lecture Notes in Computer Science, 2013, doi: 10.1007/978-3-642-40994-3_16.

[24] S. Kong, X.-K. Wang, D.-H. Wang, and Fei Wu, “Multiple feature fusion for face recognition,” inProc. of the10th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, Shanghai, 2013, doi: 10.1109/FG.2013. 6553718.

[25] S. Kong and D. Wang, “Learning individual-specific dictionaries with fused multiple features for face recognition,” in10th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, Shanghai, 2013, doi: 10.1109/FG.2013.6553710.

Xiao-Ling Xiawas born in Hubei, China in 1966. She received the Ph.D. degree from Shanghai Jiao Tong University in image processing and pattern recognition in 1994. Now, she works with Donghua University, Shanghai, China as an associate professor. Her research interests include image processing and data visualization.

Hang-Hui Huangwas born in Shaanxi, China in 1986. He received the B.S. degree from Donghua University in 2010. He is currently a graduate student with the Department of Computer Science and Technology, Donghua University. His research interests include computer vision, machine learning, and pattern recognition.

May 28, 2013; revised September 27, 2013

X.-L. Xia is with the Department of Computer Science and Technology, Donghua University, Shanghai 201620, China (Corresponding author email: sherlysha@dhu.edu.cn).

H.-H. Huang is with the Department of Computer Science and Technology, Donghua University, Shanghai 201620, China (email: yellow.beyond@mail.dhu.edu.cn).

Color versions of one or more of the figures in this paper are available online at http://www.intl-jest.com.

Digital Object Identifier: 10.3969/j.issn.1674-862X.2013.04.014

Journal of Electronic Science and Technology2013年4期

Journal of Electronic Science and Technology的其它文章: FPGA-Based Network Traffic Security: Design and Implementation Using C5.0 Decision Tree Classifier; Characterization of Fundamental Logics for the Sub-Threshold Digital Design; Saudi License Plate Recognition Algorithm Based on Support Vector Machine; Mobile Web Middleware of NFC Context-Awareness Applications; Metamaterial Absorbers in Terahertz Band; Analysis of the Signal of Singing Using the Vibrato Parameter in the Context of Choir Singers