亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Sentiment Classification Based on Piecewise Pooling Convolutional Neural Network

2018-09-11 05:13:46YuhongZhangQinqinWangYulingLiandXindongWu

Computers Materials&Continua 2018年8期

Yuhong Zhang , Qinqin Wang Yuling Li and Xindong Wu

Abstract: Recently, the effectiveness of neural networks, especially convolutional neural networks, has been validated in the field of natural language processing, in which,sentiment classification for online reviews is an important and challenging task. Existing convolutional neural networks extract important features of sentences without local features or the feature sequence. Thus, these models do not perform well, especially for transition sentences. To this end, we propose a Piecewise Pooling Convolutional Neural Network (PPCNN) for sentiment classification. Firstly, with a sentence presented by word vectors, convolution operation is introduced to obtain the convolution feature map vectors. Secondly, these vectors are segmented according to the positions of transition words in sentences. Thirdly, the most significant feature of each local segment is extracted using max pooling mechanism, and then the different aspects of features can be extracted. Specifically, the relative sequence of these features is preserved. Finally, after processed by the dropout algorithm, the softmax classifier is trained for sentiment classification. Experimental results show that the proposed method PPCNN is effective and superior to other baseline methods, especially for datasets with transition sentences.

Keywords: Sentiment classification, convolutional neural network, piecewise pooling,feature extract.

1 Introduction

Sentiment classification, also called sentiment analysis or opinion mining, is to study people’s opinions, sentiments, evaluations, attitudes and sentiment from text and reviews[Liu and Zhang (2012)], which is an important task in natural language processing (NLP).With the successful application of deep learning in visual and speech recognition, some researchers have applied deep learning models such as recurrent neural networks (RNN)[Yoav (2016); Socher, Pennington, Huang et al. (2011); Sutskever, Vinyals and Le(2014); McCann, Bradbury, Xiong et al. (2017); Li, Luong, Jurafsky et al. (2015); Socher,Perelygin, Wu et al. (2013)] and convolutional neural networks (CNN) [Kim (2014);Kalchbrenner, Grefenstette and Blunsom (2014); Zeng, Liu, Lai et al. (2014); Johnson and Zhang (2015); Yin and Schütze (2016); Wang, Xu, Xu et al. (2015); Soujanya, Erik and Alexander (2016)] to address the data sparseness in sentiment classification and get a better performance. Compared with RNNs, CNNs have attracted more attention because it can capture better semantics and it is easier to train with fewer tags, fewer connections and fewer parameters.

Recently, CNNs have shown to be effective in capturing syntax and semantics of words in sentences. CNNs [Kim (2014); Zeng, Liu, Lai et al. (2014); Wang, Xu, Xu et al.(2015); Hu, Lu, Li et al. (2014)] usually take a max pooling mechanism to capture the most useful feature of a sentence. Dynamic CNNs [Kalchbrenner, Grefenstette and Blunsom (2014); Yin and Schütze (2016)] use a dynamick-max pooling operation for semantic modeling of sentences, and it can extract thekmost useful features for sentiment classification. However, in practice, people are accustomed to express both positive and negative opinions connected by transitional words [Tang, Qin and Liu(2016); Vasileios and Kathleen (1997)]. In fact, statistically the number of the transition sentences takes a large proportion of about 40% in several review benchmark datasets.Therefore, the classification of the transition sentences has a great impact on the overall classification accuracy.

Most existing convolution neural networks adopt max pooling ork-max pooling to deal with transition sentences. However, this makes it difficult to capture both of the positive and negative features. For example, “beautifully filmed, talented actor and acted well, but admittedly problematic in its narrative specifics.” Max pooling based CNN models only extract one feature “well” in affirmative acting, while omitting the feature “problematic”on the script which determines the sentiment orientation of this sentence. In contrast,kmax pooling based CNN models based onk-max pooling can extract three aspects features “well”, “talented” and “beautifully”. However, the three features extracted are the positive aspect for the filming and the performance of the actor, while the negative information on screenplay is absent.

In this paper, a piecewise pooling technology is introduced into CNN, and it forms our Piecewise Pooling Convolutional Neural Network (PPCNN). More specifically, with a transition word database, feature mapping vector is segmented, and then the most significant feature of each local segment is extracted using the max pooling mechanism.This not only extracts local significant features with different sentiment polarities, but also preserves the relative word sequence of these features.

Our contributions of this paper are as follows:

1. The text is represented with word embedding as the input of CNN, which does not require a complicated NLP preprocessing.

2. A piecewise pooling mechanism in CNN is proposed for sentiment classification on transition sentences, which can get multiple features with different sentiment polarities,and can also maintain the relative sequence of words in a sentence.

The remainder of this paper is organized as follows. Section 2 briefly reviews related work about RNN and CNN. Section 3 gives the details of our proposed PPCNN method.Section 4 shows the effectiveness of our proposed method experimentally. Section 5 summarizes the paper.

2 Related work

Deep learning models have been successfully applied in the fields of computer vision[Krizhevsky, Sutskever and Hinton (2012)] and speech recognition [Graves, Mohamed and Hinton (2013); Kim, Hori and Watanabe (2017)]. In the field of sentiment analysis,researchers have adopted deep learning models to learn better feature representations.These models fall into two categories: Sequence-based recursive neural network model[Socher, Pennington, Huang et al. (2011); Sutskever, Vinyals and Le (2014); McCann,Bradbury, Xiong et al. (2017); Li, Luong, Jurafsky et al. (2015)] and convolutional neural network model [Kalchbrenner, Grefenstette and Blunsom (2014); Zeng, Liu, Lai et al.(2014); Johnson and Zhang (2015); Yin and Schütze (2016); Wang, Xu, Xu et al. (2015);Soujanya, Erik and Alexander (2016)].

2.1 Recursive neural network model

Based on RNN model, Richard et al. [Socher, Pennington, Huang et al. (2011)] and Sutskever et al. [Sutskever, Vinyals and Le (2014)] proposed a semi-supervised recursive automatic encoder and a recursive neural tensor network to analyze the sentiment of sentences separately. Ramy et al. [Ramy, Hazem, Nizar et al. (2017)] created an Arabic Sentiment Treebank (ARSENTB) to explore different morphological and orthographical features at multiple levels of abstract. Kai et al. [Kai, Socher and Christopher (2015)]combined the LSTM networks with strong retention capabilities for time-series information to construct a tree-structure LSTM network model. This model outperformed other LSTM baselines on predicting the semantic relevancy between two different sentences and sentiment classification. Generally, RNNs require a lot of manual tagged words, phrases and sentences.

2.2 Convolutional neural network model

Compared with RNN, CNN is easy to be trained and requires fewer parameters and sentence-level tags. The standard CNN usually consists of an input layer, a convolution layer, a pooling layer and an output layer. In the input layer, each word is represented by a real-valued vector. Convolution layer is to learn and extract features. Pooling layer is to select features that have the strongest relevance to the task. Output layer is to classify, in which softmax classifier is usually adopted.

Kim [Kim (2014)] proposed a simple and improved CNN, whose input layer toke both task-specific and static word vectors for sentiment analysis and classification.Kalchbrenner et al. [Kalchbrenner, Grefenstette and Blunsom (2014)] introduced a dynamic convolutional neural network (DCNN), and a dynamick-Max pooling operation was used as a nonlinear sampling function to dynamically adjust the extracted important features (kvalues) to accomplish sentiment classification without the requirement of parsers and other external features. Zeng et al. [Zeng, Liu, Lai et al.(2014)] established a deep convolutional neural network (DNN), which extracted the vocabulary and sentence-level features to classify. DNN also introduced the relationship label of the noun pair as a position feature into the network. Yin et al. [Yin and Schütze(2016)] proposed a multi-channel variable-size convolution neural network model(MVCNN) for sentiment classification on sentences and subjectivity classification, where"MV" indicated that texts were initialized with five-word-vector training methods such as word2vec and Glove, and a variable-size convolution filter was applied to extract the features of sentences with various ranges.

3 Our proposed approach PPCNN

Aiming to improve the sentiment classification for a large number of transition sentences,this paper proposes a novel piecewise pooling convolution neural network, namely PPCNN. In this model, firstly, a sentence is represented with word embedding, and convolution operation is applied to obtain a feature mapping vector. Then this vector is segmented according to the positions of transition words in the sentence, so each important local feature is extracted in each fragment to capture the sentiment of the sentence. Finally, the features captured from all segments are used to train a classifier.Fig. 1 shows the architecture of our piecewise pooling neural network for text sentiment classification. Generally, the whole framework includes four parts: Data representation,convolution operation, piecewise pooling, and softmax output. We will describe these components in detail.

3.1 Representation for input data

At present, there are many works about word embedding [Mikolov, Sutskever, Chen et al.(2013); Thang, Richard and Christopher (2013)]. These works point out that word vectors learned in large-scale unsupervised corpus can obtain more sematic information of words.In this paper, Google’s word embedding tool, googlenews-vecctors-negative 300 billion,is adopted to train word vectors based on a corpus containing about 100 billion words.Given the training data

can be represented as a two-dimensional matrix, as shown in Eq. (1).

3.2 Convolution operation

3.3 Piecewise pooling

Traditional convolution neural network [Kim (2014); Wang, Xu, Xu et al. (2015)] takes the max pooling mechanism to map the input sentences of variable lengths into the same dimensional representation. In order to express the meanings of the text better, some

It is necessary to mention that different sizes of convolution kernels indicate the differences of cutting points. In this paper, instead of selecting the most reasonable point,we use different sizes of convolution kernels to capture richer pooing features, which will benefit the classification.

Finally, the feature maps of allonvolution kernels are respectively segmented and pooled to obtain the final output, donated as

In this way, we can extract the most important information in each segment based on the position of the transition word in one sample, and finally the feature vectorscan be obtained. In order to avoid over-fitting and improve the prediction accuracy, the dropout algorithm [Kim (2014)] is applied to randomly set the input data to 0 according to a certain probability, and only the preserved elements are passed through the whole network to train softmax classifier.

In fact, the positions of transition words vary in different sentences, which means that the segment cannot keep balance between two parts of a whole sentence. Our proposed PPCNN method will extract one important feature from each segment regardless of the length of segment. Therefore, the position of transition words will not influence the performance of our proposed approach. Moreover, when there is no transition word, our proposed approach will perform as same as the one proposed in Richard et al. [Socher,Perelygin, Wu et al. (2013)].

4 Experiment results

4.1 Data sets

In this section, we compared our proposed PPCNN with 10 relevant algorithms on 6 benchmark datasets to prove the superiority of our proposed algorithm. The details of these 6 datasets are illustrated as follows, with the statistical summary of shown in Tab. 1.

MR:In this data set of movie reviews, there is one sentence for each review.Classification process involves detecting positive/negative reviews. There are 10662 reviews totally, in which positive/negative emotions own an equal weight. There are 4647 transition samples in this dataset, with the average length of samples 20. And the dataset is available at: https://www.cs.cornell.edu/people/pabo/movie-review-data/.

Table 1: Details of data sets

SST-1:SST-1 is also called Stanford Sentiment Treebank, and it is an extension of MR but with five labels including very positive, positive, neutral, negative, and very negative.SST-1 can be obtained from: http://nlp.stanford.edu/sentiment/.

SST-2:SST-2 is the same as SST-1 but with neutral reviews removed and all reviews are converted to binary labels.

Subj:Subj is a dataset, and the task is to classify a sentence as being subjective or objective. There are 10000 samples in this dataset, including 4411 transition samples, and the average length of samples is 23. More details can refer to Wang et al. [Wang and Christopher (2012)].

CR:CR is a dataset about customer reviews of various products, including cameras,MP3s, etc., and the task is to predict positive/negative reviews. There are 1489 transition samples in this dataset. For more details, please refer to Wang et al. [Wang and Christopher (2012)].

MPQA:MPQA is a dataset for opinion polarity detection, and there are 10606 samples including 461 transition samples. For more details, please refer to http://www.cs.pitt.edu/mpqa/.

4.2 Baselines and parameters

To demonstrate the effectiveness of our proposed model, four categories methods,including 10 algorithms are used as the baselines, whose details are as follows.

1) Traditional classifiers with Bag of words:NB and SVM are the traditional classifiers using the bag of words method.

2) Traditional classifiers wit unigram and bigram:BiNB, NBSVM and MNB also use the traditional classifiers NB and SVM. Especially, BiNB trains NB classifier with unigram and bigram features, and NBSVM and MNB train Naive Bayes SVM and Multinomial Naive Bayes with uni-bigrams [Wang and Christopher (2012)].

3) RNNs:RAE, MV-RNN and RNTN are models based on the RNN and use a fullylabeled parser to parse the vector representation of tree learning phrases and complete sentences. RAE [Socher, Pennington, Huang et al. (2011)] adopts Recursive Auto Encoders with pre-trained word vectors from Wikipedia. MV-RNN [Richard, Brody,Christopher et al. (2012)] is a Matrix-Vector Recursive Neural Network with parse trees.In contrast, RNTN [Socher, Perelygin, Wu et al. (2013)] adopts a Recursive Neural Tensor Network with tensor-based feature functions and parse trees.

4) CNNs:Both DCNN and CNN are based on the CNN model. CNN [Kim (2014)] is a Convolutional Neural Network with max pooling, while DCNN [Kalchbrenner,Grefenstette and Blunsom (2014)] is a Dynamic Convolutional Neural Network withkmax pooling.

Since the advantages of multiple sizes of convolution kernels have been demonstrated in existing works, we adopt three sizes of filter windows of 3×|V|, 4×|V|, and 5×|V|(|V|=300), and each size includes 100 kernels as the settings in these works [Kim (2014);Kalchbrenner, Grefenstette and Blunsom (2014)]. Meanwhile, other parameters are kept the same as those in Kim et al. [Kim (2014)], such as: A dropout rate is set to 0.5, an L2 constraint is set to 3, and a mini-batch size is set to 50. The classification accuracy averaged over 10 cross-validations will be reported in the following subsection.

The word corpus trained on the news corpus of Google is utilized to initialize the experimental data in this paper.

We incorporate Smart Words (http://www.smart-words.org/linking-words/transitionwords.html) and MSU (https://msu.edu/user/jdowell/135/transw.html) to get a transition word corpus, which includes 179 transition words in total. This transition word corpus is utilized to locate probable transitions in each sentence.

4.3 Classification performance

We compare our proposed PPCNN with the baselines, and the classification accuracies of all methods are shown in Tab. 2. We have notice that there are some missing values in Tab. 2. On the one hand, for Subj and CR data sets, the accuracies of RNN models are not included because the data sets have not the phrase tag information. On the other hand,as for other missing values in Tab. 2, the results are not included because these data sets cannot run in the open source code.

Compared with traditional methods (such as NB, SVM, etc.), the classification performance based on neural networks (including RAE, MV-RNN, RNTN, DCNN, CNN,and PPCNN) have an improvement by a range [3.4%, 6.7%]. It indicates that neural network models can obtain more valuable context information, relieve the data sparseness and explore the semantic information of texts more effectively.

Table 2: Classification accuracy of all algorithms (%)

Compared with the RNNs (including RAE, MV-RNN, and RNTN), convolution-based models (including CNN, DCNN and PPCNN) are more suitable for representing the semantic of texts. And classification accuracies of CNNs are improved by [2.8%, 4.2%]on 6 datasets. This is due to the fact that CNNs can select richer and more important features in pooling layer and capture the contextual information in convolution layer. In contrast, RNNs can only capture contextual information using semantic combinations of constructed text trees, which heavily depends on the performance of tree construction. In addition, RNNs cost O(n2) time to represent the sentence, whereas CNNs only cost O(n),wherenmeans the length of the text.

Compared with other CNNS, our proposed CNN based PPCNN improves the classification accuracies by [0.6%, 1.6%] on MR, SST-1, SST-2 and CR datasets,because both positive and negative sentiments exist in these four kinds of comment texts,which are interrupted by transition words. Traditional CNN and DCNN ignore local important features, which may lead to an incorrect final sentiment label. However, our PPCNN algorithm extracts multiple features with different sentiment from multiple segments, and all these features are useful in sentiment classification.

As for MPQA and Subj datasets, our proposed PPCNN method is superior to RAE and traditional bag-of-words based methods on accuracy performance, but it is not prior to other baselines. This is due to the fact that MPQA dataset is very short (an average length of sentence is 3), and the advantage of dividing the sentence according to transition word cannot be reflected. As a result, our method performs similarly to other CNN methods. In addition, Subj dataset is to determine the text whether subjective evaluation or objective factual summary, so the results have nothing to do with transition words.

4.4 Effectiveness of piecewise pooling

Table 3: Examples of captured features by CNN, DCNN and PPCNN on MR dataset

In this subsection, we validate the effectiveness of piecewise pooling from two aspects:The quality of extracted features and the classification accuracy. Firstly, we examine the completeness of the captured features based on piecewise pooling. Tab. 3 compares the features extracted by CNN, DCNN and PPCNN on MR dataset.

In sentences with the transition word, the sentiment polarity will turn from the positive(negative) to negative (positive). However, CNN can extract only one feature, while DCNN can extractkfeatures according to frequency or position. Taking the first review in Tab. 3 as an example, CNN extracts only one feature “cleverly”, and DCNN extracts two positive features “cleverly”, “clean” and a negative feature “peep”. Both may predict the label incorrectly. And our proposed PPCNN can extract “cleverly” from one segment and “hollow” from another segment according to the transition word, and it has larger probability to predict the label correctly.

It can also be seen that the positions of transition words vary, for example, some are balance (such as example 1, 3 and 6 in Tab. 3) and some are not (such as example 2, 4 and 5 in Tab. 3). It can be concluded that the positions of transition words will not influence the performance of our proposed PPCNN. Especially, when the transition words are not centered, baselines may neglect the smaller part, while our algorithm will not. As a result, our PPCNN will extract richer sentimental features.

Table 4: Comparison results on a Transition subset and a non-Transition subset (%)

In addition, we show the effectiveness of the piecewise pooling in term of the accuracy.With the dataset divided into two subsets according to the fact whether one sample contains transition words, we have a transition subset (i.e. Transi in Tab. 4) and a nontransition subset (i.e. NoTransi in Tab. 4). In Tab. 4, the SST-1 dataset is divided into a transition subset (with the size of 4762) and a non-transition subset (with the size of 7093). In Tab. 4, 41.1% is the accuracy of DCNN trained and tested on the transition subsets using 10-fold cross-validation.

As shown in Tab. 4, the classification accuracies of three methods on Transi subsets are significantly lower than those on NoTransi subsets, which reveals that the classification for transition sentence is challenging. In the three subsets with transition words, our proposed PPCNN performs best with an improvement by [0.6%, 3.1%] compared with DCNN and CNN. It shows that piecewise pooling can capture more representative features for transition sentences. Additionally, when there is no transition word in sentence, our PPCNN will not divide the sentence, therefore, PPCNN performs as same as CNN and DCNN, as shown in Tab. 4 on NoTransi subsets.

5 Conclusion

Transition sentences in the real application make sentiment classification a challenging and attractive task. This paper focuses on the transition sentences and proposes a piecewise pooling convolution neural network (PPCNN). For common texts with transitional semantics, we can capture important local features from multiple segments of sentences. Experimental results show that the proposed model is superior to the current convolutional neural network models on four public customer comment datasets. In the near future, we tend to represent the input data in chunk vector [Yan, Zheng, Zhang et al.(2017)] to address the sentiment classification to improve the efficiency.

Acknowledgement:This work is supported in part by the Natural Science Foundation of China under grants (61503112, 61673152 and 61503116).

Computers Materials&Continua2018年8期

Computers Materials&Continua的其它文章: A Proxy Re-Encryption with Keyword Search Scheme in Cloud Computing; A Highly Effective DPA Attack Method Based on Genetic Algorithm; Embedding Image Through Generated Intermediate Medium Using Deep Convolutional Generative Adversarial Network; Reversible Data Hiding in Classification-Scrambling Encrypted-Image Based on Iterative Recovery; Research on Trust Model in Container-Based Cloud Service; A Novel Universal Steganalysis Algorithm Based on the IQM and the SRM