亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Research on identity recognition of English mail author based on writing style

        2019-10-21 16:16:15徐坤豪
        大東方 2019年9期

        徐坤豪

        Abstract:The content of the email is often very short,but the style of language is obvious.Therefore,we think the ideal in the sample case,part of the text style can be used to identify the author of the text.We use a short word mail in proportion,word species accounted for ratio,the average length of words,the mean and variance of lexical density and the maximum number of single use ratio as characteristic value,principal component analysis of these features,the final extract two principal components,which reflect the word density and vocabulary does not repeat,and then to the two principal components were used as independent variables and the dependent variables,the authors make different scatter diagram,found that these scattered point map has certain rules,can reflect the differences between the various authors,so we use the BP neural network model identification,to extract principal components as input features,with a four bit binary number As the authors number,each author selects a certain number of mail to train.We find that when the learning rate is 0.01 and the hidden layer is 50,the test output is the best,and the correct rate of identification is 87.5%.

        Key words:text feature;principal component analysis;scatter diagram;BP neural network pattern identification ? identification

        I.Problem Analysis and Model Establishment

        1.1 SPSS principal component analysis

        The eigenvalues of the extracted are input into the SPSS,and the principal component analysis is used to reduce the dimension of the feature set.

        It can be seen intuitively that there is a correlation between the variables,but it needs to be tested,and then the output is the correlation test:After the Bartlett sphericity test,the P value <0.001.combines two indexes,which shows the correlation between the variables,and can be analyzed by factor.we can see that the eigenvalues of components 1 and 2 are greater than 1,and they can explain 79.773% variance,which is pretty good.Therefore,we can extract 1 and 2 as principal components,and seize the main contradiction.

        The eight picture the abscissa represents 2 main components,namely “the average sentence length recognition ability of the author”;the ordinate represents the principal component 1,namely “the proportion of total words for identifying the author through different words ability;relationship between each figure represent each author of the two kinds of ability;through SPSS we can see that these two kinds of ability of each author has some relations and differences obviously.Therefore,we can put these two components as input parameters of BP neural network training,and then identify the authors of the text.

        1.2 The solution of neural network

        We have two main components extracted as the input of neural network,as a four bit binary number to express the authors name was S,so the choice of logarithmic function as the transfer function of output neurons.Through repeated testing,to determine the learning rate is 0.01,the maximum number of iterations for 10000 times,the hidden layer 50 layer.

        After executing a large number of neural network algorithms,we found that among the eight selected authors,seven were basically identified.The accuracy rate reached 87.5%.We could think that this model could identify the author of the mail.We chose two distributed scatter diagrams as follows:

        II.Conclusions

        The lexical structure out of the model can reflect the characteristics of different authors in a certain extent,this paper proposes the method of vocabulary and structure established identification based on the identity of the mail author is effective.Through principal component analysis,plot analysis,we conclude that the lexical features we selected can be used to different authors,the recognition rate can reach 87.5%.in the process of training the BP neural network,we found that for the final accuracy of the test result the greatest impact is the number of hidden layers,visible and hidden layers is determined accurately BP neural network training is the key factor,followed by BP network learning rate will affect the learning effect.

        III.References

        [1]RuiHua Qi.Research on the identification of text authors[M].Beijing:Tsinghua University press,2017;

        [2]Shuying Zhang、Ye Zhang.Implementation of pattern recognition and intelligent computing -Matlab Technology[M].Beijing:Electronic Industry Press,2015:138-191;

        [3]G.U.Yule,The statistical study of literary vocabulary, Cambridge University Press,(1944);

        [4]J.Moody and J.Utans, Architecture Selection Strategies for Neural Networks Application to Corporate Bond Rating, Neural Networks in the Capital Markets, (1995);

        (作者單位:山東理工大學(xué))

        亚洲精品夜夜夜| 国产精品三区四区亚洲av| 无码av专区丝袜专区| 日本人妻免费一区二区三区| 香蕉久久一区二区不卡无毒影院| 亚洲日韩一区精品射精| 六月婷婷久香在线视频| 亚洲18色成人网站www| 精品欧美一区二区在线观看| 成人综合亚洲欧美一区h| 国产成人亚洲综合二区| 国产精品毛片极品久久| 少妇被黑人整得嗷嗷叫视频| 亚洲sm另类一区二区三区| 免费无码毛片一区二区三区a片| 国产成人免费高清激情明星| 午夜视频福利一区二区三区| 国产少妇露脸精品自拍网站| 本道天堂成在人线av无码免费| 小宝极品内射国产在线| 国产女在线| 亚洲a人片在线观看网址| 国产三级精品三级在线观看粤语| 国产亚洲午夜精品久久久| 亚洲国产天堂久久综合网| 国产亚洲日韩欧美一区二区三区| 亚洲一区二区在线视频播放| 亚洲一区二区三区一站| 国产在线精品观看一区二区三区| 三级黄色片免费久久久| 乱子伦一区二区三区| 自慰无码一区二区三区 | 亚洲一区二区高清精品| 中文字幕有码久久高清| 国产精品久久成人网站| ā片在线观看免费观看| 国产高清精品自在线看| 夫妻一起自拍内射小视频| 激情文学婷婷六月开心久久| 国产成人精品无码一区二区老年人| a级黑人大硬长爽猛出猛进|