Automatic Aggregation Enhanced Affinity Propagation Clustering Based on Mutually Exclusive Exemplar Processing

2023-12-12 15:51:00ZhihongOuyangLeiXueFengDingandYongshengDuan

Computers Materials&Continua 2023年10期

Zhihong Ouyang,Lei Xue,Feng Ding and Yongsheng Duan

Electronic Countermeasure Institute,National University of Defense Technology,Hefei,230037,China

ABSTRACT Affinity propagation (AP) is a widely used exemplar-based clustering approach with superior efficiency and clustering quality.Nevertheless,a common issue with AP clustering is the presence of excessive exemplars,which limits its ability to perform effective aggregation.This research aims to enable AP to automatically aggregate to produce fewer and more compact clusters,without changing the similarity matrix or customizing preference parameters,as done in existing enhanced approaches.An automatic aggregation enhanced affinity propagation(AAEAP)clustering algorithm is proposed,which combines a dependable partitioning clustering approach with AP to achieve this purpose.The partitioning clustering approach generates an additional set of findings with an equivalent number of clusters whenever the clustering stabilizes and the exemplars emerge.Based on these findings,mutually exclusive exemplar detection was conducted on the current AP exemplars,and a pair of unsuitable exemplars for coexistence is recommended.The recommendation is then mapped as a novel constraint,designated mutual exclusion and aggregation.To address this limitation,a modified AP clustering model is derived and the clustering is restarted,which can result in exemplar number reduction,exemplar selection adjustment,and other data point redistribution.The clustering is ultimately completed and a smaller number of clusters are obtained by repeatedly performing automatic detection and clustering until no mutually exclusive exemplars are detected.Some standard classification data sets are adopted for experiments on AAEAP and other clustering algorithms for comparison,and many internal and external clustering evaluation indexes are used to measure the clustering performance.The findings demonstrate that the AAEAP clustering algorithm demonstrates a substantial automatic aggregation impact while maintaining good clustering quality.

KEYWORDS Clustering;affinity propagation;automatic aggregation enhanced;mutually exclusive exemplars;constraint

1 Introduction

In the field of unsupervised learning,cluster analysis is a crucial technology.It aims at separating elements into distinct categories based on certain similarity assessment criteria and result evaluation indexes.Elements with high similarity are commonly classified into the same cluster,whereas elements in different clusters have lower similarity.With the fast development of information technology,clustering is required to address realistic challenges in numerous technical fields,including image segmentation,text mining,network analysis,target recognition,trajectory analysis,and gene analysis,which in turn enhance the continuous development of clustering approaches[1–4].

The existing clustering approaches can be generally divided into subsequent categories.The partitioning-based approach usually assumes that the data set must be classified intoKclusters,and iteratively searches for the optimal centers and data partitioning through certain criteria.The benefits are comprehensible,but usually,there is a dearth of prior knowledge about the number of clusters.The representative approaches are K-means[5],K-medoids[6],and Fuzzy C-means(FCM)[7].Hierarchical clustering approaches mainly combine data or split clusters according to certain criteria and ultimately represent the findings in a tree structure.Balanced iterative reducing and clustering using hierarchies [8] and clustering using representatives [9] are the representatives.Their most visible advantage is lucid logic,but the drawbacks are that the clustering process is irreversible and requires vast calculations.Density-based clustering approaches mainly consider the density of each data point within a certain range.Data points with superior density are commonly identified as centers,whereas those with very inferior density are considered outliers.These approaches can better adapt to clusters with disparate shapes and do not need to specify the number of clusters in advance.However,their clustering findings are sensitive to parameters,including distance range and density threshold.The representatives are density-based spatial clustering of applications with noise [10],ordering points to identify the clustering structure[11],and clustering by quick search and finding of density peaks(DP)[12].The basic idea of grid-based clustering approaches is to divide the data space into a grid structure of numerous cells and achieve classification by processing cells[13].The benefits are fast calculation and good data adaptability.The drawbacks are that the data structure is ignored and the clustering findings are also influenced by the cell shape and size.Statistical information grid[14]and clustering in quest[15]are well-known grid-based clustering approaches.In addition to the four types of clustering approaches,some approaches based on specific mathematical models have also been widely used,including Gaussian Mixture Model-based clustering [16],spectral clustering(SC)based on graph theory[17],and affinity propagation(AP)clustering based on message passing mechanism[18].

The AP clustering in which we are interested comes from the idea of belief propagation [19–21],and it is currently an extremely popular and potent exemplar-based clustering approach.To maximize total similarity,AP locates a set of exemplars and establishes corresponding relationships between exemplars and other data points.The message-passing mechanism is adopted to address the optimization challenge,which has been expressed in a more intuitive binary graphical model[22,23].First,AP considers all data points as potential exemplars.The messages designated responsibility and availability are then transmitted back and forth between the data under the objective function and limitations.Finally,the exemplars are chosen and each data point finds its most suitable exemplar.AP offers several advantages,including simple initialization,the absence of a requirement to specify the cluster number,superior clustering quality,and high computational efficiency.AP has been widely employed in manifold aspects such as face recognition [24,25],document clustering [26,27],neural network classifier [28],image analysis [29,30],grid system data clustering [31,32],small cell networks working analysis[33],manufacturing process analysis[34],bearing fault diagnosis[35–37],K-Nearest Neighbor (KNN) positioning [38],psychological research [39],radio environment map analysis[40],building evaluation[41],building materials analysis[42],interference management[43],genome sequences analysis[44],map generalization[45],signal recognizing[46],vehicle counting[47],indoor positioning [48],android malware analysis [49],marine water quality monitoring [50],and groundwater management[51].

Classic AP also possesses some drawbacks including the sensitivity of preference selection.Generally,the median of all data similarity values is considered the preference value.Increasing this value leads to more exemplars,which represent more clusters.However,reducing the value leads to a decrease in the number of clusters.Furthermore,AP has superior performance on data sets with a regular distribution such as a spherical shape,but it is challenging to achieve good results in the tasks on data sets with a nonspherical distribution.Therefore,numerous studies have also concentrated on enhancing the AP clustering theory.TheK-AP algorithm [52] has customizability in terms of cluster number.It introduces a constraint ofKexemplars to make the clustering ultimately converge toKclusters.The rapid affinity propagation algorithm[53]enhances the clustering speed and quality,which separates the clustering process into coarsening phase,exemplar-clustering phase,and refining phase.The multi-exemplar affinity propagation algorithm [54] expands the single-exemplar model to a multi-exemplar one.It proposes the concept of super exemplar and addresses the multisubclass challenge.The stability-based affinity propagation algorithm[55]concentrates on addressing the preference selection challenge.It offers a new clustering stability measure and automatically sets preference values,which can generate stable clustering results.The soft-constraint semi-supervised affinity propagation algorithm [56] adds supervision based on AP clustering and implements soft constraints,which can generate more accurate results.Another rapid affinity propagation algorithm[57] enhances the efficiency by compressing the similarity matrix.The adjustable preference affinity propagation algorithm[58]mainly concentrates on preference selection and parameter sensitivity of AP.The message-passing model is derived under additional preference-adjusting constraints,and it results in automatic preference adjustment and better clustering performance.Through densityadaptive preference estimation,an adaptive density distribution-inspired AP clustering algorithm[59]addresses the challenge of preference selection.Additionally,to address the nonspherical cluster problem,the algorithm uses a similarity measurement strategy based on the nearest neighbor search to describe the data set structure.The adaptive spectral affinity propagation algorithm[60]discusses why AP is unsuitable for nonspherical clusters and proposes a model selection procedure that can adaptively determine the number of clusters.

It is clear from the foregoing theoretical development of AP that the enhanced algorithms mainly concentrate on similarity matrix construction,preference selection,and application of nonspherical data clustering.However,there are relatively few studies on the aggregation ability of AP clustering.Aggregation is the important goal of clustering,and it is patently crucial.The enhancement of aggregation ability represents a reduction in cluster number,which is anticipated for numerous application scenarios.People prefer to obtain results such as sunrise and sunset rather than sunrise with fishing boats,sunrise with more clouds,sunset with faster waves,and peaceful sunset,as mentioned in the example of multi-subclass image clustering in[54].Multiple clusters are undoubtedly significant since they offer more detailed classification and richer cluster information.We just want to study that reducing the cluster number is equally valuable as it can offer more general and comprehensive information.

However,numerous investigations have demonstrated that AP clustering cannot independently converge to a relatively small cluster size.As mentioned above,reducing preference values will lead to fewer clusters,although there is no explicit analytical correlation between the preference value and cluster number.By revising preferences and similarity matrices based on analyzing the data set structure or data point density,some enhanced algorithms can also reduce the number of clusters to some extent.However,we should recognize that the revision has certain subjectivity,as it indicates that we already hope some specific data points will become the final exemplars.Additionally,we cannot guarantee the accuracy of the structure analysis.The difficult tasks are how to set preference values for these exemplars and how to define the function to compute the similarity between exemplars and other data points,assuming that the analysis is accurate enough to exclude potential exemplars.

We concentrate on making the AP cluster show stronger aggregation while ensuring clustering quality.It can be imagined that the aggregation involves merging clusters,but classic AP clustering does not know which clusters can be merged;therefore,it needs a reliable information source to tell it.We anticipate that the information that can enhance aggregation is automatically generated and objective,without human involvement.An automatic aggregation enhanced affinity propagation(AAEAP)clustering algorithm based on mutually exclusive exemplar processing is proposed based on the foregoing considerations.The general idea of the AAEAP is as follows.

First,we select a dependable partitioning clustering approach such as FCM clustering,and allow it to generateMclusters when the AP clustering stabilizes and converges toMclusters.Fig.1 shows various possible differences between AP and FCM clustering findings when the number of clusters is the same.In Fig.1a,both AP and FCM generate two clusters,but the exemplars are disparate,leading to substantial differences in the classification of other data points.The exemplars of AP are just ordinary points in the two clusters of FCM.Fortunately,neither cluster of FCM contains both exemplars of AP.In Fig.1b,the classification findings of AP and FCM are very similar and only a few data points demonstrate differences in selecting the cluster.These situations are understandable and tolerable.However,in Fig.1c,there are substantial differences in the clustering findings between AP and FCM.Particularly,the blue cluster generated by FCM contains the blue and green exemplars of AP,and we consider it intolerable.Thus,these AP exemplars contained in the same FCM cluster are determined to be mutually exclusive.It is essential to modify the number of AP clusters,and these mutually exclusive exemplars should not exist as exemplars at the same time.

Figure 1:Possible changes in the findings of various clustering approaches.(a)Changes in exemplars.(b)Changes in data assignment.(c)Two exemplars classified in one cluster

By adding a novel constraint and altering the message iteration model,we addressed the foregoing challenge.Then,the adjusted clustering converges stably again based on the new model.Until there are no more conflicting situations,mutual exclusion exemplar detection and clustering of the entire clustering process must be repeated.Some standard classification data sets were adopted to examine and validate the proposed AAEAP algorithm,and six clustering assessment indexes were employed to compare the quality of the result between AAEAP and the other eight clustering algorithms.

The key contributions of the proposed work are as follows:

? We propose a method that can improve the aggregation ability of AP clustering.The core is to employ a partitioning clustering algorithm to detect mutually exclusive exemplars and then reliably guide AP to combine clusters.

? The detection information output by the partitioning clustering approach is mapped as a mutual exclusion and aggregation clustering constraint,and the new message iteration model is derived in detail.

? The overall aggregation improved clustering process is automated and does not need manual intervention,nor does it involve potential exemplar selection and preference revision,which enhances the algorithm’s applicability.

The remainder of this study is organized as follows.Section 2 provides a brief review of AP.Section 3 introduces the proposed AAEAP algorithm.Section 4 presents the experimental findings on standard classification data sets.Section 5 concludes the study work.

2 Affinity Propagation

AP clustering is an exemplar-based clustering algorithm that simultaneously considers all data points as potential exemplars and exchanges messages between them until a high-quality exemplar set and corresponding cluster emerge.AP was originally derived as an instance of the max-product algorithm in a loopy factor graph [19].A simplified max-sum (log-domain max-product) message update form was obtained by reducing then-ary messages to binary messages[18,22,23],making the message iteration process of the AP clustering clearer and easier to expand.

Given a data setX={x1,x2,···,xN},a set of exemplars and their clusters are generated through AP clustering.The results are expressed using a binary matrixwherecijrepresents a binary variable,cij=1,if data pointiselects data pointjas an exemplar,otherwisecij=0.If data pointirepresents an exemplar,thencii=1,otherwisecii=0.

The complete process of AP clustering is as follows: First,a data similarity measurement function is defined to compute the similaritys(i,j) between data pointsxiandxj,which can be a negative Euclidean distanceor user-defined,forming a similarity matrixS={s(i,j)}N×N i,j∈{1,···,N}.The diagonal elements of the similarity matrixSrepresent preference parameters,denoted aspk=s(k,k),k∈{1,···,N}.Based on theSfunction,Sijis defined to denote the similarity between data points and their exemplars.

Meanwhile,we provide two basic constraints,IandE.We termIas the 1-of-N constraint,which indicates that each data point can only be assigned to one exemplar.Ican be naturally defined as follows:

Erepresents the exemplar consistency constraint,which indicates that once a data point is selected as an exemplar by another point,it must select itself as an exemplar.Eis defined as

As previously mentioned,the max-sum is the representation of thelog-domain max-product,and the 1-N constraint and the exemplar consistency constraint in the max-product model are changed as

The goal of AP clustering is to maximizeFby finding a set of exemplars and corresponding data partitions.The process of finding high-quality exemplars is accomplished through the recursive transmission of messages.Fig.2 shows the transmission mechanisms for the two types of messages,r(i,j)anda(i,j).r(i,j)is referred to as responsibility,which is a message sent by data pointito candidate exemplarj,reflecting the accumulated evidence for how well-suited data pointjis to serve as the exemplar fori.a(i,j)is referred to as availability,which represents a message sent by candidate exemplarjto data pointi,reflecting the accumulated evidence for how suitable it would be for data pointito selectjas its exemplar.In other words,r(i,j) shows how strongly a data point favors one candidate exemplar over other candidates,anda(i,j)shows to what degree each candidate exemplar is available as a cluster center for one data point.

Figure 2:Sending responsibility and availability messages

The messages are initialized as 0 and updated as follows.

It is necessary to increase the damping factorλ,with a value range of[0,1],to prevent oscillation during the message update process.By integrating the messagertandatof thetiteration with the messagert′,at′based onrtandat,the messagert+1andat+1of thet+1 iteration is obtained.

Responsibility and availability update until convergence.AP ultimately outputs an assignment vector c=[c1,···,cN]andci=argmaxj[r(i,j)+a(i,j)].

3 Automatic Aggregation Enhanced Affinity Propagation

We propose an AAEAP clustering algorithm to overcome the difficulty of AP convergence to a small exemplar size.First,the entire framework of the algorithm is introduced.Then,the basic model is offered,including constraints,objective function,and factor graph.The message iteration is finally derived.

3.1 Overall Framework

The distinctive feature of AAEAP is that it uses a dependable partitioning clustering approach to detect whether there is a mutual exclusion situation in the exemplars produced by AP clustering.If there is,by adding a new constraint,the two mutually exclusive exemplars will not become exemplars and the number of clusters decreases.This will lead to the merging of clusters and present an aggregative state.The algorithm converges to a smaller exemplar size when there are no longer mutually exclusive exemplars.The basic framework of AAEAP is as follows.

The input of the algorithm is an unclassified data setX={x1,···,xN},and each dataxiis a one-dimensional sequence havingdfeatures.Based on the negative Euclidean distance given by the classic AP,the similarity between two sequences is calculated,and ultimately the similarity matrixSis obtained.The computation approach of the similarity matrixScan be enhanced based on[53,59,60]to make AP more adaptable for data sets with nonspherical structures,although this is not the concentration of this research.The required parameters for AP clustering are initialized,including the maximum iteration numberNImax,stable convergence numberNIcvg,damping factorλ,and similarity matrixS.It is also required to clarify constraintsIandE,which determine the way messages are iterated.

Then,the algorithm enters the message iteration.The algorithm will complete the first convergence based on constraintsIandE,and generate a set ofXEscontainingMexemplars due to the null initialization of the mutually exclusive exemplarsMEs.A dependable partitioning clustering approach including K-means,K-medoids,or FCM is used to detect mutually exclusive exemplars inXEs.Particularly,by employing the partitioning clustering approach to generateMclusters simultaneously,each cluster is checked to determine whether there are two or more AP exemplars.If so,the included exemplars are considered mutually exclusive.There may be situations where numerous pairs of mutually exclusive exemplars are detected.Only one pair is randomly selected for each iteration to simplify the computation.The foregoing process of partitioning clustering and detecting is processed through the functionIdentifyExclusion,which assigns a pair of mutually exclusive exemplars toMEs.

We must define a new constraintDbased onMEs,which we call the mutual exclusion and aggregation constraint because of the existence of mutually exclusive exemplars.Its primary capability is to prevent mutually exclusive exemplars from becoming exemplars simultaneously again and reduce the overall number of exemplars by 1.A detailed description will be offered in the next section.The algorithm begins the second message iteration under the constraints ofI,E,andD.The expected aggregation impact will occur when stable convergence is achieved.Afterward,the previous steps will be repeated,including detecting the existence of mutually exclusive exemplars,updating constraint conditionsD,and restarting message iteration.The algorithm ends until there are no mutually exclusive exemplars left.The output of the AAEAP is the assignment vector c=[c1,···,cN],which is the same as the classic AP,andci=argmaxj[r(i,j)+a(i,j)].Finally,the clustering results should be examined to confirm the effectiveness of the algorithm.

3.2 Basic Model

Fig.3 shows the factor graph of the AAEAP algorithm.The similarity functionSand the three constraintsI,E,andDtogether influence the variable nodes.The similarity functionSonly influences each variable node in the graph separately,constraintIinfluences the rows in the graph,constraintEinfluences the columns,and constraintDaffects the diagonal.

Figure 3:Factor graph of the AAEAP

We present the model of the AAEAP algorithm in the max-product form.First,the constraints of AAEAP are given in detail.The 1-N constraintIand the exemplar consistency constraintEare the same as those of the classic AP.They are defined as

The following concentrates on describing the newly added mutual exclusion and aggregation constraintD.ConstraintDhas two functions,one of which is to prevent mutually exclusive exemplars from becoming exemplars the next time,that is,they cannot coexist.The next clustering exemplar setXEswill either only have the exemplarp,only exemplarq,or neither exemplarspnorq,assuming thatpandqare mutually exclusive exemplars detected and recommended by partitioning clustering.The second function is to cause clustering to aggregation.Exemplarspandqserve as the core and representative of their clusters,reflecting the basic characteristics of the clusters.Crucially,it shows that there are substantial differences in the clusters generated aroundpandq.Therefore,mutual exclusion can be seen as a problem to the AP clustering findings,showing that some data points of the clusters represented bypandqcan be combined,while the remaining data points may select other exemplars,leading to a decrease in 1 in the exemplar number.

According to the above considerations,constraintDis defined as

Mdenotes the exemplar number generated by the previous iteration,pandqare two exemplars that satisfycpp=1 andcqq=1 in the above equation.However,cppandcqqcannot both be 1,and the exemplar number for the next iteration is constrained toM-1 when they are identified as mutually exclusive exemplars.ConstraintDis dynamically changing,as there are three variablesp,q,andM,which need to be defined before each iteration to determine the current messaging model.

Then,the max-product objective function defined according to constraintsI,E,andDis

3.3 Message Iteration

Fig.4 illustrates the message iteration of AAEAP.As shown in Fig.4a,there are eight types of messages related to diagonal nodes.As illustrated in Fig.4b,there are six types of messages related to other nodes.

Figure 4:Messages of the AAEAP.(a)Messages associated with cii.(b)Messages associated with cij

Based on the max-product algorithm in the factor graph described in [23,52],the message representation from variable nodexito function nodef mis

whereNe(xi)fmrepresents the set of functions related to the variablexiexcluding the functionf m,which can be considered as the neighborhood off m.

The message from function nodef mto variable nodexiis expressed as

whereNe(fm)represents the set of all variable nodes related to thef mfunction,whileNe(fm)xidoes not include variablexi.

Based on Eq.(13),the messages sent by the variable nodes to the constraint functions in Fig.4 are

Based on Eq.(14),the messagesθsent by the similarity functionSto the variable nodes are

The messagesηsent to the variable nodes by theIconstraint function are

The messagesαsent by theEconstraint function to the variable node are

Additionally,theδmessages sent by theDconstraint function to the variable node are expressed as

The above messages are binary messages that can be normalized by a scalar ratio[23,52]:βij(1)=βijandβij(0)=1.Then

Similarly,ρij(0)=1ρij(1)=ρijandρij(0)=1,then

Derivingδiimessages is relatively complex,althoughζij(1)=ζijandζij(0)=1,it is crucial to consider both the sub-constraints of mutual exclusion and the reduction of the exemplar number.Supposepandqare two mutually exclusive exemplars,without loss of generality,ifi=pand the exemplar number is reduced toM-1,then

DefineR={1,···,p-1,p+1,···,q-1,q+1,···,N}.In Eq.(21),J1?RhasM-2 elements,K1?RhasN-Melements,and they satisfyJ1∩K1=?,J1∪K1=R,whileJ2?RhasM-1 elements,K2?RhasN-M-1 elements,and they satisfyJ2∩K2=?,J2∪K2=R.maxis the selection ofM-2 nodes with a value of 1 and the remainingN-Mnodes with a value of 0 fromN-2 variable nodes excludingcppandcqqin the factor graph.This selection scheme can maximize the continuous product ofζjj(1)messages andζkk(0)messages.Furthermore,it can be obtained that

whereΦrepresents theζmessage set ofζjj,j∈Rmessages arranged in descending order,Φ1represents the maximum value inΦ,andΦM-1is theM-1 th largest value inΦ.

And ifi/∈{p,q},then

DefineR′={1,···,i-1,i+1,···,N}.In Eq.(23),J3?R′,{p,q}J3hasM-2 elements,K3?R′hasN-Melements and they satisfyJ3∩K3=?,J3∪K3=R′.Meanwhile,J4?R′,{p,q}J4hasM-1 elements,K4?R′hasN-M-1 elements and they satisfyJ4∩K4=?,J4∪K4=R′.So

whereΓdenotes the set ofζmessages arranged in descending order ofζjj,j∈R′messages.Sincepandqare two mutually exclusive exemplars and the min{ζpp,ζqq}element must be eliminated fromΓ,ΓM-1is theM-1 th largest value inΓ.

Through normalization,we have obtainedηii,ηij,αii,αij,δii.Becauseθij=es(i,j)it is easy to obtainβii=δii·θii·αii,βij=θij·αij,the expressions for other messages are

Responsibility messages are expressed asr(i,j)=logρijand availability messages are expressed asa(i,j)=logαij[19–23,52].As a reference,mutual exclusion and aggregation messages are expressed asu(i)=logδii,v(i)=logζii.They are initialized as 0 and updated,respectively,as follows.

whereV′represents the set ofv(j),j∈Rarranged in descending order,andV′(M-1)is theM-1 th largest value inV′.Similarly,V′′represents the set ofv(j),j∈R′arranged in descending order,andV′′(M-1)represents theM-1 th largest value inV′′.

4 Results and Discussion

The experiments were conducted on some standard classification data sets,and the clustering findings were evaluated using internal and external clustering efficiency evaluation indexes to verify the aggregation and accuracy of AAEAP.The clustering algorithms for comparison include classic AP,five enhanced AP algorithms,SC,and DP.

4.1 Experimental Setting

The data sets for our experiments are from the UC Irvine Machine Learning Repository [61],Knowledge Extraction based on Evolutionary Learning(KEEL)[62],and S-sets[63].Table 1 presents the brief information on these data sets.

Table 1:Characteristics of the data sets

We conduct min-max normalization on each column to ensure that the impact of each attribute on the clustering process and results are balanced before clustering.

Besides classic AP,SC,and DP,there are also five enhanced AP algorithms for comparison,includingK-AP,AP clustering based on cosine similarity (CSAP) [48],adjusted preference AP clustering based on twice the median (TMPAP) [46],adjusted preference AP clustering based on quantile (QPAP) [50],and adaptive density distribution inspired affinity propagation clustering(ADDAP) [59].K-AP concentrates on customizing the number of clusters,CSAP concentrates on modifying the similarity matrixS,TMPAP and QPAP concentrate on modifying the preferences,while ADDAP measures the similarities based on nearest neighbor searching and modifies the preferences based on density.

Both the clustering algorithms for comparison and the proposed AAEAP algorithm have been edited and implemented in Matlab R2016b.The pertinent settings are as follows:the similarity between data points is measured by Euclidean distance except for CSAP and ADDAP,the preferences of AAEAP,AP,and CSAP are established as the median of the total inter-point similarities,the damping factors of AAEAP,AP,and five enhanced AP algorithms are 0.9,theMparameter of QPAP is considered the value corresponding to the 5th quantile to reduce the number of exemplars,and the partitioning clustering approach necessary for AAEAP to detect mutually exclusive exemplars is FCM clustering.All experiments were conducted on Windows 7,with Intel?Core?i7-9700,memory size 16 GB.

4.2 Clustering Evaluations

An integral component of the clustering process is the validation of the clustering findings,and numerous indexes have been proposed to quantitatively evaluate the performance of clustering algorithms.Effectiveness evaluation indexes can be widely classified into two categories: internal and external evaluation indexes.The internal evaluation indexes primarily examine and assess the clustering results from aspects,including compactness,separation,and overlap,based on the structural information of the data set.The external evaluation indexes are mainly based on available prior information from the data set,including the cluster labels of all data points.The performance is evaluated by comparing the degree of correspondence between clustering results and external information.

We use six extensive evaluation indexes,including Silhouette Coefficient (Sil) [64],In-Group Proportion(IGP)[65],Rand Index(RI)[66],Adjusted Rand Index(ARI)[66],F-measure(FM)[67],and Normalized Mutual Information (NMI) [54] to analyze the clustering results of the proposed AAEAP algorithm.Internal evaluation indexes are Sil and IGP,whereas external evaluation indexes are RI,ARI,FM,and NMI.

Sil is an evaluation index based on compactness within a cluster and separation between clusters.For the data pointi,compute the average distance from it to other data points within the cluster,denoted asa(i).Compute the average distance from it to each other cluster,and use the minimum value denoted asb(i).Then,its silhouette coefficient can be expressed as

The value range ofs(i)is[-1,1].Whens(i)approaches 1,it shows that the data point has a high degree of correspondence with the assigned cluster and is distant from other clusters.Furthermore,whens(i)approaches-1,it indicates that the data point is assigned to the wrong cluster.Traditionally,the Sil index of the overall clustering results is the average ofs(i)for all data points.

IGP is defined as the proportion of each data point and its nearest neighboring point belonging to the same cluster.

whereXis the data set,urepresents one cluster,jrepresents a data point inu,j1NNis the nearest neighbor point fromj,ClassX(j)=ClassX(j1NN)=udenotesjandj1NNbelongs to the same cluster,and#denotes the number of data points that meet the above conditions.ComputeIGP(u,X)for allncclusters,with a larger meanIGP_Mshowing superior clustering quality.

RI is an external evaluation index that necessitates real classification informationC.AssumingKis the clustering results,adenotes the number of data pairs in the same cluster in bothCandK,andbdenotes the number of data pairs that are not in the same cluster whether inCorK,then RI is expressed as

wherenrepresents the number of data points,denotes the number of data pairs that can be generated in the data set.The range of RI values is[0,1].A larger value indicates that the clustering results are more consistent with the real classification.

ARI is an enhancement of RI,which examines clustering by computing the number of data pairs assigned to the same or different clusters in real labels and clustering findings.Compared to RI,ARI has higher discrimination.ARI is expressed as

whereadenotes the number of data pairs that belong to the same cluster in both real labels and clustering findings,bdenotes the number of data pairs that belong to the same cluster in real labels but do not in clustering findings,cdenotes the number of data pairs that do not belong to the same cluster in real labels but belong to the same cluster in clustering findings,andddenotes the number of data pairs that are not in the same cluster,whether in real labels or clustering results.The range of ARI values is[-1,1],the larger the value,the better the clustering impact,and ARI equals 1,which signifies that clustering findings are completely consistent with real labels.

FM integratesprecisionandrecallsto examine the clustering impact,which is expressed as

whereprecision=,recall=.nkdenotes the number of data points in thekcluster of clustering findings,andnmdenotes the number of data points in themcluster of real classification anddenotes the number of data points shared by thekandmclusters.The larger the FM index value,the better the clustering impact.

NMI assesses the similarity between the clustering results and the real labels from the perspective of information theory.AssumingUdenotes the clustering results containingkclusters,Vdenotes the real labels containingmclusters,andMI(U,V)is the mutual information between the clustering results and the real labels,then NMI is expressed as

whereFis the geometric mean,nis the number of data points,ncis the data point number of theccluster in clustering results,nprepresents the data point number of thepcluster in real labels,andrepresents the number of the intersection of thecandpclusters.

4.3 Experimental Results

To demonstrate the automatic aggregation process of the AAEAP algorithm,we selected Iris and Wine data sets.Since the data dimensions of two data sets are greater than 3,we used the classic t-distributed stochastic neighbor embedding(t-SNE)approach[68]to achieve dimensionality reduction and then visualize the data to show the clustering process and aggregation impact.

Fig.5 demonstrates the findings of the Iris data set.AAEAP converges based on the classic AP when the cluster numberNClassis 11.Then,AAEAP iterates eight times to automatically detect mutually exclusive exemplars and aggregates,stably converging to three clusters,consistent with the real category number of the Iris data set.

Figure 5:Aggregation effect on Iris data set

Fig.6 demonstrates the clustering and aggregation impact on the Wine data set.Based on classic AP,AAEAP converges to 21 clusters and then achieves automatic aggregation through mutually exclusive exemplar detection.Finally,with an equivalent number of real categories,AAEAP stably converges into three clusters.However,from the experiments,we discover that AAEAP cannot converge to three clusters every time,and there are also cases of six clusters.Similarly,there are cases of aggregation into three or four clusters for the Yeast data set.The main reason we examined is that AAEAP uses FCM partitioning clustering to detect mutually exclusive exemplars in the experiments,and the random initialization of FCM clustering causes instability in its classification,directly influencing the detection results of mutually exclusive exemplars;therefore,causing changes in AAEAP clustering results.A simple solution is to combine numerous FCM partitioning clustering results to weaken the randomness effect of FCM,and then conduct mutual exclusion detection and ensure the stability and comprehensiveness of the detection.Additionally,it shows the reliability of mutual exclusion detection,which we emphasized earlier.It is easy to imagine that dependable detection will bring accurate aggregation and better clustering quality.

Figure 6:Aggregation effect on Wine data set

Then,to measure the AAEAP clustering quality,we use clustering effectiveness evaluation indexes.Table 2 demonstrates the experimental findings on the Iris,Wine,Yeast,and S1 data sets.AAEAP can converge to the true cluster number,while AP,CSAP,QPAP,and TMPAP cannot.Compared to them,AAEAP has substantial benefits in terms of aggregation performance.ADDAP aims at obtaining the maximum Sil index value,but it cannot obtain the true numbers of clusters for these four data sets.The evaluation indexes of ADDAP are typically superior to those of AP,CSAP,QPAP,and TMPAP.However,AAEAP has better clustering quality than ADDAP,particularly external evaluation indexes,demonstrating that the AAEAP clustering results are closer to the real categories.The evaluation indexes ofK-AP,SC,and DP are mostly inferior to those of AAEAP for the Iris,Wine,and Yeast data sets,while their indexes for the S1 data set are perfect and slightly superior to those of AAEAP.

Table 3 shows the findings on Segment and Glass data sets,where AAEAP does not converge to the real category numbers and exhibits excessive aggregation.Therefore,we also delineate the clustering evaluation findings whenK-AP,SC,and DP converge to the real category numbers.

Table 3:The clustering results on segment and glass data sets

Although the real category number of the Segment data set is 7,Table 3 shows that whenK-AP converges into 6 categories,the evaluation indexes are significantly superior to those of 7 categories because of fewer incorrect element classifications,and the entire clustering performance of AAEAP is better thanK-AP.However,some evaluation indexes of AP,CSAP,QPAP,and TMPAP are superior to those of AAEAP,but their aggregation performances are poor.The excessive aggregation of ADDAP is more serious,while the evaluation index values are generally good,particularly the internal indexes.The evaluation indexes are mostly inferior to those with seven categories when SC and DP converge into six categories.The performance of AAEAP is superior to that of SC and proximate to DP.The real number of categories is six for the Glass data set,and there is also a situation where the evaluation indexes whenK-AP converges into five categories are typically superior to the evaluation indexes with the real category number.Generally,the performance of AAEAP is better than that ofK-AP.ADDAP obtains fewer clusters,while its evaluation indexes are often inferior to those of AAEAP.Although AP,CSAP,QPAP,and TMPAP cannot converge into the real number of categories,their evaluation indexes are not bad.The evaluation indexes of SC and DP have their respective benefits,and the index values of AAEAP are often between the two.Fig.7 demonstrates the entire performance of AAEAP,with no weaknesses in its evaluation indexes.They are substantially superior to the means of indexes obtained by other algorithms and close to the maximum values.

Figure 7:Overall performances of AAEAP on segment and glass data sets

Table 4 shows the findings on Banana and Phoneme data sets,where AAEAP does not converge to the real category numbers and exhibits a lack of aggregation.Therefore,we also delineate the clustering evaluation findings whenK-AP,SC,and DP converge to the real category numbers.

Table 4:The clustering results on Banana and Phoneme data sets

The deviation between the numbers of clusters obtained by AP,CSAP,QPAP,TMPAP,and the real category numbers is significant,and the problem of insufficient aggregation of these four algorithms is obvious.AAEAP converges to the numbers of clusters different from the real numbers since FCM does not offer more mutually exclusive exemplars.However,AAEAP still shows strong automatic aggregation ability even for large data sets compared with AP,CSAP,QPAP,and TMPAP.The performance of ADDAP is superior to that of most algorithms since the density of data is considered and the optimal classification is obtained through iteration.The density distribution is distinctly nonuniform for Banana and Phoneme data sets,enabling ADDAP to offer full play to the superiority.The index values ofK-AP,SC,and DP are mostly inferior to those of AAEAP,regardless of whether they obtain the real category numbers or the same number of clusters as AAEAP.Fig.8 demonstrates the overall performance of AAEAP.The evaluation indexes of AAEAP are superior to the means of indexes obtained by other algorithms and close to the maximum values.

5 Conclusions

This study proposes an AAEAP clustering algorithm according to mutually exclusive exemplar processing.Its main objective is to enable AP clustering to automatically aggregate,and the information that enhances aggregation comes from real-time partitioning clustering results,rather than prior knowledge or human intervention.This is also the distinction between the AAEAP algorithm and semi-supervised AP clustering algorithms.Potential mutually exclusive exemplar pairs are identified by cross-checking the partitioning and AP clustering findings.Based on them,the current clusters are disassembled and clustering is restarted based on the mutual exclusion and aggregation constraint,achieving aggregation incrementally.From the experimental findings,it can be discerned that the automatic aggregation impact of AAEAP is substantial,and the entire clustering evaluation index values are superior.However,we also discover that the quality of clustering results is related to whether partitioning clustering can offer stable and reliable mutual exclusion detection information.The more accurate the information,the better the AAEAP clustering impact.Future studies will concentrate on enhancing AAEAP to make it more adaptable to cluster on nonspherical data sets.

Acknowledgement:The authors wish to acknowledge the contribution of the Beijing Institute of Tracking and Telemetry Technology,China.The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers.

Funding Statement:This research was supported by Research Team Development Funds of L.Xue and Z.H.Ouyang,Electronic Countermeasure Institute,National University of Defense Technology.

Author Contributions:Study conception and design:Z.H.Ouyang,L.Xue;data collection:Y.S.Duan;analysis and interpretation of results: Z.H.Ouyang,F.Ding;draft manuscript preparation: Z.H.Ouyang,Y.S.Duan.All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials:Data will be made available on request.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua2023年10期

Computers Materials&Continua的其它文章: Micro-Expression Recognition Based on Spatio-Temporal Feature Extraction of Key Regions; Fusion-Based Deep Learning Model for Automated Forest Fire Detection; Efficient Technique for Image Cryptography Using Sudoku Keys; NPBMT:A Novel and Proficient Buffer Management Technique for Internet of Vehicle-Based DTNs; Deep Learning-Based Model for Detection of Brinjal Weed in the Era of Precision Agriculture; A Smart Heart Disease Diagnostic System Using Deep Vanilla LSTM

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放