亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Label correlation for partial label learning

2022-11-01 07:59:04GELingchiFANGMinLIHaikunandCHENBo

Journal of Systems Engineering and Electronics 2022年5期

GE Lingchi,FANG Min,LI Haikun,and CHEN Bo

School of Computer Science and Technology,Xidian University,Xi’an 710071,China

Abstract: Partial label learning aims to learn a multi-class classifier,where each training example corresponds to a set of candidate labels among which only one is correct.Most studies in the label space have only focused on the difference between candidate labels and non-candidate labels.So far,however,there has been little discussion about the label correlation in the partial label learning.This paper begins with a research on the label correlation,followed by the establishment of a unified framework that integrates the label correlation,the adaptive graph,and the semantic difference maximization criterion.This work generates fresh insight into the acquisition of the learning information from the label space.Specifically,the label correlation is calculated from the candidate label set and is utilized to obtain the similarity of each pair of instances in the label space.After that,the labeling confidence for each instance is updated by the smoothness assumption that two instances should be similar outputs in the label space if they are close in the feature space.At last,an effective optimization program is utilized to solve the unified framework.Extensive experiments on artificial and realworld data sets indicate the superiority of our proposed method to state-of-art partial label learning methods.

Keywords: pattern recognition,partial label learning,label correlation,disambiguation.

1.Introduction

Although learning from the training examples associated with accurate labels is effective,collecting such labeled data is expensive in many real-world classification tasks.The aim of partial label learning is to learn a multi-class classifier from ambiguously labeled examples which can be easily obtained.Recently,partial label learning has arisen in many real-world applications,such as ecoinformatics [1],natural language processing [2],and automatic image annotation [3,4].

The difficulty of partial label learning is that the ground-truth label hidden in the candidate label set cannot be accessed by the learning algorithms directly.The candidate label disambiguation is a straightforward method to deal with partial label learning.The current disambiguation-based approaches can be divided into two categories: the averaging-based strategy and the identification-based strategy.For the averaging-based strategy,the candidate labels of each training example are treated equally and the predictive model is made by averaging their modeling output [5,6].For the identification-based strategy,the ground-truth label is regarded as a latent variable which can be determined by an iterative refining procedure [7-12].

It is well-known that the label correlation has a pivotal role in multi-label learning [13].So far,little attention has been paid to the role of label correlation in the multi-class classification because of a lack of the label correlation information in the label space.However,partial label learning could be a contributing factor to build a multiclass classifier by the label correlation.Specifically,if a pair of classes is hard to distinguish,they are easy to appear in the candidate label set of the same sample in the partial label learning.In terms of this agreement,label correlation can be built from the label space.After that,label correlation is utilized to disambiguate candidate label sets by the smoothness assumption that two instances should be the similar output in the label space if they are close in the feature space.

In the label space,the global label information can be extracted by label correlation.The label relationship at the instance level can be extracted by the semantic difference maximization criterion [12].At last,to overcome the influence of noise and outliers in the feature space,the adaptive graph [10] can be integrated into the unified framework.Then,a novel approach named label correlation,semantic difference maximization,and adaptive graph for partial label learning (PL-LCSA) is proposed in this paper.

There are two innovations in this paper: (i) To the best of our knowledge,this paper is the first one that label correlation is introduced to deal with partial label learning,and the information from the label space can be further expanded.(ii) Label correlation,semantic difference maximization,and the adaptive graph are integrated into a learning framework,in which the information from the label space and the feature space can be learned simultaneously.Comprehensive experiments show that PLLCSA achieves competitive performance against the stateof-the-art partial label learning approaches in the artificial and real-world data sets.

2.Related work

Formally,let χ= Rdbe thed-dimensional feature space and Y={0,1}lbe the label space withlclass labels.The training set can be denoted by D={(xi,Zi)|1≤i≤m},wherexi∈ χ is ad-dimensional feature vector andZi∈ Y is the candidate label set associated withxi.The ground-truth labelassociated withxiis assumed to reside in the candidate label setZi,i.e.,∈Zi.The aim of the partial label learning is to learn a multi-class classifierf: χ → Y from the training set D.

The ground-truth label which resides in the candidate label set is not directly accessible to the learning algorithm.Disambiguation is a major approach to recover the ground-truth labeling information.There are currently two main disambiguation strategies,i.e.,the averagingbased strategy and the identification-based strategy.

The averaging-based strategy is to treat the candidate labels in an identical manner and predict unseen instances by averaging the output of candidate labels.Thek-nearest neighbor technique for partial label learning was proposed in [5],and the candidate label set of neighbor instances are integrated by weighted voting to make the prediction of unseen instances,i.e.,argwjII(y∈Zj) (Nk(xi) is the set ofk-nearest neighbor forxi).The deficiency of the averaging-based strategy is that the output of the ground-truth label will be overwhelmed by false positive labels.

To overcome the weakness of the averaging-based strategy,the identification-based strategy is to make the ground-truth label a latent variable which can be identified by an iterative process.There are several techniques to iteratively refine the ground-truth labeling confidences,e.g.,the maximum latent semantic differences criterion[14],the maximum-likelihood technique [15],the maximum margin criterion [7,16],the dictionary-based learning criterion [17],the boosting technique [18],the heterogeneous loss with sparse and low-rank regularization[19],artificial neural network technology [4,20].

Recently,the feature-aware disambiguation strategy aims to disambiguate the candidate label set by the information from the feature space.The reconstructing information from thek-nearest neighbor is used to update the labeling confidence matrix iteratively [8].The manifold structure of the feature space is propagated to the label space for disambiguation [9].To overcome the noise and outliers in the feature space,the adaptive graph is utilized to guide disambiguation [11].In the label space,the semantic difference maximization criterion aims to maximize the latent semantic differences of two instances which do not share any common candidate labels [14].

Different from the method of disambiguation,some algorithms work by binary decomposition for partial label learning.The disambiguation-free approach learns the predictive model by the error-correcting output codes(ECOC) technique [21].The One-vs-One decomposition strategy is adapted to solving the partial label learning problem [22].

Most studies in the field of the label space have only focused on the difference between candidate labels and non-candidate labels,and ignored the label correlation.In the next section,we present a novel approach which introduces the label correlation to partial label learning and simultaneously utilizes the information from the feature space and the label space to disambiguate the candidate label set.

3.Approach

In this section,we introduce the proposed approach PLLCSA.Firstly,we present the label correlation and the uniform learning framework.After that,an effective optimization procedure is utilized to deal with this framework.

3.1 Label correlation

The aim of (1) is to normalizepi,and make the sum of labeling confidence vectorpibe 1 for each instance.The aim of (2) is to make sure that the labeling confidence of non-candidate labels must be 0,and the ground-truth label resides in the candidate label set.Once the labeling confidence vectors are normalized,the partial label training set D can be transformed into its disambiguation counterpart D={(xi,pi)|1≤i≤m}.Then the predictive model can be learned by the disambiguation results D.

The label correlation matrix can be denoted asB∈ Rl×l,wherebi,jis the similarity between theith and thejth labels.The label correlation matrix can be calculated by the cosine similarity:

whereqiis theith column of label matrixY=[y1,y2,···,ym]T,i.e.,Y=[q1,q2,···,ql].

To disambiguate the candidate label set by the label correlation,the labeling confidence matrix will be generated by the smoothness assumption that two points should be the similar output in the label space if they are close in the feature space.

According to this assumption,we can calculate the similarity of instances in the label space by

where simLi,jis the similarity between instancexiand instancexjin the label space.The larger the similarity is,the bigger simLi,jis.

Then the similarity of instances in the feature space can be obtained according to

Then we should model the smoothness assumption by minimizing the gap of instances similarity between the label space and the feature space,and the label confidence matrix can be calculated by

where 1l=[1,1,···,1]T∈ Rlis an all 1 vector with sizel,0m×lis an all 0 matrix with sizem×l,and constraints are the matrix form of (1) and (2) respectively.

3.2 Uniform learning framework

The feature and label information are simultaneously utilized to disambiguate the candidate label sets in our uniform learning framework.

In the feature space,we adopt the adaptive graph to recover the intrinsic manifold structure within the data more robustly and accurately.Let G=(N,Ξ,S) be a weighted graph,where N={xi|1≤i≤m} is the node of the graph,andΞ={{xi,xj}|xj∈Nk(xi),1≤i≤m} is a set of edges fromxitoxjif and only ifxjis among thek-nearest neighborsofxi.S ∈ Rm×mcorresponds to the information of the manifold structure in the feature space.L∈ Rm×mis an index matrix whereli,j=1 if (xi,xj)∈Ξ;otherwise,li,j=0.The labeling confidence matrix can be calculated [11] by solving the following problem:

wheref(xi,W) is a predictive model,andWis the model parameter.The first term is to learnW.The second term is to learn the labeling confidence matrix,the third term is to determine the adaptive graph weight matrix,and the fourth term is a regularization term.

In the label space,PL-LCSA aims to disambiguate candidate label sets by the label correlation and the semantic difference maximization.The aim of semantic difference maximization is to maximize the semantic differences of the two instances which do not share any common candidate labels.It can be expressed [14] by the following problem:

By integrating (6),(7),and (8),the final objective function is shown as follows:

whereλis the regularization parameter,δ,α,β,andγare the trade-off parameters.

3.3 Alternative optimization

Before the optimization procedure,the matricesPandSshould be initialized by solving the standard quadratic programming (QP) problems [11] as follows:

Then,we utilize the alternative optimization procedure to solve the problem (9).

UpdateS,with fixedPandW,the problem (9) can be stated as follows:

UpdatingP,with fixedSandW,the problem (9) can be stated as follows:

In the optimization problem (14),the first three terms are convex,and the last term is concave.To optimize the second term,the first order Taylor approximation atP0can be utilized to replace it.

whereP0is the updated value at the previous iteration ofP.After Taylor approximation of the second term,the first three terms are still convex,and the last term is concave.The problem (14) is a constrained convex-concave problem.Fortunately,the concave-convex procedure(CCCP) can be used to solve this optimization problem[23].A rigorous analysis of the convergence of CCCP is provided by [24].The idea of CCCP is to linearize the concave part of the objective function.Therefore,the last term can be linearized by its first order Taylor approximation:

This is an unconstrained optimization problem.The predictive performance can be improved by the kernel technology.LetΦ(·): Rd→ Rhbe a feature mapping which maps the feature space to a Hilbert space withhdimensions.The linear model is rewritten asg(xi)=Φ(xi).We convert the problem (16) into an equality constrained minimization as follows:

The method of Lagrange multipliers can be used to solve this problem.The Lagrange function is stated as follows:whereK=ΦΦTis the kernel matrix with its elementki,j=κ(xi,xj),andκ(·,·) is a kernel function.For PL-LCSA,we use Gaussian kernel to calculateK.The modeling output matrixFcan be calculated byF=Φ/2λ.Unseen instancexcan be predicted [11] by

whereai,jis the element of matrixA,andy*is the predicted label forx.The pseudo-code of PL-LCSA is summarized as Algorithm 1.

4.Experiments

4.1 Experimental setup

In this subsection,two series of comparative experiments based on synthetic data sets and real-world data sets are conducted to evaluate the performance of PL-LCSA.Table 1 summarizes the characteristics of six multi-class data sets.Following the widely-used controlling protocol[6,8,11,14,25-27],artificial data sets are generated from multi-class data sets by three controlling parametersp,?,andr.Here,pis the proportion of example which is ambiguous.ris the number of the false positive labels in the candidate label set (i.e.,|Zi|=r+1).?is the probability that a false positive label co-occurs with the ground-truth label.The choice of false positive label is important,because it is the key factor of the label relationship in the synthetic data sets.In this paper,the reason why we choose multi-label data sets as synthetic data sets is that the multi-label data can be used to generate label correlation information.Firstly,(3) is used to calculate label correlation in multi-label data sets.Then the most relevant label corresponding to the ground-truth label is taken as the false positive label.Table 2 summarizes the characteristics of six real-world data sets,where Avg.CLs is the average number of candidate labels.

Table 1 Characteristics of the multi-class data sets

Table 2 Characteristics of real-world partial label data sets

The data sets in Tabel 1 are derived from multi-label benchmark data set by retaining instances with only one relevant label.They can be collected from Mulan(http://mulan.sourceforge.net/index.html). Because the features of Bibtex and Tmc2007 are relatively sparse,we make dimensionality reduction by principal component analysis (PCA),and the feature dimensions after dimensionality reduction are shown in the fourth column of Table 1.

In this paper,five partial label learning algorithms are utilized for comparative studies.Each algorithmis configured with the following literature:

(i) PL-KNN [5]: ak-nearest neighbor approach which makes the prediction for unseen instances by averaging the labeling information of itsk-nearest neighbor (suggested configuration:k=10).

(ii) IPAL[8]: an instance-based approach which disambiguates the candidate label set via an iterative label propagation procedure (suggested configuration:k=10 andα=0.95).

(iii) PL-ECOC[21]: a disambiguation-free approach which learns the predictive model by the error-correcting output codes (ECOC) technique (suggested configuration:τ=0.1·|D|,L=「40·log2(l)).

(iv) PL-AGGD [11]: an approach which disambiguates the candidate label sets by adaptive graph to overcome the noise and outliers in the feature space (suggested configuration:k=10,λ=1,μ=1,γ=0.05,andT=10).

(v) SDIM [14]: an approach which aims to maximize the latent semantic differences of the two instances whose ground-truth labels are definitely different (suggested configuration:λ=0.05,β=0.001).

The parameters of PL-LCSA are set asδ=0.5,α=0.5,β=0.05,γ=0.5,λ=1,k=10,andT=10.Ten-fold cross-validation is executed in each algorithm,and mean prediction accuracy and standard deviations will be recorded.

4.2 Results and discussion

4.2.1 Real-world data sets Table 3 is the summary classification accuracy (mean±standard deviation (std)) of each comparing algorithm on the real-world data sets.· indicates the PL-LCSA is statistically superior/inferior to the comparing algorithms on each data set (pairwise t-test at 0.05 significance level).The performance of each algorithm is poor on the face and gesture recognition network (FG-NET) aging data set,because its Avg.CLs is extremely large.

As shown in Table 3,it can be seen that:

Table 3 Classification accuracy (mean ±std) of each comparing algorithm on real-world partial label data sets

(i) On the Lost,MSRCv2 and Soccer Player data sets,the performance of PL-LCSA is superior to all the comparing algorithms;

(ii) On the BirdSong data set,the performance of PLLCSA is comparable to the PL-ECOC and superior to the other comparing algorithms;

(iii) On the FG-NET data set,the performance of PLLCSA is comparable to the SDIM,PL-AGGD,PL-KNN and superior to the PL-ECOC and IPAL.

(iv) PL-LCSA is never outperformed by any comparing algorithms.

From Table 3,PL-LCSA is capable of better performance compared with other comparison methods in the real-world data sets.Because three items are integrated in the PL-LCSA,we set three sets of comparative experiments in real-world data sets to determine the effect of these modules by parametersδandγ.The first set of comparative experiments shows the performance of the adaptive graph.The second set shows the performance of the adaptive graph and the label correlation.The third set of comparative experiments shows the performance of PLLCSA.

Table 4 shows the classification accuracy (mean ±std)of these three sets of comparative experiments.A significantly increased performance was observed in the second set experiment compared with the first set experiment on MSRCv2,BirdSong,Soccer Player,and Yahoo! News.It is because the label correlation has contributed to the partial label learning.The performance of the third set of experiment achieves the better performance to the second,and the reason is that the label correlation and the semantic difference maximization can jointly promote the performance.

Table 4 Classification accuracy (mean ±std) of control variables for PL-LCSA on real-world data sets

4.2.2 Synthetic data sets

Fig.1 illustrates the classification accuracy of each comparing algorithm on the synthetic data sets as the cooccurring probability?varies from 0.1 to 0.9 with step size 0.1,wherer=2 andp=1.

Fig.1 Classification performance on synthetic data sets with ? ranging from 0.1 to 0.9 with step size 0.1 (r=2,p=1)

In general,PL-LCSA achieves competitive or better performance at synthetic data sets.Fig.1 reveals that there has been a gradual decrease in the second half.The reason may be that the greater the similarity between the ground-truth label and the false positive label is,the more difficult it is to distinguish.Compared with the PLAGGD,PL-LCSA achieves competitive performance at a smaller value of?and better performance at a bigger value of?on Enron,Scene,and Mediamill.The reason is that the size of label space on Enron,scene,and Mediamill is small,and the synthetic data does not have the information about the label correlation at a smaller value of?and it is disadvantageous to PL-LCSA.On the contrary,when the value of?is high,PL-LCSA achieves superior performance against PL-AGGD.It shows that the more the information about the label correlation is,the better the performance of PL-LCSA is.

4.2.3 Parameter sensitivity

Fig.2 shows the accuracy of PL-LCSA under different configurations for parametersδandγon Lost and MSRCv2.As γ increases,PL-LCSA starts to take into consideration the semantic difference maximization criterion and the classification accuracy increases.Forδ,as the weight of the label correlation increases,the classification accuracy decreases first,and then increases.In practice,we suggest users to chooseδand γ around 0.5 for all data sets.

Fig.2 Parameter sensitivity analysis for PL-LCSA

5.Conclusions

In this paper,we propose a unified framework that simultaneously focuses on the label and feature space.Meanwhile,this work generates fresh insight into the acquisition of the learning information from the label space,i.e.,the label correlation.The framework integrates the label correlation,the adaptive graph,and the semantic difference maximization criterion.The relationship of instances can be learned by the adaptive graph in the feature space,the semantic difference analyzes the label relationship at the instance level,and the label correlation learns at the global label level.An effective optimization method is also proposed for this framework.Experiments on realworld and artificial data sets have demonstrated the superiority of PL-LCSA to the state-of-the-art partial label learning approaches.

Journal of Systems Engineering and Electronics2022年5期

Journal of Systems Engineering and Electronics的其它文章: Joint optimization of inspection-based and age-based preventive maintenance and spare ordering policies for single-unit systems; Research on virtual entity decision model for LVC tactical confrontation of army units; Improved adaptively robust estimation algorithm for GNSS spoofer considering continuous observation error; Design and simulation of the ATP system considering the advanced targeting angle in quantum positioning system; Impact angle constrained fuzzy adaptive fault tolerant IGC method for Ski-to-Turn missiles with unsteady aerodynamics and multiple disturbances; Maneuvering target state estimation based on separate modeling of target trajectory shape and dynamic characteristics