亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Retraining Deep Neural Network with Unlabeled Data Collected in Embedded Devices

2022-04-19 05:49:02HongXuChengLeTianHuangJunShiWangMasoumehEbrahimi

Journal of Electronic Science and Technology 2022年1期

Hong-Xu Cheng | Le-Tian Huang | Jun-Shi Wang | Masoumeh Ebrahimi

Abstract—Because of computational complexity,the deep neural network (DNN) in embedded devices is usually trained on high-performance computers or graphic processing units (GPUs),and only the inference phase is implemented in embedded devices.Data processed by embedded devices,such as smartphones and wearables,are usually personalized,so the DNN model trained on public data sets may have poor accuracy when inferring the personalized data.As a result,retraining DNN with personalized data collected locally in embedded devices is necessary.Nevertheless,retraining needs labeled data sets,while the data collected locally are unlabeled,then how to retrain DNN with unlabeled data is a problem to be solved.This paper proves the necessity of retraining DNN model with personalized data collected in embedded devices after trained with public data sets.It also proposes a label generation method by which a fake label is generated for each unlabeled training case according to users’feedback,thus retraining can be performed with unlabeled data collected in embedded devices.The experimental results show that our fake label generation method has both good training effects and wide applicability.The advanced neural networks can be trained with unlabeled data from embedded devices and the individualized accuracy of the DNN model can be gradually improved along with personal using.

Index Terms—Deep neural network (DNN),embedded devices,fake label,retraining.

1.lntroduction

The deep the neural network (DNN) has achieved many breakthroughs in various fields,such as image classification,speech recognitions and natural language processing[1].However,because of high computational complexity and memory overhead of DNN algorithms,it is seldom fully implemented in embedded devices.Some applications use DNN in the cloud through the Internet[2],whereas the privacy security and real time cannot be guaranteed[3].For example,using face recognition to unlock the smartphone,even without the Internet,the unlocking ought to work correctly and rapidly.Moreover,the users’ face pictures captured by the phone are personal privacy,thus it is in high risk of privacy disclosure to use DNN through the Internet.Therefore,for the application with high demands of privacy security or real time,using DNN locally is necessary.

In general,deep learning includes two major phases,training and inference.In training,each layer of the model is assigned some weights initialized with random numbers,and then the model is feed with cases of objects to be detected or recognized,predicting the class label of each case.This is the forward pass of the training phase.After that,the predicted label is compared against the real label to compute an error via a loss function.Then the error is propagated backward through the network to update weights with some weight update algorithms,such as stochastic gradient descent.This is the backward pass of the training phase.Unlike training,inference only comprises the forward pass similar to training,in which a trained model is used to infer/predict the label of some samples.Obviously,the training phase has much more overhead when implemented on the embedded systems than inference.Thus in some DNN implementations[4],[5],the training phase is performed externally in the graphic processing unit (GPU) based high-performance computers,and only weights are transmitted to the embedded devices for inference after training.This is called off-chip learning.Although off-chip learning is power and computation friendly to embedded systems,it is not appropriate for all application scenarios.

Since the data sets processed by embedded devices,such as smartphones and wearables,are usually personalized,the neural network model trained with public data sets may not be good at inferring the personalized data.For example,the handwriting digit recognition application in our experiment,the neural network is fully trained on the public data sets (MNIST) to get the accuracy of 99.14% on its test data set,then the model is used for inferring handwriting digits written by 3 different people and the accuracy is shown inFig.1.As everyone has his/her own writing habits,the model gets different accuracies of 87.36%,82.8%,and 88.90%,respectively,all far lower than 99.14%.Consequently,to solve this problem,retraining the neural network model with personalized data sets collected locally in embedded devices is necessary.

Fig.1.Neural network performance on digits written by different persons,which is trained on the training data set of MNIST.

Some on-chip learning implementations of DNN are also proposed,in which the training and inference phases of the neural network algorithm are both performed in embedded devices[6],[7].Nevertheless,those studies mainly focus on the hardware implementation and optimization of training algorithms within strict energy and computing power limits,while the personalization of data sets and how to train the neural network with unlabeled data collected in embedded devices are not concerned.

In this paper,we firstly analyze the personalization of the data sets collected in embedded devices and prove that retraining neural networks in embedded systems is necessary by experiments.Secondly,we propose a fake label generation algorithm to solve the problem that data sets collected in embedded devices are unlabeled.The experimental results show that fake labels generated by the algorithm can effectively train the neural network.

Section 2 presents some related works to apply deep neural networks in embedded systems.Section 3 proves the necessity of retraining neural networks in embedded systems by experiments.Section 4 proposes the solution to retrain DNN with unlabeled data collected in embedded devices.Section 5 gives the conclusion.

2.Related Works

2.1.DNN in Embedded Devices

Many efforts have been made to apply DNN in embedded systems,and one idea of them is model miniaturization which manages to reduce the size of the neural network model,so that the embedded devices can run it within the power and performance limits.Two techniques can be applied to model miniaturization.One is adjusting the model structure and then training a small model directly,including many different methods,such as the binarized neural network[8],depth-wise convolution[9],and kernel reduction[10].The other technique is model compression,in which only a little of the model is modified and retraining is not needed,and the corresponding methods include quantization,code optimization,pruning,and the integration of them[11].

Another idea to run DNN in embedded devices is the accelerator,the implementations including software and hardware.In a software accelerator,such as DeepX[12],a pair of resource control algorithms are used to optimize the resource,allowing even large-scale deep learning models to execute efficiently on modern mobile processors.The purpose of the hardware accelerator is to enhance the computing power of embedded devices and optimize energy consumption meanwhile.Because of high overhead of the training phase,in many hardware accelerators[13]-[16],only inference is implemented.Even though some works also implement online training in the accelerator[17],the problem of the unlabeled data sets collected locally for training is not mentioned.

In addition,[18] analyzed the challenges and characteristics that can be exploited in embedded DNN processing and introduced some promising algorithmic and processor techniques that bring deep learning to Internet of things (IoT) and edge devices.Some researchers propose to apply DNN for IoT with edge computing to offload the cloud tasks and an elastic model to DNN for IoT with edge computing is formulated[19].Furthermore,distributed deep neural networks (DDNNs),which can accommodate DNN in the cloud and at the edge and end devices,are proposed to improve the recognition accuracy and reduce the communications cost[20].

To summarize,all the efforts above more focus on running the neural network model faster in embedded devices and consuming less resources,such as energy and area at the same time.Thus,the characteristics of personalization and lack of labels of the data sets collected in embedded devices are usually ignored.In this paper,the necessity of retraining is proven and the solution to train the neural network with unlabeled data is proposed.

2.2.Retraining

The word “retraining” is mentioned in many papers related to hardware neural networks[21]-[23],but the purposes of retraining are different in various works.Reference [21] proposed to leverage the error resiliency of the neural network to mitigate timing errors in neural network accelerators.Retraining is needed to update the weights of the neural network,but timing errors significantly affect the output results,thus the critical timing errors is necessary.Reference [22] proposed that the power consumption of the multilayer perceptron accelerator during classification can be saved by approximation,such as reducing bit-precision and using inexact multiplication.Furthermore,retraining the network after approximation can improve the accuracy and retaining the power consumption meanwhile.LightNN[23],[24]was introduced and compared with various quantized DNNs.Retraining here was used for compensating the accuracy loss caused by quantization.All the retraining mentioned above uses the same training data sets as the pre-training for the purpose of compensating the accuracy loss.Nevertheless,in this paper,the data sets used for retraining are collected in embedded devices,which are different from the public data sets used in pre-training.Further,the personalization of data collected locally is analyzed and the necessity of retraining is proved.Moreover,to retrain with unlabeled collected data,a fake label generation method is proposed.

3.Necessity of Retraining Locally

To prove the necessity of retraining the neural network with data sets collected in embedded devices,a convolution neural network (CNN) shown inFig.2(detailedly described in subsection 5.1) is fully trained on the training data sets of MNIST and finally gets the accuracy of 99.14% on the test data sets of MNIST.

Fig.2.Topology of CNN for MNIST data sets used in experiments.

Fig.1presents the handwriting digits collected from three different people and those from MNIST test data sets,from which we can see that different people have different handwriting habits,so that the digits written by them have obvious differences.And then these differences may affect the accuracy,when using a pretrained neural network model to recognize digits written by different people.To prove this point,the fully trained model is used to infer ten people’s handwriting digits and all get low accuracy (less than 90%),which indicates that a neural network fully trained on public data sets may not have good performance on inferring personalized data sets.Further,everyone’s handwriting digits are divided into two parts,the training set and the test set.The fully pre-trained model is retrained on one person’s training set,and then the retrained model is tested on the test set of the same person.

The accuracy of the model before and after retrained on each person’s data set is shown inTable 1.Table 1illustrates that after retraining,the accuracy on each person’s testing data increasesmore than 10% and therefore retraining in embedded devices with local collected data is necessary,after the neural network is transferred from a cumbersome model trained with public data sets.

Table 1:Comparison of accuracy before and after retraining

4.Training with Unlabeled Data Collected in Embedded Devices

Embedded devices,such as smartphones and wearables,are usually equipped with lots of sensors so that it is easy for them to collect datain situ[26].Fig.3shows the example of data collecting of a handwriting digit recognition application in the smartphone.When someone writes a digit in the touch screen of the smartphone,the digit is sampled as an object in format of an integer array.On one hand,the handwriting digit object is sent to the neural network model,and then the model infers the digit and gives a prediction.On the other hand,the object is saved to some special storage,such as the secure digital card in the smartphone and the collected digit objects will be used to train the neural network.However,the digits collected have no label,so labeling these digits before using them to train the neural network is necessary.

In this scenario,only the user who writes the digit knows the corresponding label.If the application asks the user to label the digit,the user’s experience will be seriously damaged.Although the user cannot label the handwriting digit directly,he/she may give some feedback on the prediction result.For example,the user writes a “9” in the touch screen,if the neural network model recognizes it as “9”,then the user may press the “ensure” or “next” button and implicitly give the feedback of “correct prediction” at the same time,otherwise he/she may press the “delete” button and give the feedback of “wrong prediction” implicitly.The feedback exists widely in embedded applications.For another example,in a speech control application,the user may say “Let there be light” to turn on the lights,and if the lights are not turned on,then the user would say the words again and give the implicit feedback of “wrong prediction” meanwhile;otherwise the user will not repeat the words and give the implicit feedback of “correct prediction”.This kind of feedback is named correctness feedback (CF) in this paper.Moreover,as shown inFig.4,a label method using CF to generate the fake label is proposed,and then the unlabeled case coupled with its corresponding fake label can be used to train the neural network.

Fig.4.Generating the fake label with CF and retraining the neural network.

To figure out how fake labels are generated by CF,firstly we should understand the principle of loss function calculation with real labels.Most neural networks used for classification adopt the softmax layer as the output layer,and the cross entropy of the prediction distribution and real distribution as the loss function[27]-[29].As shown inFig.5,assume that the output of the last hidden layer isyj,j∈{1,2,…,n},then the softmax layer maps the output to the distribution of [0,1] by

Fig.5.Neural network adopting softmax layer as the output layer.

For the cross entropy loss function,it is defined as

whereprepresents the real distribution corresponding to the real label,i.e.,for ann-classification model,if the real label indicates classi,then

whereqis the prediction distribution corresponding to the prediction result of the neural network,i.e.,q(i)=aj,j∈{1,2,…,n}.

For an example of 10 classifications,if the real label indicates class 1,then the real distribution isp={1,0,0,0,0,0,0,0,0,0}.Assume that the prediction distribution,i.e.the output of the sofmax layer isq={0.010,0.020,0.010,0.910,0.003,0.010,0.009,0.011,0.005,0.012},thus the cross entropy loss can be calculated ash=4.6052.And then the neural network applies the back propagation algorithm with the loss to update each weight,so as to learn from the training case and its real label.

In the CF scenario proposed in this paper,we do not know the real labels corresponding to the training cases,but we can get the feedback whether the neural network prediction is correct.In this scenario,the feedback of neural network prediction can be divided into two cases:Correct prediction and wrong prediction.In both cases,we can generate fake labels based on CF,and the loss calculated by fake labels is similar to that by real labels,so the fake labels can be effectively used for neural network training.Algorithm 1 describes the fake label generation algorithm based on CF,which will be discussed on two cases later.

Algorithm 1.Fake label generation algorithm according to CF

Input:The softmax output of neural network:

Correct prediction:If the neural network has correct prediction for a training case,we can deduce its real label directly,i.e.its fake label is just real label.For example,the real class of a training case is the 4th,and the output of the softmax layer isq={0.010,0.020,0.010,0.910,0.003,0.010,0.009,0.011,0.005,0.012},which means that the prediction class is the 4th as well,so we can get “correct prediction” feedback from the user.Then in the real label,the probability of the 4th class must be 1,and others are 0,so the real label can be deduced asq={0,0,0,1,0,0,0,0,0,0}.

Wrong prediction:If the neural network has wrong prediction for a training case,we can construct a fake label where the probability of the predicted class is 0,because the prediction is wrong.For the probabilities of other classes,we do not know which should be 1,because the real class that the training case belongs to is unknown.Nevertheless,to make the sum of all probabilities to be 1,we let the other classes share the probability of the predicted class equally.For example,assuming that the real class of a training case is the 1st,but the prediction result isq={0.010,0.020,0.010,0.910,0.003,0.010,0.009,0.011,0.005,0.012},i.e.the prediction class is the 4th,and then we can get “wrong prediction” feedback from the user.Since the result of the neural network prediction is wrong,the probability of the 4th class must be 0.In order to ensure that the sum of the probability of each class is 1,we average the probability of the 4th class in the prediction results and add it to that of the other 9 classes.Finally the constructed fake label isf={0.1111,0.1211,0.1111,0,0.1041,0.1111,0.1101,0.1121,0.1061},and the corresponding cross entropy loss ish=4.7004,similar to that calculated by real labels.

5.Evaluation

5.1.Experimental Setup

In order to prove the necessity of retraining the neural network with personal data sets in embedded devices and evaluate the fake label generation algorithm,CNN shown inFig.2is constructed with Tensorflow[25].The size of each convolution kernel in CNN is 5×5,and the strides both in width and height are 1 with the padding method of “SAME”.The kernel size of the max-pooling layer is 2×2,and the strides both in width and height are 2 with the padding method of “SAME”.The full-connection layer FC5 flattens the results of the last max-pooling layer,and the drop out is adopted to reduce overfit.Finally,the full-connection layer FC6 gets a result vector with the length of 10 and then the vector is mapped to a probability distribution of [0,1] by the softmax algorithm.

To evaluate the training effect using the fake label generated with CF,a mechanism to simulate CF scenario is shown inFig.6,in which the feedback simulator simulates a user to give CF by comparing the prediction result of the neural network with the real label,i.e.if the prediction is the same as the real label,then the feedback simulator gives the feedback of “correct prediction”,otherwise it gives “wrong prediction”feedback.And then the feedback is used for generating the fake label to train the neural network.CNN shown inFig.2is trained on the MNIST training data set with real labels and fake labels generated with CF from scratch,respectively.The training is performed with the initial learning rate of 10?4,dropout rate of 0.5,and batch size of 50.Moreover,the adaptive moment estimation (ADAM)[30]optimization is adopted.For each step of training,the accuracy and loss on MNIST testing data sets is measured,and the model is trained for 10000 steps totally.

In order to prove that the fake label generation algorithm can also work well on other DNNs and data sets,CNN for CIFAR-10[31]data sets shown inFig.7is build with Tensorflow.The input layer of CNN has 3 channels corresponding to the three color channels (i.e.red,green,and blue) of input images.The convolution kernel in CNN is 5 ×5 with the strides of 1 and the pool kernel has the size of 3 ×3 with strides of 2.Both the convolution layer and the pool layer adopt the padding method of “SAME”.There are two local response normalization (LRN)[27]layers in CNN,one is after the first pool layer S2,and the other is after the second convolution layer C4.CNN shown inFig.7is trained on the CIFAR-10 training data set with real labels and fake labels from scratch,respectively.The batch size of each training step is 128 and the model is trained for 250000 steps.

As mentioned earlier in this paper,DNN in embedded devices is mostly pre-trained on some public data sets,and therefore in the CF scenario of embedded systems,the model to be retrained with fake labels generated by CF is fully pre-trained on the training data sets of MNIST and gets the accuracy of 99.14% on the test data sets of MNIST beforehand.To evaluate the retraining effect of fake labels in this scenario,we repeat the experiment in Section 3 on ten different people’s data sets,but the only difference is that the experiment is conducted twice on each person’s data set,using the real label and fake label,respectively,to retrain the neural network model corresponding to each person.

5.2.Results

Fig.8shows the comparison between the accuracy/loss trained with real labels and fake labels on MNIST data sets.It can be seen that the accuracy rising and the loss falling in the fake label training are slower than that in the real label training within the initial few training steps.However,as the training goes on,the accuracy and loss in the two cases are going to coincide gradually,which illustrates that the fake label generated with CF from scratch can effectively train the neural network.

Fig.8.Comparison of the accuracy/loss trained with real labels and fake labels on MNIST data sets.

The training curves of accuracy/loss with real labels and fake labels on CIFAR-10 are shown inFig.9.The training curves inFig.9have the same trends as those inFig.8,which illustrate that fake labels generated with CF can work well for CNN shown inFig.7on the CIFAR-10 data sets.The fake label generation algorithm has both good training effect and wide applicability.

Fig.9.Comparison of the accuracy/loss trained with real labels and fake labels on CIFAR-10 data sets.

The retraining effect of the fake labels can be seen fromTable 2.The handwriting digits of each person listed inTable 2is divided into two parts,one for training and the other for test.Meanwhile,CNN shown inFig.2is pre-trained on the training data set of MNIST and reaches the accuracy of 99.14%,when tested on the test data set of MNIST.Then the pre-trained CNN model is performed on each person’s test data set and gets the corresponding accuracy shown in the “Before retraining” column.Subsequently,the pre-trained CNN model is retrained with the training data set of each person twice,using real labels and fake labels,respectively.And the retraining accuracy is shown in “After retraining with the real labels” and “After retraining with fake labels” columns,respectively.From the accuracy before retraining,we can see that each person gets different accuracy,when using the CNN model pre-trained on public data sets to test the personal data set,because each person may have his/her own handwriting habits.Therefore,the pre-trained neural network model cannot be applied directly to personal data sets in this scenario,i.e.retraining the pre-trained model with personal data sets is necessary.The accuracy after retraining with real labels indicates that after retraining with personal data sets,the accuracy for every person’s test data set is improved a lot,which means that using personal data sets to retrain the neural network model pre-trained on the public data sets is necessary and effective.As can be seen from the comparison between the accuracy obtained after retrained with real labels and with fake labels,the pre-trained model has almost the same accuracy,and therefore the fake label generated with CF can effectively retrain the neural network in embedded devices.

Table 2:Comparison of final accuracy after retraining with real labels and fake labels

Fig.10shows the retraining curves of accuracy/loss with real labels and fake labels from ten people’s personal data sets,respectively.

Fig.10.Comparison of the accuracy/loss curves retrained with real labels and fake labels from ten different persons:(a)person Y’s retraining curve,(b) person H’s retraining curve,(c) person K’s retraining curve,(d) person G’s retraining curve,(e) person A’s retraining curve,(f) person J’s retraining curve,(g) person Z’s retraining curve,(h) person B’s retraining curve,(i) person P’s retraining curve,and (j) person O’s retraining curve.

These training curves illustrate the following conclusions:

1) Even though the trends of those curves are very similar,different curves have different initial accuracy,final accuracy,raising slope of accuracy,and the range of loss.These differences indicate that different person’s data set has its own personality.

2) The accuracy rising and loss falling in the fake label retraining may be slower than that in the real label retraining within the initial few training steps,but as the retraining goes on,the accuracy and loss in the two cases become coincide gradually,which illustrates that the fake label generated with CF can effectively retrain the neural network.

In the field of deep learning,new ideas pop up every single week,which bring state-of-the-art technologies and higher accuracy.Most of these advance neural networks need to be trained with labeled data,while user’s data sets collected in embedded devices are unlabeled.Then how to train the neural network model without labeled data is a problem to be solved.The fake label generation algorithm in this paper is a solution to the problem,therefore,the purpose of our method is not to improve the current state-of-the-art accuracy,but to provide a method with which these advanced neural networks can be trained even without labeled data and get almost the same training effect as that trained with real labeled data.A series of experiments are designed to prove the effectiveness of our method,which compare the fake label training effect with the real label training effect of the same neural network.All the results and conclusions above prove that the fake label generation algorithm is effective and widely applicable.

6.Conclusion

Because of extra overhead in the training phase,many implementations of DNN in embedded devices only focus on the inference stage of the neural network.Even though some accelerators implement the training phase,they mainly optimize the performance and power consumption and the data sets used for training usually do not get much attention.However,as the data processed by embedded devices,such as smart phones and wearables,are personalized,thus the DNN model trained on public data sets may have poor accuracy when inferring the personalized data sets collected in embedded devices,and this is proven by experiments in this paper.

Therefore,this paper proposes that retraining with data collected in embedded devices is necessary.Meanwhile,this paper also proves that retraining the pre-trained neural network model is effective by experiments.Furthermore,to solve the problem that data collected locally are unlabeled,a fake label generation method is proposed in this paper,and the fake label can both train the neural network from scratching and retraining the pre-trained model effectively.This work will be useful in many application scenarios of neural networks.For example,the handwriting input method in the smart phone can use the fake label generation method to retrain the neural network model,so that the recognition accuracy for some person can be improved gradually.Because each person has his/her own voice,the pre-trained the speech recognition model may not work well for everyone.Therefore,the voice controlled devices in the smart home system can also use this method to improve the speech recognition accuracy.With this work,the accuracy of DNN models can be gradually improved along with personal using i.e.,the more users use,the higher the accuracy of the neural network model is.

Disclosures

The authors declare no conflicts of interest.

Journal of Electronic Science and Technology2022年1期

Journal of Electronic Science and Technology的其它文章: Review on ldentity-Based Batch Verification Schemes for Security and Privacy in VANETs; Two-Stream Architecture as a Defense against Adversarial Example; Exploration of the Relation between lnput Noise and Generated lmage in Generative Adversarial Networks; An in situ Digital Background Calibration Algorithm for Multi-Channel R-βR Ladder DACs; SET-MRTS:An Empirical Experiment Tool for Real-Time Scheduling and Synchronization; lnvestigation on Surface Plasmon Polaritons and Localized Surface Plasmon Production Mechanism in Micro-Nano Structures