Jian ZHONG, Wei-ying HE, Han-song TAN
(1School of Computer Science & Engineering, Tianhe College of Guangdong Polytechnic Normal University, Guangzhou 510540, China) (2School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China) (3School of Computer Science and Engineering, Central South University, Changsha 410083, China)
Abstract: More natural and flexible gesture recognition technology is gradually becoming an important human-machine interface for intelligent mobile robot control. In order to further improve the real-time and accuracy of robot vision control based on computer vision, a gesture recognition method based on Principal Component Analysis (PCA) dimension reduction combined with machine learning algorithm is proposed. First, the gesture image captured by the vision camera is preprocessed, including image binarization, median filtering, and morphological transformation. Then the background subtraction is used for feature extraction, and then the main features will be extracted by PCA and the data is reduced. Finally, combined with the more advanced self-organizing neural network in machine learning as a classifier, it is applied to gesture recognition. The static gesture experiment results show that compared with BP neural network and K-means algorithm, the proposed method could shorten the gesture recognition time and the recognition accuracy will be effectively improved. Therefore, the effectiveness of the proposed method has been verified.
Key words: Gesture recognition, Human-computer interaction, Visual control, SOM, Principal component analysis
As computers are increasingly connected to human life, human-computer interaction devices such as keyboards, mice, and light pens have been widely used and dominated for decades as mainstream devices. However, with the rapid development of artificial intelligence technology, due to the limitations of hardware devices, this traditional human-computer interaction device has limited the speed and convenience of human-computer communication [1].
Intelligent robots are one of the hot research directions in present computer field, such as external bone robots, automobile robots and search and rescue robots. However, because the hardware size of the robot system is usually large, the program is usually complicated, and the operation is cumbersome, the traditional human-computer interaction equipment can no longer meet the control requirements in complex environments [2]. More natural and flexible human-computer interaction has become a real problem that needs to be solved in the field of robot control.
Human gesture recognition and tracking is an emerging human-computer interaction method. The basic principle is the computer vision processing technology. The application field of human gesture recognition and tracking is very extensive, which can effectively improve the operational efficiency of human-computer interaction systems, such as smart home systems, virtual reality systems, auxiliary design systems and robot control systems [4]. The biggest advantage of gesture interaction is that the interaction is simple, fast, and more in line with people’s living habits. The user can complete the input function of the system by merely gesture operation.
However, as a multi-disciplinary and difficult technical problem, the real-time and accuracy of human gesture recognition methods still cannot meet the requirements of large-scale commercial use, especially the robot navigation control industry with huge market scale and potential. Therefore, in order to further improve the real-time and accuracy of computer vision-based gesture recognition, a gesture recognition method based on PCA reduction and machine learning algorithm is proposed. The main features are extracted by PCA and the data is reduced in dimension. The self-organizing neural network is used as a classifier for gesture recognition. The static gesture experiment test results verified the feasibility and advancement of the proposed method.
As mentioned above, accurate real-time human-computer interaction gesture recognition plays a very important role in the future development of intelligent robots. Since the 1980s, gesture recognition has attracted the attention of scholars from various countries [5]. Scholars at home and abroad have proposed a variety of gesture recognition technologies. In literature [6], a network data glove gesture recognition technology based on PSO is proposed. The general robot gesture control model is established by feature extraction and normalization. In literature [7], a vision-based gesture recognition method and its implementation on a digital signal processor are proposed. The feature points are extracted by the internal maximum circle method and the round-cut method, which enhances the classification accuracy to a certain extent. In literature [8], the gesture recognition based on dynamic Bayesian network is proposed. The skin color model of HSV space is used to realize gesture positioning, and the dynamic Bayesian network model is established to realize gesture recognition. In literature [9], a Kinect-based directional gesture detection algorithm is proposed, which achieves better accuracy by combining multiple algorithms. In literature [10], an RFID tag-based wearable gesture detection system is designed to effectively meet the requirements of the Internet of Things system. In literature [11], the face recognition technology based on gradient direction PCA is proposed, which effectively utilizes the PCA subspace feature information. The research [12] carried out PCA weighting on the basis of the former.
As an advanced unsupervised machine learning algorithm, self-organizing neural networks introduce the concept of competitive learning [13-14]. Compared to networks, the network structure of self-organizing neural networks is much simpler. Compared with theK-means clustering algorithm, self-organizing neural networks can link classification results with clustering. At present, there are still few studies on the combination of PCA dimensionality reduction and self-organizing neural networks, and there is no attempt to apply it to gesture recognition. Therefore, through the above research and analysis, the idea of this paper is to combine PCA dimensionality reduction and self-organizing neural network to solve the problem of accurate gesture recognition in human-computer interaction.
Since each image is composed of pixels of different values, in order to better analyze the image features, a single-channel grayscale processing of the original image is required. The specific method of image binarization is defined as follows.
(1)
Where,src(x,y) represents the value of a pixel in the original image,threshrepresents a preset threshold, bothv1andv2represent the processed pixel values. The goal of image binarization is that each pixel can only be 0 or 255. The binarization of the gesture image is shown in Fig.1.
Fig.1 Image binarization
Image smoothing is used to suppress the interference of various noises (such as thermal noise and check noise), so that the accuracy of gesture recognition can be optimally achieved while retaining the details of the image as much as possible. The specific method used is the median filtering method in nonlinear filtering. The median filtering method based on the ranking statistics theory uses neighborhood pixels to achieve the salt and pepper noise removal effect while preserving sufficient detail definition. The specific algorithm is defined as Eq.(2).
g(x,y)=med{f(x-k,y-l),(k,l)∈w}
(2)
Where,f(x,y) represents the original image,g(x,y) represents the processed image, andw(k,l) represents the pixel within the adjacent rectangular window. The window size is typically 3×3 or 5×5. The filtering effect is shown in Fig.2.
Fig.2 Filter processing effect diagram
For the morphological transformation process, it is mainly divided into expansion operation, corrosion operation and opening and closing operation. The expansion operation is defined in Eq.(3):
A⊕B={Z|[(B)s∩A]?A}
(3)
Where,Arepresents an image andBrepresents a structural element, both of which are collections ofZ2.
The definition of the corrosion operation is defined in Eq.(4):
AΘB={Z|(B)s?A}
(4)
The definitions of open and closed operations are defined in Eq.(5) and Eq.(6), respectively.
A°B=(AΘB)⊕B
(5)
A·B=(A⊕B)ΘB
(6)
Where, the symbol ⊕ indicates the Mingkovs sum operation.
After performing the above preprocessing method on the gesture data set, n pictures are obtained. The training sample can be expressed asx1,x2, …,xm, and the standard deviation can be expressed asS1,S2,…,Sm. Then the normalized transformation method could be obtained in Eq.(7):
(7)
First, if the value ofY1is equal to the orthogonal unit value of the corresponding feature value, and the variance ofY1is the largest, it can be determined thatY1is the first principal component.
Second, if the value ofY2is equal to the orthogonal unit value of the corresponding eigenvalue, the covariance ofY1andY2is zero and the variance is the largest, thenY2can be determined to be the second principal component.
Similarly, more than m principal components can be obtained by analogy.
In the cumulative contribution rate calculation process, the contribution rate of the i-th principal componentYiis the Eq.(8).
(8)
Then the total contribution rate of the first n principal components is the Eq.(9).
(9)
As a typical extension of SOM, the LVQ neural network can supervise the clustering center to classify the input data. Its structure generally includes three layers [15], as shown in Fig.3, divided into an input layer, an implicit layer (competitive layer) and an output layer. Among them, the hidden layer is mainly responsible for completing the classification of input vectors in the input layer neurons.
Fig.3 LVQ neural network structure
The LVQ network algorithm is divided into two types: LVQ1 and LVQ2. LVQ1 is more commonly used, so this paper uses LVQ1 algorithm [16], the steps are as follows:
Step1Initialize the weight and the learning rate between the input layer and the competition layer;
Step2Import the input vectorX=(x1,x2,…,xR)Tinto the input layer (R is the number of input elements) and calculate the distance between the competition layer neurons and the input vector.
(10)
Where,Slis the number of competitive neurons [17].
Step3Select the competition layer neurons with the shortest distance from the input vector. Ifdjis the smallest, the category label of the output layer neuron whose label is connected isCj;
Step4Set the class label corresponding to the input vector toCx. IfCj=Cx, adjust the weight according to formula (11);
Wij-new=Wij-old+η(x-Wij-old)
(11)
Otherwise, adjust the weight according to the formula (12):
Wij-new=Wij-old-η(x-Wij-old)
(12)
Step5Jump to Step 2 and repeat it until the set number of iterations or error precision is met. The flow of the LVQ-based gesture recognition method is shown in Figure 4, using PCA to reduce the gesture image to 8 dimensions, and the cumulative contribution rate is 99%.
Fig.4 Flow of the LVQ gesture recognition algorithm
In order to analyze and verify the PCA-LVQ gesture recognition method proposed in this paper, specific experiments were carried out. The experimental hardware environment is: Intel Core i7 2.2GHz processor, 8 G memory. The experimental software environment is: Windows 7 operating system, Matlab7.0 simulation software.
The IDIAP Hand pose/gesture datasets were selected for static gesture recognition testing. Select 500 gesture samples, in which 300 are training sets and 200 are test sets. For comparative analysis, in the same experimental environment, PCA-LVQ-based gesture recognition,K-means algorithm-based gesture recognition [16] and BP-based gesture recognition [17] were compared. The number of neurons in the input layer, the competition layer, and the output layer in the LVQ neural network are 8, 8, and 5, respectively, and the learning rate is 0.1.
After 1 000 experiments, as the number of hidden layer nodes gets increased, the training time results of the three gesture recognitions are shown in Fig.5.
Fig.5 Comparison of training time
As can be seen from Fig.5, as the number of hidden layer nodes gets increased, the training time of the three methods gets increased as well. Among them, the growth rate ofK-means algorithm is the fastest, and the growth rate of BP network is the second. The growth rate of the proposed method is the slowest. In addition, under the same number of hidden layer nodes, PCA-LVQ gesture recognition requires the least training time, especially in the case of a large number of hidden layer nodes. Therefore, it can be concluded that the gesture recognition rate of the PCA-LVQ has been greatly improved.
When the number of hidden layer nodes is 1 000, the gesture motion recognition results of the three algorithms are shown in Table 1. It can be seen from Table 1 that compared with the other two algorithms, the accuracy of the PCA-LVQ algorithm for gesture motion recognition can be stabilized at about 90%, which can be effectively applied to the robot navigation human-computer interaction system.
Table 1 Performance comparison of three algorithms for gesture motion recognition
This paper proposes a gesture recognition method based on PCA dimension reduction combined with machine learning algorithm. Mainly through PCA to extract the main features and dimensionality reduction, and combined with LVQ neural network as a classifier for gesture recognition. The IDIAP Hand pose/gesture datasets were selected for static gesture recognition testing. The results show that gesture recognition based on PCA-LVQ neural network has better accuracy and real-time performance.