亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Robust Background Subtraction Method via Low-Rank and Structured Sparse Decomposition

2018-07-24 00:46:52MinshengMaRuiminHuShihongChenJingXiaoZhongyuanWang

China Communications 2018年7期

Minsheng Ma, Ruimin Hu,＊, Shihong Chen, Jing Xiao, Zhongyuan Wang

1 National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, China

2 Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China

3 Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China

Abstract: Background subtraction is a challenging problem in surveillance scenes. Although the low-rank and sparse decomposition(LRSD) methods offer an appropriate framework for background modeling, they fail to account for image’s local structure, which is favorable for this problem. Based on this, we propose a background subtraction method via low-rank and SILTP-based structured sparse decomposition, named LRSSD. In this method, a novel SILTP-inducing sparsity norm is introduced to enhance the structured presentation of the foreground region. As an assistance, saliency detection is employed to render a rough shape and location of foreground. The final re fined foreground is decided jointly by sparse component and attention map. Experimental results on different datasets show its superiority over the competing methods, especially under noise and changing illumination scenarios.

Keywords: background subtraction; LRSD;structured sparse; SILTP

I. INTRODUCTION

As the essential process in video investigation,background subtraction has always been a challenging task [1]. Among numerous challenges, illumination variations and noise are of awesome concerns. Many methodologies have been proposed to manage such multi-modal scenarios in the previous decade [2-10].Nonetheless, these methods always appreciate nonexclusive comprehensiveness because they are not streamlined for certain particular situations. Such as, the spatial structure information of adjacent pixels is ignored, which thus leads to many limitations in real video scenarios [11].

Recently, LRSD-based (Low-Rank and Sparse Decomposition) methods have received wide attention [12-16], which can offer an appropriate system for background modeling by dividing the input video matrix into two parts - a low rank and sparse matrix, which represent background and moving foreground respectively. In [17], Bouwmanset al.provided a preliminary review and comparative evaluation of decomposition into low-rank plus additive matrices for foreground/background separation. The main assumption made on the background is that any variation in its appearance is highly constrained and can be captured by a low-rank matrix [18]. Generally, Principal Component Analysis (PCA) and its extending algorithms can be used to solve the problem of high matrix dimension,have been widely studied and applied in background subtraction[15, 16, 19, 20, 21]. As the representative algorithms, Wenet al.use the motion as a priori knowledge and proposed a JVFSD-RPCA method for background subtraction to validate its competitiveness [15]. In 2013, DECOLOR framework was proposed to solve nonrigid motion problem, however it may misclassify unmoved objects as background, so it’s not suitable for real-time object detection [16].Liuet al.proposed a two-pass RPCA process which interleaved a motion saliency estimation step for background subtraction, a new spatio-temporal group sparsity constraint is employed to enhance the detection performance [21].

Structural information always exists in actual situations, whereas, most LRSD-based algorithms do not take into account the structural feature of frames according spatial contiguity and locality surrounding foreground region, leading to less robust in noise and illumination variation scenes. On the other side,as an efficient local image descriptor, Local Binary Pattern (LBP) and its variants and extension algorithms have been successfully applied in many fields, such as face recognition,foreground detection, etc. In these algorithms,Scale Invariant Local Ternary Pattern (SILTP)can tolerant more noise and illumination changes of images due to its scale invariance property [11, 22, 23].

Inspired by the previous works of LRSD-based and SILTP methods, we propose a novel background subtraction method via low-rank and SILTP-based structured sparse decomposition, named LRSSD. Experiments show that our method can provide better robustness against noise and scale or illumination variations under real surveillance scenarios.

The main contributions of our research can be summarized as:

1). We reformulate a novel SILTP-inducing structured sparsity norm and give the details of the sparsity matrix decomposition. This formulation considers robustness to scale or illumination variations and tolerance to local image noises and illumination changes.

2). We propose a new background subtraction framework for surveillance video which fuses saliency detection and our proposed LRSSD method. Saliency detection is used to obtain a rough shape and location of foreground, then the attention map together with sparse component akingelowingofal foreground.decide the final foreground.

The rest of paper is organized as follows.Section II of this paper introduces some related researches. Section III presents the proposed robust background subtraction method via LRSSD, including the detailed background subtraction framework and process. And Section IV presents the experiment and discusses the proposed method. Finally, the paper is concluded in Section V.

In this paper, the authors propose a novel LRSD-based background subtraction method for surveillance video.

II. RELATED WORKS AND PROBLEM SETUP

2.1 RPCA-based methods for background subtraction

RPCA-based methods have been widely studied for background subtraction. In [24], a RPCA-LBD method was proposed for foreground detection. However, the block-sparsity property of RPCA-LBD only considers implicit structural information of images, thus cannot constrain outlier matrix. Andrewset al.[25] proposed a double-constrained RPCA,in which the sparse component is constrained by attention maps, aiming to improve foreground/background segmentation accuracy and robustness, especially in dynamic scenes.However, this method is not robust enough for noise and illumination changes because of lacking temporal structure information. In the latest study, Ebadiet al.[26] proposed an approximated RPCA for background subtraction.A tree-structured sparsity model was dynamically applied to describe real foreground moving objects, which showed better performance in the dynamic background videos especially.Although the model can adapt to slow-chang-ing scenes, it cannot subtract background effectively under the sudden light change scene.

In addition to this, some methods focus on the improvement of norm expressions. Wrightet al.[20] firstly proposed to use ?1?normto restrain the sparse matrix. Whereas, ?1?normdoesn’t consider a particular structure or any conceivable correlations among blocks [27],so theminimization formulation is not suitable for illumination and shadow variations modeling. Many algorithms have been proposed to improve this internal defect [11, 23, 24, 27].

In 2014, Liuet al.[11] proposed a lowrank and structured-sparse matrix decomposition method for foreground detection, which could tolerate dynamic background variations to some extent. Considernsequential grayscale images (frames)F1…Fn, which are captured by a fixed camera.frame at time t, whereI1andI2denote the image rows and columns respectively. Matrixis defined as the mapping matrix of an observation video, where

To descript the algorithm clearly, we use the following prede fined matrix norms in this paper:

Infinite normorcorresponds to the value in the largest absolute value of all the elements of the matrix.

The structured sparsity-inducing norm(SSN) can be de fined as follows [23]:

whereS∈ ?m×n,sjdenotes thejthcolumn of S andis the number of columns,g∈?stands for overlapping group and each group includes some indices ofm.

To exhibit the superiority of this method,we firstly assume two sparse matrix distributions [23]. According to Eq. (1), the norm gets the maximum value of each group and calculates the sum. Unlike ?1?normfocusing on each individual pixel, the structured sparsity-inducing norm pays more attention to structure and correlation among each group.Inspired by the advantage of the SSN, we introduce the SILTP operator to constitute a SILTP-inducing norm for background subtraction, which will be introduced in Section III.

2.2 SILTP operator

Proposed by Ojala et al. in 2009, LBP has been proved to be an effective local image descriptor [28]. However, when there are similar local image noises or neighboring pixel values, the LBP operator will be less effective[22]. Many LBP variants and extension algorithms have been proposed to solve this problem [22, 29, 30]. To improve the robustness of local comparison operator in the illumination changing environment, some research focus on enhancing the intensity scale invariant properties. When global or local illumination changes suddenly, the scaling transform will be a roughly constant factor.In 2016, Guo et al. [31] used a Local SVD Binary Pattern feature to build an adaptive background model,which is proved to be robust against illumination changes by utilizing potential structure information of local regions. As another representative algorithm, Liao et al. proposed scale invariant local ternary pattern operator(named SILTP) for background subtraction,and proved that it can tolerant scale and illumination variations to some degree, especially under the moving shadows environment [22].

Given imageF, the following formulation represents the SILTP operator:

where (xc,yc) is the location of central pixelcin image,Dis predefined circle radius andNis the number of pixels located inDradius neighborhood of pixelc. Moreover,Icis the grayscale representation of pixelc, andIkareNneighborhood pixels gray value of surround pixelc, wherestands for concatenation operator of binary strings, andτmeans a small scale factor indicating the comparing range,sτcan be expressed with the following piecewise function:

According to Eq. (3), there are three possible values for each comparison (“01”, “10”and “00”). To enhance the discriminant ability in feature extraction, the SILTP operator use two binary bits for encoding (see Eq. (3),where “11” is not de fined).

To verify the advantage of the SILTP operator, we setτ=0.1. As shown in figure 1,when introducing a small noise (second row),or scale growing twice (third row), the value of each pixel is the same as the original image(right column).

The experiments show that the SILTP operator can tolerant local noise and has a scale invariance property. Therefore, it is not sensitive to global illumination changes. Taking advantage of the SILTP properties, we constructed a SILTP -inducing structured sparsity norm for background subtraction in this paper.

III. PROPOSED METHOD

3.1 Structured sparse norm

In 2015, an approximated RPCA algorithm was presented to cope with some intrinsic constraints of PCA [32], which decomposes the original video sequences matrixAinto three components. This matrix reduction can be expressed as, whereLis a low-rank matrix which stands for background part,Srepresents the moving foreground component, andEis a residual noise term. Figure 2 shows video segmentation process based on low-rank and sparse matrix decomposition.The rightmost image is the foreground mask(outliers) which is acquired based on the matrixSwith a threshold.

To make sparse patterns and operator more structural, we propose a novel structured sparsity-inducing norm and integrate it into the LRSD-based background subtraction framework. Speci fically, we incorporate the normalized SILTP operator into the original sparse norm formula to induce locally structured sparsity. Here the locally structured sparsity refers to the foreground movement region(sparse component) in surveillance video. It has the consistency of local spatial structure,the continuity of motion, and the similarity of texture and color.

Fig. 1. Robustness of SILTP under local noise and global scale changes.

As mentioned in Section 2.1, the input image sequences can be expressed by an observed matrixTo taking advantage of the correlation and structure of pixels, a newly SILTP-inducing norm is constructed for background subtraction, which involves overlapping group and structured sparsity. Since the original SILTP descriptor cannot be directly used for modeling the background due to its binary string expression form, further normalization and conversion process are both necessary. Here we firstly employ pattern KDE method to estimate the local pattern kernel [22], and the density function of local patternqat timetcan be represented as.Then we calculate the density function for each pixel and set the weight to 1 by normalization process. Finally, the following formula for estimating the probability of local patternptis given, which can be used as the background (or foreground)pixel decision:

Fig. 2. Video segmentation process based on low-rank and sparse matrix decomposition.

Table I. Comparison of our proposed LRSSD formulation with related works.

In [11] and [23], Liuet al. proposed structured sparsity norm by defining 3 × 3 overlapping-patch groups. Our SILTP-inducing sparsity norm is built by SILTP and involving the neighboring pixels. Two methods both use multiple adjacent pixels in space to constitute norm, which have a certain similarity in form,so our proposed SILTP-inducing norm can be regarded as “l(fā)ocal structured”. Compared to?1?normwhich only accelerates independent sparsity between the pixels, the new SILTP-inducing sparsity norm can promote error sparsity pattern with the ability of contiguity and spatial locality. We incorporate the proposed SILTP-inducing norm term into the LRSD framework, named LRSSD. The LRSSD can make full use of the correlation between pixels, and enhance the structured sparsity of sparse component due to scale invariant property of SILTP, which can be expressed as:

where Δ(S) means the SILTP-inducing sparsity norm. As mentioned in Section I,Lis a low-rank matrix which stands for background part,Srepresents the moving foreground component, andEis a residual noise term.The SILTP-inducing norm uses neighborhood pixels around the center point in space to calculate, so it can be regarded as“l(fā)ocal structured”. Table 1 shows the formulation comparison of the proposed methodand four latest LRSD-based methods, including Oreifejet al.[33], Yeet al.[34], Andrewet al.[25] and Liuet al.[11].

3.2 Solving the LRSSD

The proposed LRSSD formulation is listed in Eq. (6). We select Augmented Lagrange Multiplier (ALM) to solve this optimization problem. Compared with other similar matrix decomposition methods, such as accelerated proximal gradient method, ALM can get a better balance between efficciency and performance.

The augmented Lagrangian function can be formulated as [35]:

whereY∈?m×nis Lagrange multipliers vector,μis a small positive scalar, as the penalty parameter value for the violation of the linear constraint, and <,> means the matrix inner product.

Through a continuous recursive computation amongL, SandE, variableYis updated,and convergence is done eventually. In this way, the Eq. (5) can be solved by ALM gradually. These steps can be divided into following sub-problems:

where initial value ofLcan be solved by fixingSandEfirstly.Ycan be explicitly expressed as formula:

Similar to Eq. (9), the solution ofSt+1in Eq.(8) can be expressed as:

where Δ(S) is the SILTP-inducing sparsity norm as defined in Eq. (5). The form in Eq.(11) turns out to be the proximal operator,which can be obtained by solving a quadratic min-cost flow problem.

We develop the solving solutions for the updating process in Eq. (8) [23, 33]. Our decomposition steps can be summarized as the following process (see algorithm 1).

Here,svd(.) stands for a full Singular Value Decomposition of matrixGα(.), which can be de fined as:

And then applyGα(.) to matrixXin an element-wise way. The detailed decomposition and solving steps of LRSSD can refer to algorithm 1. Usually the process loop ends and the convergence is done when the Eq. (13) is satis fied:

whereZ∈?m×nmeans the residual,εis the error tolerance parameter. The parameterλ1is scalar, which is defined as the weighting parameter of the sparse component. Similarly,λ2is a scalar too, which means the weighting parameter of noise component. In addition,ρis a constant scalar, it denotes a growth factor of parameterμ.

3.3 Framework for background subtraction

In this section, we give details of the proposed background subtraction process (see figure 3).

(a) As a preliminary step, for a sake of efficient computation, a scaled-down (sub-sampled at four to one ratio) and gray color transform process is necessary.

(b) Foreground attention map detection process. Surroundedness cue is used for foreground localization firstly. Subsequently,salient object detection model aims to pop out salient moving foreground with the BMS method (Boolean Map based Saliency model)which proposed by Zhanget al.[36], due to its high speed and accuracy. The final attention map of surveillance video is obtained by shape and region constraint, can reinforces the ability of the pixels belonging to the moving objects.

Fig. 3. The outline of our proposed background subtraction method.

Table II. The details of partial test sequences in the experiments

(c) A low-rank and structured sparsity decomposition process is performed with our proposed LRSSD methods, then the original video is separated into two parts: background(low-rank component) and foreground (sparse component, including noise and residual). The detailed solving process can refer to Section 3.2.

(d) The refined foreground is jointly decided by the attention map of step (b) and sparse component of (c), meanwhile, the noise of sparse component in step (3) has been removed effectively.

IV. EXPERIMENTAL RESULTS

4.1 Experimental setup

We validate the proposed method with two phase experiments. All our experiments are carried out on MATLAB R 2013a and OpenCV 2-4.3 tools, with a 2.40 GHz Intel Core i5 PC, 8 GB RAM. The experiments are conducted on CDnet 2014 dataset [37], BMC dataset [38] and PETS 2001 dataset [39]. All of these datasets provide a realistic, camera-captured (no CGI), diverse set of videos.Our partial experimental tools include BGSLibrary [40], LRSLibrary [41], BMC Wizard[38], BMS [36], etc.

We perform experiments and compare with four recent pixel-based modeling methods,GMM [7, 8], KDE [9], Vibe [10], LSBP [31]and two latest LRSD-based algorithms, including SCM-RPCA [25] and Liuet al.[11]. For quantitative analysis, we adoptF-measure (F1)to evaluate the performance of background subtraction, shown by Eq. (14). In general,biggerF1means better detection result.

4.2 Results and analysis

For the first phase of the experiment, we choose eight representative sequences in three datasets to show their results. Table 2 shows the details of selected sequences in the experiments. Part of the parameters in the experiment setting are as follows: N=4,D=1,τ=0.05 and N=8, D=2,τ=0.1. As mentioned in Eq. (2), where D is a prede fined circle radius and N is the surrounding neighborhood pixels number of pixel c,τmeans a small scale factorare defined asThe values of these thresholds and parameters come from existing research data or empirically debugging, and work as a good compromise among the constraints of the optimization in all test sequences. Moreover, considering many small values in sparse matrix S, essential to avoid losing important information,a thresholding criterion is essential. To obtain that threshold,we adopt the same strategy in [18], such asλ1is set atthe criteria for convergence is

We visualize the qualitative results of the proposed method on part of test sequences in figure 4. Subjective observation shows that our method can get a better segmentation effect.

Fig. 4. Qualitative results of our method on eight videos from CDnet 2014, BMC and PETS 2001 datasets.

As shown in table 3, in eight sequences, our proposed LRSSD method gives highest score of averageF-measure among all methods.

Specially, for noise and illumination change challenge of test videos, including Highway,Pedestrians, Backdoor, Video06, our algorithm can detect the unabridged and correct moving foreground region, so can get a higher score ofF-measure. However, the other methods can only obtain a good vision on certain situations due to their inherent limitation. Furthermore,our proposed method has the highest score ofF-measure on Pedestrians, Backdoor and Video03 sequences, which shows that our method performs better in the case of noise and shadow changes.

Table III. Performance of F-measure (%) on selected test sequences.

Table IV. Average execution times(s) of our and the comparison methods.

Table V. Performance comparison with six methods on the CDnet 2014 dataset

For the sake of completeness, we compare the average processing times of two LRSD-based methods. The experiments were carried out 5 times over eight sequences. The average execution times are listed in table 4. In our experiments, we noticed that pre-process and saliency detection only cost a little time, and the primary cost comes from the computation process of SVD in each iteration. Experiments in 8 sequences show that our algorithm is comparable to the other two algorithms, so it has certain competitiveness in speed. Even though our algorithm cost more time compared with other pixel-based modeling methods, it is still competitive in performance, and we will focus on the improvement of computational efficiency and real-time application in the future work.

For the second phase of the experiment, we execute a comprehensive evaluation on CDnet 2014 dataset and compare it with other methods. This dataset contains 11 video categories and each category has 4 to 6 videos sequences,which include some common challenges, such as shadows, dynamic background, camera jitter, night, etc. Performance evaluation results are shown in table 5, from which we can see that theF-measure (F1)value of our method outperforms most of them. Due to the scale invariance properties of SILTP, our SILTP-inducing structured sparsity-inducing norms enhance the spatial structural of the foreground pixels, and make it robust to noise and illumination changes. Meanwhile, our LRSD-based and saliency detection hybrid method takes into account the structural properties of images and the spatial distribution of outliers,so it makes the final detection results more accurate.

V. CONCLUSION

In this paper, we propose a novel LRSD-based background subtraction method for surveil-lance video. To facilitate efficcient foreground and background segmentation, we construct a SILTP-inducing structured sparse norm, and provide an optimization algorithm for sparse matrix decomposition. The new SILTP-inducing sparsity norm can utilize known structured information better, so it improved error sparsity patterns with the properties of contiguity and spatial locality. Then we employ saliency detection to restrict the foreground region attribution of moving objects, which offers a rough shape and location of the foreground.The refined foreground is decided by the attention map and sparse component altogether.Experimental results on different datasets show its superiority over the competing methods, especially under noise and changing illumination scenarios. Meanwhile, we noticed that the SILTP is easily ruined by the noise when the neighborhood pixel is very similar to the center one, so when the texture region of video is large or obvious, it may affect the detection accuracy. Considering that our research is mainly focused on surveillance video applications, the possible impact of video texture region should not be the primary concern of this paper. In the future, we will pay more attention to further improvements over the robustness against global noise change as well as the computational efficciency.

ACKNOWLEDGEMENTS

The authors would like to thank the reviewers for their detailed reviews and constructive comments, which have helped improve the quality of this paper. This work was supported in part by the EU FP7 QUICK project under Grant Agreement No. PIRSES-GA-2013-612652*, National Nature Science Foundation of China (No.61671336, 61502348, 61231015, 61671332,U1736206), Hubei Province Technological Innovation Major Project (No. 2016AAA015,No. 2017AAA123), the Fundamental Research Funds for the Central Universities(413000048), National High Technology Research and Development Program of China(863 Program) No. 2015AA016306, Applied Basic Research Program of Wuhan City(2016010101010025).

China Communications2018年7期

China Communications的其它文章: Secret Key Generation Based on Two-Way Randomness for TDD-SISO System; Exploiting Geo-Social Correlations to Improve Pairwise Ranking for Point-of-Interest Recommendation; Design and Implementation of an Adaptive Feedback Queue Algorithm over OpenFlow Networks; Calculate Joint Probability Distribution of Steady Directed Cyclic Graph with Local Data and Domain Casual Knowledge; A Controller-Based Architecture for Information Centric Network Construction and Topology management; Recommending Authors and Papers Based on ACTTM Community and Bilayer Citation Network