Multi-focus image fusion based on block matching in 3D transform domain

2018-04-27 06:38:42YANGDongshengHUShaohaiLIUShuaiqiMAXiaoleandSUNYuchao

Journal of Systems Engineering and Electronics 2018年2期

YANG Dongsheng,HU Shaohai,*,LIU Shuaiqi,MA Xiaole,and SUN Yuchao

1.Institute of Information Science,Beijing Jiaotong University,Beijing 100044,China;2.Beijing Key Laboratory of Advanced Information Science and Network Technology,Beijing 100044,China;3.College of Electronic and Information Engineering,Hebei University,Baoding 071002,China;4.The Third Research Institute,China Electronics Technology Group Corporation,Beijing 100015,China

1.Introduction

Image fusion[1]refers to the process that obtains a new single synthesized image from two or more images.The final fused image could provide more comprehensive,accurate and reliable image description, which is widely used in other image processing or computer vision fields.Pixellevel fusion methods can be broadly classified into two groups[2]spatial domain and transform domain fusion.

Currently,the most frequently used methods are based on multi-scale transforms where fusion is performed on several different scales and directions independently.The most typical transform would be the discrete wavelet transform(DWT)[3]which is widely used because of its favorable time-frequency characteristics.After DWT,a series of improved multi-scale transforms has taken to the stage,e.g.,Ridgelet[4],Curvelet[5],and Contourlet[6].Among these transforms,non-subsampled Contourlet transform(NSCT)[7,8]has been widely adopted owing to its multiresolution and multi-directional properties.With the introduction of non-subsampled Shearlet transform(NSST)[9],there is no more limit on the number of decomposition directions,thus both the effectiveness and efficiency of image fusion have been enhanced.

Usually,the transform domain fusion framework is relatively fixed.The first step is the multi-scale transform.The coefficients got from the first step can be divided into low frequency component and several high frequency components.These components are fused by different fusion rules because the low and high-frequency components represent approximate and detailed information respectively.The final fused image is constructed by the inverse transform with all composite coefficients.

With the introduction of more transform domain methods,the fusion framework has been greatly enriched.For example,the integrity,hue and saturation(IHS)transform frame was widely introduced in fusion frameworks to achieve color image fusion[10];or combining the pulse coupled neural network(PCNN)structure during the selection of coefficients[11].However,in these framework,source images are directly transformed into the transform domain,which lead to the loss of specific characteristics in the spatial domain,like edge contours,and spatial similarity.Since spatial information cannot be further used,it usually causes distortion or artificial textures in fused images.

The proposed framework’s improvement mainly shows in two aspects.First,prior of transforming it adds some spatial domain pre-processing steps,like blocking and grouping.Second,it changes the 2D multi-scale geometric transform into a new type of 3D transform.Other steps are basically the same with the existing framework,except an aggregation procedure after inverse transform.This structure constitutes a partially similar structure of an image de-noising algorithm—block-matching and 3D filtering(BM3D)[12,13].So far,the BM3D algorithm is one of the most excellent image de-noising algorithms.It has been widely used in image and video noise reduction.

The main reason for the excellent effect of BM3D is that it makes good use of the similarity of the noise signal in different similar blocks.Since it could achieve a better separation of the noise signal in the similar region by matching and grouping image blocks,this algorithm provides better performance than traditional ones.Thus,the spatial domain information can assist transform domain processes to achieve outstanding performance.With the BM3D-like algorithm structure,i.e.,before switching to the transform domain,introducing blocking and grouping steps,the similarity in spatial domain can be fully taken advantage of.

The transform adopted here are:3D transform with 2D direct cosine transform(DCT)(BMDCT),3D transform with 2D DWT(BMDWT),3D transform with 2D NSCT(BMNSCT),3D transform with 2D NSST(BMNSST).As for fusion rules,it mainly adopts max and mean,image region energy[14]and chose max intra-scale[15].

This paper is organized as follows.In Section 2,we introduce the proposed fusion framework in detail.The specific procedures of blocking,matching and grouping can be found in Section 3 and the method of 3D transform and other transform domain process is given in Section 4.Experimental results and analysis are presented in Section 5.Finally,Section 6 contains relevant conclusions.

2.Fusion framework

Spatial and transform domain techniques are the two-major pixel-level techniques.In terms of the structure,spatial domain techniques are usually haphazard and changeful,because spatial domain methods usually perform by combining input images in a linear or non-linear fashion using weighted average or variance based algorithms[16].However,the structure of most transform domain methods is relatively fixed,because much of the innovation happens in transform domain,where the actual fusion takes place.The main motivation of moving into the transform domain is to work within a framework,where the salient features are more clearly depicted.

2.1 Improvement of existing frameworks

A typical transform domain fusion framework can be described as Fig.1.Both the source imagesAandBare first transformed to a transform domain(e.g.,DWT domain).Then,each of the input images can be divided into a low frequency coefficient and a series of high-frequency coefficients.The fused low-and high-frequency components are gained from their coefficients in both images through different fusion rules.The final fused imageFis received by taking the inverse transform of the composite representation.

Fig.1 Block diagram of a typical fusion framework of existing transform domain image fusion algorithms

Such a transform domain fusion framework has been widely used.In most cases,the innovations can mainly be reflected in two aspects:one is the innovation of transform methods,which means replacing the transform; the other is the creation of new fusion rules.However,transform coefficients are operated by fusion rules directly,which easily leads to the production of distortion and artificial texture.It is worth noting that these problems are not only a matter of multi-scale transform itself,but also the deficiencies of not making use of spatial information.

Thus,the improvement should not only focus on modifying transforms,but also introducing more spatial features.By using the similarity in spatial domain,e.g.,block distance,we could group some image patches with salient similar features into a series of 3D arrays.Therefore,the process in transform domain can have more specific and suitable options on function parameters.These improvements enhance the effect of each block,thus achieve the promotion of the overall effect.

The proposed framework can be seen in Fig. 2. The main improvements are:blocking the input imagesAandBby a fixed sliding window; setting a fixed searching area based on the current block and matching them by their similarities;grouping the chosen blocks and arranging them to a 3D array;transforming the 3D arrays above into transform domain by using a 3D transform.After the fusion in transform domain,the coefficients are transformed back by the inverse 3D transform;separating the 3D array and putting the image blocks to their original positions;the overlapped pixels need to be recalculated by the aggregation algorithm to get the fused imageF.

Fig.2 Block diagram of the proposed image fusion framework

2.2 Comparing with BM3D

As mentioned previously,the idea of BM3D is referenced in the proposed framework.Before gathering coefficients in the transform domain,an added procedure of BM3D is used.Analogously,the aggregation step is adopted after inverse transform.The procedures of these structures have been boxed by blue dashed lines in Fig.2.

Compared with the BM3D in image de-noising,there area lot of differences within the proposed framework. The BM3D algorithm can be divided into two sections. The first is the basic estimate. The input noisy image is processed by successively extracting reference blocks from it.And for each block,the algorithm finds their similar one and stack them together to form a 3D array[13].After 3D transform,a threshold is used to help reduce noise.The second step is the final estimate.The previous result is grouped again by the same process.Then,apply 3D transform on both groups(one from the noisy image,the other from the basic estimate)and perform Wiener filtering on the noisy one using the energy spectrum of the basic estimate as the true(pilot)energy spectrum[12].Finally,all the blocks obtained after the inverse transform need to be returned to their original positions.

From above description,the overall structure of the two estimate steps in BM3D is basically the same.Thus,there is no need to adopt a second step in image fusion.Besides,as the process is different between image fusion and denoising,the Wiener filter also is not useful for current coefficients.For further optimizing coefficients,a method of threshold shrinkage is adopted here.In the subsequent sections,the spatial domain processing,3D transform and aggregation will all be introduced in detail.

3.Spatial domain processes

The proposed framework can be divided into two parts:the spatial domain section and the transform domain one,besides,the spatial domain process can also fall into blocking and grouping.For blocking,a method by using the sliding window is adopted.For grouping,it means stacking the 2D block with high similarity together and forming them into 3D arrays.The measurement of similarity is achieved through calculating block distance.

Such steps switch the 2D images into 3D image arrays,and it is a kind of potential similarity(correlation,affinity,etc.)across the blocks which are used in the arrays.In this way,a better estimate of the distinct image can be obtained through the data with this potential relevance.The approach that groups low-dimensional data into high-dimensional data sets enables the use of high-dimensional filtering to process these data,hence it is defined as collaborative filtering[12].

3.1 Blocking and matching

According to a certain window size and fixed sliding step,a series of image blocks can be obtained.Then we filter out some of the blocks within the search area in accordance with a pre-selected searching rule and threshold set.The applied algorithm process for each image block is illustrated in Fig.3(a).The white boxed area which is enlarged on the right is the search area that uses the reference block(markedR)as the center,and the similar blocks(markedS)are pointed out by black dash arrows.The block matching process is as follows:

(i)Select the current block as reference one;

(ii)Draw a fixed search area centered on the reference block(for16×16pixels’image blocks,the set of an80×80 pixels search area is reasonable);

(iii)For each image block contained,we denominate them as candidate blocks,and calculate the distance metric between each of the candidate and reference blocks;

(iv)List the distance of all blocks in the region in a descending order,and the least one is defined as the most similar one;

(v)Compare the distance of blocks with a pre-set threshold,and all blocks less than the threshold are defined as similar;

(vi)Arrange these similar blocks into an array sorted by their similarity.

Fig.3 Schematic diagram of the procedures in spatial domain

3.2 Grouping by similarity

To better reflect the similarity,a typical regulation is using the inverse of some distance-measure.For image blocks,the smaller the distance between the reference blocks,the more similar the blocks are.Typical distance measures are norms,like the Euclidean distance(p=2)[17],thepnorm used in denoising,thelp-norm in different signal fragments,and the Kullback-Leibler distance used in texture detection[18].

In fact,a similar block choosing approach is diverse,so it can be considered as a clustering or classification approach.There are a series of literature systematically introducing many classic methods,e.g.,K-means clustering[19],fuzzy clustering[20],vector quantization[21].For these classification approaches,their classification result is no cross terms,which is because their idea of classification is based on segmentation or partitioning.In other words,one block can only belong to a specific group.To construct such two disjoint groups which own many elements with high mutual similarity,the conventional method requires a lot of recursive computation cycles,which needs vast computing power.Moreover,such a screening method will lead to the inequality of the distribution of fragments,this is because the fragment close to the centroid will be more similar than the farther one.Such is often the case,even under the exceptional circumstance that all fragments are equidistantly distributed.

The proposed matching method can be implemented to the intersection-contained classification of mutually similar signal fragments. This is done by the pairwise test of the similarity between the reference and candidate blocks.In such classification,the choice of similarity can be regarded as a classification function,and the chosen reference block refers to the centroid of the group.Thereby,the approach avoids the problem of disjoint groups.

The grouping and matching process can be seen in Fig.3(b),which shows,to complete the work of the whole image,we need to traverse all the blocks by using the same process as the reference one.Besides,each block needs to be used as the reference block respectively to find their similar blocks.

3.3 Similarity measurement

As mentioned above,the proposed framework adoptslpnorm as the similarity measurement.Here,two input imagesA,Bare processed by the same step in spatial domain,therefore,we only useAas an example for illustration,and the final fused image is denoted byF.

3.3.1 Modeling and notation

For imageA,we denotexas a set of 2D spatial coordinate and its value belongs to 2D image domainX?Z2.Thus,for any fixed sizeN×Nblock split out fromA,it can be expressed asAx,wherexrepresents the coordinate of the top-left cornerof the image block,i.e.,Axis the image block of the imageAwhich is adhered at the location ofx.For image block groups,the form can be represented by a set,which is denoted by a bold-face capital letter with a subscript to express the set of all coordinates in the group.For example,ASrepresents a 3D array which is composed of a plenty ofAx,here the position

In addition,we definedas the calculated distance measure between blocks.And for distinguishing different parameter selections,we use the superscripts “ideal”to denote the distance in an ideal condition likedidealand “real”for practical situation,e.g.,dreal.

3.3.2 Block distance

As introduced in section 3.1,the block-distance(dissimilarity)is a pairwise calculation of reference and candidate blocks.Thus,we defineAxRas the reference block,AxCas the currently selected candidate block,here,xR∈X,xC∈X.

The dissimilarity between blocks is determined by the given reference block and a fixed threshold,which means the block is deemed to be similar when the distance with the reference block is smaller than the threshold.The distance is obtained throughl2-norm calculation between the blocks.

In an ideal situation,the block-distance of the input defocus-imageAshould be determined by the corresponding blocks in the true-imageT,that isA=.Therefore,it can be calculated as

wheremeans thel2-norm,TxRandTxCdenote the blocks at the corresponding location in the true-image of the reference and candidate block respectively,here

Obviously,the true imageTis unavailable.And as the best estimate ofT,the fused imageF=is also unpredictable.Therefore,the distance can only be obtained byAxRandAxCthemselves,as

However,such calculation does not consider the difference between the ideal and the reality,if the gap betweendidealanddrealdoes not exceed the threshold set range,it will not affect the grouping result,but if the difference exceeds the boundary,that means grouping error.In practical situation,a too small block size or sliding step,or the chosen search area is exactly the defocus area,these all will cause difference betweendidealanddreal.In this case,the block will still be matched as similar because the real distance is smaller than the threshold, however, the distance in true-image has already crossed the threshold.Analogously,it is also possible that the block be excluded as dissimilar though the ideal distance is smaller than the threshold.

To address this,we employ a coarse 2D linear prefilter [12] to preprocess the two original blocks. Such a prefiltering is applying a normalized 2D linear transform on both blocks,and then the threshold is used on the obtained coefficients.This approach relatively reduces the false positives,and the final distance can be calculated as

wheref2D(·)represents the function of the 2D linear filter.

As mentioned before,the results calculated byddistance(3)is presented in the form of a set.Therefore,the set of all the coordinatesxof the similar blocks of the reference blockAxRcan be expressed as

whereτmaxis a threshold that represents the maximumdd-distance of two blocks which are considered as similar.The selection ofτmaxis based on the acceptable value of the ideal difference of the natural images.

Since the reference block itself is also in the search area,so we haved(AxR,AxR)=0,i.e.,for each reference blockAxR,it has at least one similar block(i.e.,itself),SxRwill not be empty.After obtaining coordinates setSxR,we can use similar blocksAx∈AS,x∈SxRto form a 3D array of sizeN×N×NS,denoted asASxR,whereNSdenotes the number of similar blocks.After that,we can obtain a collection of 3D arrays,as shown in Fig.3(b).The length of each 3D array in the collection is not a fixed value,but is decided by the number of the similar blocksNS.

3.3.3 Block-matching effect

In practical application,for 512×512 pixel resolution natural images,it would be suitable to set the sliding window between 8×8 to 16×16pixel,so the blocks can contain more local edge features.Besides,to reduce blocking effects,the step length of the sliding window is typically less than the window size.Fig.4 shows the selection of similar blocks.In each image,the red translucent square means reference block,other green translucent squares represent similar blocks found in the search area.The window size of the first line is 16×16 pixel and the second line is 8×8 pixel.

Fig.4 Illustration of the selection of similar blocks in natural images

It can be found that similar details exist extensively among natural images,which is in the form of small edge segments.In addition,the similar blocks are scattered around the same focal plane or the junction of different focal planes.This can assist the subsequent algorithm to further integrate information,thus optimizing the fusion effect.

The selection of similar blocks between multi-focus image groups can be seen in Fig.5.For the two groups of images(Clock&Pepsi),we search similar blocks on both the focus position of one image and the corresponding position of the defocus image.The first line of the illustration selects the focus position and the second line stands for its defocused image.All the images have already gone through a rigorous registration process.The window size of the left image of each group is 8×8 pixel and the right one is 16×16 pixel.

Fig.5 Illustration of the selection of similar blocks in multi-focus image groups

As can be seen,the selection of similar blocks is approximately the same between the focus region of one image and the defocus region of its counterpart.Besides,since the similarity measures are much closer,the defocus image usually has more similar blocks.Therefore,whether the current group represents the focus region can be determined by comparing the number of similar blocks of this group and its counterpart.More details can be obtained by using this as an instruction for subsequent works,especially for the fusion rules design.

4.Transform domain processes

After the grouping step,we can get the transform coefficients by a 3D transform.Then high-and low-frequency components are processed in different rules,since they indicate details and approximation information respectively.Since there will be overlap,if we return fused blocks to their original position,an aggregation process is used to calculate the final pixel value of each position to avoid this.

4.1 3D transform

The 3D transform is a combination of 2D and 1D transforms,that is,for each 2D image block in the 3D array we adopt a traditional 2D transform,followed by a 1D transform on each column of this array(i.e.the third dimension).This paper uses several 2D transforms to present a comparative analysis.For the 1D transform,we adopt the DCT,since it could reduce significant coefficients.

4.1.1 Theory of 3D transform

Collaborative filtering,as mentioned in Section 3,is very effective for multi-focus images,because of the use of spatial correlation in the filtering and the creation of sparsity by the shrinkage after transform.These processes reduce the uncertainty of the fused image,and create the possibility of optimizing the result.

The correlation here means the correlation within the single image block(i.e.intra-block correlation)and within the whole group(i.e.intra-group correlation).The intra block correlation refers to the connection between different pixel values in one block.The intra-group correlation shows the similarity relevance of blocks and their corresponding spatial regions.

The reason for adding a 1D transform in the third dimension is to further optimize the coefficients.For thenblocks in one group,if we use 2D transform,there will benλsimilar coefficients obtained,whereλdenotes the number of coefficients of one block.Such a method is not only of minor efficiency but also does not use the intragroup similarity in transform domain. If we add a 1D transform between the transformed blocks(i.e.applying a 1D transform on each column of pixels in the same position of many blocks),there will be onlyλsignificant coefficients approximately representing the results of the entire group.For the coefficients after the 3D transform,there should be a shrinkage before the fusion rules.To facilitate subsequent calculations,we use a hard-threshold operator to rapidly filter out significant values.

For the 3D transform of one grouped array,the process can be divided into a 2D transformT2D(·)which is followed by a 1D transformT1D(·)across all the blocks.

The process can be presented by Fig.6,wherein the redcross arrow marks the unfolding surfaces of the 2D transform and the one-way arrow indicates the direction of the 1D transform.

Fig.6 Schematic diagram of 3D transform

Since the set of grouped blocks can be denoted asASxR,its 3D transform coefficientscan be expressed as

whereT3D(·)represents the 3D transform anddenotes the set of coefficients after the 2D transform,which means

The decorated letterAxmeans the 2D transform coefficients of the intra-group blockAx,that isAx=T2D(Ax).In transform domain,the first step is threshold shrinkage,the coefficients are processed through a set hard threshold filterfht(·),and then for two groups of coefficients,we use different fusion rules to filter the high-and low-frequency components respectively.Generally,the hard threshold filter may be defined as

whereτrepresents the current input coefficient,τhtis the fixed threshold parameter.Then transform coefficients used for fusion rules can be expressed as

4.1.2 NSST

For 2D transform,we compare several widely used transforms,including 2D-DCT,DWT and NSCT in the existing fusion,as well as the most effective and efficient transform NSST which is mainly used in this paper.The following is a brief introduction of it.

The NSST is constructed through affine systems with composite dilations,when the dimensionn=2,the affine systems can be defined as follows:

whereψ∈L2(R2),D,Sare both 2×2invertible matrices and|detS|=1.The matrixDis known as the dilations matrix,whileSstands for the shear matrix.

The discretization of NSST consists of two phases including multi-scale and multi-direction decomposition.For multi-scale decomposition,NSST adopts the nonsubsampled pyramid(NSP)as the decomposition approach.

By using NSP,one low-frequency sub-band image andkhigh-frequency sub-band images can be obtained from the original source image,throughklevels decomposition,in which each level can decompose out both a low-and a high-frequency sub-image,and every subsequent decomposition takes place on the low-frequency sub-image of the up one level iteratively.The NSST decomposition process is illustrated in Fig.7 where “SF”is the abbreviation for“shearing filter”.

For multi-direction decomposition,it is realized through a modified SF in NSST.Roughly speaking,the conventional SF is realized by translating the window function on the pseudo-polar grid,while the non-subsampled SF maps the pseudo-polar grid back to Cartesian grid system,so the entire process can be directly completed through the 2D convolution.The support zones of NSST is a pair of trapeziform zones with the size of 22j×2j,whichis shown in Fig.8.

Fig.7 Schematic diagram of multi-scale decomposition of NSST

Fig.8 Trapeziform frequency support zones of an SF

4.2 Fusion rules

Since the paper focuses on the study of the fusion framework,we only try some common fusion rules to integrate them with the entire framework.Besides,we do not make any in-depth discussion of the influence by using different fusion rules.

As described before,the main purpose of the 1D column transform is to utilize correlation,to reduce significant coefficients and to facilitate calculation,it does not destroy the positional distribution of the different frequency components which are obtained by the previous 2D scaling transform.The fusion takes place on the 2D surfaces of the 3D array.Hence,the high-and low-frequency components and fusion rules described subsequently still aim at 2D surfaces.

In addition,because the numbers of similar blocks between the corresponding reference blocks in two images are not always equal,the smaller one is used as the final number of the fused blocks.That is,if there arenblocks inASxRandn+mblocks inBSxR,then we only take the firstnblocks inBSxRto use.As mentioned in section 3.3.3,the defocus area usually has more similar blocks,therefore,choosing the smaller one can provide more focus information for the rules.

4.2.1 High frequency fusion rules

High-frequency coefficients usually contain salient fea-tures,such as contours and edges.Therefore,the higher the high-frequency coefficients’value is,the more decisive it represents the region’s change.For high frequency fusion rules,the basic one is“choose max”(CM),which selects a larger absolute value as the result.Besides,another rule improved from CM is“choose max by intra-scale grouping”(CMIS)[15],and it introduces a rule across different decomposition levels.

(i)CM

The CM rule is selecting higher energy coefficients as the fused decomposed representation.Accordingly,for the fused coefficientFxin the position(i,j),which is in thelth decomposition level and thekth sub-band,it can be represented as

wheredenote the magnitude coefficients of their input block respectively.

(ii)CMIS

Since each high-frequency coefficient has correlation with others in different scales and directions,the simple CM rule cannot be well combined with the multi-scale decomposition.Therefore,for all the high-frequency coefficients at different scales and directions,they all should be compared in a composite result,that is

where the judgment condition is the summation ofksubbands coefficients,so it connects each decomposition level and direction to determine the fused coefficients.

4.2.2 Low frequency fusion rules

The low-frequency coefficient fusion uses two kinds of fusion rules,one is the simple averaging rule,and the other is an effective rule based on region energy.

(i)Averaging

The low-frequency coefficient reflects the background information.Therefore,it may not be able to obtain the significant salient feature,even though we use a high-pass requirement.Hence,we usually use averaging operation:

whereAxandBxdenote the approximation coefficients of the input blocks respectively.

(ii)Region energy

For the low-frequency component,if we only take some algebraic method like average,it is easy to lose some approximate information and cause a larger gray difference[22].Therefore,we adopt the fusion rule based on region energy[14].The low-frequency sub image of each image block is subdivided again into several 3×3 or 5×5 pixels regions and then its region energy is calculated.The region energy centers on coordinate(i,j)can be expressed asEn(i,j),wherenrepresents coefficientAxorBx.The formula could be

whereNis the number of ranks of the region.Therefore,the fusion rule is

whereFxrepresents the fused low-frequency coefficients,EAxandEBxare the regionenergy of coefficientsAxandBx.

4.3 Aggregation of blocks

A series of after-fused 3D arrays likeFSxRwhich share the analogous structure withASxRcan be obtained by the inverse 3D transform.By restore the image blocks in these arrays to a 2D surface,the fused imageFcan be got.In general,there will be overlap of pixels between blocks.Here,an averaging operation is adopted to aggregate overlapped blocks together to form the 2D image.

Overlap is caused by block selection,i.e.,the same area of image may be selected as part of some similar blocks for many times,while after the transform domain process there might be variance among some pixels.

For example,Fxmis an image block belonging to the arrayFSxM,who is located onxM,while in another arrayFSxNthere also exists the same blockFxm,howeverFSxNis obtained by the reference blocks atxN.

To solve this,we calculate the mean value of the overlap pixels as the final value.For overlapping blocks,[23]has more in-depth explanation,roughly speaking,different arrays containing overlapping image blocks are statistically correlated,biased and each pixel included has different variance.In image de-noising,they use a weighted averaging method,where the weights are inversely proportional to the total sample variance to reduce the weights of noise[13].While in image fusion,such a weighted method will easily lead to edge-smoothing.Thus,an averaging method is adopted.Therefore,for the coefficient of each image block’s pixel valuesωxR,it can be defined as wherenxRis the number of retained non-zero coefficients inFSxR,so the final fused imageFmay be loosely expressed as

wherexMis the coordinate of an unspecific reference block,xmdenotes the position of a similar block included in thexMlocating group.

5.Experimental results

In order to provide an effective evaluation of the proposed framework,we carry out three groups of comparative experiments in this paper.Four different sets of 256 gray levels multi-focus natural images are employed in the experiments,and we comparatively analyze the proposed algorithm through the subjective visual effects and objective evaluation criteria.

In subsequent experiments,the size of the image blocks is 16×16 pixels,and the step of the sliding window is 8 pixels.For DWT,we use a three levels db2 wavelet.The decomposition level of NSCT is 4,and there are respectively 2,8,16,16 directional sub bands in each level,for non-subsampled filter banks,we use “9-7”as a pyramid filter and “pkva”as a direction filter.For NSST,we use three levels multi-scale decomposition,the numbers of directional sub bands of each level are 10,10 and 18,and the pyramid filter is“max fl at”.For the NSCT in BMNSCT and NSST in BMNSST,the decompositions are both two levels.In addition,all experiments are implemented on an Intel Core i5 2.27 GHz with 4GB RAM.The simulation software is MATLAB 2014a.

5.1 Evaluation criteria

The experiments use image entropy(EN)[7],average gradient(AVG)[24],normalized mutual information(MI)[25],edge based similarity measure(QAB/F)[26],structural similarity(SSIM)[27]and standard deviation(STD)[28],as the evaluation criteria.

EN represents the richness of information.The larger the value of entropy is,the more information the image includes.

AVG is an indicator of contrast.The larger the AVG is,the more gradation the image reveals.

MI calculates how much information in source images is transferred to the fusion result.Higher MI means the fused image contains more information about the source images.

QAB/Fusing the Sobel operator gives the similarity between the edges transferred during the fusion process.A higher QAB/Fvalue indicates that more edge information is obtained.

SSIM measures the structural similarity between fused and source images.An SSIM value which is closer to 1 means a better fusion.

STD indicates the distribution of pixels.The larger the STD is,the more discretely the pixel values distribute,the more information the image contains.

5.2 Experiment with DCT

The first experiment is the comparison between the DCT and the BMDCT,and it is used to reflect the advantages of the proposed framework in transform domain fusion.The experiment uses DCT as the control group,and BMDCT as the experimental group.The results are shown in Fig.9.

As can be seen from Fig.9,especially from the enlarged view that BMDCT has obvious advantages when compared with DCT.The proposed framework significantly weakens the artificial texture appearing on DCT fusion,thus making the BMDCT result become smoother, flatter and more natural.

Fig.9 Source images and fusion results of DCT and BMDCT

5.3 Experiments with different transforms

This experiment evaluates the proposed framework together with various transforms.Four improved algorithms are used here:the BMDCT,the BMDWT,the BMNSCT and the BMNSST,and the fusion rules are CM and“Averaging”.The pair of source images are shown in Fig.10(a)and Fig.10(b).

As can be seen in Fig.11,Fig.11(a)is the result of BMDCT;Fig.11(b)is the result of BMDWT;Fig.11(c)is the result of BMNSCT;Fig.11(d)is the result of BMNSST;Fig.11(e),Fig.11(i)are Fig.11(a)’s difference map with Fig.10(a)and Fig.10(b);within the next three columns occurs the same.

Fig.10 Source images of experiment

Fig.11 Fusion results of different transforms on source image“Lab”

For subjective visual effects,among the four transforms with block matching and 3D transform,the BMNSST approach is the best one,because of the clearer edges,more abundant textures and better retention details.The difference map shows that BMNSST has not only the best integration of the focus area,but the least artificial textures and blocking effects,which is followed by BMNSCT.

In addition,we also examine the transforms from objective criteria.As is shown in Table 1,the fusion method of BMNSST has the best scores on EN,STD,MI,QAB/Fand the second best one on AVG.Therefore,the fused image of BMNSST has the maximum amount information of the source image.

Table 1 Objective criteria comparison of different fusion algorithms with different transforms on source image “Lab”

5.4 Comparing with classic methods

This experiment uses two groups of images to compare the results of some classic fusion algorithms and the BMDWT,BMNSCT,BMNSST algorithms using the improved fu-sion rules,i.e.,CMIS and region energy(RE).Therefore,the experiment is the horizontal comparison of the best combination in this paper(e.g.,BMNSST-CMIS)and some existing transform domain fusion methods(e.g.,DWT-MAX,NSCT-MAX and NSST-MAX).

The first pair of source images “Pepsi”is shown in Fig.10(c)and Fig.10(d)and the second pair“Clock”is shown in Fig.10(e)and Fig.10(f).The experimental results of “Pepsi”can be seen in Fig.12.Fig.12(a)is the result of DWT-MAX;Fig.12(b),Fig.12(c),Fig.12(d),Fig.12(e),Fig.12(f)are the results ofNSCT-MAX,NSSTMAX,BMDWT-CMIS,BMNSCT-CMIS and BMNSSTCMIS respectively;Fig.12(g),Fig.12(m)are Fig.12(a)’s difference map with the sources images;within the next five columns occurs the same.Correspondingly,the results of“Clock”can be seen in Fig.13.

Fig.12 Fusion effects of each transform domain method on source image “Pepsi”

Fig.13 Fusion effects of each transform domain method on source image“Clock”

As can be seen from the subjective visual effects,compared with the existing algorithms,the proposed algorithm(i.e.,BMNSST-CMIS)has a better performance on edge details.Some salient features on its result are clearer than others.From the comparison of the difference maps,we see that the fused image of the proposed algorithm is more similar with the source images.That is,the proposed method better restores the focus area of the source images.Besides,the performance of BMNSCT and BMDWT is also better than the results of their original transforms.

In terms of objective criteria,the result of “Pepsi”and “Clock”can be seen from Table 2 and Table 3 respectively.Compared with the existing algorithms,the proposed algorithm has relatively good performance on four of six evaluation indexes.Especially on EN and MI,compared with NSST-MAX and NSCT-MAX,BMNSCTCMIS and BMNSST-CMIS are improved significantly,and this shows that their results are more similar with both input images on the edges structure.

Table 2 Objective criteria comparison of different transform domain fusion algorithms on source image “Pepsi”

Table 3 Objective criteria comparison of different transform domain fusion algorithms on source image “Clock”

6.Conclusions

In this paper,a multi-focus image fusion framework based on block-matching and 3D transform is proposed.Compared with existing ones,by using blocking and grouping,the proposed method makes it possible to further utilize spatial domain correlation in the transform domain fusion.The algorithm forms similarly block into 3D arrays by using block-matching steps.Then use a 3D transform which consists of a 2D and a 1D transform to transfer the blocks into transform coefficients and process them by fusion rules.The final fused image is obtained from a series of fused 3D image block groups after the inverse transform by using an aggregation process.Experimental results show that the proposed algorithm outperforms traditional algorithms in terms of qualitative and quantitative evaluations.Despite of many blocking and matching works,the efficiency of the algorithm is yet to be improved,therefore,how to reduce the time complexity will be the main research directions in the future.Besides,fusionrules are not discussed in depth in this paper,which also require further studies.

[1]HAGHIGHAT M B A,AGHAGOLZADEH A,SEYEDARABI H.Multi-focus image fusion for visual sensor networks in DCT domain.Computers&Electrical Engineering,2011,37(5):789–797.

[2]ZHANG Z,BLUM R S.A categorization of multiscalede composition-based image fusion schemes with a performance study for a digital camera application.Proceedings of the IEEE,1999,87(2):1315–1326.

[3]PAJARES G,CRUZ J M D L.A wavelet-based image fusion tutorial.Pattern Recognition,2004,37(9):1855–1872.

[4]CANDES E J.Ridgelets:theory and applications.Stanford,USA:Stanford University,1998.

[5]COHEN R A,SCHUMAKAR L L.Curves and surfaces.Nashville:Vanderbilt University Press,2000.

[6]DO M N,VETTERLI M.The Contourlet transform:an efficient directional multi-resolution image representation.IEEE Trans.on Image Processing,2005,14(12):2091–2106.

[7]ZHANG Q,GUO B L.Multifocus image fusion using the nonsubsampled contourlet transform.Signal Processing,2009,89(7):1334–1346.

[8]WANG J,PENG J Y,FENG X Y,et al.Image fusion with nonsubsampled contourlet transform and sparse representation.Journal of Electronic Imaging,2013,22(4):043019.

[9]GUO K,LABATE D.Optimally sparse multidimensional representation using shearlets.SIAM Journal on Mathematical Analysis,2007,39(1):298–318.

[10]NUNEZ J,OTAZU X,FORS O,et al.Multire solution-based image fusion with additive wavelet decomposition.IEEE Trans.on Geoscience and Remote Sensing,2002,37(3):1204–1211.

[11]GENG P,WANG Z Y,ZHANG Z G,et al.Image fusion by pulse couple neural network with shearlet.Optical Engineering,2010,51(6):067005-1–067005-7.

[12]DABOV K,FOI A,KATKOVNIK V,et al.Image denoising with block-matching and 3D filtering.Proc.of SPIE-IS&T Electronic Imaging:Algorithms and Systems V,2006,6064:606414-1–606414-12.

[13]DABOV K,FOI A,KATKOVNIK V,et al.Image denosing by sparse 3D transform-domain collaborative filtering.IEEE Trans.on Image Processing,2007,16(8):2080–2095.

[14]TIAN J,CHEN J,ZHANG C.Multispectral image fusion based on fractal features.Proceedings of SPIE,2004,5308:824–832.

[15]BHATNAGAR G,WU Q M J,LIU Z.Directive contrast based multimodal medical image fusion in NSCT domain.IEEE Trans.on Multimedia,2013,15(5):1014–1024.

[16]KUMAR M,DASS S.A total variation-based algorithm for pixel-level image fusion.IEEE Trans.on Image Processing,2009,18(9):2137–2143.

[17]BUADES A,COLL B,MOREL J M.A review of image denoising algorithms,with a new one.SIAM Journal on Multiscale Modeling and Simulation,2005,4(2):490–530.

[18]DOM N,VETTERLI M.Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance.IEEE Trans.on Image Processing,2002,11(2):146–158.

[19]MACQUEEN J B.Some methods for classification and analysis of multivariate observations.Proc.of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1967:281–297.

[21]GERSHO A.On the structure of vector quantizers.IEEE Trans.on Information Theory,1982,28(2):157–166.

[22]JIANG P,ZHANG Q,LI J,et al.Fusion algorithm for infrared and visible image based on NSST and adaptive PCNN.Laser and Infrared,2014,44(1):108–112.(in Chinese)

[23]GULERYUZ O.Weighted over complete denoising.Proc.of the 7th Asilomar Conference on Signals,Systems and Computers,2003,2:1992–1996.

[24]LIU S,ZHU Z,LI H,et al.Multi-focus image fusion using self-similarity and depth information in nonsubsampled shearlet transform domain.International Journal of Signal Processing,Image Processing and Pattern Recognition,2016,9(1):347–360.

[25]QU G H,ZHANG D L,YAN P F.Information measure for performance of image fusion.Electronic Letters,2002,38(7):313–315.

[26]XYDEAS C S,PETROVI V.Objective image fusion performance measure.Electronics Letters,2000,36(4):308–309.

[27]WANG Z,BOVIK A C,SHEIK H R,et al.Image quality assessment:from error visibility to structural similarity.IEEE Trans.on Image Processing,2004,13(4):600–612.

[28]MIAO Q G,SHI C,XU P F,et al.Multi-focus image fusion algorithm based on shearlets.Chinese Optics Letters,2011,9(4):25–29.

Journal of Systems Engineering and Electronics2018年2期

Journal of Systems Engineering and Electronics的其它文章: Health evaluation method for degrading systems subject to dependent competing risks; Remaining useful life prediction for a nonlinear multi-degradation system with public noise; Hybrid artificial bee colony algorithm with variable neighborhood search and memory mechanism; An optimization method:hummingbirds optimization algorithm; Direction navigability analysis of geomagnetic field based on Gabor filter; Integrated modeling of spacecraft relative motion dynamics using dual quaternion

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放