Xiangnan Ren | Jinjiang Li| Zhen Hua | Xinbo Jiang
1School of Computer Science and Technology,Shandong Technology and Business University,Yantai, China
2Co‐innovation Center of Shandong Colleges and Universities,Future Intelligent Computing, Shandong Technology and Business University,Yantai,China
3School of Information and Electronic Engineering, Shandong Technology and Business University,Yantai, China
4Shandong Provincial Key Laboratory of Software Engineering, Shandong University, Jinan,China
Abstract In a group of images,the recurrent foreground objects are considered as the key objects in the group of images.In co‐saliency detection,these are described as common saliency objects. The aim is to be able to naturally guide the user's gaze to these common salient objects. By guiding the user's gaze, users can easily find these common saliency objects without interference from other information. Therefore, a method is proposed for reducing user visual attention based on co‐saliency detection. Through the co‐saliency detection algorithm and matting algorithm for image preprocessing,the exact position of non‐common saliency objects(called Region of Interest here,i.e.ROI)in the image group can be obtained.In the attention retargeting algorithm,the internal features of the image to adjust the saliency of the ROI areas are considered.In the HSI colour space,the three components H,S,and I are adjusted separately.First,the hue distribution is constructed by the Dirac kernel function, and then the most similar hue distribution to the surrounding environment is selected as the best hue distribution of ROI areas.The S and I components can be set as the contrast difference between ROI areas and surrounding background areas according to the user's demands. Experimental results show that this method effectively reduces the ROI areas' attraction to the user's visual attention.Moreover, comparing this method with other methods, the saliency adjustment effect achieved is much better, and the processed image is more natural.
With the rapid development of the information age, the amount of video and picture data is increasing exponentially.With the huge amount of image information,people hope that their computer can quickly locate the key information in an image, and so image saliency detection technology has developed rapidly and has been widely used in the fields of image target detection, image segmentation, and video or image compression. However, the information contained in a single image is limited. In contrast, the image group contains richer information. Therefore, some researchers have proposed the co‐saliency detection algorithm. In the image group, the recurrent foreground objects are considered as the key objects in the group of images. In co‐saliency detection, these are called common saliency objects.The saliency map obtained by the co‐saliency detection algorithm is still a greyscale image.The larger the greyscale value of a pixel, the stronger the saliency of the point. As with a single image, it is hoped that subsequent processing of the image group will be performed according to the main content of the image group to improve computational efficiency and reduce the computational burden of the computer.
In recent years,the co‐saliency detection algorithm has been gradually improved and widely used in various fields. For example, co‐segmentation [1] uses co‐saliency detection technology to find common salient targets in related image groups and segment them;in video and image compression technology[2], co‐saliency detection technology is used to determine the compression rate of different areas of the video or the image in the image group. Subsequently, only the areas with high co‐saliency are losslessly compressed, and lossy compression is performed on the low‐saliency or non‐common salient areas so that the compression coding can be efficiently obtained and less storage space is occupied.
F I G U R E 1 An image set containing co‐saliency objects
Co‐saliency objects should meet the following conditions:firstly,all co‐saliency objects should exist in each image of the image group; secondly, co‐saliency objects should be the foreground area in the images; and thirdly, the co‐saliency objects should have saliency features in all images. Figure 1 shows a typical example of co‐saliency detection,in which only the players in the red strip are the co‐saliency objects in the image set, and the other information is all non‐common saliency objects.
Information can be guided to modify the salient region in the image, so that the co‐saliency objects in the image group have unique saliency in each image, to guide human visual attention.This can be applied to many practical scenes,such as smart advertising, image editing, object enhancement, and improving image aesthetics, etc. According to the saliency guidance information, the image features of the area in the original image are randomly changed to achieve manipulation of the area saliency in the image.This study is called“attention retargeting”. Imagine that consumers can see products at first glance in flyers or magazines,so that attention to the products will be greatly improved; in the works of amateur photographers, non‐important objects are often more conspicuous in position or colour than important objects. Using the image attention retargeting algorithm based on co‐saliency can better correct the aesthetic effects of the image,enhance the objects,or eliminate the interference objects.
However, the saliency map obtained by the saliency detection algorithm is sparse pixel points and cannot be used as the guide information for subsequent image feature processing. That is to say, in image attention retargeting, the saliency estimation and image feature editing need two separate steps,but it is not enough to find the feature clues only by the saliency detection algorithm,the specific location of the area to be edited also needs to be found. Therefore, the matting algorithm is used to obtain non‐common salient object areas,that is,image blocks guiding saliency(hereinafter referred to as ROI areas). The cluster‐based co‐saliency detection algorithm[3] can be applied to both single and multiple images. By comparing the single and co‐saliency maps, it can discover which areas are non‐common saliency objects clearly,and then use the KNNT (K‐Nearest Neighbours Texture) matting technology [4] to extract non‐common saliency objects accurately and completely. The image attention retargeting algorithm proposed can process multiple areas in the image simultaneously, and edit the image features based on the HSI colour space according to the consistent processing method of the internal features of the image, to achieve non‐common saliency objects closer to the characteristics of the background area and to reduce the saliency.
Section 2 summarises previous related work. Section 3 introduces in detail the preprocessing methods used and the new image processing methods proposed. In Section 4, qualitative analysis and quantitative analysis of the experimental results of this method are carried out, as well as comparative experiments to prove the effectiveness of this method.Finally,conclusions and a summary of possible future improvements are presented in Section 5.
The co‐saliency detection algorithm is an emerging branch of saliency detection. In 2010, Chen et al. [5] first proposed the image co‐saliency detection algorithm. The traditional co‐saliency detection algorithm research [6–9], by dividing the input image into several small computing units,such as super‐pixel block[10]or pixel cluster[3],can effectively simulate the corresponding relationship between images in an image group. In 2013, Tan et al. [11] proposed a self‐contained co‐saliency detection algorithm based on the super‐pixel affinity matrix.
However, the object of these algorithms proposed at first was only for image pairs. In 2014, Liu et al. [12] proposed an image co‐saliency detection algorithm based on hierarchical segmentation.The algorithm is divided into two modules:fine segmentation and coarse segmentation. Region similarity and region contrast measurement are based on fine segmentation.Based on the coarse segmentation, the prior objects of each area are measured based on the connectivity of the image boundaries.Finally,based on the region similarity measure,the global similarity of each area is obtained, and then it is effectively integrated with the internal saliency map and the prior object map to generate a co‐saliency map for each image. Li et al. [13] proposed a new co‐saliency detection model based on graph matching. The model integrates visual appearance,saliency coherence, and spatial structure continuity for co‐saliency detection between image pairs. Zhang et al. [14]proposed a hierarchical image co‐saliency detection framework from rough to fine to capture features.This method generates the co‐saliency map through a mask‐guided fully convolutional network.
In recent years, some researchers have extended deep learning techniques to co‐saliency detection algorithms and obtained good performance by co‐training common saliency objects [15–21]. Zhang et al. [15–17] proposed a self‐paced multiple‐instance learning framework for co‐saliency detection [17]. On the one hand, it is self‐paced through MIL(Multiple‐Instance Learning) can adapt to insight measurement and discover common patterns in areas of co‐saliency.On the other hand, it tends to ensure the reliability and stability of learning by simulating human learning processes.Tsai et al. [18] proposed a phased co‐saliency detection algorithm. In the first stage, they evaluated the quality of salient features through an unsupervised deep learning model with an SAE (Stacked AutoEncoder). In the second stage, they developed a self‐training convolutional neural network to alleviate the problem of excessive smoothing of saliency maps.
Gao et al. [19] proposed an FCN (Fully Convolutional Network) framework embedded with a common attention module, called Co‐Attention FCN (CA‐FCN). This method inserts the common attention module into the high‐level convolutional layer of FCN. This can assign a larger attention weight to common salient objects and a smaller attention authority to the background and non‐common interferers to improve the final saliency detection performance.Although the development of the co‐saliency detection algorithm has only been in the last few years,it has gained a high level of attention and very satisfactory performance and has shown a broad application prospect.
In 1984, Porter et al. [22] made a mathematical definition for the matting problem for the first time, expressing any observation imageIas a linear combination of the foreground imageFand the background imageBunder theαchannel,as shown in(1),whereαrepresents the opacity of the foreground object.Since the foreground imageF, background imageBandαvalue are unknown, the expression of alpha matting is ambiguous.
In the past decade,natural image matting has received great attention from the research community, and can now be considered as one of the classic research problems in visual computing. Natural image matting algorithms can be roughly divided into three categories:
(1) Sampling‐based matting algorithms. This method has the foreground and background information of each unknown pixel estimated from the surrounding known pixels,so that the alpha value can be calculated by finding the optimal foreground and background data pair in the known area.Ruzon and Tomasi [23] proposed a sampling‐based matting method to separate the background and foreground by establishing a colour space model. This method can extract objects from natural images and extend the blue screen matting to images with almost any colour background. Then, some scholars found that the sampling‐based matting algorithm has problems of conflicts between multiple sampling standards and incomplete sample space,and made optimisation improvements[24,25].The PDMS method (Pixel‐level Discrete Multi‐object Sampling)proposed by Huang et al.[24]formalises the colour sampling process at each unknown pixel as a multi‐objective optimisation problem and expands the sample space. This greatly improves the boundary clarity of the results.
(2) Propagation‐based matting algorithm. In 2004, Sun et al. [26] proposed a Poisson matting algorithm. This method assumes that the foreground and background are locally smooth. According to the input trimap,Dirichlet boundary conditions are used to solve the Poisson equation to reconstruct the mask value of the unknown region. However, because pixels cannot be solved independently, this method is not as good as the sampling method in parallelism. Wang et al. [27] proposed a robust matting algorithm. This method greatly increases the sampling range and optimises the mask value by adding sampling coefficients. In 2013, Chen et al. [28] applied non‐local principles to general Alpha matting to extract multiple image layers simultaneously.This algorithm improves the non‐local matting by considering the first K nearest neighbours in the high‐dimensional feature space, called KNN Matting. Aksoy et al. [29] proposed a natural image matting algorithm based on pure affinity. The algorithm uses multiple definitions of pixel affinity to control the flow of information from known opaque areas and within the location area itself.
(3) Learning‐based matting algorithm. In recent years, deep learning has been widely used and has achieved good results in natural image matting [30–34]. Cho et al. [30]proposed the DCNN (Deep Convolutional Neural Networks) matting algorithm, which for the first time introduced a deep neural network to the image matting technology. This method uses the network to learn the results of different methods, to achieve the fusion of local and non‐local methods. Xu et al. [31] proposed a large‐scale data set neural network model, which mainly solves the problem of using only low‐level features and lacking high‐level context semantic information. Tang et al. [34] proposed a method based on a mixture of sampling and learning. Before opacity estimation, a deep neural network was used to estimate the layer colour,which greatly improved the performance of opacity estimation.
F I G U R E 2 The main flow chart of the authors’ method
One of the earliest results of image attention retargeting research was a technique proposed by Su et al. [35] to selectively alter texture changes to reduce image regional significance. The visual attention retargeting methods have been roughly divided into two categories: colour‐based methods[36–39] and direction‐based methods [40, 41].
The colour‐based method modifies each colour component so that the visual saliency inside the ROI increases,while the visual salience outside the ROI decreases.The advantage of this method is that it not only increases the significance of the ROI area but also maintains the high resolution of the non‐ROI. Hagiwara et al. [36] proposed an iterative technique for manipulating image saliency. This method adjusts the colour and brightness of the image (inside and outside the ROI) in each step after selecting the ROI to increase saliency. During this period, the saliency map of the modified image will be calculated and analysed, and adjusted to increase the saliency within the ROI until it is considered the highest relative to the rest of the image. However, because saliency must be calculated in each step, this method has high computational complexity. Mechrez et al. [39] proposed a method for manipulating object saliency by considering the internal colour and saliency of the image.The optimisation framework in this method uses patches in the same image to modify the saliency of the ROI area.
On the other hand,the direction‐based method guides the user's visual attention to the non‐blurred area by blurring the colours outside the specified area. Hata et al. proposed a direction‐based image modification method[40],which created a modified image by using a Gaussian filter to blur areas outside the specified area. In this method, high‐frequency components are suppressed by applying spatial filtering to control the resolution. Hitomi et al. [41] proposed a saliency mapping based on wavelet transform and an image modification method.This method corrects the frequency components according to the obtained saliency map to suppress the visual saliency outside the specified area.
Recently,Chen et al.[42]proposed a model based on deep learning, which directly optimises operations between any distinguishable saliency estimation method and neural networks that apply image processing to meet saliency guidance,called the SaGIM (Saliency‐Guidance Image Manipulation)model.
An image attention retargeting algorithm was studied. In the image preprocessing stage, the authors adopted the existing KNNT matting algorithm [4] and cluster‐based co‐saliency detection algorithm [3]. In this section, the two algorithms used are introduced and the image attention retargeting algorithm proposed is described in detail.Figure 2 is the main flow chart of the method. First, the authors used the saliency detection algorithm for each input image to obtain a single‐image saliency map, and then the co‐saliency detection algorithm was used on the image group to obtain a co‐saliency map.Second,by comparing two saliency maps,a non‐common salient object was obtained (i.e., ROI), and the matting algorithm used to extract the corresponding positions of the non‐common salient objects. Finally, the attention of the ROI area was retargeted to achieve the effect of weakening its saliency.
F I G U R E 3 Flowchart of cluster‐based co‐saliency detection algorithm. ROI, Region of Interesting
3.1.1 | Co‐saliency detection algorithm
In the saliency map extraction stage,a cluster‐based co‐saliency detection algorithm was selected [3]. This method uses clustering to maintain global consistency among multiple images and fused bottom‐up clues to generate the final co‐saliency map. This method does not require deep learning and is simple, versatile and efficient.
The algorithm is based on a two‐layer clustering method to perform co‐saliency detection of multiple images. The algorithm flow is shown in Figure 3. Given a set of images, this method uses K‐means clustering to group the pixels in a single image and cluster the pixels in multiple images to form associations among multiple images. This method selects uniqueness,distance from the centre of the image,and repeatability as the salient features on a single image and multiple images,respectively,called contrast cue,spatial cue,and corresponding cue.Finally,a probability framework is used to soft‐assign each pixel based on all clustering characteristics to compute the saliency value,which is used to generate the final saliency map.
wherenirepresents the number of pixels in clusterCi, andNrepresents the number of pixels in all images.
Spatial Cue: In the human visual system, the closer the object is to the centre of the image, the more attention it will attract. According to the “central bias rule” in saliency detection, the distance between the object and the image centre increases, and the attention gain decreases. We define the spatial cuecs(k)of clusterCk, as shown in Equation(3).
According to the above process, the contrast cue and spatial cue of a single image are obtained, as well as the contrast cue, spatial cue and corresponding cue of multiple images. For the single‐image saliency map, it is obtained by fusing its two cues. For the multi‐image saliency extraction stage, the three cues and the single‐image saliency map are fused,and finally,the multi‐image co‐saliency map is obtained.Common fusion methods include linear summation and point‐wise multiplication. The point‐by‐point multiplication is more advantageous in suppressing noise, and the linear summation has a better recall rate.In the algorithm experiment,a saliency map with higher accuracy is needed. Therefore, point‐wise multiplication is selected in this algorithm to integrate the final saliency features.
Before merging saliency features, all cluster feature distributions need to be standardised to standard Gaussian distributions. First, the co‐saliency probability of clusterCkasP(Ck) is defined, as shown in Equation (6).
whereVpis the feature vector of pixelpand the variance of clusterCkis regarded as Gaussian variance. Finally, the saliency probability of each pixel is obtained by summing the co‐saliency probabilities on the cluster.
3.1.2 | KNNT matting algorithm
In Equation (1), the observed image is represented as a linear combination of the foregroundFand the backgroundBunder theαchannel.The value ofαis[0,1].When the value ofαis 1,the pixel belongs to the foreground area.When the value ofαis 0,the pixel belongs to the background area.When the value ofαis between(0,1),the pixel belongs to the unknown area of imageI.Therefore,how to estimate the value of the unknown area by using the pixels of the known foreground and background areas is the key to the matting problem.In areas where the foreground and background have overlapping colours or dense textures, the quality of the estimated mask value will be greatly affected. The authors use the KNNT matting technology [4]. This adds texture features based on KNN matting to enhance the background constraint, and the experimental effect is better.
Non‐local Principle [43]: Suppose a denoising pixelpis a weighted sum of pixels with similar characteristics and a kernel functionκ(i,j), as follows.
whereX(i) is the feature vector of pixeli,dijis the distance between pixelsiandj, ‖?‖gis a Gaussian norm,h1andh2are constants.Similar to Equation(9),the expected value ofαis as follows.
Generally,colours are described in terms of hue,saturation and brightness. Hue describes the pure colour attribute, saturation describes the measure of the degree to which a pure colour is diluted by white light, and brightness is the key factor to describe the colour and is measurable. When colour is mentioned,the first thing generally thought of is the hue in the colour, while the saturation and brightness are not as noticeable. However, in image saliency detection, the saturation and brightness of the object colour have a greater impact on the saliency. Therefore, to get a better effect of modifying the significance, HSI colour space is used to modify the ROI area from three aspects, hue, saturation and brightness, and the brightness component is separated from the colour information in the image to achieve the best effect of reducing saliency.Suppose a natural image is obtained with hue, saturation,brightness,and its selected ROI area. Since the lowest level of
To obtain the most similar hue distribution, the optimal case is to minimiseG(θ), that isθadj= arg minG(θ). Dirac kernel function is used, which may occur that with multiple differentθvalues meeting the requirements in a flat peak segment,so the mean value of the range is used as the optimal solutionθadj. Therefore, all hues in the ROI area are adjusted toi=hi+θadj, and the value range of huehis [0,2π), so wheni≥2π,i=i?2π.
For saturation and brightness channels, they were further adjusted, through the interaction parameters saturation difference Δsand brightness difference Δi, which represent the saturation difference and the brightness difference, respectively,between the ROI area and the background area.Because these two parameters are directly related to the image saliency,their effect is obvious, and they are easy to adjust.
For input natural imageM, to express the influence more clearly by mathematical formula, about the user control parameters Δsand Δion the image, two functions are defined,Φ(SM′,R) and Φ(IM′,R), and used to calculate the saturation difference and brightness difference between the ROI area and the background area:
whereM′is the processed image, and initiallyM′=M. The corresponding saturation and brightness are represented bySM′andIM′, andRis the ROI area.
The energy terms are set based on saturation and brightness, and the difference between saturation and brightness is transformed into the minimum energy problem,
In this section, the setup and results of the experiment to assess the performance of this method are discussed.
In the selection of experimental data sets, at least two requirements must be met: (1) each image has at least one co‐saliency object; (2) each group contains at least one image, in which there are two or more saliency targets, and each object covers at least 10 percent of the image. This can prevent the most significant problems caused by excessive ROI area or the neglected problem caused by the ROI area being too small.According to the above requirements for experimental data,20 groups of images were selected from CMU,Cosal2015 and iCoseg, totalling 100 images for experimental verification.
The following is the construction phase of the saliency‐guided image block(i.e.ROI).First,a cluster‐based co‐saliency detection algorithm was used to estimate the co‐saliency map of all image groups and the single saliency map of each image. By comparing the co‐saliency map and single saliency map, the rough position of the ROI area can be estimated. Then,through the matting algorithm,the grey image of non‐common saliency objects can accurately and completely be obtained,that is, the saliency‐guided image block. Here, to increase the accuracy of ROI region extraction, the method of manually drawing the trimap was used.Figure A1 in the Appendix shows the partial results of the saliency detection.As shown in Figure 4,part of the extraction results of matting are shown.Figure 5 is an enlarged detail of the ROI region.It can be seen that these methods can accurately judge the foreground and background,andevencanbeaccuratelydetectedonthesmalltipofthetomato pedicle.However,there remains room for improvement in the handling of edge details.
TA B L E 1 A comparison of the results of the single‐image saliency map before and after modification on the six evaluation indexes
A qualitative and quantitative analysis has been performed.Qualitative analysis uses the intuitive perception of human eyes to evaluate the effect of visual attention retargeting processing.By comparing the saliency maps of the single image before and after the processing with the co‐saliency maps, respectively,quantitative analysis to obtain specific evaluation index values is more convincing.
4.3.1 | Qualitative analysis
In terms of image attention retargeting,a questionnaire survey was made for 20 groups of the original images and the images processed using the described method. It contains three questions: (1) In the original image, which is the most significant object? (2) In the processed image, which is the most significant object? (3) Are the most prominent objects in the two images the same? In the first two questions, several candidate objects are supplied for testers to choose from. A total of 53 testers participated in the survey. The most significant objects in the two images selected by 71.6%of the testers are different. In the second question, 56.6% of the testers did not select the ROI area as the salient object. Experimental results show that this image processing changed the saliency of the image to a certain extent, and accurately matches the ROI area that was extracted.
4.3.2 | Quantitative analysis
To more intuitively embody the saliency reduction effect of this method on non‐common salient objects, the original image co‐saliency map was used as the reference truth map.The original single‐image saliency map and the single‐image saliency map modified using the described method are used as test maps. Saliency evaluation index F‐Measure, Mean Absolute Error (MAE), P–R curve, Matrix Similarity (Similarity,SIM),Normalised Scanpath Saliency(NSS),Linear CorrelationCoefficient(CC),and KL divergence(KLdiv)were selected to evaluate the experimental effect [45]. When calculating F‐Measure,β2= 0.3 was taken, and the results of various indicators are shown in Table 1 and Figure 6.Figure 7 is the P–R curve of the original single‐image saliency map and the modified single‐image saliency map.
F I G U R E 6 Comparison results of the single‐image saliency map before and after modification on the six evaluation indexes
F I G U R E 7 Taking the co‐saliency map as the reference truth map,the P–R curve of the original single‐image saliency map and the modifiedsingle‐image saliency map
From the evaluation indicators in Figure 6, Figure 7, and Table 1,the visual attention retargeting algorithm based on co‐saliency detection our proposed, in each index, reflects the modified single‐image saliency map is closer to the original image co‐saliency maps, and this method effectively reduces the saliency of non‐common saliency objects.
In the comparative experiment, the results from this method were compared with those of Su et al. [35], SaGIM[42], and the OHA algorithm [46], and partial experimental results are shown in Figure 8. More experimental results are shown in Figure A2 in the Appendix. Among them, the first column of each group is the processing result of the method,and the second column is the single‐image saliency map of the result. The described method has obvious advantages in reducing saliency and effectively suppresses the saliency characteristics of the ROI region. Besides, for the attention retargeting effect, the colour of the image obtained using the described method is more natural, and is adjusted on the entire ROI, while the other algorithms can only modify some areas in the ROI area in individual images. As shown in the kites in the example on the fourth row in Figure 8, the white area in the original image cannot be modified. Moreover, the methods in [35, 42] seriously affect the quality of the image,resulting in a decrease in the definition of the ROI area in the image, while the method described here shows good performance in image quality maintenance. In the experiment, the authors found that the results of this method on the image of people are not perfect. For example, there may be unrealistic colours, such as green players, as shown in the 16th row of Figure A2. In the next study, this problem needs to be resolved.
The authors propose an image attention retargeting algorithm based on co‐saliency detection. The co‐saliency detection algorithm is used to obtain the approximate position of non‐common salient objects, and then the matting algorithm is used to accurately and completely extract their precise location.Finally, the authors propose a new method of attention redirection.In the HSI colour space,they describe the hue content of an ROI and its surroundings using a polar representation of a perceptually uniform colour space, so that the best hue adjustment to minimise the difference between the ROI and its surrounding environment can be easily determined. In terms of saturation and brightness, the parameters can be set according to the user's ROI area and the surrounding environment differences,which is more user‐friendly and the effect is much better. In future work, the authors plan to optimise the accuracy of the matting algorithm in image preprocessing to make it more accurate in small parts.The attention retargeting algorithm also needs to continue to be optimised for the naturalness of colour,while reducing the ROI saliency,making the ROI region features more realistic.
F I G U R E 8 Examples of image attention retargeting results by the authors’ method and single-image saliency map, and comparative experiments include OHA [46], SaGIM [42], Su et al. [35]
ACKNOWLEDGEMENTS
This research was supported by the National Natural Science Foundation of China (61772319, 62002200, 61976125,61976124, 61907026), Project of Shandong Province Higher Educational Science and Technology Program(J18KA392)and Project of Shandong Technology and Business University wealth management (2019ZBKY053, 2019ZBKY032).
ORCID
Jinjiang Lihttps://orcid.org/0000-0002-2080-8678
APPENDIX
F I G U R E A 1 Example of cluster‐based saliency detection results
F I G U R E A 2 More examples of image attention retargeting results by the authors’ method and single‐image saliency map, and comparative experiments include OHA [46], SaGIM [42], Su et al. [35]
CAAI Transactions on Intelligence Technology2021年3期