亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

TransHist:Occlusion-robust shape detection in cluttered images

2018-07-13 06:59:32ChuHanXuetingLiuLokTsunSinnandTienTsinWong

Computational Visual Media 2018年2期

Chu Han,Xueting Liu,Lok Tsun Sinn,and Tien-Tsin Wong

Abstract Shape matching plays an important role in various computer vision and graphics applications such as shape retrieval,object detection,image editing,image retrieval,etc. However,detecting shapes in cluttered images is still quite challenging due to the incomplete edges and changing perspective. In this paper,we propose a novel approach that can efficiently identify a queried shape in a cluttered image.The core idea is to acquire the transformation from the queried shape to the cluttered image by summarising all pointto-point transformations between the queried shape and the image.To do so,we adopt a point-based shape descriptor,the pyramid of arc-length descriptor(PAD),to identify point pairs between the queried shape and the image having similar local shapes. We further calculate the transformations between the identified point pairs based on PAD.Finally,we summarise all transformations in a 4D transformation histogram and search for the main cluster. Our method can handle both closed shapes and open curves,and is resistant to partial occlusions.Experiments show that our method can robustly detect shapes in images in the presence of partial occlusions,fragile edges,and cluttered backgrounds.

Keywords shape matching; shape detection;transformation histogram

1 Introduction

Shape matching plays an important role in various computer vision and graphics applications such as shape retrieval,object detection,image editing,image retrieval,etc. Compared to gradient and texture features,shape features are much more reliable when objects are characterized by distinctive shapes,such as road signs in images and videos.In this paper,we focus on detecting shapes in cluttered images by analyzing point-to-point transformations.

In the early days,methods were proposed to measure shape similarity by transforming shapes into other domains,using,e.g.,wavelet-based transforms[1]and Fourier transforms[2–4].Methods have also been proposed to transform shapes into the curvature domain,using curvature in flection points for shape matching[5,6]. Later,various shape descriptors were proposed and used for measuring shape similarity. Shapes may also be described using triangle areas,by forming a set of triangles at a reference point[7].Use has also been made of an integral kernel to extract shape characteristics within a region centered at a reference point[8,9].The state-of-the-artshape contextshape descriptor utilizes a log-polar diagram to statistically record the spatial distribution of shapes at each sample point[10].

However,most existing shape matching algorithms or shape descriptors are only designed for matching shapes with clean and clear edges.Unsatisfactory results may be obtained if they are used to detect shapes in cluttered images.First of all,edges in cluttered images are fragile edges:shapes may be cut into fragments(blue box in Fig.1(b)).With no tolerance for fragile edges,whole-shape descriptors naturally fail to detect shapes with fragile edges.Furthermore,region-based shape descriptors are also unable to extract features from fragile edges since they assume that shapes are closed.Secondly,partial occlusions(yellow box in Fig.1(b))typically occur,again hindering whole-shape descriptors from finding the correct solution. Lastly,the background in cluttered images may be extremely noisy(red box in Fig.1(b)).Noise greatly affects shape descriptors in the spatial domain.Moreover,cluttered backgrounds also increase computational costs.

Theinner-distance shape context[11]extended the original shape context to tackle partial-shape matching,but it still requires the shapes to be closed. Recently,Kwan et al.[12]proposed a point-based shape descriptor called thepyramid of arc-length descriptor(PAD)to tackle partial occlusion,and their method is also applicable to open curves.However,their method is greatly affected by cluttered backgrounds since they rely on distancefields for final shape matching. Moreover,it is computationally expensive.

Fig.1 Challenges in shape detection in cluttered images.(a)Input image.(b)Edge map of(a).(c)PAD with distance field.(d)Our result.

In this paper,we present an efficient shape detection method tailored for shape detection in cluttered images. We found that,while shape detection can be regarded as finding the optimal transformation from a queried shape to the shape in the image,we may first identify the optimal transformation for each point on the shape and then summarise all transformations to get the optimal transformation for the whole shape. In order to find the optimal transformation for each point efficiently,we adopt PAD[12]as the shape descriptor for each point and find the optimal corresponding point in the image by comparing similarity of PAD.We use PAD because of its scale,rotational,and translational invariance.Furthermore,PAD is applicable to both closed and open curves,and is robust to partial occlusion.We then calculate the transformation between corresponding points based on PAD(Fig.5),and form a 4D transformation histogram to summarise the transformations of all points on the shape. The main cluster of transformations is the detected result.Figure 1(c)shows the result detected by directly applying PAD,while Fig.1(d)shows the result detected by our method.It can be seen that our method is more robust to noise and therefore more appropriate for detecting shapes in cluttered images. We later provide the results of two experiments to validate the robustness of our method to partial occlusion and cluttered backgrounds.

Our contributions can be summarized as follows:

?an efficient shape detection method for detecting shapes in cluttered images;

?which is well able to simultaneously handle fragile edges,partial occlusion,and cluttered backgrounds.

2 Related work

Detecting shapes in images is a fundamental problem in computer vision and graphics.Over the past few decades,various two-dimensional shape descriptors have been proposed to describe the characteristics of a shape. They can be broadly classified into two categories: global shape descriptors,which describe the characteristics of the whole shape,and local shape descriptors,which describe the local characteristics of a shape at specific points.

2.1 Global shape descriptors

A popular family of global shape descriptors performs domain transform on the shape.Fourier descriptors[2–4]can be used as shape signatures that capture shape characteristics(e.g.,centroid distance and cumulative angles along the shape boundary)in the Fourier domain. Wavelet descriptors[13]rely on wavelet transforms to obtain a multiresolution representation of a shape from the shape boundary. Radon descriptors[14]rely on R-transforms,a variant of Radon transforms,of the shape to obtain the shape properties. Curvature scale space(CSS)[5,6]describes a shape by recording its in flection points at different level of smoothing. It represents the changes in location of in flection points on the shape boundary with smoothing.Later,Lee et al.[15]proposed shape signature harmonic embedding(SSHE),which uses discrete harmonic functions to replace smoothing in the construction. Descriptors have also been proposed based on moment theory,including Hu moments[16],Zernike moments[17],and image moments[18,19].Bernier and Landry[20]proposed a polar representation that plots the orientation of each boundary point referenced from the centroid of the shape as a description.

However,all of the above global shape descriptors describe shapes in an integral manner. They do not extract any local detail of the shape.Thus,we cannot directly use these global shape descriptors to measure shape similarity for open curves or complex shapes.Furthermore,global shape descriptors may also fail when partial occlusions exist.

2.2 Local shape descriptors

Local shape descriptors are generally point-based,describing local shape characteristics at reference points.One descriptor is built for each reference point,and all descriptors together form a rich description of the whole shape.

Shape context[10]is the state-of-the-art pointbased shape descriptor. It describes the shape distribution of boundary points by a log-polar diagram centered at a reference point. Mori et al.[21–23]introduced fast pruning to speed up the matching process for shape contexts. Inner distance shape contexts[11]are an extension of shape contexts which use inner distance instead of traditional Euclidean distance in length measurement.However,shapes need to be closed to measure inner distances.Furthermore,all methods based on shape contexts are error-prone when used to detect shapes in cluttered images since shape contexts describe shapes in raster space.

Although these point-based shape descriptors describe local shape characteristics, global normalization is still needed to achieve scale invariance,as they are not inherently scale invariant.These point-based descriptors cannot be directly applied to detecting shapes in cluttered images.

Recently,Kwan et al.[12]proposed a locally scale-invariant shape descriptor,the pyramid of arclength descriptor(PAD).The locally scale-invariant property enables more robust detection of shapes in the presence of cluttered edges and noise.However,PAD only represents a very limited range of local shapes at a reference point,so further evaluation is needed for shape matching.While Kwan et al.used a distance field for this purpose,the resulting ability to detect shapes in cluttered images is poor,since distance fields are quite sensitive to noise in raster images.Instead,we adopt a transformation histogram for shape matching,making our method much more robust to noise.

2.3 Shape-based object detection

Various methods have been proposed to detect objects in 2D or 3D space[26,27].Here,we only focus on object detection in 2D,i.e.,images.Most existing methods are based on shape context[10]since it is the state-of-the-art shape descriptor.In particular,Lian et al.[28]proposed to detect shapes with a novel outlier-resistant shape context distance that ignores outliers in norm-2 distance in the original shape context.Thayanathan et al.[29]introduced a continuity constraint into shape context matching,restricting the correspondence to be formed by nearby points.However,these methods cannot achieve scale invariance and are only applicable to detecting shapes at the same scale.

Riemenschneider et al.[30]proposed a partialshape matching method to locate objects.However,their method is not truly scale invariant.Shape bands[31]can tolerate shape deformation to some extent,within a fixed bandwidth.But this method is also not scale invariant and is errorprone in the presence of cluttered edges.Cheng et al.[32]proposed to use a boundary band map to search for repeated elements with similar shapes,but user interaction is needed.The chordiogram method[33]works by first forming a set of chord by joining boundary points together,and then uses lengths,orientations,and normal directions of the boundary points forming the chords to form a chordiogram for shape matching.However,this method is not scale invariant and rotational invariant.Chi and Leung[34]proposed to decompose a shape into primitives and perform partialshape matching by searching an indexed structure of primitives.However,they do not take scale invariance into consideration.In contrast,our approach achieves scale invariance and rotational invariance;only a single description is needed to describe the local shape in a scale-invariant and rotational-invariant manner.

Methods based on neural networks have also been proposed to detect objects[35–37].While neural networks may achieve better results than traditional low-level methods,they rely highly on training data.In this paper,we aim to detect shapes in cluttered images by only relying on the shapes’characteristics.

3 System overview

In this paper, we propose a novel shape detection method which calculates and analyzes a transformation histogram relating the queried shape to the cluttered image. Figure 2 shows the framework of our method.Given a cluttered input image and a queried shape,we first extract the edges from both the image and the queried shape.To extract edges from the cluttered image,we use the Canny edge detector[38]. For the queried shape,we simply identify all boundary pixels as edges of the shape. We then calculate the local shape feature of each each point using an existing point-based shape descriptor,the pyramid of arc-length descriptor(PAD)[12].Note that PAD can be computed for all points no matter whether they are on closed or open curves.Moreover,PAD only describes local shape features along a single edge.No redundant or disturbed information from the cluttered background is embedded in PAD.In addition,it is scale invariant,rotational invariant,and translational invariant,allowing the detection of shapes with changes in size and orientation.

The next step is to find all pairs of edge points with similar PAD features(see Section 4.1).We observe that there is a high probability that the points will be correctly matched,and the transformations for all correctly matched point pairs should be quite similar(Fig.6).Thus,we first calculate the transformation for each pair of edge points in correspondence(see Section 4.2),and then form a transformation histogram using all transformations,to identify the main cluster that represents the transformation of the whole shape(see Section 4.3).

Our method is fully parallelisable for use on a GPU.

4 Shape detection in cluttered images

4.1 Point-to-point matching via PAD

Given the edge maps from an image and a query shape,we first extract the shape features of all reference points on the edges by a local point-based shape descriptor,pyramid of arclength descriptor(PAD)[12].The features extracted from PAD is locally scale invariant,rotational invariant,and translational invariant.It can provide the precise transformation of two points with similar local shapes, which perfectly matches our requirement.But we still want to point out that our framework actually can accept all local shape descriptors that provide point-to-point transformations.

Fig.2 Framework of our approach.

For the completeness of the paper,we brie fly introduce PAD and how it is used for local shape matching between two reference points in the following.The PAD shape feature is extracted from the integral of absolute curvature(IAC)[39].Given a curve(Fig.3),the integral of absolute curvatureτover a curve segment between pointssandtis defined as

whereκ(x)is the curvature at pointxon the curve.

PAD encodes the shape by combining the IAC domain and the arc-length domain.It constructs a pyramid of arc-length intervals centered at a pointp,such that each interval corresponds to a fixed integral value of absolute curvature(2i?τ),cumulated fromp.The pyramid of arc-lengthLiandRican be extracted by integrating different levels of absolute curvature in the IAC domain.For leveli,the two intervals(li:p)and(p:ri)have the same IAC value 2i?τ,so

After cumulating arc-length on both left and right hand sides at different levels,the PAD vector is defined by using this set of arc-length values.The corresponding arc-lengths of these IAC intervals form the initial PAD vectorMinit.Figure 4 shows the set of intervals sampled for 5 levels and the IAC value accumulated for intervals from each level.

Fig.3 An interval on a curve.

Fig.4 Intervals sampled with multiple levels of PAD.

Here,s∈{+1,?1}is the sign of curvature atp,indicating whether the curve is convex(+1)or concave(?1)near the point of interest.

The PAD distance(similarity)between two local shapes around two pointspandqis denotedDp,q,and is thel∞-norm distance of the difference of two PAD vectors:

We can now estimate the local shape similarity of two points using the defined PAD similarity.Wefind all point pairs(one on the queried shape and the other on the image)with PAD distance larger than or equal toK=0.2 and denote these point pairs asmatching pairs.We may increaseKto enforce the matching pairs to have more similar local shapes,but with reduced tolerance for shape deformations.

4.2 Transformation of matching pairs

We define the transformation between two points,and thus two local shapes,to be a 2D transformation comprising scaling,rotation,and translation alongx-andy-axes.For example,Fig.5 shows a queried shapeQ(left)and an edge mapE(right)with an identified matching pairpi∈Qandqj∈E.We build a vector(red arrows in Fig.5)of the endpoints of the last level coverage of PAD for pointspiandqi.The relative magnitude of the two vectors indicates the change in scalespi,qjbetween the two shapes under the PAD coverage.The angular difference of the two vectors indicates the change in orientationθpi,qjof the two shapes locally.The translation between the two points(xpi,qj,ypi,qj)can be computed as the spatial distance between the two vectors.Then we can write the transformationTi,jas

Fig.5 Vector formed by last level PAD coverage.

whereiandjare indices of points on the queried shapeQand the edge map of the cluttered imageErespectively.

Note that the transformation model used herefirst translates the queried shape to the position defined by the vector,and then scales and rotates the queried shape correspondingly.Translation is referenced to the centroid of the shape in order to avoid deviations in position of the matching pair.This avoids translations being affected by the scaling and rotation components.

4.3 Transformation histogram

We now obtain a set of transformationsT={Ti,j}for all matching pairs. All these transformations hint at possible locations of the queried shape in the cluttered image.It can be easily observed from Fig.6 that matching pairs between two similar shapes should have similar transformations.Based on this observation,we use a transformation histogram to cluster the transformations.

Fig.6 Matching pairs along similar boundary portions. Each differently colored point is a PAD point involved in a different matching pair.Dashed lines show correspondences between matching PAD points.

whereD(i,j)is the PAD distance ofiandj,andκdenotes curvature.The aim is to consider local similarity and local smoothness for each matching pair(i,j).We want locally more similar matching pairs to contribute higher scores.Since a smaller PAD distance means shapes are more similar,we thus putD(i,j)in the denominator.We further weight the score by local smoothness of the matching pair(i,j),as smoother edges deliver less shape information and are more likely to be matched with other smooth edges.On the contrary,more rapidly changing edges contain more information.Figure 7 shows examples of matching two smooth edge segments and two more rapidly changing edge segments. Dashed lines indicate pairs of points which are locally similar and form a matching pair. We can easily observe that each point in Fig.7(a)matches several locally similar points in Fig.7(b). In contrast,the indicated point in Fig.7(c)only matches one point in Fig.7(d).Without consideration of local smoothness,the transformation histogram will be dominated by matching pairs of smooth points.We overcome this issue by weighting scores of matching pairs by their curvatures,κ(i)andκ(j).

For each binncorresponding to a certain transformation range[Tn,Tn+1),the final scoreSnis the sum of scores of all matching pairs for that bin:

Fig.7 Locally similar matching pairs with similar transformations.(a)and(b)are two smoother edges.(c)and(d)are two more rapidly changing edges.Dashed lines join locally similar pointers.Smoother edges have many more correspondences than more rapidly changing edges.

In practice,we set the numbers of bins for scale, rotation, andx-andy-translations to{10,10,50,50}.By finding the bin with the largest score in the histogram,we can find the target shape in the image. Since it is possible that no similar shape exists in the cluttered image,we set a threshold on the fraction of the points for a match to exist.If the fraction is too low(empirically＜30%),we conclude that there is no similar shape in the image and return no match.

5 Experiments

In this section,we describe several experiments conducted to evaluate our shape detection method.Firstly we show some results of detecting shapes in cluttered images. Note that all results are directly plotted onto edge maps to aid visualization.We conduct two further experiments to validate the robustness to occlusion and cluttered edges respectively.

In these experiments,we compare our method to 6 existing shape matching methods,including PAD using a distance field[12],shape contexts[10],inner-distance shape contexts[11],curvature scale space[6],triangle area representation [7],and integral invariants[8],abbreviated as PAD-DF,SC,IDSC,CSS,TAR,and II,respectively.

5.1 Shape detection in cluttered image

First we show a few results to demonstrate the ability of our method to detect shapes in cluttered images.

Figure 8(a)shows a real life photo containing a swan in the background. Since most existing shape descriptors cannot support open curves,we only compare our method with PAD-DF[12]and IDSC[11]. IDSC fails to detect the swan as inner distance is not defined for points across the disjoint components(see Fig.8(b)).PAD-DF fails to locate the swan as it is confused by the clutter(see Fig.8(c)).Our method successfully finds the location of the swan in the image(see Fig.8(d)).Since it is unfair to compare IDSC in such cases,in the next two results,we only compare our method with PAD-DF.

Figure 9(a)shows an image from an object detection dataset,the ETHZ shape dataset[40].The aim is to detect the Apple logo(shown at top left in Fig.9(a)).Leaves on trees in the right part of the image lead to crowded edges(see Fig.9(b))which leads to incorrect detection results for PAD-DF(the Apple logo marked in red).In contrast,our method successfully avoids the effects of crowed edges and correctly matches the Apple logo in the image(see Fig.9(c)).

Fig.8 Matching a swan.(a)Original image with query at top left.(b)IDSC result.(c)PAD-DF result.(d)TransHist result.

We show another example in Fig.10(a)where the Apple logo is partially occluded by a wire.Figure 10(b)shows the result generated by PADDF.When detecting shapes in real images,PADDF is much more error-prone in the presence of crowded edges. Even with partial occlusion,our method can still survive and recognize the Apple logo correctly(see Fig.10(c)).

Fig.9 Matching an Apple logo on a window.(a)Original image with query at top left.(b)PAD-DF result.(c)TransHist result.

Fig.10 Matching an occluded Apple logo.(a)Original image with query at top left.(b)PAD-DF result.(c)TransHist result.

5.2 Tolerance to occlusion

Partial occlusion is one of the major challenges in shape detection.However,this phenomenon is quite common in real world scenarios.This experiment aimed to explore the tolerance to partial occlusion,using the dataset proposed by Kwan et al.[12].It contains a set of shapes and clipped instances of them.The shapes are from the MPEG7 dataset[41],and comprise 20 classes each with 70 shapes.Each shape is gradually occluded from left to right.The occlusion rate goes from 0%to 90%in 10%increments,giving 14,000 shapes and partial shapes in total.

We follow the rendering approach used by Kwan et al.[12]to quantify the goodness of matching:we render the clipped instance(after transformation)on top of the original shape.LetCbe the set of pixels belonging to the transformed clipped instance in the space of the original shape andCobe the set of pixels whereCshould be,again in the space of the original shape.We then measure the matched fractionγas follows:

γ=1 indicates a perfect match.Due to rasterization and numerical errors,we may not get a perfect match even if the match is visually perfect. Hence,we regard a transformation withγ＞0.95 as a successful match.

Figure 11 plots the success rate against the degree of occlusion.We can see that PAD-DF and our method can still recognize shapes even in the presence of significant occlusion:even with 80%occlusion,we still achieve a success rate of around 25%.Most other descriptors,including TAR,SC,CSS,and II,are unable to deal with partial occlusion,and their success rates drop extremely quickly.Clearly,whole shape matching descriptors are inappropriate for measuring shape similarity in the presence of occlusion.Our transformation approach also outperforms PAD-DF,since the transformation histogram guarantees that matching pairs with similar transformations are grouped together,while the distance field is greatly affected by the cluttered background.

Fig.11 Matching success versus occlusion.

5.3 Tolerance to cluttered edges

We further evaluate robustness to cluttered edges since it is an important factor when detecting shapes in cluttered images.To mimic cluttered edges,we add random arcs to the edge map of a clean shape.We only control the total length of all added arcs with respect to the length of the edges of the original shape.Figure 12 shows a few examples of cluttered instances with different amounts of clutter.

We collected 96 multi-boundary shapes from the Internet.We considered 9 different relative amounts of clutter:0%,25%,50%,75%,100%,150%,200%,250%,and 300%. For each shape and clutter level,we created 4 cluttered instances.This gave 3456 cluttered instances in total.Evaluation was performed as in Section 5.2.

Fig.12 Cluttered instances,indicating the amount of clutter relative to the original total edge length.

Figure 13 plots the results of our experiment.Since most existing descriptors do not support matching shapes with multiple boundaries and open curves,we only compared our approach with shape contexts and PAD-DF.Our method is much more successful at matching than the other two methods in the presence of cluttered edges.Shape contexts are highly in fluenced by cluttered edges since they affect the global quantity used for normalization.PADDF is also affected by cluttered edges because the distance field is extremely sensitive to noise.Our method works best since cluttered edges are filtered out in the transformation domain.

6 Discussion and conclusions

In this paper we have presented a new shape detection approach that robustly detects shapes in cluttered images.By utilizing PAD,our method is able to support shape matching for both open curves and closed shapes, in a scaleinvariant,rotational-invariant,and translationalinvariant manner.Moreover,our method can detect shape in the presence of partial occlusion.

Our method also has certain limitations. It is sensitive to shape deformation,including change of perspective,because the shape descriptor we use to calculate the transformation does not provide perspective invariance. Figure 14(b)shows a failure in such a case.Although we successfully find the location of the Apple logo in the image,we fail to match the whole shape with a correct transformation.Our method is also sensitive to noise of a kind that leads to changes in curvature of boundary points.

Fig.14 Matching under perspective transformation.(a)Input image.(b)Incorrectly detected logo.

Acknowledgements

This project was supported by the Research Grants Council of the Hong Kong Special Administrative Region,under the RGC General Research Fund(Project No.CUHK 14217516).

Computational Visual Media2018年2期

Computational Visual Media的其它文章: Magic sheets:Visual cryptography with common shares; Automatic texture exemplar extraction based on global and local textureness measures; Diffusion curves with diffusion coefficients; EZ-Manipulator:Designing a mobile,fast,and ambiguity-free 3D manipulation interface using smartphones; Knowledge graph construction with structure and parameter learning for indoor scene design; Surface remeshing with robust user-guided segmentation