DOI QR코드

DOI QR Code

WLSD: A Perceptual Stimulus Model Based Shape Descriptor

  • Li, Jiatong (School of Information and Electronics, Beijing Institute of Technology) ;
  • Zhao, Baojun (School of Information and Electronics, Beijing Institute of Technology) ;
  • Tang, Linbo (School of Information and Electronics, Beijing Institute of Technology) ;
  • Deng, Chenwei (School of Information and Electronics, Beijing Institute of Technology) ;
  • Han, Lu (School of Information and Electronics, Beijing Institute of Technology) ;
  • Wu, Jinghui (School of Information and Electronics, Beijing Institute of Technology)
  • Received : 2014.07.22
  • Accepted : 2014.11.20
  • Published : 2014.12.31

Abstract

Motivated by the Weber's Law, this paper proposes an efficient and robust shape descriptor based on the perceptual stimulus model, called Weber's Law Shape Descriptor (WLSD). It is based on the theory that human perception of a pattern depends not only on the change of stimulus intensity, but also on the original stimulus intensity. Invariant to scale and rotation is the intrinsic properties of WLSD. As a global shape descriptor, WLSD has far lower computation complexity while is as discriminative as state-of-art shape descriptors. Experimental results demonstrate the strong capability of the proposed method in handling shape retrieval.

Keywords

1. Introduction

With the development of the Internet and the rapid update of the electronic equipment, the efficient and effective image retrieval methods are required to satisfy the image retrieval needs of people. Unlike the traditional textual annotation based retrieval methods, content-based image retrieval (CBIR) systems, which represent the image as features like color, texture and shape to implement image matching, are expected to realize the function of searching image in high efficiency [1]. Shape is an important perception for humans to understand images. However, to establish effective shape feature extraction and matching method is still a challenging task. A robust shape descriptor is required to be invariant to translation, rotation and scale as well as insensitive to noise, tolerant to distortion and even occlusion [2]. In addition, the massive database requires the CBIR has quick retrieval response, therefore, low computation complexity is also necessary for a shape descriptor.

Many shape representation and analysis methods have been proposed in the past decades. They can be generally classified into two categories: contour-based methods and region-based methods. Among the two categories, contour-based methods are in the majority.

There are many contour-based methods. One of the classic methods is Curvature Scale Space (CSS) [3]. The CSS uses the zero-crossings of the contour curvature to divide the whole shape contour into convex/concave arcs. CSS smoothens the contour by Gaussian kernels of increasing scales, and finally the whole contour will be convex with pairs of the curvature zero-crossing points evolving together. The whole evolving process of the curvature zero-crossings can be illustrated by the CSS Image, whose similarity is used for shape matching. Another well-known shape descriptor is Shape Context (SC) [4]. At each reference point, SC uses the spatial location of remaining contour points to form the 2D histogram, and thus finds correspondence by the histogram sets of two shapes. In order to tackle the shape articulation, Inner Distance Shape Context (IDSC) [5] extends the SC, and proposes to replace the traditional Euclidean distance by inner distance, which is the shortest path between landmark points within the shape silhouette, and dynamic programming is used to preserve the ordering of the contour points. Unlike SC and IDSC which utilize local histogram matching methods, Contour Points Distribution Histogram (CPDH) [9] uses one histogram to describe the distribution of the whole contour points of a shape under polar coordinate, and the shape similarity is obtained by Earth Mover’s Distance (EMD). In [8], another shape descriptor called Contour Flexibility (CF) is proposed, which describes the shape contour points by their deformable potential and also uses dynamic programming to do shape matching. Besides designing discriminative shape descriptors, some new shape matching methods are also proposed. One of them is Hierarchical Procrustes Matching (HPM) [6]. HPM is proposed to conduct the shape matching hierarchically by using the longer segment matching result to predict the correspondence of the shorter and finer segment. In [7], Shape tree (ST) describes the shape by hierarchically dividing the shape contour into short sub-curves, and adopts the elastic matching method for shape comparison. The ST representation has good tolerance to shape distortion, and random shape deformations can be obtained by adding noise to the nodes in a shape tree without perceptually identifying the shape category.

The other major category of shape descriptor is region-based methods. For example, the Zernike moments (ZM) [10] is selected as the MPEG7 standard region-based shape descriptor, but it is relatively time consuming and has limited tolerance to shape distortion. Another moment-based shape descriptor is Moment Invariant (MI) [11]. Although MI performs not as well as ZM, it only has 7 feature vectors, with low storage and high speed retrieval response, so it is also a widely used shape analysis method. Recently, Support Vector Shape (SVS) [12] is proposed which uses the decision function trained by Support Vector Machine (SVM) to describe the shape. With the Radial Basis Function (RBF), SVS maps points within the shape to high dimension, and has good performance against large noise.

Recent years, there are some new trends in shape analysis. For instance, some work focus on the post-processing after shape matching [13] [14]. In [13], it is proposed to construct the graph model using the similarity (or distance) of pairs of shapes, and adopt the graph transduction to learn the graph structure implicitly. The context-based shape retrieval methods like graph transduction can well handle the situation in which intra-class distances are larger than inter-class ones and is capable to improve the retrieval rate based on available shape measures. In addition, there are researches about the heuristic-based auxiliary shape descriptor aims at describing some specific kinds of shapes. One of the representative work is [15]. In [15], two perceptually motivated strategies are proposed, the first handles shapes with base structure and “strand” structures, the second handles symmetry shapes. The two strategies can be integrated into existing shape matching methods to improve the retrieval or classification performance. In this paper, we focus more on shape representation and matching and do not focus on the above post-processing methods, since they can be used in all available shape measures.

The remainder of paper is organized as follows. In Section 2, we review the previous shape representation work in multi-scale which is in close relation to our approach. Section 3 describes the new shape representation in detail. Section 4 presents the experimental results using the proposed method for shape retrieval in two datasets, including the widely used MPEG-7 core dataset and Tari 1000 dataset respectively. The experimental result in MPEG-7 shows the competitive retrieval performance to the state-of-the-art shape descriptors, and the results in Tari 1000 demonstrate the proposed method is also very discriminative in handling articulated shapes.

 

2. Related Work

Contour-based methods are more popular than region-based ones and it can further be divided into global description methods and local description methods. Usually, local descriptors are more discriminative than global ones. The massive database requires not only high retrieval rate, but also low storage features and quick retrieval response. However, most local description methods aim to find correspondence of feature sets of contours, consequently sacrificing computation efficiency, which is not suitable for large database retrieval.

Multi-scale methods are an important research branch of shape analysis and have shown high discriminative ability [3][6][7][29][30]. However, most of the existing multi-scale methods still use the scale normalization to get the scale invariance, and circular shift matching to achieve the rotation invariance, all of which add to high computation load. In addition, the general multi-scale shape representation itself is another step that is time-consuming. For example, as discussed in Section 1, CSS achieves the multi-scale representation by smoothing the shape contour using Gaussian kernels. The same representation method is also adopted by [29].

This paper proposes a new contour-based shape descriptor, called Weber’s Law Shape Descriptor (WLSD). WLSD is a global shape descriptor with very low computation complexity while is as discriminative as the state-of-the-art local descriptors. In addition, WLSD is intrinsically extended to multi-scale without extra computation load, and it represents a shape by only 24 dimension vector each scale, which leads to efficient shape matching without scale normalization or circular shift.

WLSD aims to describe the salient variation of a shape contour that stimulate human perception. According to Weber’s Law, under the fact that human perception of a pattern depends not only on the stimulus difference but also on the ratio of the relative stimulus change to the original stimulus intensity, we first design Contour Angular Feature (CAF) to model the so-called stimulus intensity, and then establish WLSD according to the Weber’s Law Equation. WLSD is then extended to multi-scale to enhance its description capacity. We also adopt the feature selection method to select the effective scales of WLSD. The experiments demonstrate the discriminative power and low computation property of WLSD in shape retrieval.

 

3. Weber’s Law Shape Descriptor for Shape Representation

In this section, we first review Weber’s Law and then propose the Contour Angle Feature (CAF). We then adopt the CAF to construct WLSD, and demonstrate the importance of the multi-scale property of WLSD. Finally, we give the WLSD scale selection procedures by SFS.

3.1 Weber’s Law

Weber’s Law is a psychological rule. It demonstrates that the change in stimulus intensity varies by the original stimulus intensity, and the ratio of the just noticeable difference (JND) of the perception to the original stimulus is a constant [23] [24]. The relationship can be expressed as:

where ΔI denotes the JND, I represents the original stimulus intensity, k is the constant corresponding to the certain perception. The fraction ΔI/I is known as the Weber fraction or Weber proportion. For example, an experiment conducted by Weber in 1840 shows that people can feel the increase or decrease of the additional 1 gram if they hold a 52 grams weight object, but can only feel 2 gram if they hold 104 grams, which demonstrates the JND differs according to the original stimulus intensity.

3.2 WLSD

Weber’s Law gives us the motivation to a perceptual stimulus model. The first task is to construct a shape descriptor as the original stimulus intensity. Many shape descriptors have been proposed, such as the distance between the shape centroid and the contour, which is used in [25] to do the trademark retrieval, others, like the contour curvature, signature, Fourier Descriptor (FD), are either difficult to model Weber’s Law or are sensitive to scale or rotation variation, so we design a new shape descriptor called Contour Angular Feature (CAF).

3.2.1 The Original Stimulus Feature - Contour Angle Feature

We first introduce the single scale CAF, and then give the general definition of the multi-scale CAF.

Given a closed contour point O with its nearest right and left points A and B, then ∠AOB is called the CAF of point O. Note ∠AOB has two directions, inwards or outwards to the closed contour, and the sum of two angles are 2π. In this paper, we suppose the CAF always be the outwards one. CAF can be extended to multi-scale naturally, let CAFs signifies the s scale of CAF, then angle ∠AOB is CAF1, and CAF2 is the angle formed by contour point O with its second nearest right and left points, etc. Now we give the general definition of CAFs :

Definition I: Given a closed contour C , and sample it by n clockwise points, whose coordinates are orderly (xi,yi), where i=1,2,…,n, then the CAFs of O(xi,yi) is defined as:

where s=1,2,…,⎿(n-1)/2⏌ (⎿ ⏌ denotes round toward negative infinity), operator × represents the vector cross product, , .

In Definition I, the multi-scale CAFs can be obtained with s varying from 1 to ⎿(n-1)/2⏌, As and Bs are the right and left points relative to O respectively, and both are s points away from O. Consequently, every sample point can have the maximum ⎿(n-1)/2⏌ scale angles. Examples of the CAF1, CAF3, and CAF5 of point O(xi,yi) are shown in Fig. 1 (a). The reason why extending and to three dimension is to highlight the direction property of the vector cross product (for instance, ×>0 indicates that B→O→A is clockwise ordered), therefore, the definition guarantees angle always directing outwards the interior of a closed contour. Analogously, the CAFs of the opposite direction can be defined, which is equivalent to Definition I. The defined varies from 0 to 2π, indicating the local contour of the shape changing perceptually from inwards strand structure to outwards one (as illustrated by Fig. 1 (b)), so the CAFs is capable to describe all variation of a shape contour. Notice CAF is invariant to scale and rotation.

Fig. 1.(a) Illustration of the CAFs (b) CAFs varies from nearly 0 (the bottom square) to near π(the upper square)

3.2.2 Weber’s Law Shape Descriptor

Hinted by the Weber’s Law, we use the CAF difference between the current point and its neighbors to model the difference of the stimulus intensity, and proportion of CAF differences to CAF of the current point to get the saliency variation that stimulate human perception. We then adopt the arctangent function to operate on the proportion [17], which can reduce the negative effect caused by noise. Let θi be CAF1 of the contour point O(xi,yi), then the saliency of point O(xi,yi) can be described as:

where θi denotes the CAF1 of the current point, θmod(i+1,n) and θmod(i-1,n) represent the right and left neighbor points of the current point respectively. We call this feature the Weber’s Law Shape Descriptor (WLSD), and f(θi) the WLSD value of point O(xi,yi).

The idea of the WLSD is partially motivated by Weber’s Law Descriptor (WLD) in [17]. WLD is an image descriptor regards the gray intensity as the original stimulus intensity, and the difference of the stimulus is the intensity differences of a current pixel and its neighbors. While in WLSD, CAF is viewed as the stimulus intensity, and the difference of the stimulus is the CAF difference of a current contour point and its neighbors. However, the description targets of WLSD and WLD are different, leading to the necessity of miming more information of WLSD: the information content of a shape contour is much smaller than that of a gray image, and the CAF of a contour varies relatively more smoothly than that of gray intensity. That leads to the motivation of extending WLSD to multi-scale.

If we only use single neighbor based WLSD shown in Equation (3), the relatively large WLSD value points (salient points) will only occupy a small proportion of the whole contour points. Fig. 2 maps the WLSD value of contour points linearly to the colormap bar illustrated in Fig. 2 (e) so that we can observe the description capacity of WLSD visually (the minimum value maps to leftmost deep blue, maximum to the rightmost deep red). Fig. 2 depicts the single scale WLSD (i.e. the WLSD indicated by Equation (3)) of the contour. Because CAF1 varies slowly, WLSD value has a sparse distribution, with most of the WLSD value focus on the middle of the colormap bar (nearly 0). In addition, the single neighbor based WLSD value presents an intermittent appearance because of the noise or the continuity variation of the shape contour, which is an unstable and less discriminative description. Our goal is to let WLSD give a relatively continuous and robust description of the contour in accordance to human perception. As a result, we introduce the multi-scale WLSD.

Fig. 2.WLSD1→1(single scale WLSD) value linearly mapped to the colormap bar

The scale of WLSD has close relation with the scale of CAF, let WLSDw represents WLSD of scale w, given a contour sampled with n points and its corresponding CAFs, let w=1, then the WLSD1 of point O(xi,yi) is:

where signifies the CAFs defined as Equation (2). Consequently, each WLSDw has a corresponding scale of CAFs. In this sense, the scale of WLSD is also determined by the scale of CAFs implicitly. We use WLSDs→w represents the WLSDw extracted from CAFs, then the notation of the Equation (4) is WLSDs→1. Combining the scales of s and w, we introduce the general definition of WLSDs→w:

Definition II: Given a closed contour C , and sample it by n clockwise points, whose coordinates are orderly (xi,yi), where i=1,2,…,n, the WLSDs→w of point O(xi,yi) is defined as:

where is the CAFs of the contour point O(xi,yi), and w=1,…,⎿⎿(n-1)/2⏌/s⏌(⎿ ⏌ denotes round toward negative infinity).

Therefore, given the CAFs of a contour, the stimulus intensity difference of the current scale w is the difference between the current point’s CAFs and the CAFs of its right and left w points (the adjacent distance within each sides’ points is s).

Fig. 3 illustrates some examples of WLSDs→w. Red represents the current point, green denotes the right and left w number of points corresponding to different scale WLSDs→w.

Fig. 3.The illustration of the WLSDs→w

Fig. 4 illustrates that the multi-scale WLSD can describe the contour saliency well and has the desired continuous salient variation description property. (a) ~ (d) depict the WLSD5→5, and demonstrate that multi-scale WLSD has relatively uniform distribution within the WLSD range of value and can describe the contour saliency continuously. Fig. 4 also indicate WLSD has the similar description of intra-class shapes.

Fig. 4.WLSD5→5 value linearly mapped to the colormap

3.2.3 WLSD Histogram and WLSD scale selection

In order to let the feature be tolerant to the shape distortion to some extent, we uniformly divide the range of WLSDs→w value, and use one dimensional histogram as the global shape feature. Given two histograms H1 and H2, we adopt the commonly used χ2 distance to compare them:

where K represents the number of histogram bins, h1k and h2k signifies number of points falling into the k th bin of H1 and H2 respectively. In the paper we take K=24.

Different scale of WLSD present different discriminative information, for instance, WLSD3→3 can capture local information of the contour, while WLSD16→5 tends to describe the contour in a global view. As mentioned above, each WLSDw has a corresponding scale of CAFs. Given a n points contour, then the whole number of scales of CAF is ⎿(n-1)/2⏌, and each scale of the CAF corresponds to ⎿⎿(n-1)/2⏌/s⏌ scales of WLSD. Consequently, given a n points contour, we can get the whole number of scales of WLSD, which is denoted by SWLSD:

We want to select discriminative scales of WLSD among the whole number of scales shown in Equation (6), and combine the selected scales to be the final effective features. Motivated by the feature selection theory, we view the scale selection as a feature selection problem and use the idea of Sequential Forward Selection (SFS) [36]. First, we define the feature evaluation criterion according to specific application situation, and then select the scale in the candidate set that can improve the defined criterion iteratively until the criterion stops improving. It should be noted that the scales selected by SFS may be not the optimal, but it is relatively more time-saving than many other complicated algorithms, and is very efficient which can be verified by the experiment.

One of the problem above is that, the feature dimension will expand as the iterative times increase, then the shape matching computation will also increase. To overcome the above computation problem, we operate directly on the distance matrix instead of calculating the combined features. Suppose there are M training shapes, then the distance matrix is DM×M, whose elements are the distance between each pair of shapes. Suppose each contour is uniformly sampled by 200 points, according to Equation (6), the size of the initial candidate set SWLSD is 473, we denote them by:

The subscript of D is in accordance with that of WLSDs→w, it can be simplified denoted by {D1,D2,…,Dl,…,DSWLSD}. The WLSD scale selection algorithm steps are shown as Table 1:

Table 1.WLSD Scale Selection Algorithm

After the above training procedures, the selected scales are recorded in L. In specific application, for example, shape retrieval, we first train the samples to acquire the effective WLSD scales, and then compare shapes according to their combination distances of the selected WLSD scales.

 

4. Experimental Results and Analysis

In this section, we demonstrate the performance of WLSD in shape retrieval, and compare it with other state-of-art methods. We first do experiment in MPEG7 shape database. Others, like Kimia’s dataset, is also very popular in shape matching, but it is not suitable for training since its size is relatively small (Kimia’-99 dataset has 99 shapes with 11 shapes per category). Therefore, we choose the Tari dataset [21]. In the MPEG-7 dataset, we further conduct some specific analysis experiments, including the computation complexity analysis, the stability of the scale selection algorithm and the effectiveness of WLSD in comparison with CAF.

In the following experiments, the retrieval rate is measured by the so-called bull’s eye score. Let t be the number of shapes of the same class. Every shape in the database is compared to all other shapes, and the number of shapes from the same class among the 2t most similar shapes is reported. The bull’s eye retrieval rate is the ratio of the total number of shapes from the same class to the highest possible number (in MPEG7 is 20×1400). We apply the five-fold cross validation to demonstrate the performance of WLSD. The dataset is divided into five folds, one split for validation and the others for training. Note that each validation fold is tested on the whole dataset using the scales obtained from the corresponding four training splits, consequently, the combination of the five validation folds’ performance will include retrieval result of all the dataset samples.

Each contour is uniformly sampled by 200 points, according to Equation (6), the number of the initial candidate set SWLSD is 473. The experimental environment is Matlab 7.11.0, and the experimental platform is Intel(R) Core(TM) 2 Duo CPU, 2.53 GHz.

4.1 MPEG-7 dataset

4.1.1 Comparison with existing methods

MPEG-7 CE-Shape-1 Part B dataset is widely used in shape retrieval, it consists of total 1400 shapes, having 70 shape classes, and each class owns 20 pictures. The database is challenging due to the presence of much distortion and examples that are visually dissimilar from other members of their class and highly similar to members of other classes. The first example of each class is shown in Fig. 5:

Fig. 5.Examples of MPEG-7 shape dataset

The bull’s eye score is compared by the recently popular 13 shape analysis algorithms. From Table 2, it can be seen that the bull’s eye score of WLSD performs better than all the other methods, including multi-scale methods-[6][29][30]. Moreover, WLSD is much more efficient than all the methods listed on the table. In real application, after the scale selection and the feature extraction of the database having been done offline, the time complexity of shape retrieval using WLSD is only the time costs by feature extraction of the query shape, and the number of selected scales χ2 distance computation of the 24-dimension vectors. Under the above experimental environment, the pairwise shape matching time of WLSD is only 2.2ms with n=200, where n is the number of contour points.

Table 2.Comparison of bull’s eye score on MPEG-7 dataset

Considering the methods whose retrieval rate are above 85% listed on the table. Hierarchical Procrustes reports 300ms taken by matching two shapes [6]. The computation of pair-wise similarity between two shapes in Symbolic takes 76.5ms on average, with the contours represented by 100 points [33]. Shape L’ Â ne Rouge estimates the density takes on average 2 to 3 minutes per shape, which is also much time consuming than WLSD [7]. With 100 points, IDSC reports 0.31s on a 2.8G PC implemented by optimized Matlab code [5]. The main reasons of time saving of WLSD are two points, one is that WLSD is a global shape descriptor which is much faster than local descriptors that depend on local matching; the other is that WLSD is intrinsically extended to multi-scale without extra extension load, such as filtering the contour used in [18][29].

4.1.2 Experiments of the single scale WLSD

We first test the effectiveness of the single scale WLSD by showing its bull’s eye retrieval rate of each scale in Fig. 6 (a). Because of the relation of WLSDs and WLSDw, the scales mainly spread near the coordinates. The highest score of single scale is 58.30%, with the corresponding scale is WLSD10→5, while lowest score is only 33.16%. The results show that the performance of single scale WLSD is not satisfactory. Consequently, the multi-scale fusion is necessary so that different scales that contain different information about the contour variation can make good combination.

Fig. 6.(a) The retrieval rate of the single scale WLSD (b) The front and left elevation of Fig.6 (a)

From Fig. 6 (a), it can be seen that the outstanding scales of WLSD are mainly focused on the low scales relatively, because the score peaks are grouped around the original point. The trends begin to decrease as the scales become larger.

We also want to observe the performance of WLSDs and WLSDw respectively. So in Fig. 6 (b), the front and the left elevation of Fig.6 (a), the illustration of WLSDs and WLSDw, are drawn respectively. It illustrates that the outstanding scales of WLSDs are mainly range from 10 to 40, and the counterpart of WLSDw, which are relatively lower, range from 1 to 20. Moreover, the scores of WLSDs are obviously higher than those of WLSDw on average. It is precisely able to explain the characteristic of distribution of the selected scales, which we will show in the following multi-scale experiment.

Although it seems WLSDs performs better than WLSDw, however, it does not mean that WLSDw is not important, as it can be seen from Fig. 6 (a) that only after WLSDw starts to grow can the WLSDs→w reaches a jump increasing performance. It proves that WLSDs and WLSDw have good combination, and WLSDw makes necessary supplement to WLSDs.

4.1.3 Experiments of the multi-scale WLSD

In the first experiment in this section, we will demonstrate the effectiveness of the multi-scale integration. First, the detail of the iteration performance is discussed. In Fig. 7 (a), the curve shows that the retrieval rate increases by iteration using WLSD in MPEG-7. It demonstrates that the power of a single scale WLSD is not very satisfactory but the score increases shapely by 73.41% just after the second iteration. The retrieval rate arrives more than 80% with less than 4 scales and reaches above 85% with less than 8 scales, which shows the efficiency of the SFS. Finally, after ten iterations, the performance is relatively stable.

Fig. 7.(a) The improvement trend of the retrieval rate by the increase of the iteration times (b) The first five selected scales of each training set in MPEG-7

Then, we demonstrate the effectiveness of Weber’s Law applied to WLSD, the experiment using only the CAF is conducted and the result is also shown in Fig. 7 (a). The highest score obtained by CAF is 80.59%, which is much lower than WLSD. But the trend of the increasing retrieval rate is similar to that of WLSD. Therefore, WLSD can dramatically improve the discriminative performance compared with only CAF. The main reason is that CAF only has the multi-scale information of a shape contour, whereas WLSD can describe more than that with the saliency variation which implicitly indicates the relation between the contour points. The same increasing trend of the curve shown in Fig. 7 (a) also indicates that the multi-scale selection motivated by SFS is efficient and stable.

The distribution of the first several selected scales is further observed. Fig. 7 (b) illustrates the first five selected scales of the five-fold subsets, where the indication color is orderly green, blue, red, white and magenta. It can be seen that the variance of the selected scale becomes larger as the number of iteration increases, and the variance along WLSDs varies more dramatically than that of WLSDw. So the lower scales of WLSDs→w are relatively more stable.

In the second experiment, we analysis the WLSD scale distribution. The selected scales of five folds are shown in Fig. 8 (a) ~ (e) respectively, where the yellow area is all the possible WLSD scales and the green dots represent the selected scales. It is illustrated that the effective scales of WLSDw are mainly the lower scales between 1~16. Even though the scales of WLSDs spread relatively scattered, they are more concentrated among 1~20, despite of some sparse high scales. We conclude that the discriminative scales of WLSDs→w mainly concentrate in lower scales while higher scales make effective complement.

Fig. 8.(a) ~ (e): The distribution of the selected scales of each training set in MPEG-7 (f): The scales of retrieval rate more than 50% in MPEG-7

The scales of retrieval rate more than 50% are shown in Fig. 8 (f) in blue dots. Compared with the selected scales in Fig. 8 (a) ~ (e), the outstanding individual scales (we call them OIS) are more concentrated. We also find that many selected scales are not in the OIS set although the first selected scales are mainly in the OIS domain. Consequently, less outstanding scales are also very important since they make necessary supplement to the OIS.

In the third experiment, we demonstrate the robustness of the proposed scale selection algorithm. In Fig. 9 (a), the accumulation of the selected scales of the five training sets are illustrated. Compared with Fig. 7 (b), as more scales are included, the scales of WLSDs spread more uniformly than WLSDw. It can be explained that once the high WLSDs scale is selected, the high WLSDw scale is not necessary to some extent, or in another way, high WLSDw scales are not as discriminative as the lower ones. We also want to conduct the quantitative analysis of the variation of the selected scales. Consequently, in Fig. 9 (b), the variance of the first ten selected scales of the training sets are illustrated. From the first two figures, it can be seen that the variance of the first several selected scales of both WLSDs and WLSDw are very stable. Although there are some fluctuations in higher scales, the whole stability is satisfactory. We conclude that the proposed scale selection algorithm is stable in the dominant discriminative scales and have acceptable variation in the complementary scales.

Fig. 9.(a) The accumulation of the first ten selected scale of the training sets (b) The variance of the first ten selected scales

As shown in Fig. 9 (b), WLSDw is relatively more stable than WLSDs. Although WLSDs performs better than WLSDw, WLSDw has made more contribution in the stability of the whole WLSDs→w. We further draw the conclusion that WLSDs dominates the discriminability and WLSDw not only provides necessary supplement but also makes WLSDs→w more stable. Therefore, it demonstrates again the close relation between WLSDs and WLSDw.

4.2 Tari dataset

To demonstrate the WLSD can also handle articulated shapes, we test it on relatively new Tari dataset, which consists of 1000 binary images from 50 shape categories, each category has 20 images. It is designed to have large intra-class shape deformation and many shapes are articulated (see Fig. 10.). It has been extended from the original 180 images [21] to do the shape skeleton research [22].

Fig. 10.Examples of Tari dataset

We also adopt the bull’s eye score here, since the scores reported in this dataset are not as much as that in MPEG-7, the number of the compared algorithms are relatively small (as shown in Table 3).

Table 3.Comparison of bull’s eye score on Tari dataset

From Table 3, IDSC scores higher than WLSD since it is designed to handle articulated shapes and is insensitive to articulation. Overall, WLSD is comparable to the listed methods, which prove the proposed method can handle articulated shapes well. Note the compared methods are also much more time-consuming than WLSD since their local property or local matching scheme.

The iteration plot and the distribution of scales are illustrated in Fig. 11. The curve shows that after the second iteration, the retrieval rate increases significantly and begins approaching to the final rate. The selected scales cluster in lower scales and disperse in the higher scales. Both of the figures are in accordance with the test in MPEG-7 dataset, which demonstrates the discussion and conclusion in MPEG-7 experiment section, and shows the stability of the proposed method.

Fig. 11.(a) The curve shows the retrieval rate increases by iteration in Tari (b) The distribution of the selected scales of WLSD in Tari

In Fig. 12, it shows that the variance of WLSDw is much more stable than WLSDs. Compared with Fig. 9 (b), the average variance of WLSDs→w here is less variable, which can be explained that the Tari dataset has less distortion than MPEG7. In summary, it has the same variation feature as that of MPEG7. It demonstrates again the robustness of the scale selection method.

Fig. 12.The variance of the first ten selected scales

As described in the Abstract and Section 3.2.1, both the CAF and WLSD are insensitive to rotation and scale variation, we now demonstrate this property in the last experiment. We conduct the experiment by rotating the Tari dataset to 30˚ , 60˚ ,90˚ clockwise, and scale the dataset to 0.5, 1.5 and 2 as well. The test result is shown in Table 4. From the table, it can be seen that the rotated and the scaled scores are nearly the same to the original ones (the rotation 0˚ and scale 1), which demonstrates CAF and WLSD are robust to rotation and scale variation. We also find that some scores are even higher than the original, which may because of the dataset itself or the de-noising operation in the image transform. The scores of 1.5 and 2 scale transform are lower than that of 0.5 can validate this discussion.

Table 4.Bull’s eye score on rotated and scaled Tari dataset

 

5. Conclusion

In this paper we propose a new shape descriptor based on Weber’s Law, named Weber’s Law Shape Descriptor (WLSD). The key idea of WLSD is to capture the salient variation of a shape contour to stimulate human perception. We first design Contour Angular Feature (CAF) to construct the original stimulus intensity and propose WLSD according to Weber’s Law Equation, and then develop the WLSD to multi-scale as well as demonstrate the necessity of the multi-scale extension, finally, the feature selection framework is applied to the multi-scale WLSD to extract discriminative scales. The experiments show the effectiveness and the outstanding efficiency of WLSD.

The future work is notable from the theory foundation of WLSD, since Weber’s Law is instinctively associated with saliency. Therefore, we plan to enhance the selection scheme to select the salient WLSD that stimulate human views to further improve the discriminative ability.

References

  1. Mussarat, Yasmin, et al, "Content based image retrieval using combined features of shape, color and relevance feedback," KSII Trans. on Internet and Information Systems, vol. 7, no. 12, pp. 3149-3165, 2013. https://doi.org/10.3837/tiis.2013.12.011
  2. Tak, Yoon-Sik, and Eenjun Hwang, "Pruning and matching scheme for rotation invariant leaf image retrieval," KSII Trans. on Internet and Information Systems, vol. 2, no. 6, pp. 280-298, 2008. https://doi.org/10.3837/tiis.2008.06.001
  3. F. Mokhtarian and S. Abbasi, "Shape similarity retrieval under affine transforms," Pattern Recognition, vol. 35, no. 1, pp. 31-41, 2002. https://doi.org/10.1016/S0031-3203(01)00040-1
  4. Belongie Serge, Jitendra Malik, and Jan Puzicha, "Shape matching and object recognition using shape contexts," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, 2002. https://doi.org/10.1109/34.993558
  5. Ling Haibin and David W. Jacobs, "Shape classification using the inner-distance," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 29, no. 2, pp. 286-299, 2007. https://doi.org/10.1109/TPAMI.2007.41
  6. G. McNeill and S. Vijayakumar, "Hierarchical procrustes matching for shape retrieval," IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 885-894, 2006.
  7. P. F. Felzenszwalb and J. D. Schwartz, "Hierarchical matching of deformable shapes," IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1-8, 2007.
  8. C. Xu, J. Liu, and X. Tang, "2D shape matching by contour flexibility," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 180-186, 2009. https://doi.org/10.1109/TPAMI.2008.199
  9. X. Shu and X.J. Wu, "A novel contour descriptor for 2D shape matching and its application to image retrieval," Image and vision Computing, vol. 29, no. 4, pp. 286-294, 2011. https://doi.org/10.1016/j.imavis.2010.11.001
  10. W. Y. Kim and Y. S. Kim, "A region-based shape descriptor using Zernike moments," Signal Processing: Image Communication vol. 16, no. 1, pp. 95-102, 2000. https://doi.org/10.1016/S0923-5965(00)00019-9
  11. Zunic Jovisa, Kaoru Hirota, and P. L. Rosin, "A Hu moment invariant as a shape circularity measure," Pattern Recognition vol. 43, no. 1, pp. 47-57, 2010. https://doi.org/10.1016/j.patcog.2009.06.017
  12. Van Nguyen Hien, and Fatih Porikli, "Support vector shape: a classifier-based shape representation," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, no. 4, pp. 970-982, 2013. https://doi.org/10.1109/TPAMI.2012.186
  13. Bai Xiang, et al, "Learning context-sensitive shape similarity by graph transduction," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 861-874, 2010. https://doi.org/10.1109/TPAMI.2009.85
  14. Yang Xingwei, Suzan Koknar-Tezel and Longin Jan Latecki, "Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval," IEEE Conf. on Computer Vision and Pattern Recognition, pp. 357-364, 2009.
  15. Temlyakov, Andrew, et al, "Two perceptually motivated strategies for shape classification," IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2289-2296, 2010.
  16. Qi Heng, et al, "An effective solution for trademark image retrieval by combining shape description and feature matching," Pattern Recognition, vol. 43, no. 6, pp. 2017-2027, 2010. https://doi.org/10.1016/j.patcog.2010.01.007
  17. Chen Jie, et al, "WLD: A robust local image descriptor," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1705-1720, 2010. https://doi.org/10.1109/TPAMI.2009.155
  18. F. Mokhtarian, S. Abbasi, J. Kittler, "Efficient and robust retrieval by shape content through curvature scale space," Series on Software Engineering and Knowledge Engineering, vol. 8, pp. 51-58, 1997.
  19. Z. Tu, A. L. Yuille, "Shape matching and recognition-using generative models and informative features," Computer Vision-ECCV, Springer Berlin Heidelberg, pp. 195-209, 2004.
  20. Nanni Loris, Sheryl Brahnam and Alessandra Lumini, "Local phase quantization descriptor for improving shape retrieval/classification," Pattern Recognition Letters, vol. 33, no. 16, pp. 2254-2260, 2012. https://doi.org/10.1016/j.patrec.2012.07.007
  21. Aslan Cagri, et al, "Disconnected skeleton: Shape at its absolute scale," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 12, pp. 2188-2203, 2008. https://doi.org/10.1109/TPAMI.2007.70842
  22. Baseski Emre, Aykut Erdem and Sibel Tari, "Dissimilarity between two skeletal trees in a context," Pattern Recognition, vol. 42, no. 3, pp. 370-385, 2009. https://doi.org/10.1016/j.patcog.2008.05.022
  23. Gescheider, A. George, "Psychophysics: the fundamentals," Psychology Press, 2013.
  24. Jain, K. Anil, "Fundamentals of digital image processing," Prentice-Hall, Inc., 1989.
  25. Qi Heng, et al, "An effective solution for trademark image retrieval by combining shape description and feature matching," Pattern Recognition, vol. 43, no. 6, pp. 2017-2027, 2010. https://doi.org/10.1016/j.patcog.2010.01.007
  26. Zhou Yu, et al, "Shape matching using points co-occurrence pattern," in Proc. of 6th International Conf. on Image and Graphics, pp. 344-349, 2011.
  27. H. Ling, X. Yang, and L. J. Latecki, "Balancing deformability and discriminability for shape matching," Computer Vision-ECCV, Springer Berlin Heidelberg, pp. 411-424, 2010.
  28. Peter Adrian, Anand Rangarajan and Jeffrey Ho, "Shape L'Ane rouge: sliding wavelets for indexing and retrieval," IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
  29. T. Adamek and N. E. O'Connor, "A multiscale representation method for nonrigid shapes with a single closed contour," IEEE Trans. on Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 742-753, 2004. https://doi.org/10.1109/TCSVT.2004.826776
  30. Attalla Emad and Pepe Siy, "Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching," Pattern Recognition, vol. 38, no. 12, pp. 2229-2241, 2005. https://doi.org/10.1016/j.patcog.2005.02.009
  31. I. C. Paula, F. N. S Medeiros and F. N. Bezerra. "Shape retrieval by corners and dynamic space warping." in Proc. of 18th International Conference on Digital Signal Processing, pp. 1-6, 2013.
  32. F. Fotopoulou, G. Economou. "Multivariate angle scale descriptor of shape retrieval," in Proc. of Signal Process. Appl. Math. Electron. Commun (SPAMEC). pp. 105-108, 2011.
  33. M. R. Daliri, V. Torre, "Robust symbolic representation for shape recognition and retrieval," Pattern Recognition, vol. 41, no. 5, pp. 1782-1798, 2008. https://doi.org/10.1016/j.patcog.2007.10.020
  34. B. Wang, "Shape retrieval using combined Fourier features," Optics Communications, vol. 284, no. 14, pp. 3504-3508, 2011. https://doi.org/10.1016/j.optcom.2011.03.063
  35. Z Wang, J. Ouyang, "Shape classes registration and retrieval based on shape parts matching," Journal of Computational Information Systems, vol. 9, no. 4, pp. 1493-1499, 2013.
  36. L. C. Molina, L. Belanche, A. Nebot. "Feature selection algorithms: a survey and experimental evaluation," IEEE International Conf. on Data Mining, pp. 306-313, 2002.