TMCnet News

Scaling and Rotation Insensitive 3DSIFT for Volumetric Image Corresponding [Sensors & Transducers (Canada)]
[April 22, 2014]

Scaling and Rotation Insensitive 3DSIFT for Volumetric Image Corresponding [Sensors & Transducers (Canada)]


(Sensors & Transducers (Canada) Via Acquire Media NewsEdge) Abstract: This paper presents an Improved 3DSIFT (I3DSIFT) method to extract coarse correspondences between two volumetric images. The method is insensitive to local scaling and rotation deformation. We deal with the rotation deformation by reorienting a scale related feature region according to its dominant directions and use scale selection during the descriptor construction procedure to solve the scaling deformation. Experiments on synthetic images and real lung CT images from different time series show the superiority of the I3DSIFT over the state of art. Furthermore, basing on the volumetric image corresponding result, we give an application on feature guided image registration. Copyright © 2013 IFSA.



Keywords: Volumetric Image Corresponding, Feature Modeling, 3DSIFT (ProQuest: ... denotes formulae omitted.) 1. Introduction Volumetric image corresponding is the process to establish coarse correspondences between two volume images. It poses an important and fundamental computational problem in geometric computing for images and videos. With the established one-to-one matching between two volume images, the comparison, integration, and deformation or motion analysis of these volume images (image sequences) can greatly benefit, which also facilitate many subsequent applications such as landmark based elastic registration [1-3], hybrid intensity and landmark based non-rigid registration [1], evaluation of registration [2-5], volumetric ultrasound panorama [6], object recognition [7], three dimension classification [8], to name a few.

Robustly extracting correspondences between two volumetric medical images is often highly challenging, because these acquired data usually characterize the deformations of organs, and hence, often consists of global and local non-rigid transformations. Because it is impossible to pre-know the non-rigid transformation between two medical images, we simplify the problem of non-rigid transformation into the local scaling and rotation transformation. It is reasonable to do like this, since the local scaling and rotation transformation is the main part of non-rigid transformation. Non-rigid transformation can be approximated by the combination of one or more local scaling/rotation transform.


An effective volumetric image corresponding method is the 3DSIFT [9-11] algorithm. 3DSIFT is an extension of the widely-used 2D SIFT descriptors [12, 13], and it shows great discriminative ability on volume images. However, the existing 3DSIFT algorithms are still inadequate in handling dynamically deforming volumetric images such as aforementioned medical images, due to their sensitivity against the scaling and rotation change, especially the local scaling and rotation global scaling and rotation can be easily pre-calculated. However, this method will fail on other image types such as videos. What is more, the local scaling and rotation is hard to pre-calculate. Therefore, it is essential to design an image corresponding method that is insensitive to local scaling and rotation.

In the paper, we present an improved 3DSIFT algorithm. The algorithm is insensitive to global/local scaling and rotation deformation. We extend the method [12] used in planar images to three dimension to solve the scaling and rotation deformation. We define a scale related feature region and reorient it using dominant directions. The descriptor is constructed on the feature region. Therefore, we can get a scaling and rotation insensitive descriptor.

The remaining of this paper is organized as follows. We briefly review related work in Section 2. Then, we elaborate the improved 3DSIFT in Section 3 and some volumetric image corresponding results are illustrated in Section 4. We give an application on feature guided registration in Section 5. Finally, we conclude in Section 6.

2. Related Works There are some works on volumetric image corresponding. Scovanner et al. [10] proposed the 3DSIFT descriptor and applied it in action recognition. Since they randomly chose interest points, which have no scale information, this descriptor is sensitive to local and global scaling. Ni et al. [6] applied 3DSIFT in volumetric ultrasound panorama with scale space extreme detection and dominant direction reorientation. Cheung et al. extended the SIFT to the so called N-Dimension SIFT (N-SIFT) [9] and demonstrated its effectiveness using synthetic experiments on volumetric MRI and 4DCT data matching. N-SIFT algorithm is still sensitive to both scaling and rotation. Allaire et al. [11] proposed a full 3DSIFT algorithm. They filtered out potential keypoints that either have low contrast or locate on edges or ridges. Many correct corresponding keypoints are filtered out unnecessarily. As a consequence, only 0.1 % ~ 8 % of potential keypoints were kept, which is sometimes too few for reliable volumetric matching. Also, this 3DSIFT is sensitive to scale change.

Cheung et al. [9] proposed the global histogram feature and reoriented global histogram feature, both of them use single histogram on a global keypoint region, while the latter has an additional reorientation step. The reoriented global histogram feature normalizes the direction of the gradients with respect to the highest peak in the histogram, in order to increase robustness to orientation change. Unfortunately, both features were shown less discriminative than N-SIFT, for the lack of space information. Han [14] proposed a method based on 3DSURF. Feature points are first extracted; then on subdivided local regions surrounding each feature point the partial derivatives and their absolute values are accumulated, which compose the descriptor on this feature.

3. Improved 3DSIFT As discussed in Section 2, a reliable volumetric image corresponding algorithm should be insensitive to scale and rotation change. We propose an improved 3DSIFT (I3DSIFT) algorithm consisting of four main steps: keypoint detection, orientation assignment, direction reorientation, and descriptor construction. Our main contributions are in the third and fourth steps. Compared with the original 3DSIFT, our I3DSIFT is insensitive to rotation and scaling transformations. We will elaborate these four steps in the following four subsections.

3.1. Keypoint Detection Keypoint detection is an essential procedure in volumetric image matching. Keypoints can be defined as salient points that are different from their neighboring points in the image. In many image analysis problems, comers (extrema in very small neighboring regions) are used as keypoints [14] and comer detection algorithms [15] are used to identify keypoints. However, most of such comer-detection algorithms are sensitive to deformations as well as noise. On the other hand, keypoints can be identified through extremes of larger neighboring regions. Scale space analysis provides an effective approach to achieve this goal [16], and it was shown that such constmcted descriptor is robust against scaling, rotation, and noise. Koenderink [17] and Lindeberg [18] pointed out that Gaussian function is the only one possible scale space kernel under a series of reasonable assumption. Therefore, Gaussian function is widely accepted in the constmction of scale space, which is called Gaussian scale space.

Gaussian Scale space can be further extended to 3D. In the three-dimensional space, Gaussian function can be defined: ... (1) Given a volume image I, its Gaussian scale space can be defined as a stack of images that are smoothed using series of Gaussian function with variable a : ... (2) where * is the convolution operation.

DOG (Difference Of Gaussian) [12] is usually adopted to detect keypoints. Given a volume image I, its DOG image D can be defined as a stack of images that are smoothed using series of DOG functions with the variable a, ...

According to the property of the Gaussian function convolution, ...

Therefore, DOG image extremes are stable. For efficiency, an image pyramid with N octaves and S levels per octave is implemented. The image of the first level in each octave is a down-sampling (always by a factor of 2) one of the previous octave. The other level images are smoothed one of their previous level image. Finally, the extremes of 3 x 3 x 3 x 3 across scale in each octave are defined as keypoints DOG extremes can be automatically assigned a scale, which is denoted as scale selection. The sigma of the Gaussian image in which the keypoint shows up is selected as the keypoint's scale. Note that since an image pyramid is used, the scale should be assigned on each octave instead of the whole pyramid. For example, a keypoint that shows up on the S*h level image, its scale scl is assigned as: ... (3) A keypoint may be detected more than once in different scales, so that the keypoint has multiple scales.

3.2. Orientation Assignment Keypoint descriptor is another key problem in volumetric image matching. The choice of pixel attribute is important during the construction of descriptors. A good pixel attribute should be discriminative and invariant in some extent. Pixel gradient is a widely accepted pixel attribute. It has been shown that gradient-based descriptors, such as SIFT [12], are more discriminative than intensity based descriptors [13].

3DSIFT is a gradient-based descriptor. Voxel gradient can be depicted according to its azimuth (0), elevation (9), and magnitude m3D, see Fig. 1. They are defined as: ... (4) ... (5) ... (6) where Lx,Ly and Lz are the partial derivatives. Lx can be defined in discrete image space, as well as Ly and L , ...

3.3. Scale-related Direction Reorientation An important issue is how to deal with image rotation. Typically, this is done by defining a dominant direction and then we rotate the keypoint region accordingly. The key problem is the way to find a dominant direction that is representative of the keypoint. In the paper, we define a scale-related keypoint region and calculate the dominate direction on it.

In order to calculate the dominant directions, a 36x36 bin histogram is calculated in the surrounding region sized 6.0x scl . scl is the keypoint's scale computed from (5). Each bin covers a range of 360° = 10° . Magnitudes of points in the keypoint region are accumulated to their corresponding bins.

For the bin with the maximal value vBiat the direction that it indicates is the dominant direction. Specifically, we select the bins whose values are larger than 0.8vmar and also consider their directions as dominant directions.

When the dominant directions are calculated, we can reorient the keypoint region by multiplying point locations with the following matrix [10]: ...

where 0* and tp* are the dominant directions.

3.4. Scale-related Descriptor Construction In order to design a 3DSIFT descriptor that is insensitive to scale change, we adopt a two-step algorithm. First, we extract a local estimate of scale for each keypoint. Then, descriptor is constructed using an image patch related to the scale.

Since in our scale space extreme detection (Section 3.1), the estimate of the keypoint scale has been automatically obtained using scale selection, we can construct the descriptor less insensitive accordingly.

In the paper, a keypoint's descriptor is built on a surrounding region centered on the keypoint. After we finish the direction re-orientation, we divide the keypoint region into 4x4x4 sub-regions. The size of the sub-region is defined as 3.0*scl . If the keypoint has multi-scales, descriptor should be constructed on every scale.

In the following, the procedure of scale-related descriptor is illustrated, 1) We divide 0 and cp, 0,cp e [0,360) evenly into 8*8 bins, so that each bin represents a range of ...

2) We add magnitudes to their corresponding bins. In order to avoid all boundaries affects that the descriptor abruptly changes as a sample shifts from being within one sub-region to another or one orientation to another. We propose a 5-linear interpolation method to interpolate within three subregion dimensions (X,Y and Z) and two gradient orientations (0(x,y,z) and tp(x,y,z) ). Each entry into a bin is multiplied by a weight of 1 - d for each dimension, where d is the distance of the sample from central value of the bin as measured in units of the histogram bin spacing. After that, 8*8 bins are joined together as each sub-region's features.

3) We join features from 4x4x4 sub-regions together as the final descriptor.

Finally, the 3DSIFT descriptor has a length of 4x4x4x8x8 = 4096 . In the paper, we normalize the descriptor to [0,1] to discard information due to noise or illumination. We discard features below a threshold (0.2 in the paper) by setting it 0. After that, we normalize the descriptor to [0,1] again and get the final descriptor.

4. Experimental Results In order to demonstrate the effectiveness of I3DSIFT, we conduct experiments on volumetric images. Our experiment materials are lung volumetric images. We perform experiments on a dataset SI, SI includes 3 real volume images (SE_1, SE_5, SE_8) from different respiratory periods of a same person. The volume images in SI have a resolution of 465x465x20 . In preprocessing, we extract Region of Interests (ROI) through the graphcut image segmentation, and rescale the gray level of each volume image to [0, 255].

In our experiments, we detect scale space extremes as keypoints. We match descriptors based on NN (Nearest Neighborhood) search and only those matches whose ratio of NN distance and the second NN distance is smaller than a threshold T (T = 0.8 in our paper). We perform the NN search in two directions and a matched pair is kept only if it satisfies the NN-optimality in both directions. We conduct NN search using a k-d tree algorithm for the sake of efficiency. We adopt a 1.5 Voxel matching accuracy in the experiments on synthetic images. In the paper, most existing volumetric image matching algorithms are included to make a full range of comparison on volumetric image matching. The algorithms include global histogram (Global) [9], reoriented global histogram (Reor.) [9], NSIFT [9] and 3DSURF [14].

We implement a 3DSURF descriptor based on the paper [14] and use a source code provided in the paper [9] on global histogram, reoriented global histogram and NSIFT. I3DSIFT is implemented in C++ and ITK. Note that, the original object of our design of I3DSIFT is to be less insensitive to local scaling and rotation. However, it is hard to represent the local scaling and rotation using a single formula. Therefore, we consider the global scaling and rotation transformation which is part of the local scaling and rotation transformation. We show the performance of I3DSIFT on synthetic images to demonstrate the performance with scale and rotation change. Moreover, we show results on real data.

4.1. On Synthetic Images Rotation insensitive is an important property to evaluate volumetric matching methods. In order to demonstrate the performance of I3DSIFT with rotation change, we do matching on SE I and its synthetic images with a transform of rotating by 30° about Z axis. Table 1 shows the result. In Table 1, I3DSIFT holds the highest accuracy of 84.8 %, which is about 10 % higher than 3DSURF. In addition, the correct matches extracted by I3DSIFT are about 10 times more than that extracted by other algorithms. Therefore, I3DSIFT demonstrate its better robustness against image rotation.

Scale insensitive is another important property to evaluate volumetric matching methods. In order to demonstrate the performance of I3DSIFT with scale change, we show matching result on a volumetric image and its down-sampling synthetic images by a scaling factor of 0.8 along each of the 3 axes. Table 2 shows the result.

We can see that I3DSIFT gets an accuracy (70.8 %), about 2 % - 5 % higher than 3DSURF (68.2 %) and N-SIFT (65.2 %), while global histogram and reoriented global histogram has an accuracy of 35.7 % and 36 % respectively. Since all the comparison algorithm features have no scale information, they are sensitive to scale change. Both global histogram and reoriented global histogram features perform much worse than the others due to the lack of space information. Moreover, I3DSIFT finds the most correct matches. I3DSIFT is partially insensitive to scale change. In Fig. 2 and Fig. 3, we give some matching examples on rotation change and scale change.

Above all, I3DSIFT is very robust against rotation and insensitive to scaling.

4.2. Matching Real Data Our dataset SI are volumetric CT images scanned during respiratory cycles. SE_i indicates the /lh frame. We pick three frames 1, 5, 8, and use I3DSIFT to match their corresponding pairs: SE_15, SE_58, SE_18. The gray values of corresponding points vary little (i.e., the lighting condition is stable) in the volume images. Therefore, we compute mean of Sum of Square Difference D to valúate he matching results. Effective matching should produce small D value.

... (7) where IF and IM are the fixed image and the moving image, (xFi,xM) is the matching pair between the two images, N is the match number. Table 3 shows the results. We give some matching example on real data in Fig. 4, Fig. 5 and Fig. 6.

5. Feature-guided Image Registration Assume we have two volume images. One image, the moving image IM (x), is deformed to fit the other image, the fixed image IF(x) . IF and IM are each defined on their own spatial domain: fiF cz R3 and Qm cz R3 respectively. We are going to find a transformation T(x)=x+u(x) that makes IM(r(x)) spatially aligned to IF (x).

The transformation is defined as a mapping from the fixed image to the moving image, i.e. QF cz R3 -» Qm cz R3 . The quality of alignment is defined by a cost function C with a regularization term P, In here, we will adopt the sum of squared differences with a feature regularization term: ...(8) where A. weighs similarity against feature constraints. To solve the above minimization problem, there are basically two approaches: parametric and nonparametric. In here we will consider the parametric methods, that the number of possible transformation is limited by introducing a parameterization (model) of the transformation. The original optimization problem thus becomes: ...(9) where the subscript p indicates that the transformation has been parameterized. The vector p contains the values of the "transformation parameters". In here, we use the non-rigid B-spline transformation: ... (10) with xk the knot points, ß3(x) the cubic multidimensional B-spline polynomial, pk the Bspline coefficient vectors (the control point displacements), a he B-spline control point spacing, and Nx the set of all knot points within the compact support of the B-spline at x. The parameters p are formed by the B-spline coefficients pk .

The cost function is defined as: ...

with Qp the domain of the fixed image IF and | £AF | the number of Voxels. Given a transformation T, this measure can easily be implemented by loop over the Voxels in the fixed image, taking IF(xi) calculating ...by interpolation, and adding the squared difference to the sum.

Based on our proposed feature matching algorithm, we can get the feature correspondences between two volume images. They shall guide the spline-fitting. The regularization term therefore minimizes the distance of two corresponding feature points: ...

where | {xf} | indicates the size of the correspondence set, and x[ ,xf are corresponding features from the fixed and moving images, respectively.

Integrating the feature extraction step, we now apply our whole volumetric matching algorithm on the publicly available POPI dataset. In the 3D volume at time frame t, the coherent landmarks (a set of 3D points, denote as Pt = jpt ,,pt 2,...,pt |P| j are available and can be used to evaluate the registration.

In our experiments, we adopt the 3D consecutive image registration with the alignment of corresponded features enforced. The registration results are evaluated by the mean target registration error (MTRE) between the set of landmark points {P0,...,P9}. MTRE is defined by: ...(i2) where p, i is a landmark i in time t , Tr,t is the transformation from time r to time t . In our experiments, we set the control weight in (9) as A, = 0.1 Table 4 shows the comparison results between feature-guided matching and the registration algorithm without any feature constraints A. = 0. By comparing the MTRE errors, we can see the method with I3DSIFT outperforms (by 8% ) the method that do not consider the feature correspondences.

Based on our feature-guided image registration, we can do the dynamic tumor motion modeling. With our volumetric registration between consecutive images, we can build up (by linear interpolation) a temporal deformation model. The motion and dynamics of the tumor can be analyzed using this model. The coherent tracking of the deforming tumor is visualized in Fig. 7.

6. Conclusion In the paper, we present an improved 3DSIFT (I3DSIFT) algorithm. It is insensitive to scaling and rotation change. We demonstrate the effectiveness of our method on both synthetic and real data by showing that our algorithm outperforms the existing volumetric matching algorithm, such as N-SIFT [9], 3DSURF [9], global histogram [9], and the reoriented global histogram [9]. We can conclude that our I3DSIFT can be used for reliable volumetric matching.

A limitation of I3DSIFT is the expensive computational cost. Feature extraction based on 3DSIFT needs a convolution in 3-dimensional space so it is very slow. On the other hand, in this multiscale space, it can be easily paralleled. In the near future, we will explore its GPU implementation to achieve efficient computation.

References [1]. M. Urschler, C. Zach, H. Ditt, and H. Bischof, Automatic point landmark matching for regularizing nonlinear intensity registration: Application to $3 [2] . K. Murphy, B. van Ginneken, J. P. Pluim, S. Klein, and M. Staring, Semi-automatic reference standard construction for quantitative evaluation of lung CT registration, in Proceedings of the Medical Image Computing and Computer-Assisted Intervention(MICCAI' 08), 2008, pp. 1006-1013.

[3] . R. Castillo, E. Castillo, R. Guerra, V. E. Johnson, T. McPhail, A. K. Garg, et al, A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets, Physics in Medicine and Biology, Vol. 54, 2009, p. 1849.

[4] . Y. Yin, E. A. Hoffman, and C.-L. Lin, Mass preserving nonrigid registration of CT lung images using cubic B-spline, Medical Physics, Vol. 36,2009, p. 4213.

[5] . X. Huang, N. A. Hill, J. Ren, G. Guiraudon, D. Boughner, and T. M. Peters, Dynamic 3D ultrasound and MR image registration of the beating heart, in Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAP 05), 2005, pp. 171-178.

[6] . D. Ni, Y. Qu, X. Yang, Y. P. Chui, T.-T. Wong, S. S. Ho, et al, Volumetric ultrasound panorama based on 3D SIFT, in Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAP 08), 2008, pp. 52-60.

[7] . G. T. Flitton, T. P. Breckon, and N. M. Bouallagu, Object Recognition using 3D SIFT in Complex CT Volumes, in BMVC, 2010, pp. 1-12.

[8] . J. Knopp, M. Prasad, G. Willems, R. Timofte, and L. Van Gool, Hough transform and 3D SURF for robust three dimensional classification, in Proceedings of the Computer Vision (ECCV'10), 2010, pp. 589-602.

[9] . W. Cheung and G. Hamameh, N-sift: N-dimensional scale invariant feature transform for matching medical images, in Proceedings of the 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBF 07), 2007, pp. 720-723.

[10] . P. Scovanner, S. Ali, and M. Shah, A 3-dimensional sift descriptor and its application to action recognition, in Proceedings of the 15th International Conference on Multimedia, 2007, pp. 357-360.

[11] . S. Allaire, J. J. Kim, S. L. Breen, D. A. Jaffray, and V. Pekar, Full orientation invariance and improved feature selectivity of 3D SIFT with application to medical image analysis, in Proceedings of the IEEE Computer Vision and Pattern Recognition Workshops, (CVPRW'08). 2008, pp. 1-8.

[12] . D. G. Lowe, Distinctive image features from scaleinvariant keypoints, International Journal of Computer Vision, Vol. 60,2004, pp. 91-110.

[13] . K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, 2005, pp. 1615-1630.

[14] . X. Han, Feature-constrained nonlinear registration of lung CT images, MICCAI EMPIRE, Vol. 2010, 2010, p. 5.

[15] . J. Canny, A computational approach to edge detection, IEEE Transactions onPattern Analysis and Machine Intelligence, 1986, pp. 679-698.

[16] . A. Witkin, Scale-space filtering: A new approach to multi-scale description, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'84), 1984,pp. 150-153.

[17] . J. J. Koenderink, The structure of images, Biological Cybernetics, Vol. 50, 1984, pp. 363-370.

[18] . T. Lindeberg, Scale-space theory: A basic tool for analyzing structures at different scales, Journal of Applied Statistics, Vol. 21, 1994, pp. 225-270.

1* Peizhi Chen,1 Maoqing Li,2 Shuili Chen 1 Department of Automation, Xiamen University, 361005, China 2 School of Sciences, Jimei University, 361005, China *Tel.: 0086-13599901829, fax: 0086-0592-2580007 E-mail: [email protected] Received: 18 September 2013 Accepted: 22 November 2013 Published: 30 December 2013 (c) 2013 International Frequency Sensor Association

[ Back To TMCnet.com's Homepage ]