RobustCameraPoseEstimation:强大的摄像机位姿估计

Robust Camera Pose Estimation Using 2D Fiducials Tracking for Real-Time Augmented Reality Systems

Fakhr-eddine Ababsa*

Laboratoire Systèmes Complexes. CNRS FRE 2494 40, Rue du Pelvoux, 91020 Evry ,France

Malik Mallem†

Laboratoire Systèmes Complexes. CNRS FRE 2494 40, Rue du Pelvoux, 91020 Evry ,France

Abstract

Augmented reality (AR) deals with the problem of dynamically and accurately align virtual objects with the real world. Among the used methods, vision-based techniques have advantages for AR applications, their registration can be very accurate, and there is no delay between the motion of real and virtual scenes. However, the downfall of these approaches is their high computational cost and lack of robustness. To address these shortcomings we propose a robust camera pose estimation method based on tracking calibrated fiducials in a known 3D environment, the camera location is dynamically computed by the Orthogonal Iteration Algorithm. Experimental results show the robustness

and the effectiveness of our approach in the context of real-time AR tracking.

Keywords: Augmented reality, fiducials tracking, camera pose estimation, computer vision.

1 Introduction

AR systems attempt to enhance an operator's view of the real environment by adding virtual objects, such as text, 2D images, or 3D models, to the display in a realistic manner. It is clear that the sensation of realism felt by the operator in an augmented reality environment is directly related to the stability and the accuracy of the registration between the virtual and real world objects, if the virtual objects shift or jitter, the effectiveness of the augmentation is lost.

Several AR systems have been developed these last years, they can be subdivided into two categories: Vision-based AR systems (indirect vision) and see-through AR systems (direct vision). Vision-based techniques have more advantages for AR applications. First, the same video camera used to capture real scenes also serves as a tracking device. Second, the pose calculation is most accurate in the image plane, thereby minimizing the perceived image alignment error. Additionally, processing delays in the video and graphics subsystems can be matched, thereby eliminating dynamic alignment errors [Neumann and Cho, 1996]. Recently, several vision based methods of esti

mating position information from known landmarks in the real world scene have been proposed. Bajura and Neumann used LEDs as landmarks and demonstrated vision-based registration for AR systems [Bajura and Neumann, 1995]. Uenohara and Kanade used template matching for object registration [Uenohara and Kanade, 1995]. State et al. proposed a hybrid method of combining landmark tracking and magnetic tracking (they used color markers as landmarks) [State et al. 1996].

--------------------------------------------

*e-mail:********************.fr

政治副中心†e-mail:********************.fr In this paper we propose a robust camera pose estimation method based on tracking calibrated 2D fiducials in a known 3D environment. To efficiently compute the camera pose associated with the current image, we combine results of the fiducials tracking method with the Orthogonal Iteration (OI) Algorithm [Lu et al. 2000]. Indeed, the OI algorithm usually converges in five to ten iterations from very general geometrical configurations. In addition, it outperforms the Levenberg-Marquardt method, one of the most reliable optimization methods currently in use, in terms of both accuracy against noise and robustness against outliers. Knowing the camera poses for each image frame, we can integrate virtual objects into a video segment.

The remainder of this paper is organized as follows. Section 2 is devoted to the system o verview. Section 3 describes in details the 2D fiducials tracking algorithm. Section 4 introduces the Orthogonal Iteration Algorithm and its adaptation to compute the camera pose. Experimental results are then presented in section 5, which show the stability, the robustness to scale, orientation, and the computational performance of our approach. Finally, section 6 provides conclusions.

2 System Overview

Our vision-based AR system is composed of four main components (figure1):

§2D fiducials detection: detect 2D markers in each new video image.

§2D-3D correspondence: identification of the detected fiducials allows to match 2D image features with their calibrated 3D features.

§Camera pose estimation: estimating camera pose based on 2D-3D correspondence.

§Virtual world registration: the final output of the system is an accurate estimate of camera pose that specifies a virtual camera used to project the virtual world into the current video image.

Figure 1. Vision-based AR system architecture

3 Fiducials Tracking Algorithm

In our approach we have considered a square-shaped fiducial (figure 2.a) with a fixed, black band exterior surrounding a unique image interior. The outer black band allows for location of a candidate fiducial in a captured image and the interior image allows for identification of the candidate from a set of expected images. The four corners of the located fiducial allow for the unambiguous determination of the position and orientation of the

follows:

Image binarization:rapidly.

Fiducials recognition: for each selected region, the system takes the four corners points and maps the enclosed area to a standard 100x100 template shape. The normalized templates are then compared to the stored ones at all four orientations. A variety of methods are possible for comparing images, we have used the correlation coefficient method because it is luminance invariant. So, the mean and standard deviations for the normalized template I and stored pattern P are first computed:

∑∑

=µx

y xy

I )y ,x (I 1 (1)

=µx

P )y ,x (P 1

(2)

()

∑∑µ−=

σx

政治权利是什么

I )y ,x (I 2 (3)

()∑∑µ

−=

σx

P )y ,x (P 2

(4)

Then, the correlation coefficient is computed as:

[][

I x

)y ,x (P )y ,x (I σσµ−⋅µ−=

ρ∑∑ (5)

(a) Original Image (b) Binarization

Finally, a correlation matrix is created, relating each found marker to each stored template. It allows to allocate the markers to templates by finding the greatest correlation coefficient.

4 Camera Pose Estimation

The recognized marker region is used for estimating the current

camera position and orientation relative to the world coordinate system. From the coordinates of four corners of the marker region on the projective image plane, a matrix representing the translation and rotation of the camera in the real world coordinate system can be calculated. Several algorithms have been developed last years. Examples are the Hung-Yeh-Harwood pose estimation algorithm [Hung et al. 1985] and the Rekimoto 3-D position reconstruction algorithm [Rekimoto and Ayatsuka, 2000]. In this work we adapted the algorithm

wall

cupboard

proposed by Lu et al. [Lu et al. 2000], namely the Orthogonal Iteration Algorithm, to perform the camera pose estimation.

4.1. Camera Model and Coordinates

The configuration of our system includes only a moving CCD video camera. There are three principal coordinate systems, as illustrated in Figure 4: the world coordinate system W, the camera-centered coordinate system C. and the 2D image coordinate system U.

Camera reference frame

X X

Figure 4. Camera model and the related coordinates systems

A pinhole camera models the imaging process. The origin of C is at the projection center of camera. The transformation from W to C is:

[]

⋅=

w w w c c c z y x T R z y x (6)

BLACK-SCHOLES模型

where the rotation matrix R and the translation vector T characterize the orientation and the position of the camera with respect to the world coordinate frame. Under perspective projection, the transformation from W to U is:

[][]

⋅⋅=

w w w u u z y x T R K y x 1 (7)

where the matrix K :

αα=10

0000v f u f K y x (8)

contains the intrinsic parameters of the camera, f is the focal

length of camera, αx , αy are the horizontal and vertical pixel sizes on the imaging plane, and (u 0,v 0) is the projection of camera center (principal point) on the image plane.

4.2. Camera Calibration

Internal, as well as, external camera parameters are determined by an automated (i.e. with no user interaction) camera calibration

procedure. A highly precise camera calibration is required for a good initialisation of the camera pose tracker. For that purpose, we have used our fiducilas tracking algorithm to generate enough 2D-3D matched points. The calibration parameters are then computed by an iterative least-squares estimation [Faugeras, 1993].

The intrinsic parameters K remain constant during the camera tracking mode. The external parameters describe the transformation (rotation and translation) from world to camera coordinates and undergo dynamic changes during a session (e.g. camera motion). Once the camera calibration is finished, the system passes in tracking mode, and uses the obtained external camera parameters for the first initialisation of the camera pose. The current camera pose is then computed using the OI algorithm described below.

4.3. Orthogonal Iteration Algorithm

The OI algorithm allows to dynamically determine the external camera parameters using 2D-3D correspondences established by the 2D fiducials tracking algorithm from the current video image.

The main idea of this algorithm is first in defining pose estimation using an appropriate object space error function, in this case object-space collinearity error vector, and then in showing that this function can be rewritten in a way which admits an iteration based on the solution to the 3D-3D pose estimation or absolute orientation problem [Arun et al. 1987].

Otherwise, the OI algorithm converge to an optimum for any set of observed points and any starting point R (0). However, in order to reduce the average number of iteration taken by OI to converge, we initialize it near the optimum for each new acquired image. So, at time t (corresponding to the current image), we initialize the rotation matrix by the matrix R found at time t-1 (corresponding to the previous image).

5 Results and Discussion

In our experiments we recorded an image sequence from a moving camera pointing at the wall and the cupboard (Figure 2.b). One fiducial can be seen, at least, in this area. The frame rate is 25 frames/s and there are 1000 frames in the over 40 second long sequence. We tracked the 2D fiducia

ls on every frame. When the system identifies a detected fiducial, the corresponding overlay information is retrieved from the database (in this case 3D two wire frame models: a cube and a pyramid). Using the estimated camera pose, these virtual objects can correctly be superimposed on the video image.

Figure 5 shows four frames of the video sequence showing virtual objects rendering. For each frame, the camera pose was estimated using two 2D detected fiducials. From figures (5-a), (5-b), (5-c) and (5-d) we can see that virtual objects are well superimposed on the real world. Our current implementation exhibits an average reprojection error between 0.7 and 1.2 pixels.

新三国穿帮00

(a) frame 0 (b) frame 50

Figure 5. Camera tracking results

Figure 6, illustrates the robustness of our approach to:

§Effects of scales, the major advantage in using corners for tracking is that corners are invariant to scale. Figure (6-a)

shows that our 2D fiducials tracker can detect and identify markers in spite of the large range of distances from the

camera.

§Poor detection: figure (6-b) illustrates the ability of our system to well estimate the camera pose when only one fiducial is detected.

§Effects of orientations, due to perspective distortion, a square on the original pattern does not necessarily remain square when viewed at a sharp angle and projected into image space. F igure (6-c) illustrates the efficiency of our system in such situations.

Otherwise, real-time performance of our system has been

young folksachieved by carefully evaluating each processing step. We have

implemented our system on an Intel Pentium 3 500 MHz PC

equipped with a Matrox 2 acquisition card and an iS2 IS-800

CCD camera. The average processing time per frame when viewing two fiducials is as fellows:

Fiducials identification : 29 ms

Camera pose estimation : 4 ms

Augmentation time : 2 ms

As can be seen, processing times are very acceptable for real time implementation.

6 Conclusion

In this paper we described a robust solution for vision based augmented reality tracking that identifies and tracks, in real-time, known 2D fiducials made up of corners, in order to estimate the camera pose. The major advantages of tracking corners are their detection robustness at a large range of distances, and their reliability under severe orientations. Additionally, we have adapted the orthogonal iteration algorithm to our problem and have demonstrate its efficiency in such applications.

(a) Effects of scales (b) Poor detection

Figure 6. The system robustness References

Neumann, U., AND Cho, Y. 1996. A self-tracking Augmented Reality Systems". In Proceedings of ACM Virtual Reality Software and Technology. 109-115.

Bajura, M., AND Neumann, U. 1995. Dynamic registration correction in augmented reality systems. In Virtual Reality Annual International Symposium (VRAIS'95). 189-196.

Uenohara, M., AND Kanade, T. 1995. Real-time vision based object registration for image overlay. Journal of the Computer in Biology and Medicine. 249-260.面子理论

State, A, Hirota, G., Chen, D. T., Garrett, W. F., AND Livingston, M. A.. 1996. Superior augmented registration by integrating landmark tracking and magnetic tracking. In SIGRAPH'96 Proceedings.

Lu, C. P, Hager, G. D., AND Mjolsness, E. 2000. Fast and globally convergent pose estimation from video images. In IEEE trans. Pattern Analysis and Machine Intelligence, Vol. 22 no. 6, 610-622.

Kato, H., AND Billinghurst, M. 1999. Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System. In Proceedings of 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR ‘99). 85 -94.

Hung, Y., Yeh, P., AND Harwood, D. 1985. Passive Ranging to Known Planar Point Sets. In Proceeding of IEEE International Conference on Robotics and Automation, Vol. 1,.80-85.

Rekimoto, J., AND Ayatsuka, Y. 2000. CyberCode: Designing Augmented Reality Environments with Visual Tags. Designing Augmented Reality Environments. In DARE (2000).

Faugeras, O. 1993. Three-dimentional computer vision: ageometric viewpoint. MIT Press.

Arun, K.S., Huang, T.S., AND Blostein, S.D. 1987. Least-Squares Fitting of Two 3D Point Sets. In IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, 698-700.

本文发布于:2024-09-20 17:24:42，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/645594.html

上一篇：a new approach to the robust wire bonding

下一篇：Robust Protein Hydrogels from Silkworm Silk

标签：政治穿帮面子中心三国理论

留言与评论（共有 0 条评论）