Hirotake YamazoeHiroshi Habe,Ikuhisa Mitsugami,and Yasushi Yagi
Abstract This paper proposes a depth measurement error model for consumer depth cameras such as the Microsoft Kinect,and a corresponding calibration method. These devices were originally designed as video game interfaces,and their output depth maps usually lack sufficient accuracy for 3D measurement.Models have been proposed to reduce these depth errors,but they only consider camera-related causes.Since the depth sensors are based on projectorcamera systems,we should also consider projectorrelated causes.Also,previous models require disparity observations,which are usually not output by such sensors,so cannot be employed in practice. We give an alternative error model for projector-camera based consumer depth cameras,based on their depth measurement algorithm,and intrinsic parameters of the camera and the projector;it does not need disparity values.We also give a corresponding new parameter estimation method which simply needs observation of a planar board.Our calibrated error model allows use of a consumer depth sensor as a 3D measuring device.Experimental results show the validity and effectiveness of the error model and calibration procedure.
Keywords consumer depth camera; intrinsic calibration;projector;distortion
Recently,various consumer depth cameras such as the Microsoft Kinect V1/V2,Asus Xtion,etc.have been released.Since such consumer depth sensors are inexpensive and easy to use,these devices are widely deployed in various fields for a wide variety of applications[1,2].
These consumer depth cameras can be divided into two categories:(i)projector-camera based systems in which a projector casts a structured pattern onto the surface of a target object,and(ii)time-of- flight(ToF)sensors that measure the time taken for light to travel from a source to an object and back to a sensor.ToF sensors generally give more accurate depths than projector-camera based ones,which are,however,still useful because of their simplicity and low cost.
Since projector-camera based devices include cameras and a projector,output errors may be caused in errors in determining the intrinsic parameters.As long as such devices are used as human interfaces for video games,such errors are unimportant. For example,even when a Kinect V1 captures a planar object,the resultant depth maps have errors(see Fig.1,left)as also reported elsewhere[3,4]. Thus,in this paper,we focus on projector-camera based consumer depth cameras and propose a depth error correction method based on their depth measurement algorithm. Various intrinsic calibration methods have already been proposed for Kinect and other projector-camera based depth cameras[3–11].Smisek et al.[3]and Herrera et al.[4]proposed calibration and depth correction methods for Kinect that reduce the depth observation errors. Raposo et al.[6]extended Herrera et al.’s method to improve stability and speed. However,their methods only considered distortion due to the infrared (IR) cameras.Since projector-camera based depth sensors include cameras and a projector,we should also consider projector-related sources of error.
We previously proposed a depth error model for Kinect including projector-related distortion[5].Darwish et al.[8]also proposed a calibration algorithm that considers both camera and projectorrelated parameters for Kinect.However,these methods as well as other previous methods require disparity observations,and these are not generally provided by such sensors.Thus,methods that require disparity observations cannot be employed in practice for error compensation for data from existing commercial sensors.
Some researchers employ non-parametric models for depth correction but a calibration board needs to be shown perpendicular to the sensor[9,11],or ground truth data obtained by simultaneous localization and mapping(SLAM)are required[12,13].Jin et al.[10]proposed a calibration method using cuboids,but,their method is also based on disparity observations.Other researchers proposed error distribution models for Kinect[14,15],but this research did not focus on error compensation.
To provide straightforward procedures for calibration and error compensation for depth data,including previously captured data,our method introduces a parametric error model that considers(i)both camera and projector distortion,and(ii)errors in the parameters used to convert disparity observations to actual disparity.To estimate the parameters in the error model,we propose a simple method that resembles the common color camera calibration method[16]. Having placed a planar calibration board in front of the depth camera and captured a set of images,our method efficiently optimizes the parameters allowing us to reduce the depth measurement errors(see Fig.1,right).Our compensation model only requires depth data,without the need for disparity observations.Thus we can apply our error compensation to any depth data captured by projector-camera based depth cameras.
We note that the calibration method introduced in this paper is designed for Kinect because it is the most common projector-camera based depth sensor.However,it potentially generally more useful because it is based on a principle common to other projectorcamera based depth sensors.
Section 2 describes the measurement algorithm used by Kinect,and Section 3 describes our parametric error model and parameter estimation.Section 4 shows experimental results demonstrating the effectiveness of our proposed method,while Section 5 summarizes our paper.
Fig.1 Left: observation errors in Kinect output. Right:compensated values using our method.
Since our method is based on the measurement algorithm used by the Kinect,we first outline this algorithm and this depth sensor,which consists of an IR camera and an IR projector.The IR projector projects special fixed patterns(speckle patterns)on the target observed by the IR camera.By comparing the observed and reference patterns captured in advance,Kinect estimates depth information for the target.The reference patterns are observations made by the IR camera when the IR projector casts the speckle pattern on the reference plane Π0[17](see Fig.2).
wherewis the baseline distance between the camera and the projector,fis the focal length of the IR camera(and the IR projector),andZ0is the distance between the reference plane Π0and the Kinect.
Fig.2 Depth measurement by Kinect.
ThenXi,which is the 3D position of pointQi,can be calculated as
wherexccandyccare the IR camera’s principal points andZiis the depth of pointQi.
Kinect does not output disparity values,but only normalized observationsδ′ifrom 0 to 2047(inKinect disparity units:kdu)[17],whereδi=mδ′i+n.The driver software for Kinect(Kinect for Windows SDK and OpenNI)uses these to calculate and output depth valuesZibased on the following equation:
The disparity between the camera and projectordican be expressed as follows:
Note that recent versions of the driver software do not support output of disparitiesδi′,so these are generally unobtainable.Instead,we propose a method to calibrate and compensate the depth data obtained by Kinect that does not require either the disparity or normalized disparity observations.
The depth measurement model described above holds only in an ideal case.In practice,when Kinect observes a planar target,the output depth maps have errors,as previously noted[3,4](see Fig.1).To be able to compensate for them,we consider not only camera distortion but also the projector distortion in our model.
2.2.1Distortion parameters
A well-known lens distortion model is
We assume the same distortion model can be used for the projector:
wherexpiandpiare the ideal and distorted 2D positions,andupigives the normalized coordinates ofxpi.[xpc,ypc]Tis the principal point of the projector,andkp1andkp2are the distortion parameters of the IR projector.
We now consider patternP(xpi)projected in the direction of pointxpi(see Fig.3).However,because of projector distortion,patternP(xpi)is actually projected in the direction of pointpi,patternP(xpi)is actually projected onto position
2.2.2Proposed error model
Fig.3 Proposed error model.
whereαandβare parameters for compensating errors inf,w,Z0,m,andn.A detailed derivation of Eq.(10)is shown in Appendix A.Thus,the ideal disparitydican be expressed as follows:
By introducingαandβ,we can compensate for errors in the parameters in Eq.(5)without observing the normalized disparity itself.Therefore,we calibrate not only the distortion parameters of the camera and the projector but alsoαandβallowing us to compensate for errors in these values.In the next section,we describe parameter estimation for this error model.
In consumer depth sensors,since the projection patterns cannot be controlled,we cannot directly estimate the projector’s distortion parameters.Instead,we estimate the error model parameters using the process flow shown in Fig.4.
First,we obtainNIR images and corresponding depth data for a calibration board(of known size and pattern)in arbitrary poses and positions.This lets us perform intrinsic calibration of the IR camera by Zhang’s method[16].As described in the previous section,we model the depth errors based on Eq.(11),and ideal disparitydiand observed disparityiare required.Here,we assume that the poses and positions of the board estimated by intrinsic camera calibration are ideal depth values,and calculate ideal disparitydifrom these poses and positions.The observed disparity values can be calculated from the observed depth values.Next we estimate the error model parameters by minimizing Eq.(11)based ondiandi.Table 1 summarizes the notation used in the following.
Fig.4 Process flow.
First,intrinsic calibration of the IR camera is performed using theNimages captured by the IR cameras,using Zhang’s method[16].For camera calibration,Xk,x(j)bk,the size of the chessboard,and the number of checker patterns on the chessboard should be given.Zhang’s method can estimate the focal length(f),the principal point(ucc,vcc),and the camera distortion parameters(kc={kc1,kc2}).The disparity differences caused by the camera lensdistortion may be considered. Letk′cbe camera the distortion parameter,and∈′cbe disparity error caused byk′c.Thenkcand∈ccan be expressed as follows:
Table 1 Notation
In addition,we can obtain the board’s poses and positions in each imagej: (R(j),t(j)). This information is used in the following processes.
Next,we estimate the distortion parameters for the projector and the disparity conversion parameters.To do so,we use the relations in Eq.(13)to give the following equation:
Fig.5 Relationship between 2D and 3D observations.
and we employ the approximate undistorted model[18].
Based on the above equations,we can estimate?kc,kp,α,andβby minimization as below:h
whereApare intrinsic parameters of the projector.Using Eq.(7),we can obtain the following equation:
We then estimate the optimal values of∈pi(andkp1,kp2),α,andβby minimizing Eq.(19)from these initial values.
Finally,we describe the compensation process for the depth data obtained from the depth sensors.
We performed the following experiments to confirm the validity of our proposed error model and error compensation.In the experiment,we used a Kinect for Xbox(Device 1,abbreviated Dev.1,etc.),a Kinect for Windows(Device 2),and an ASUS Xtion Pro(Device 3);all of these devices are based on the same Primesense measurement algorithm[20].We compared the compensated results using the following three models and the observed raw data:
(a)our proposed method;
(b)model considering camera distortion and conversion parameters(without∈p);
(c)model considering only camera distortion errors(with ?∈c);
(d)no compensation,i.e.,observed raw data.
We captured 12 observations of the chessboard in different arbitrary poses and positions in the experiments.The distances between the board and the device were about 500–1300 mm.A leave-oneout method was used for evaluating the validity of the proposed error model:one observation was used for evaluation and the remaining observations were used for estimating error model parameters.From the observations,we manually obtained the 2D positions of the chessboard corners(54 points per image).
Table 2 shows the residual errors after the calibration phase,and Table 3 shows the errors in evaluations. Here,the errors were calculated as the averaged distances between the compensated(or observed)positions and ground truth positions of the chessboard corners.We used the 3D positions obtained from the color camera observations as the ground truth positions.
These comparative results show that all three models can reduce errors compared to the uncompensated results,in both the calibration and evaluation phases.The errors compensated by(a)our proposed model were the lowest,followed by(b)the model that considered camera distortion and linear relations,and then(c)the model that considered only camera distortion.The number of parameters used in these models also has the same ordering:(a)has the most,followed by(b)and then(c).These results suggest that using all parameters considered in our proposed error model are helpful in improving the quality of the 3D depth data.
After calibration,we evaluated the flatness of the compensated observations for the chessboard,measuring plane fitting errors within the chessboard regions.Table 4 shows comparative results for these plane fitting errors.
These results show that the plane fitting errors in compensated observations from our proposed model(a)decreased,but on the other hand,typically for other methods(b)and(c),the plane fitting errors increased.These results suggest that all parameters considered in our proposed error model are required to improve the quality of the 3D depth data.
Table 2 Comparison of averaged errors during calibration
Table 3 Comparison of averaged errors in evaluation
Next,we evaluated the method’s robustness to errors in the given baseline lengthw.Our method assumes the target device’s baseline length is that given in such articles as Ref.[19].However,if it is not given,we need to measure it ourselves.In such cases,the measured length may include errors.Thus,we evaluated the robustness to errors in the baseline lengthwof up to±2 mm.
Table 5 shows the errors when the baseline includes errors.As can be seen,our proposed model can reduce errors between the compensated positions and the ground truth positions even when the given baseline length includes errors.This is because our model considers errors in the baseline lengthwas one of the parameters in Eq.(5).
These experimental results,confirm that our proposed model can improve the quality of 3D depth data obtained by consumer depth cameras such as Kinect and Xtion.
Table 4 Comparison of plane fitting errors in evaluation
In this paper,we have proposed and evaluated a depth error model for projector-camera based consumer depth cameras such as the Kinect,and an error compensation method based on calibration of the parameters involved.Since our method only requires depth data without disparity observations,we can apply it to any depth data captured by projector-camera based depth cameras such as the Kinect and Xtion. Our error model considers(i)both camera and projector distortion,and(ii)errors in the parameters used to convert from normalized disparity to depth data.The optimal model parameters can be estimated by showing a chessboard to the depth sensor using multiple arbitrary distances and poses.Experimental results show that the proposed error model can reduce depth measurement errors for both Kinect and Xtion by about 70%. Our proposed model has significant advantages when using a consumer depth camera as a 3D measuring device.
Future work includes further investigation of the error model, improvement of the optimization approach for parameter estimation,and implementation of a calibration tool based on the proposed error model for various projectorcamera based depth cameras,such as the Intel RealSense and Occipital Structure Sensor,as well as the Microsoft Kinect.
Table 5 Residual errors with varying baseline length errors
Appendix A Derivation of Eq.(10)
Considering errors in the parameters in Eq.(5),ican be expressed as follows:
Acknowledgements
This work was supported by the JST CREST“Behavior Understanding based on Intention-Gait Model”project.
Computational Visual Media2018年2期