Inexpensive Immersive Projection
Nancy P. Y. Yuen∗
William C. Thibault†
California State University East Bay
California State University East Bay
A BSTRACT
2
Most projector-based immersive displays have numerous problems
relating to complexity, space, and cost. We present a technique for
rendering perspectively correct images using a casual arrangement
of a projector, a mirror and a display surface.
Our technique renders an arbitrarily wide field of view using an
efficient GPU-based single-pass rendering algorithm. The rendering algorithm is preceded by a one-time camera-based geometric
correction calibration step. As we will describe, this technique can
be implemented with inexpensive, commodity hardware and using
readily available display surfaces. Thus, this technique enables immersive projection systems to be used in casual locations, such as a
classroom or even in the home.
Projectors are useful for creating large displays on surfaces of any
shape. However, to surround the viewer with imagery, 3D computer graphics rendering algorithms must address the limitations of
perspective projection. Wide fields-of-view create distortions when
using perspective, and even the widest perspective viewing volumes
must have a field-of-view of less than 180 degrees.
The usual approach is to use a cubemap [11] to store the
panoramic image prior to mapping it to the display surface. Each
face in a cubemap is created with a separate rendering pass over the
dataset. In each pass, the viewing and projection transformations
are set to that particular face. GPUs that support multiple render
targets can create a cubemap with a single pass over the geometry.
Cubemaps are attractive as they have simple, fast hardware implementations. However, most cubemap texture filtering implementations fail to correctly handle edges of the cube.
Traditional ”high-end immersive projection VR” systems such
as the CAVE [10] [9] feature high-resolution rear-projection, stereo,
and head-tracking. These systems are effective for creating a sense
of immersion that fills the viewer’s visual field while allowing freedom of motion. Although effective, these systems are expensive in
terms of hardware and physical space.
Small, inexpensive dome displays can be created using a single
projector and a spherical mirror, such as those sold for safety or
security applications [7]. These inexpensive mirrors are far from
perfect, however. Simple geodesic domes can be easily constructed
from common materials such as cardboard [14]. Software written
assuming a specific geometry for the projector, mirror, and dome
screen is becoming available [4].
Cameras are increasingly used to calibrate projected displays
[17] [16] [5] [15]. These typically use homographies (projective transformations) to model the projection process, which limits
their use to planar display surfaces. Another approach to the use
of camera-based calibration is Raskar’s two-pass algorithm [17],
which first renders the desired image to a texture, and in a second
pass renders a model of the display surface that is textured with the
image, using texture coordinates derived from the camera calibration.
The direct use of camera-projector correspondences to perform
this real time warp [19] [8] [13] has the advantage that non-linear
distortions in the projectors are handled. Also, screens of any shape
are supported, as display surface geometry is not needed. Commercial systems using this approach have begun to appear [2] [3]. The
technique creates an image that appears perspectively-correct from
a specific viewing location. The camera is placed at this location
during calibration. Any effects due to screen geometry are captured
in the resulting map from camera to projector coordinates. Thus,
the general technique of calibrating with a camera at the intended
viewing position can support screens of arbitrary shape, given that
the result will only appear correct from the intended viewpoint.
Other systems have used a vertex program to compute nonstandard projections. Raskar [18] used a vertex program to compute
a warp for a quadric display shape.
Our system differs from previous ones by allowing the creation
of the projected immersive imagery with a single pass over the input
geometry. We do not attempt to compute a warp and we do not use
a cubemap. We avoid the resampling errors possible when using the
Keywords: display algorithms, viewing algorithms, camera calibration, projector-camera systems
Index Terms:
I.3.3 [Computing Methodologies]: Computer
Graphics—Picture/Image Generation; I.4.1 [Image Processing and
Computer Vision]: Digitization and Image Capture—Camera Calibration
1
I NTRODUCTION
Projectors are often used to create a large field of view in an immersive environment. Such environments have many useful application in entertainment, production, and education. However, existing projector-based immersive environments have numerous complexity, space, and cost problems. They often require an expensive screen, and expensive projectors and lenses to function. Also,
precise and time-consuming positioning of the projector is often
needed. These factors drive the cost of immersive environments
beyond the reach of many institutions.
Our goal is to build an immersive projection system that meets
these requirements: uses minimal, affordable hardware; takes advantage of existing surfaces for display; requires minimal setup and
configuration; produces a large field of view, greater than 180 degrees; requires only a casual arrangement of system components;
avoids determining display surface geometry; supports real-time interaction.
This paper describes an inexpensive immersive projection system built using commodity parts to display images on diffuse surfaces. It uses a one-time camera-based geometric calibration step
to determine how the final image should be displayed. It does not
require the computation of projector intrinsics or extrinsics, and
functions for any arrangement of projector and display surface geometry. The result is a system that eliminates the usual cost and
complexity associated with projector-based immersive display environments. Further, our approach uses a GPU-based single-pass
rendering algorithm suitable for real-time interaction. Existing applications can be ported to use this approach with minimal modification.
∗ e-mail:nyuen@horizon.csueastbay.edu
† e-mail:william.thibault@csueastbay.edu
IEEE Virtual Reality 2008
8-12 March, Reno, Nevada, USA
978-1-4244-1972-2/08/$25.00 ©2008 IEEE
P RIOR W ORK
237
Figure 1: (a) eye coordinates; (b) ideal camera image coordinates; (c)
actual camera image coordinates
typical two-pass (render to texture, then render textured geometry)
algorithm. We use a texture to store what is essentially the mapping
from eye coordinates to projector (screen) coordinates. A vertex
program uses that texture to map vertices from eye coordinates to
projector coordinates in a single pass, replacing the perspective projection with a non-parametric one derived from the display system
geometry and projector optics. It therefore supports a wide range
of possible configurations. Also, our technique requires neither expensive equipment nor specialized infrastructure.
3 S YSTEM OVERVIEW
This section gives a brief overview of our immersive projection
technique. It is comprised of two steps: (i) a one-time calibration
step and (ii) a geometry transformation step. The one-time calibration step determines the correspondences between display pixels
(as imaged by a camera) and projector pixels. We call this set of
correspondences ”camera-projector correspondences.”
The geometry transformation step is applied at runtime to each
vertex of the input geometry. It is implemented as single-pass algorithm in a vertex program.
These steps are described in more detail in the following sections.
4 C ALIBRATION
The calibration step determines a set of camera-projector correspondences. Camera pixels correspond to direction vectors in
eye coordinates, once the camera’s intrinsic (lens) parameters are
known. We use a fisheye lens to enable imaging of a display with
a large field-of-view. By projecting a sequence of known patterns,
and imaging them with the camera, correspondences between projector and camera pixels are found.
4.1 Fisheye Camera Calibration
Camera intrinsics are used to transform camera image coordinates
to and from direction vectors. We use a fisheye lens to capture an
extremely wide field-of-view in a single image.
Calibration of fisheye lenses is not supported in most off-theshelf camera calibration software. We adopt the camera model of
Bakstein and Pajdla [6]. Their imaging model assumes a radiallysymmetric lens, and accounts for pixel aspect ratio and image center.
The transformation from world coordinates, X, to camera coordinates, X̃, is described by a rotation matrix R and vector T :
X̃ = RX + T
Let X̃ = [x, y, z]T . Then, let θ be the angle between the ray and
the z-axis, and φ the angle between√the x-axis and the vector [x, y]T
x2 +y2
(Figure 1(a)). Then, θ = tan−1
, and φ = tan−1 xy Now,
z
the distance from the image center, r, of the projection is modeled as r = a tan θb + c sin θd Let the location of the feature in the
“ideal” camera image be u′ = [r cos φ , r sin φ , 1]T (Figure 1(b)). To
account for image center, [u0 , v0 ]T , and pixel aspect ratio, β , in
the position of the projection of the world point X onto the image
238
Figure 2: Pixels on the camera and projector image planes for point on
the display surface.
plane, u = (u, v, 1), in “actual”
image pixel
coordinates (Figure
1 0 u0
1(c)), u = Ku′ , where K = 0 β1 v0 We express this fish0 0 1
eye transformation as F(X̃) = u.
The camera parameters are the three rotation angles, a threevector for the translation, the four parameters a, b, c, d for the nonlinear term, β , and u0 , v0 .
Calibration uses an image of a calibration object with a number
of point features at known world coordinate positions. Let N be
the number of points, Xi the world coordinate position of the ith point, ũi the position of Xi observed in the camera, and ui the
position predicted by the parameters. The parameters are found by
minimizing ∑N
i=1 ||ũi − ui ||.
4.2 Camera-Projector Correspondences
The geometry transformation step requires a mapping from camera image coordinates to projector image coordinates. Our technique builds this mapping using camera-projector pixel correspondences. We use the Brown and Seales [8] method, but any comparable method will suffice. A projector projects known patterns of
pixels onto the viewing surface. A camera is placed at the intended
viewing location. There, it captures images of the projected patterns
(structured light) on the display surface as an observer would see
them. These camera images are correlated with the projected images to derive a set of correspondences between camera image pixel
coordinates, Ci , to projector image pixel coordinates, Pi . Completion of this calibration step yields a set of image pixel coordinates
pairs (Ci , Pi ), that we refer to as camera-projector correspondences.
Figure 2 shows the (Ci , Pi ) pair for a point on the display surface.
We interpolate this sparse set of correspondences to obtain a dense
(per-pixel) mapping from camera pixels (essentially direction vectors in eye coordinates) to projector pixels. This dense mapping is
stored in a texture and used at runtime to project vertices, replacing
the usual perspective projection.
5 G EOMETRY T RANSFORMATION
This step maps input geometry from an application from eye coordinates to clip coordinates. For efficiency, we build a camera-toprojector coordinate map, M, and store it in a 2D texture.
5.1 Initialization
The camera-projector correspondences only map a sparse subset of
camera image pixel coordinates to projector image pixel coordinates. The remaining camera pixels are mapped by interpolating
from the nearest known correspondences.
Our technique stores M in a floating point texture, where each
texel represents a camera pixel and the texel’s red and green
channels’ values are normalized projector pixel coordinates. The
camera-projector correspondences form a distorted grid in camera
image space that can be treated as a set of triangles. The position
of each triangle vertex is the camera coordinate position Ci of a
Figure 3: From left to right: (a) Display surface as captured by camera.
(b) Structured light patterns. (c) M map as texture. (d) Overlays of the
structured light patterns and the texture map.
Figure 5: (a) A configuration for a geodesic dome screen. (b) Projection using our system.
Figure 4: Vertex processing pipeline, with our customized processing
in red replacing the usual perspective transformation.
correspondence. The color of the vertex is the projector coordinate
position, Pi , of that correspondence. We create the texture by rendering these triangles directly to the texture memory and configure
the graphics pipeline to automatically interpolate the vertex colors
across the interior pixels of the triangles.
The resulting M map will not completely fill the camera image
plane, but it will cover the maximum area given by the cameraprojector correspondences. Figure 3 shows the display surface, a set
of the structured light patterns captured by the camera, and the resulting M map texture from one of our setups. The irregular border
is due to the low resolution of our current structured light system.
When reading a texel’s color from the texture, extra information
is needed to be able to determine if that texel was filled or if it is
simply background color. To help make this distinction, we set the
vertex color’s blue channel to a known constant value.
The texture-based representation of the mapping only needs to be
generated once for a given projector arrangement. Once the texture
has been generated, it provides a fast method to map from camera
image coordinates to projector image coordinates.
pipeline with our customized processing in red.
Texture coordinates range from 0.0 to 1.0. Depending on the
projection model, it may be necessary to shift and normalize the
camera coordinates into that range. The texel color read represents
the normalized projector pixel coordinates P = M(C) .
Finally, to determine the clip coordinates of the projected
vertex, the z-coordinate is set to be the distance to the vertex in eye coordinates, and the w-coordinate is set to 1.0.
The vertex coordinate output from the vertex program is V =
[Px , Py , distance(modelview(V )), 1.0]. The pipeline’s subsequent
division by w will have no efffect. Points lying along the same
viewing direction will have the same x and y coordinates, but different zs, allowing for correct z-buffering.
One limitation of our approach is that the fixed pipeline rasterization of line, triangle, and polygon primitives assumes they were
transformed by a line-preserving transformation. Under certain situations, this condition does not hold and the geometry will be rendered incorrectly if line or triangle primitives are used. This is however only a practical concern when extreme close-up viewpoints are
chosen, or when models are coarsely tesselated. There is also an issue with clipping primitives that are partially outside the extent of
the M map. Point-based rendering can be used to overcome both of
these problems.
We currently render vertices as hardware-accelerated point primitives and therefore we require densely sampled models to avoid
gaps in the rendered image. However, our method can be extended
to support alternative point-based rendering techniques [12], such
as hierarchical splatting. Our system optionally uses triangle primitives for rendering: used with moderately tessellated models, this
yields images of good quality for typical views.
6
5.2
Vertex Program
A vertex program allows for custom per-vertex processing and our
technique takes advantage of it to implement the geometry transformation. It’s the vertex program’s responsibility to transform the
incoming vertex from the application into homogenous clip coordinates before passing it down the graphics pipeline for further
processing. So, our vertex program first transforms the incoming
vertex, V , to eye coordinates using the modelview transformation.
Next, the vertex program replaces the perspective transformation
that usually follows with our custom processing. The calibrating
camera’s parameters are applied to the eye coordinate position of
the vertex to compute its camera coordinate position, C.
However, camera image coordinates must be mapped to projector coordinates in order for the projector’s output to appear perspectively correct to the observer. Thus, we apply an additional
transformation by using the vertex’s camera image coordinates to
index into the M map texture. Figure 4 shows the vertex processing
I MPLEMENTATION
This section describes the components in our immersive projection
implementation.
Physically, the system uses a Dell 1850 XGA projector reflected
off an inexpensive, approximately hemispherical security mirror
to spread the light in a wide field of view on the display surface,
thus creating an immersive projection. Figure 5 shows one of
our geodesic dome configurations. A dual-core 2GHz Xeon PC
equipped with NVIDIA’s GeForce 8800 GTS video card runs the
application program and generates output images that are sent to
the projector. The calibration step uses a low cost, low resolution,
Unibrain Fire-i firewire camera with a relatively inexpensive fisheye lens from OmniTech Robotics.
The camera was calibrated using a 3D calibration object (Figure
6(a)). The calibration object is made from three pieces of plywood,
arranged orthogonally. A checkerboard of 1.5-inch squares is affixed to the surface. This forms a natural world coordinate system
with the innermost corner of the object as the origin.
239
Figure 6: From left to right: (a) Calibration object.
object captured by fisheye lens. (c) Feature locations.
(b) Calibration
The camera is placed within the calibration object, so as to fill as
much of the camera’s field of view as possible, and a image taken
(Figure 6(b)). Our feature locating software uses the the OpenCV
library’s [1] corner detector to locate the strongest corner features
with sub-pixel accuracy. The corner detector’s strength threshold
is controlled by the user via a slider. The world position of each
corner is determined by manual inspection of the image. We used
the 39 features marked in Figure 6(c).
The camera parameters R, T , a, b, c, d, u0 , v0 , and β were determined using the numerical minimization function NMinimize in
Mathematica 6 [20].
7 F UTURE W ORK
As previously mentioned, point-based rendering overcomes issues
that arise from the rasterization implemented in the fixed graphics
pipeline. Another possible solution to the rasterization issues is
to tesslate the input geometry. The latest graphics cards feature
geometry shader support that could be used perform this adaptively
on the GPU. We are pursuing alternatives in point-based rendering
and geometry shader tesselation.
The use of bilinearly-interpolated texel colors for the screen coordinates of projected vertices introduces artifacts (small ripples) in
the image that we are still investigating as of this writing.
Stereo could be supported, albeit at greater equipment cost, using
separate calibrations for each eye position. For passive stereo, the
two projectors could be casually positioned. Active stereo requires
expensive shutter glasses.
A possible modification to the texture-based transformation we
implement would be to use a cubemap texture instead of a 2D texture. This would allow direct indexing of the texture with the eye
coordinates of the vertex, bypassing the need to perform the fisheye
projection to the camera image plane.
We could also eliminate the need for the parameterized fisheye
projection calculation in the vertex program if we create the 2D
texture using a simpler spherical projection, and use the parameterized fisheye projection’s inverse to transform the camera coordinates used to create the texture. That is, if the simple projection is
S : R3 → R2 , we would use S(F −1 (Ci )) as the triangle vertex poe as
sition for correspondence i when creating the texture, and S(X)
the index into the texture at runtime.
8 C ONCLUSION
We have presented a technique for creating wide field-of-view images from input geometry using a GPU-based, single-pass rendering algorithm. Our system has several advantages. Existing applications can be easily modified to use our vertex program during
rendering. There are no special requirements for the geometry of
the display surface. Any available diffuse surface in a room, such
as the floor, wall or ceiling, will suffice. Thus any room can be
outfitted with this immersive projection system. The components
only need to be casually aligned, unlike other existing systems that
require precise alignment of projectors and display surfaces. Thus,
our system is easier to build and maintain. Our system uses affordable components. Most institutions already have projectors and
240
suitable computers. All that’s additionally needed to build our system is a low cost camera with fisheye lens and a spherical security
mirror; a total cost of approximately $350. Thus, our system does
not require an sizable investment to build, and is suitable for lowbudget situations such as schools, classrooms, and the home.
Acknowledgments: Thanks to Ifeyinwa Okoye and Robert Sajan for their work on the correspondence finder, to Michael Leung
for equipment support, and to the anonymous reviewers for their
helpful comments and suggestions.
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
http://sourceforge.net/projects/opencvlibrary/.
http://www.mersive.com.
http://www.scalabledisplay.com.
http://www.stellarium.com.
M. Ashdown, M. Flagg, R. Sukthankar, and J. M. Rehg. A flexible projector-camera system for multi-planar displays. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR04), 2004.
H. Bakstein and T. Pajdla. Calibration of a fish eye lens with field
of view larger than 180. In Proceedings of the CVWW 2002, pages
276–285, February 2002.
P. Bourke. Spherical mirror: a new approach to hemispherical dome
projection. In GRAPHITE ’05: Proceedings of the 3rd international
conference on Computer graphics and interactive techniques in Australasia and South East Asia, pages 281–284, New York, NY, USA,
2005. ACM Press.
M. S. Brown and W. B. Seales. Low-cost and easily constructed
large format display system. Technical Report HKUST TR-CS-0102, Hong Kong University of Science and Technology, 2001.
C. Cruz-Neira, D. J. Sandin, and T. A. DeFanti. Surround-screen
projection-based virtual reality: the design and implementation of the
cave. In SIGGRAPH ’93: Proceedings of the 20th annual conference on Computer graphics and interactive techniques, pages 135–
142, New York, NY, USA, 1993. ACM Press.
C. Cruz-Neira, D. J. Sandin, T. A. DeFanti, R. V. Kenyon, and J. C.
Hart. The cave: Audio visual experience automatic virtual environment. Communications of the ACM, 35(6):65–72, 1992.
N. Greene. Environment mapping and other applications of world projections. IEEE Computer Graphics and Applications, 6(11), November 1986.
M. Gross and H. Pfister. Point-Based Graphics. Elesvier, 2007.
M. Harville, B. Culbertson, I. Sobel, D. Gelb, A. Fitzhugh, and
D. Tanguay. Practical methods for geometric and photometric correction of tiled projector. In CVPRW ’06: Proceedings of the 2006
Conference on Computer Vision and Pattern Recognition Workshop,
page 5, Washington, DC, USA, 2006. IEEE Computer Society.
http://www.astronomyteacher.com/.
C. Jaynes, W. B. Seales, K. Calvert, Z. Fei, and J. Griffioen. The
metaverse: a networked collection of inexpensive, self-configuring,
immersive environments. In EGVE ’03: Proceedings of the workshop
on Virtual environments 2003, pages 115–124, New York, NY, USA,
2003. ACM Press.
A. Raij, G. Gill, A. Majumder, H. Towles, and H. Fuchs. Pixelflex2: A
comprehensive, automatic, casually-aligned multi-projector display.
In IEEE International Workshop on Projector-Camera Systems, October 2003.
R. Raskar. Immersive planar display using roughly aligned projectors.
In IEEE Virtual Reality, March 2000.
R. Raskar, J. vanBaar, and T. Willwacher. Quadric transfer for immersive curved display. In EUROGRAPHICS 2004, 2004.
J.-P. Tardif, S. Roy, and M. Trudeau. Multi-projectors for arbitrary
surfaces without explicit calibration nor reconstruction. In Fourth International Conference on 3-D Digital Imaging and Modeling, October 2003.
Wolfram Research, Inc. Mathematica. Wolfram Research, Inc., version 6 edition, 2007.