CN105894567B

CN105894567B - Scaling pixel depth values of user-controlled virtual objects in a three-dimensional scene

Info

Publication number: CN105894567B
Application number: CN201610191451.7A
Authority: CN
Inventors: B.M.吉诺瓦
Original assignee: Sony Computer Entertainment America LLC
Current assignee: Sony Interactive Entertainment America LLC
Priority date: 2011-01-07
Filing date: 2011-12-02
Publication date: 2020-06-30
Anticipated expiration: 2031-12-02
Also published as: WO2012094076A9; CN103329165A; BR112013017321A2; KR101741468B1; CN103329165B; CN103947198B; CN105959664A; WO2012094077A1; KR101851180B1; CN105898273A; WO2012094074A2; CN105898273B; KR20130132922A; KR20140004115A; RU2013129687A; RU2573737C2; CN103348360B; BR112013016887B1; BR112013016887A2; CN103348360A

Abstract

Pixel depth values of user-controlled virtual objects in a three-dimensional scene may be rescaled to avoid artifacts when the scene is displayed. A minimum threshold value and a maximum threshold value of the three-dimensional scene may be determined. Each pixel depth value of the user-controlled virtual object may be compared to the minimum threshold value and the maximum threshold value. A depth value for each pixel of the user-controlled virtual object that falls below the minimum threshold value may be set to a corresponding low value. Each pixel depth value of the user-controlled virtual object that exceeds the maximum threshold value may be set to a corresponding high value.

Description

Scaling pixel depth values of user-controlled virtual objects in a three-dimensional scene

The divisional application is filed on 2011, 12/2, with application number 201180064484.0, and has the name of "scaling the pixel depth values of the virtual objects controlled by the user in the three-dimensional scene".

Cross Reference to Related Applications

This application relates to commonly assigned, co-pending application number 12/986,814 entitled "DYNAMIC ADJUSTMENT OFPREDETERMINED THE THREE-DIMENSIONAL VIDEO SETTINGS BASED ON SCENE CONTENT" filed ON 7.1.2011 (attorney docket number SCEA10052US 00).

The present application is related to commonly assigned, co-pending application number 12/986,854 (attorney docket number SCEA10054US00) filed on 7/1/2011 entitled "MORPHOLOGICAL ANTI-ALIASING (MLAA) OF A RE-PROJECTION OF A TWO-DIMENSIONAL IMAGE".

This application is related to commonly assigned, co-pending application number 12/986,872 entitled "MULTI-SAMPLE RESOLVING OF RE-PROJECTION OF TWO-DIMENSIONAL IMAGE", filed on 7.1.2011 (attorney docket number SCEA10055US 00).

Technical Field

Embodiments of the present invention relate to scaling pixel depth values of user-controlled virtual objects in a three-dimensional scene.

Background

The ability to perceive two-dimensional images in three dimensions through many different techniques has become quite popular over the past few years. Providing a depth aspect to a two-dimensional image may create a greater sense of realism for any depicted scene. This introduction of three-dimensional visual representations greatly enhances the audience experience, particularly in the context of video games.

There are many techniques for three-dimensional rendering of a given image. Recently, a technique for projecting one or more two-dimensional images into a three-dimensional space has been proposed, which is referred to as depth image-based rendering (DIBR). This new idea is based on a more flexible joint transmission of monoscopic video (i.e. a single video stream) and associated pixel-by-pixel depth information, compared to previous proposals that often rely on the basic concept of "stereoscopic" video (i.e. the acquisition, transmission and display of two separate video streams, one for the left eye and one for the right eye). From this data representation, one or more "virtual" views of the 3-D scene can then be generated in real time at the receiving side by means of the so-called DIBR technique. This new approach to three-dimensional image rendering brings several advantages over the previous approaches.

First, this approach allows for the adjustment of 3-D projection or display to fit a wide range of different stereoscopic displays and projection systems. Since the required left and right eye views are only generated at the 3D-TV receiver, the rendering of the views in terms of 'perceived depth' can be adapted for specific viewing conditions. This provides the viewer with a customized 3-D experience, which is an experience that can comfortably view any kind of stereoscopic or autostereoscopic 3D-TV display.

DIBR also allows 2D to 3D conversion based on the "structure from motion" approach, which can be used to generate the required depth information for single-image video material that has been recorded. Therefore, for a wide range of programming, 3D video can be generated from 2D video, which may play an important role in the success of 3D-TV.

Head motion parallax (i.e., the apparent displacement or difference in perceived position of an object caused by a change in viewing angle) may be supported under DIBR to provide another additional stereoscopic depth cue. This eliminates the well-known "shear distortion" (i.e., the stereoscopic image appears to follow the viewer as the viewer changes viewing position) often experienced with stereoscopic or autostereoscopic 3D-TV systems.

Furthermore, the photometric asymmetry (e.g., in terms of brightness, contrast, or color) between the left and right eye views that would disrupt stereoscopic perception is eliminated from the outset, because the two views are effectively synthesized from the same original image. Furthermore, the approach enables automatic object segmentation based on depth keying and allows easy integration of synthetic 3D objects into "real world" sequences.

Finally, this approach allows the viewer to adjust the reproduction of depth to suit his/her personal preferences-much like each conventional 2D-TV allows the viewer to adjust the color reproduction by means of (de) saturation control. This is a very important feature because of the differences in depth discrimination between age groups. For example, a recent study by Norman et al demonstrated that: older people are less sensitive to perceiving stereo depth than younger people.

While each viewer may have a unique set of preferred depth settings, each scene presented to the viewer may also have a unique set of preferred depth settings. The content of each scene dictates which range of depth settings should be used for optimal viewing of the scene. One set of re-projection parameters may not be ideal for each scene. For example, different parameters may work better depending on how much of the distant background is in the field of view. Because the content of the scene changes each time the scene changes, existing 3D systems do not acquire the content of the scene when determining the re-projection parameters.

Embodiments of the present invention arise in this context.

Drawings

Fig. 1A is a flow/schematic diagram illustrating a method for dynamic adjustment of user-determined three-dimensional scene settings, according to an embodiment of the present invention.

Fig. 1B is a schematic diagram showing a basic concept of three-dimensional re-projection.

Fig. 1C is a simplified diagram illustrating an example of virtual camera adjustment of 3D video settings according to an embodiment of the present invention.

Fig. 1D is a simplified diagram illustrating an example of mechanical camera adjustment of 3D video settings according to an embodiment of the present invention.

Fig. 2A to 2B are schematic diagrams illustrating a problem in which a virtual object controlled by a user penetrates elements of a virtual world in a three-dimensional scene.

FIG. 2C is a schematic diagram illustrating pixel depth value scaling to address the problem of user-controlled virtual objects penetrating elements of a virtual world in a three-dimensional scene.

FIG. 3 is a schematic diagram illustrating a method for scaling pixel depth values of a user-controlled virtual object in a three-dimensional scene according to an embodiment of the present invention.

FIG. 4 is a block diagram illustrating an apparatus for implementing dynamic adjustment of user-determined three-dimensional scene settings and/or scaling of pixel depth values of user-controlled virtual objects in a three-dimensional scene according to an embodiment of the present invention.

FIG. 5 is a block diagram illustrating an example of a cell processor implementation of an apparatus for implementing dynamic adjustment of user-determined three-dimensional scene settings and/or scaling of pixel depth values of user-controlled virtual objects in a three-dimensional scene according to an embodiment of the present invention.

Fig. 6A illustrates an example of a non-transitory computer-readable storage medium having instructions for implementing dynamic adjustment of user-determined three-dimensional scene settings, according to an embodiment of the present invention.

FIG. 6B illustrates an example of a non-transitory computer-readable storage medium having instructions for implementing scaling pixel depth values for user-controlled virtual objects in a three-dimensional scene in accordance with an embodiment of the present invention.

Fig. 7 is an isometric view of three-dimensional viewing glasses in accordance with an aspect of the invention.

FIG. 8 is a system level block diagram of three-dimensional viewing glasses according to an aspect of the present invention.

Detailed Description

For any viewer of the projected three-dimensional image, several characteristics/cues dominate their perception of depth. The ability of each viewer to perceive depth in a three-dimensional projection is unique to their own pair of eyes. Certain cues may provide certain depth characteristics associated with a given scene to a viewer. By way of example, and not by way of limitation, these binocular cues may include stereo vision (stereovision), convergence, and shadow stereo vision.

Stereopsis refers to the ability of the viewer to judge depth by processing information derived from the different projections of objects onto each retina. By using two images of the same scene obtained from slightly different angles, it is possible to triangulate the distance to an object with a high degree of accuracy. If the subject is far away, the aberration (disparity) of that image on both retinas will be small. If the object is close or near, the aberration will be large. By adjusting the angular difference between different projections of the same scene, the viewer may be able to optimize his perception of depth.

Convergence is another binocular cue for depth perception. When two eyeballs are fixated on the same object, they converge. This convergence will stretch the extraocular muscles. It is the kinesthetic sensation of these extraocular muscles that aids in the perception of depth. The angle of convergence is smaller when the eye is gazing on a distant object, and larger when gazing on a closer object. By adjusting the convergence of the eyes for a given scene, the viewer may be able to optimize his perception of depth.

Shadow stereo vision refers to the stereoscopic fusion of shadows to give depth to a given scene. Increasing or decreasing the intensity of the shadows of the scene may further optimize the viewer's perception of depth.

By adjusting the scene settings associated with these binocular cues, the viewer can optimize his overall three-dimensional perception of depth. While a given user may be able to select a common set of three-dimensional scene settings for viewing all scenes, each scene is unique and, therefore, depending on the content of that particular scene, certain visual cues/user settings may need to be dynamically adjusted. For example, in the context of a virtual world, it may be important for a viewer to look at a particular object in a given scene. However, the viewer's predetermined three-dimensional scene setting may not be most advantageous for viewing that particular object. Here, the viewer's settings will be dynamically adjusted according to the scene so that the particular object is perceived under a better set of three-dimensional scene settings.

FIG. 1A is a flow diagram illustrating a method for dynamic adjustment of user-determined three-dimensional scene settings, according to an embodiment of the invention. Initially, a viewer 115 communicates with a processor 113 configured to stream three-dimensional video data to a visual display 111. The processor 113 may be in the form of: a video game console, a computer device, or any other device capable of processing three-dimensional video data. By way of example and not by way of limitation, visual display 111 may be in the form of a 3-D ready television that displays text, numbers, graphical symbols, or other visual objects as stereoscopic images to be perceived by a pair of 3-D viewing glasses 119. Embodiments of 3-D viewing glasses are depicted in fig. 7-8 and described in detail below. The 3-D viewing glasses 119 may be in the form of: active liquid crystal shutter glasses, active "red eye" shutter glasses, passive linearly polarized glasses, passive circularly polarized glasses, interference filter glasses, anaglyphic projectors, or any other pair of 3-D viewing glasses configured to view images projected in three dimensions by visual display 111. The viewer 115 may communicate with the processor 113 by means of a user interface 117, which may take the form of: a joystick, a controller, a remote control, a keyboard, or any other device that may be used in conjunction with a Graphical User Interface (GUI).

The viewer 115 may initially select a set of general three-dimensional video settings to be applied to each three-dimensional scene presented to the viewer 115. By way of example and not by way of limitation, the viewer may select an outer boundary of depth within which the three-dimensional scene is projected. As an additional embodiment, the user may set a predetermined value for stereo, convergence, or shadow stereo. Further, if the user does not set the predetermined values for these parameters, the predetermined values may be factory set default values.

Examples of other 3D video parameter settings that may be set by the user and dynamically adjusted based on scene content include, but are not limited to: both 3D depth effects and 3D range. The depth controls how much 3D effect is presented to the user. The outer boundaries of depth basically represent range and disparity (our depth and effect sliders). In implementations involving re-projection, the projection curve may be adjusted as described below. The adjustment to the re-projection curve may be an adjustment to the nature of the shape of the re-projection curve, which may be a straight line, or perhaps an S-shape emphasizing the center. In addition, parameters of the shape may be adjusted. For example, and not by way of limitation, for a linear reprojection curve, the endpoint or slope may be adjusted. For an S-shaped re-projection curve, adjustments can be made to how fast the S-ramp is rising, etc.

In other embodiments involving re-projection, some edge ambiguity may be provided to fix the vulnerability, and the viewer 115 may drive the fixing. In addition, embodiments of the present invention using re-projection or other means may be applied to drive color contrast to help reduce ghosting — allowing scene-by-scene adjustments based on user scaling. Furthermore, the user can adjust the zoom on how far from the input camera there will be or a slight fine adjustment of the camera angle without involving re-projection. Other camera settings that may be adjusted on a scene-by-scene basis include depth of field settings or camera aperture.

Because one or more viewers 115 perceive the three-dimensional visual presentation differently, different viewers may have different general combinations of three-dimensional scene settings according to their preferences. For example, studies have demonstrated that: older people are less sensitive to perceiving stereo depth than younger people and, therefore, older people may benefit from scene settings that increase perception of depth. Similarly, young people may find settings that reduce the perception of depth that may reduce eye fatigue and fatigue while still providing a pleasing three-dimensional experience for the viewer.

While the viewer 115 is observing a steady stream of three-dimensional scenes 103, one or more scenes that have not yet been displayed to the viewer may be stored in the output buffer 101. The scenes 103 may be arranged according to their presentation order. Scene 103 refers to one or more three-dimensional video frames characterized by a set of shared characteristics. For example, a set of video frames representing different views of the same scene may be characterized as one scene. However, a close-up view of the object and a far-view of the object may represent different scenes. It is important to note that: any number of combinations of frames may be characterized as a scene.

The scene 103 passes through two stages before being presented to the viewer. The scenes are first processed to determine one or more characteristics associated with a given scene 105. One or more scaling factors 107 to be applied to the user's predetermined settings are then determined based on those characteristics. The scale factor may then be transmitted as metadata 109 to the processor 113 and applied to dynamically adjust the viewer's settings, as indicated at 110. The scene may then be presented on display 111 using the adjusted settings, as indicated at 112. This allows each scene to be presented to the viewer in such a way that: the basic preferences of the viewer are preserved while still maintaining the visual integrity of the scene by taking into account the materialization of the scene. Without involving re-projection, the metadata may be transmitted to the capture device to make adjustments, adjust our virtual camera position in the game, or adjust a physical camera, for example, as used in 3D chat embodiments.

Before describing embodiments of the present method, it is useful to discuss some background on three-dimensional video systems. Embodiments of the present invention may be applied to a re-projection setup for 3D video generated from 2D video by a re-projection process. In re-projection, a left-eye virtual view and a right-eye virtual view of a scene may be synthesized from a normal two-dimensional image and associated pixel-by-pixel depth information for each pixel in the image. This process may be implemented by the processor 113 as follows.

First, the original image points are re-projected into the 3D world using the depth data for each pixel in the original image. These 3D spatial points are then projected into the image plane of a "virtual" camera positioned at the desired viewing position. The concatenation of the re-projection (2D to 3D) and subsequent projections (3D to 2D) is sometimes referred to as 3D image warping or re-projection. As shown in fig. 1B, the re-projection can be understood by comparing with the operation of a "real" stereo camera. In "real", high quality stereo cameras, one of two different approaches is often utilized to establish a so-called zero disparity setting (ZPS), i.e. to select a convergence distance Z in a 3D scene_c. In the "toe-in" approach, the ZPS is selected by a joint inward rotation of the left and right eye cameras. In the displacement sensor approach, the convergence distance Z_cCan be established by a small displacement h of the image sensor used to be spaced apart by a distance t_cA left eye "virtual" camera and a right eye "virtual" camera, placed in parallel, as shown in fig. 1B. Each virtual camera may be characterized by a determined focal length f, which represents the distance between the virtual camera lens and the image sensor. This distance is in accordance with the near plane P used in some implementations described herein_nNear plane distance Z of_nAnd (7) corresponding.

Technically, the "tilt-in" approach is easier to implement in a "real" stereo camera. However, the displacement sensor approach is sometimes more preferable for re-projection because it does not introduce unwanted vertical disparity, which can be a potential source of eye fatigue between left and right eye views.

Considering the depth information Z of each pixel at the horizontal and vertical coordinates (u, v) in the original 2D image, the corresponding pixel coordinates (u ', v'), (u ", v") of the left and right eye views can be generated using a displacement sensor approach according to the following equations:

for the purpose of the left eye view,

for the right-eye view to be viewed,

in the foregoing equation, α_uIs the angle of convergence in the horizontal direction, as seen in fig. 1B. t is t_hmpThe term is an optional translation term (sometimes referred to as a head motion disparity term) that accounts for the actual viewing position of the viewer.

The displacement h of the left and right eye views may be related to the convergence angle α by the following equation_uConvergence distance Z_cAnd horizontal convergence angle α_uThe following steps are involved:

for the purpose of the left eye view,

for the right-eye view to be viewed,

the processor 113 may receive the scene 103 in terms of the original 2D image and per-pixel depth information together with per-scene default zoom settings, such as α, that may be applied to the 3D video parameters_u、t_c、Z_cF and t_hmpOr a combination thereof (e.g., a ratio). For example, the zoom setting may represent a variation between 0 (for no 3D perception) and some value greater than 1 (for enhanced 3D perception)A multiplier. Changing the 3D video parameter settings of the virtual camera affects the qualitative perception of the 3D video. By way of example and not by way of limitation, some qualitative effects of increasing (+) or decreasing (-) the selected 3D video parameters are described in table I below.

TABLE I

In table I, the term "screen disparity" refers to the difference in level between the left eye view and the right eye view; the term "perceived depth" refers to the apparent depth of a displayed scene as perceived by a viewer; the term "object size" refers to the apparent size of an object displayed on the screen 111 as perceived by the viewer.

In some implementations, the near plane P may be_nAnd a far plane P_fRather than angle of convergence α_uAnd the sensor interval t_cThe mathematical equations used above are described. The term "near plane" refers to the closest point in the scene that is captured by the camera, i.e., the image sensor. The term "far plane" refers to the farthest point in the scene captured by the camera. No pair rendering beyond far plane P_fI.e. beyond the far plane by a distance Z_fA system using the mathematical equations described above may indirectly select the near and far planes by selecting the values of certain variables within the equations_uAnd the sensor interval t_cThe value of (c).

The operational requirements of the three-dimensional re-projection system can be described as follows: 1) selection of a near plane for a given scene; 2) selection of a far plane for a given scene; 3) a transformation from the near plane to the far plane is defined for a re-projection of the given scene. The transformation, sometimes referred to as a reprojection curve, substantially relates the amount of horizontal and vertical pixel displacement to pixel depth; 4) a method for filtering and/or weighting insignificant/significant pixels; 5) a system for smoothing any changes to 1 to 3 that may occur during a scene change in order to prevent an incongruous clipping of depth as perceived by a viewer 115. Three-dimensional video systems also typically include 6) some mechanism that allows the viewer to zoom in and out of the three-dimensional effect.

A typical re-projection system specifies the above 6 requirements as follows: 1) a near plane of a camera of a scene; 2) a far plane of a camera of the scene; 3) a transform in which pixels are only horizontally displaced. The fixed displacement (usually called convergence) is adjusted lower by an amount inversely proportional to the depth value of each pixel-the deeper or further the pixel is, the less the pixel is displaced by convergence. This requirement can be described, for example, by the mathematical equations provided above; 4) since 1 to 3 are constant, no weighting is necessary; 5) since 1 to 3 are constant, smoothing is not necessary; and 6) sliders can be used to adjust the transformation, for example by linearly scaling the amount by which the pixels will be displaced. This is equivalent to adding a constant scaling factor to the second (and possibly third) term from the above equation for u' or u ". Such constant scale factors may be implemented via user adjustable sliders that tend to move the near and far planes (and thus the average effect) towards the screen plane.

This may result in poor use of three-dimensional space. A given scene may be unbalanced and cause unnecessary eye fatigue. A 3D video editor or 3D game developer has to carefully build all scenes and movies so that all objects within the scene are correctly arranged.

For a given three-dimensional video, there is a viewing comfort zone 121 located in an area close to the visual display. The farther away the perceived image is from the screen, the more uncomfortable it is to view (for most people). Thus, the three-dimensional scene settings associated with a given scene are intended to maximize the use of the comfort zone 121. While some things may be outside the comfort zone 121, it is generally desirable that most things the viewer gazes are within the comfort zone 121. For example, and not by way of limitation, the viewer may set the boundaries of the comfort zone 121 while the processor 113 may dynamically adjust the scene settings such that the use of the comfort zone 121 is maximized for each scene.

A straightforward approach to maximizing the use of the comfort zone 121 may involve: the near plane is set equal to the minimum pixel depth associated with a given scene and the far plane is set equal to the maximum pixel depth associated with the given scene, while preserving properties 3 to 6 as defined above for typical re-projection systems. This will maximize the use of the comfort zone 121, but it does not take into account the effect of objects flying in or out of the scene, which may cause large displacements in three-dimensional space.

By way of example, and not by way of limitation, certain embodiments of the method of the present invention may additionally take into account the average depth of the scene. The average depth of the scene may be driven towards one target. Three-dimensional scene data may set targets for a given scene while allowing users to zoom in on how far from the targets they perceive the scene (e.g., the boundaries of a comfort zone).

The pseudo code for calculating such an average can be conceived as follows:

the near plane may be set to a minimum depth value for all pixels in the scene and the far plane may be set to a maximum depth value for all said pixels in said scene. The target perceived depth may be a value that is specifically specified by the content creator and scaled by the user's preference. By using the calculated average with the transformation property 3 from above, it is possible to calculate how far the average scene depth is from the target perceived depth. By way of example and not by way of limitation, the overall perceived scene depth may then be shifted by simply adjusting convergence to a target increment (as shown in table 1). The target increment may also be smoothed as is done below for the near and far planes. Other methods of adjusting the target depth may also be used, such as the method used in 3D movies to ensure consistent depth in scene changes. However, it should be noted that: 3D movies do not currently provide a way for viewers to adjust the depth of a target scene.

By way of example and not by way of limitation, one way to determine one or more three-dimensional characteristics associated with a given scene is to determine and use two important scene characteristics: the mean pixel depth of a scene and the standard deviation of the pixel depth of that scene. The pseudo code for calculating the mean and standard deviation of pixel depth can be conceived as follows:

the near plane may then be set to the average pixel depth of the scene minus the standard deviation of the pixel depth of that scene. Likewise, the far plane may be set to the average pixel depth of a scene plus the standard deviation of the pixel depth of that scene. If these results are insufficient, the re-projection system may convert the data representing the scene into the frequency domain for calculation of the average pixel depth and standard deviation for a given scene. As with the above embodiments, driving to the target depth may be accomplished in the same manner.

To provide a method for filtering and weighting insignificant pixels, the scene may be studied in detail and insignificant pixels marked. Insignificant pixels will likely include particles and other incoherent small geometries in flight. In the context of video games this can be done easily in a rasterization process, otherwise it is likely that an algorithm for finding small cluster depth aberrations will be used. If one can discern where the user is looking, then the depth of nearby pixels should be considered more important-the farther we are from the focus, the less important the pixel. Such a method may include, without limitation: determine whether the cursors or reticles are within the image and their position in the image, or measure eye rotation by utilizing feedback from specialized glasses. Such eyewear may include a simple camera directed at the wearer's eye. The camera may provide an image in which the whites of the user's eyes may be distinguished from dark portions (e.g., pupils). By analyzing the image to determine the location of the pupil and correlating the location with the eye angle, the eye rotation can be determined. For example, a centered pupil would roughly correspond to an eyeball oriented straight ahead.

In some implementations, it may be desirable to emphasize pixels within a central portion of display 111, as the values at the edges are likely to be less important. If the distance between pixels is fixed to a two-dimensional distance that ignores depth, a simple biased weighted statistical model that emphasizes such central pixels or focal points can be conceived with the following pseudo-code:

to provide a system that keeps most of the picture within the comfort zone 121, the near and far planes (or other variables in the mathematical equations described above) should be adjusted in addition to or instead of the convergence described in the above embodiments. The processor 113 may be configured to implement a process as contemplated by the following pseudo-code:

1-scale＝viewerScale*contentScale

2-nearPlane'＝nearPlane*scale+(mean-standardDeviation)*(1-scale)

3-farPlane'＝farPlane*scale+(mean+standardDeviation)*(1-scale)

both the viewerScale and contentScale are values between 0 and 1 that control the rate of change. The viewer 115 adjusts the value of the viewerScale and the content creator sets the value of the contentScale. The same smoothing may be applied to the above convergence adjustment.

In some implementations (e.g., video games), because it may be desirable for the processor 113 to be able to drive objects within a scene farther or closer to the screen 111, it may be useful to add such target adjustment steps as:

1-nearPlane'＝nearPlane*scale+(mean+nearShift-standardDeviation)*(1-scale)

2-farPlane'＝farPlane*scale+(mean+farShift+standardDeviation)*(1-scale)

positive displacement will tend to move the nearPlane and farPlane back into the scene. Likewise, a negative displacement will cause an object to move closer.

After determining 105 one or more characteristics (e.g., near plane, far plane, average pixel depth, standard deviation pixel depth, etc.) of a given scene, a set of scale factors 107 may be determined. These scale factors may indicate how the scene is maximized within the boundaries of the user-determined comfort zone 121. Further, one of these scale factors may be used to control the rate at which the three-dimensional settings are modified during the scene change.

Once the scale factors corresponding to the characteristics of a given scene are determined, the scale factors may be stored as metadata 109 within the three-dimensional scene data. The scene 103 (and its accompanying three-dimensional data) may be transmitted to the processor 113 along with metadata 109 associated with that scene. The processor 113 may then adjust the three-dimensional scene settings according to the metadata.

It is important to note that: scenarios may be processed to determine scale factors and metadata at different stages of three-dimensional data streaming processing, and are not limited to being processed subsequent to placement in the output buffer 101. Furthermore, the set of three-dimensional scene settings determined by the user is not limited to setting the boundaries of the three-dimensional projection. By way of example and not by way of limitation, the user-determined scene settings may also include controlling the sharpness of objects within the three-dimensional scene or the intensity of shadows within the three-dimensional scene.

Although the foregoing examples are described in the context of re-projection, embodiments of the present invention are not limited to such implementations. The concept of scaling the depth and range of re-projection may be equally well applied to adjusting input parameters such as the position of a virtual or real stereo camera for real-time 3D video. If the camera feed is dynamic, then adjustments to the input parameters for real-time stereoscopic content may be implemented. Fig. 1C and 1D show examples of dynamic adjustment of camera feeds according to alternative embodiments of the present invention.

As seen in FIG. 1C, processor 113 may generate left and right eye views of scene 103 from three dimensional data representing the location of objects and virtual stereo camera 114, including left and

right eye cameras

114A and 114B, in simulated environment 102, such as in a video game or virtual world. for purposes of this embodiment, the virtual stereo camera may be viewed as part of one unit having two separate cameras however, embodiments of the invention include implementations in which the virtual stereo camera is separate and not part of one unit. it should be noted that the location and orientation of

virtual cameras

114A, 114B determine what is displayed in the scene. for example, assume that the simulated environment is at the level of a First Person Shooter (FPS) game, where avatar 115A represents user 115A. the user controls movement and action of avatar 115A using processor 113 and a suitable controller 117. in response to user commands, processor 113 may select the location and orientation of

virtual cameras

114A, 114B. if these

virtual camera

114A, 114B and suitable controller 117 are pointing to a greater depth than the virtual camera 116A, then may calculate a maximum depth value for the virtual camera object pointing to the scene(s) as calculated by the virtual camera 116, the virtual camera may be directed to the scene as a depth of the scene processed by the player's non-remote camera 118_u、t_c、Z_cF and t_hmp) Default values and/or scale factors. By way of example, and not by way of limitation, the processor 113 may implement a look-up table or function relating specific 3D parameters to specific combinations of scene-by-scene values. The tabular or functional relationship between the 3D parameters and the default scene-by-scene values and/or scale factors may be determined empirically. The processor 113 may then modify the individual default values and/or scale factors according to the user's preference settings.

In a variation on the embodiment depicted in fig. 1A to 1C, it is also possible to implement similar adjustments to the 3D parameter settings with a motorized physical stereo camera. For example, consider a video chat embodiment, e.g., as depicted in fig. 1D. In this case, the first and second users 115 and 115 'interact via the first and second processors 113 and 113', the first and second 3D video cameras 114 and 114 ', and the first and second controllers 117 and 117', respectively. The processors 113, 113' are coupled to each other through, for example, a network 120, which may be a wired or wireless network, a Local Area Network (LAN), a wide area network, or other communication network. The 3D video cameras 114 of the first user include a left eye camera 114A and a right eye camera 114B. Both the left-eye image and the right-eye image of the first user's environment are displayed on a video display 111' coupled to the second user's processor 113'. In the same manner, the 3D video camera 114 ' of the second user includes a left eye camera 114A ' and a right eye camera 114B '. For purposes of embodiments, the left-eye stereo camera and the right-eye stereo camera may be physical parts of one unit with two unified cameras (e.g., separate lens units and separate sensors for left and right views). However, embodiments of the present invention include the following implementations: wherein the virtual left-eye camera and the right-eye camera are physically independent from each other and are not part of one unit.

Both the left eye image and the right eye image of the second user's environment are displayed on a video display 111 coupled to the first user's processor 113. The processor 113 of the first user may determine scene-wise 3D values from the left-eye image and the right-eye image. For example, two cameras typically capture a color buffer. Depth information can be recovered from the color buffer information of the left and right eye cameras using a suitable depth recovery algorithm. The processor 113 may transmit the depth information along with the image to the processor 113' of the second user. It should be noted that: the depth information may vary depending on the scene content. For example, the scene captured by the cameras 114A ', 114B' may contain objects at different depths, such as the user 115 'and the remote object 118'. The different depths of these objects within the scene may affect the average pixel depth and the standard deviation of the pixel depth of the scene.

The left-eye and right-eye cameras for both the first user's camera 114 and the second user's camera 114 ' may be motorized so that parameters (e.g., f, t) for the left-eye and right-eye cameras may be adjusted on the fly_cAnd "dip" angle). The first user may select an initial setting of 3D video parameters of the cameras 114, such as the inter-camera spacing t_cAnd/or the relative horizontal rotation angle of left eye camera 114A and right eye camera 114B (in the case of "tilt-in"). For example, as described above, the second user 115 ' may use the second controller 117 ' and the second processor 113 to adjust the settings (e.g., f, t) of the 3D video parameters of the first user's camera 114_cOr an internal inclination angle) to adjust the scale factor. Data representing the adjustment to the scale factor may then be transmitted to the first processor 113 via the network 120. The first processor may use the adjustment to adjust the 3D video parameter settings of the first user's camera 114. In a similar manner, the first user 115 may adjust the settings of the second user's 3D video camera 114. In this way, each user 115, 115 'can view a 3D video image of the other party's environment in a comfortable 3D setting.

Improvements in three-dimensional image rendering have a significant impact in the area of interactive virtual environments that employ three-dimensional technology. Many video games implement three-dimensional image rendering to create a virtual environment for user interaction. However, simulating real-world physical phenomena to facilitate user interaction with a virtual world is very expensive and quite difficult to implement. Thus, some undesirable visual disturbances may occur during the execution of the game.

One problem arises when artifacts of three-dimensional video cause user-controlled virtual objects (e.g., characters and guns) to penetrate other elements in a virtual world (e.g., background scenery). When the user-controlled virtual object penetrates other elements in the virtual world, the realism of the game is greatly diminished. In the context of a first person shooting, the first person's line of sight may be obstructed or perhaps certain important elements may be obscured. Therefore, any program featuring user-controlled virtual object interactions within a three-dimensional virtual environment is necessary to eliminate the appearance of these visual disturbances.

Embodiments of the present invention may be configured to scale user-controlled virtual object pixel depths to address the problem of user-controlled virtual objects penetrating elements of a three-dimensional scene of a virtual world. In the context of a First Person Shooter (FPS) video game, one possible embodiment would be the end of the barrel as seen from the shooter's perspective.

Fig. 2A-2B illustrate the problem of user-controlled virtual objects penetrating elements of a virtual world in a three-dimensional scene generated using re-projection. When the user-controlled virtual object penetrates other elements in the virtual world, the realism of the game is greatly diminished. As shown in fig. 2A, in a virtual environment (e.g., a scene) in which scaling of pixel depth values of user-controlled virtual objects is not effectuated, a user-controlled virtual object 201 (e.g., a gun barrel) may penetrate another element 203 (e.g., a wall) of the virtual world, causing potential viewing obstruction and diminished realism, as discussed above. In the case of a first person shooting, the first person's line of sight may be obstructed or perhaps some important element (e.g., the end of the barrel) may be obscured. Hidden elements are shown in phantom lines in fig. 2A.

A common solution for two-dimensional first-person video games is to scale the depth of objects in the virtual world in order to eliminate visual artifacts in the two-dimensional image (or to exchange the artifacts for a different artifact that is not as noticeable). The scaling is typically applied in the rasterization process of two-dimensional video images. In the first person firing embodiment, this means that the spectator will see the top of the barrel 201 regardless of whether it passes through the wall 203. The solution works well for two-dimensional video, however, problems arise when this solution is applied to three-dimensional video. The problems are that: the scaled depth values no longer represent real points in three dimensions relative to the rest of the two-dimensional image. Thus, when re-projection is applied to generate left and right eye views, depth scaling causes the object to appear compressed in the depth dimension and in the wrong position. For example, as shown in fig. 2B, it is now perceived that barrel 201 will be "crushed" in the depth direction, and as it should be closer to the physical screen, the barrel is positioned so that it is closest to the audience. Another problem in re-projection is: depth scaling also leaves large holes in the image at the end that are difficult to fill.

Furthermore, scaling the depth to the original or rewritten depth value with the true depth value from the three-dimensional scene information means: the spectator will still see the barrel, but the barrel will be perceived as being behind the wall. Despite the fact that the virtual object 201 should be blocked by the wall 203, the viewer will see a phantom part of the virtual object. This deep-penetration effect is annoying because the viewer expects to still see the wall.

To address this problem, embodiments of the present invention apply a second set of zooms to objects in a scene in order to place them in the proper perceived location within the scene. The second scaling may be applied after rasterization of the two-dimensional image but before or during re-projection of the image to generate left and right eye views. FIG. 2C illustrates a virtual environment (e.g., a scene) in which scaling of user-controlled virtual object pixel depth values is effectuated. Here, by scaling the pixel depth as discussed above, the user-controlled virtual object 201 may approach another element 203 of the virtual world, but is restricted from penetrating the element 203. The second zoom-restricted depth value is between a near value N and a far value F. In essence, the object may appear to be still crushed in the depth dimension, but may exert full control over its thickness. This is a balance and of course the viewer may be provided with control of this second zoom, for example as discussed above.

Thus, visual disturbances caused by the penetration of the elements of the virtual world by the user-controlled virtual object may be eliminated or significantly reduced.

To address this issue, the program may apply a second scaling of pixel depth values for the user-controlled virtual object in accordance with the three-dimensional scene content to be presented to the user.

The scene 103 may be located in the output buffer 101 before being presented to the user. These scenes 103 may be arranged according to their presentation order. Scene 103 refers to one or more three-dimensional video frames characterized by a set of shared characteristics. For example, a set of video frames representing different views of the same scene may be characterized as one scene. However, a close-up view and a far-view of the same object may also represent different scenes. It is important to note that: any number of combinations of frames may be characterized as a scene.

As indicated at 133, an initial depth scaling of the two-dimensional image of the three-dimensional scene 103. The initial depth scaling is typically carried out using a modified view projection matrix in the rasterization process of the two-dimensional image. This writes the scaled depth information into a depth buffer of the scene.

Before the scene 103 is presented to the user in three dimensions (e.g., as a left eye view and a right eye view), the scene may be studied in detail to determine important characteristics that are critical to solving the problems discussed above. For a given scenario 103, a minimum threshold value is first determined, as indicated at 135. This minimum threshold value represents the minimum pixel depth value below which any segment of the user-controlled virtual object must not fall. Second, a maximum threshold limit is determined, as indicated at 137. This maximum threshold value represents the maximum pixel depth value that any segment of the user-controlled virtual object must not exceed. These threshold values set a limit to how a user-controlled virtual object can travel within the virtual environment such that the user-controlled virtual object is restricted from penetrating other elements in the virtual environment.

As the user-controlled virtual objects move in the virtual world, their pixel depth values are tracked for virtual objects and compared to the pixel depth values of the threshold values determined above, as indicated at 139. Whenever the pixel depth values of any segment of the user-controlled virtual object fall below the minimum threshold value, those pixel depth values are set to low values, as indicated at 141. By way of example, and not by way of limitation, this low value may be the minimum threshold value. Alternatively, this low value may be a scaled value of the user-controlled virtual object pixel depth value. For example, the low value may be determined by multiplying the pixel depth value that falls below the minimum threshold value by an inverse ratio and then adding the minimum offset to the product.

Whenever the pixel depth values of any segment of the user-controlled virtual object exceed the maximum threshold value, those pixel depth values are set to high values, as indicated at 143. By way of example, and not by way of limitation, this high value may be the maximum threshold value. Alternatively, this high value may be a scaled value of the user-controlled virtual object pixel depth value. For example, the high value may be determined by multiplying the pixel depth value exceeding the maximum threshold value by an inverse ratio and then subtracting the product from the maximum offset.

Setting the low/high values to the minimum/maximum threshold values works particularly well for virtual objects that are small in nature and do not require enhanced perception of depth. These low/high values effectively displace the virtual object away from the virtual camera. However, for virtual objects that require enhanced perception of depth (such as a sight), the above-mentioned scaling low/high values may function more efficiently.

The minimum and maximum threshold values may be determined by the program before the program is executed by the processor 113. These values may also be determined by the processor 113 while the contents of the program are being executed. The comparison of the pixel depth values of the user-controlled virtual object to the threshold value is done by the processor 113 during execution of the program. Similarly, establishing low and high values for user-controlled virtual object pixel depths that exceed or fall below a threshold value is accomplished by the processor during execution of the program.

After the second scaling has been effectuated on the pixel depth values, the processor 113 may employ the two-dimensional image and employ the resulting set of pixel depth values of the user-controlled virtual object to effectuate re-projection in order to generate two or more views (e.g., a left eye view and a right eye view) of the three-dimensional scene, as indicated at 145. The two or more views may be displayed on a three-dimensional display, as indicated at 147.

The problem of penetrating other virtual world elements is solved by setting any pixel depth values of the user-controlled virtual object that exceed the threshold value to a low value and a high value. While the physical phenomenon of simulating the interaction of a virtual object with its virtual world would effectively solve this problem, in fact it is quite difficult to implement. The ability to scale the pixel depth values of a user-controlled virtual object according to the above described method thus provides a simple, cost-effective solution to the problem.

Device

FIG. 4 illustrates a block diagram of a computer device that may be used to implement dynamic adjustment of user-determined three-dimensional scene settings and/or scaling of pixel depth values, according to an embodiment of the present invention. The apparatus 200 may generally include a processor module 201 and a memory 205. The processor module 201 may include one or more processor cores. An embodiment of a processing system using multiple processor modules is a cell processor, an embodiment of which is described in detail, for example, inCell Broadband Engine ArchitectureIn which it can be on-line with

http:// www-306.ibm.com/chip/techlib/techlib.nsf/techdocs/1AEEE1270EA2776387257060006E61BA/$ file/CBEA _01_ pub. pdf, which is incorporated herein by reference.

The memory 205 may be in the form of an integrated circuit, such as RAM, DRAM, ROM, etc. The memory 205 may also be a main memory accessible by all processor modules. In some embodiments, the processor module 201 may have local memory associated with each core. The program 203 may be stored in the main memory 205 in the form of processor readable instructions executable on the processor module. The program 203 may be configured to perform dynamic adjustment of the set of user-determined three-dimensional scene settings. The program 203 may also be configured to effectuate scaling of pixel depth values of user-controlled virtual objects in the three-dimensional scene, e.g., as described above with respect to fig. 3. The program 203 may be written in any suitable processor readable language (e.g., C, C + +, JAVA, Assembly, MATLAB, FORTRAN) and many other languages. Input data 207 may also be stored in memory. Such input data 207 may include a set of user-determined three-dimensional settings, three-dimensional characteristics associated with a given scene, or scaling factors associated with certain three-dimensional characteristics. The input data 207 may also include threshold values associated with the three-dimensional scene and pixel depth values associated with the user-controlled objects. During execution of program 203, portions of program code and/or data may be loaded into memory or a local memory of a processor core for parallel processing by multiple processor cores.

Device 200 may also include well-known support functions 209, such as input/output (I/O) elements 211, power supplies (P/S)213, a Clock (CLK)215, and a cache 217. The device 200 may optionally include a mass storage 219 such as a disk drive, CD-ROM drive, tape drive, or the like to store programs and/or data. Apparatus 200 may optionally include a display unit 221 and a user interface unit 225 to facilitate interaction between the device and a user. By way of example and not by way of limitation, display unit 221 may be in the form of a 3-D ready television that displays text, numbers, graphical symbols, or other visual objects as stereoscopic images to be perceived by a pair of 3-D viewing glasses 227, which may be coupled to I/O element 211. Stereography refers to the magnification of an illusion of depth in a two-dimensional image by presenting a slightly different image to each eye. The user interface 225 may include a keyboard, mouse, joystick, light pen, or other device that may be used in conjunction with a Graphical User Interface (GUI). The apparatus 200 may also include a network interface 223 to allow the device to communicate with other devices via a network, such as the internet.

The components of the system 200, including the processor 201, memory 205, support functions 209, mass storage 219, user interface 225, network interface 223, and display 221 may be operatively connected to each other via one or more data buses 227. These components may be implemented in hardware, software, or firmware, or some combination of two or more of these components.

There are many other ways to rationalize parallel processing using multiple processors in the device. For example, in some implementations it is possible to "unwrap" a processing loop, e.g., by copying code on two or more processor cores and having each processor core implement the code to process different blocks of data. Such implementations may avoid the latency associated with setting up the loop. When applied to embodiments of the present invention, multiple processors may determine scale factors for different scenarios in parallel. The ability to process data in parallel may also save valuable processing time, resulting in a more efficient and streamlined system for scaling pixel depth values corresponding to one or more user-controlled virtual objects in a three-dimensional scene. The ability to process data in parallel may also save valuable processing time, resulting in a more efficient and streamlined system for dynamic adjustment of a three-dimensional user-determined set of scene settings.

One embodiment, other than a processing system capable of performing parallel processing on three or more processors, is a cell processor. There are many different processor architectures that can be classified as cell processors. By way of example, and not limitation, FIG. 5 illustrates one type of cell processor. The cell processor 300 includes a main memory 301, a single power supply processor element (PPE)307, and eight Synergistic Processor Elements (SPEs) 311. Alternatively, the cell processor may be configured with any number of SPEs. Referring to FIG. 3, memory 301, PPE307 and SPEs 311 may communicate with each other and I/O device 315 via a ring element interconnect bus 317. The memory 301 contains input data 303 having the same features as the input data described above and a program 305 having the same features as the program described above. At least one of SPEs 311 may include in its local memory (LS) program instructions 313 and portions of input data 303 to be processed in parallel, e.g., as described above. The PPE307 may include program instructions 309 in its L1 cache. The

program instructions

309, 313 may be configured to implement embodiments of the present invention, for example, as described above with respect to fig. 1 or 3. By way of example, and not by way of limitation, the

instructions

309, 313 may have the same features as the program 203 described above.

Instructions

309, 313 and data 303 may also be stored in memory 301 for access by SPEs 311 and PPE307 as needed.

By way of example and not by way of limitation,

instructions

309, 313 may include instructions for implementing dynamic adjustment instructions for user-determined three-dimensional scene settings as described above with respect to fig. 1. Alternatively, the

instructions

309, 313 may be configured to implement scaling of pixel depth values of the user-controlled virtual object, e.g., as described above with respect to fig. 3.

For example, the PPE307 may be a 64-bit PowerPC Processor Unit (PPU) with associated cache. The PPE307 may include an optional vector multimedia extension unit. Each SPE 311 includes a Synergistic Processor Unit (SPU) and a Local Store (LS). In some implementations, the local memory may have a memory capacity of, for example, about 256 kilobytes for programs and data. An SPU is a less complex computational unit than a PPU because the SPU typically does not perform system management functions. The SPUs may have Single Instruction Multiple Data (SIMD) capabilities and generally process data and initiate any required data transfers (subject to the access properties set by the PPE) in order to carry out their assigned tasks. The SPU allows the system to implement applications that require a higher computational cell density and can efficiently use the provided instruction set. Large number of SPU Admission in System managed by PPEAllowing cost-effective processing across a wide range of applications. For example, a cell processor may be characterized by an architecture referred to as the Cell Bandwidth Engine Architecture (CBEA). In a CBEA-compatible architecture, multiple PPEs may be combined into one PPE group, and multiple SPEs may be combined into one SPE group. For purposes of the embodiments, the cell processor is depicted as having a single SPE group with a single SPE and a single PPE group with a single PPE. Alternatively, the cell processor may include multiple sets of power processor elements (PPE sets) and multiple sets of co-processor elements (SPE sets). CBEA-compatible processors are described in detail, for exampleCell Broadband Engine ArchitectureIt is available online as https:// www-306.ibm. com/chips/techlib. nsf/techdocs/1AEEE1270EA277638725706000E61BA/$ file/CBEA _01_ pub. pdf, which is incorporated herein by reference.

According to another embodiment, instructions for dynamic adjustment of user-determined three-dimensional scene settings may be stored in a computer-readable storage medium. By way of example, and not by way of limitation, FIG. 6A illustrates an example of a non-transitory computer-readable storage medium 400 according to an embodiment of the present invention. The storage medium 400 contains computer-readable instructions stored in a format that can be retrieved, interpreted and executed by a computer processing device. By way of example, and not limitation, a computer-readable storage medium may be a computer-readable memory such as a Random Access Memory (RAM) or a Read Only Memory (ROM), a computer-readable storage disk for a fixed disk drive (e.g., a hard disk drive), or a removable disk drive. In addition, the computer-readable storage medium 400 may be a flash memory device, a computer-readable tape, a CD-ROM, a DVD-ROM, a Blu-Ray disc (Blu-Ray), a HD-DVD, a UMD, or other optical storage media.

The storage medium 400 contains instructions 401 for dynamic adjustment of user-determined three-dimensional scene settings. The instructions 401 for dynamic adjustment of the user-determined three-dimensional scene setting may be configured to implement dynamic adjustment according to the method described above with respect to fig. 1. In particular, the dynamic adjustment instructions 401 may include instructions 403 for determining three-dimensional characteristics of a scene, the instructions for determining certain characteristics of a given scene that relate to optimization of three-dimensional viewing settings of the scene. The dynamic adjustment instructions 401 may further include instructions to determine scale factors 405 configured to determine one or more scale factors to represent certain optimization adjustments to be made based on characteristics of a given scene.

The dynamic adjustment instructions 401 may also include instructions 407 to adjust user-determined three-dimensional settings, the instructions configured to apply the one or more scale factors to the user-determined three-dimensional scene settings such that the result is: 3-D projection of a scene taking into account both user preferences and inherent scene characteristics. The result is a visual representation of the scene according to the user's predetermined settings, which may be modified according to certain characteristics associated with the scene, so that each user's perception of a given scene may be uniquely optimized.

The dynamic adjustment instructions 401 may additionally include instructions for displaying a scene 409 configured to display the scene on the visual display according to the dynamically adjusted three-dimensional scene settings obtained above.

According to another embodiment, instructions for scaling pixel depth values of a user-controlled virtual object in a three-dimensional scene may be stored in a computer-readable storage medium. By way of example, and not by way of limitation, FIG. 6B illustrates an example of a non-transitory computer-readable storage medium 410 according to an embodiment of the present invention. The storage medium 410 contains computer-readable instructions stored in a format that can be retrieved, interpreted and executed by a computer processing device. By way of example, and not limitation, a computer-readable storage medium may be a computer-readable memory such as a Random Access Memory (RAM) or a Read Only Memory (ROM), a computer-readable storage disk for a fixed disk drive (e.g., a hard disk drive), or a removable disk drive. Additionally, the computer-readable storage medium 410 may be a flash memory device, a computer-readable tape, a CD-ROM, a DVD-ROM, a Blu-ray disc, a HD-DVD, a UMD, or other optical storage medium.

The storage medium 410 contains instructions 411 for scaling pixel depth values of user-controlled virtual objects in a three-dimensional scene. The instructions 411 for scaling pixel depth values of user-controlled virtual objects in a three-dimensional scene may be configured to implement pixel depth scaling according to the method described above with respect to fig. 3. In particular, the pixel depth scaling instructions 411 may include initial scaling instructions 412 that, when executed, may effect an initial scaling of a two-dimensional image of a three-dimensional scene. The instructions 411 may further include instructions for determining a minimum threshold for the three-dimensional scene 413 that determines a minimum threshold below which a pixel depth value of a user-controlled virtual object may not fall for a particular scene. Similarly, the pixel depth scaling instructions 411 may also include instructions to determine a maximum threshold for the three-dimensional scene 415 that the pixel depth values of the user-controlled virtual objects may not exceed for a particular scene.

Pixel depth scaling instructions 411 may also include instructions 417 to compare virtual object pixel depths that compare pixel depths associated with user-controlled virtual objects to the threshold values determined above. By comparing the pixel depth value of the user-controlled virtual object to the threshold value of the pixel depth value, the location of the user-controlled virtual object may be continuously tracked to ensure that it does not penetrate other virtual elements in the three-dimensional scene.

The pixel depth scaling instructions 411 may further include instructions 419 to set the virtual object pixel depth to a low value that limits any portion of the depth of the virtual object from falling below a minimum threshold value. The low value of the too low pixel depth value assigned to the virtual object may be the minimum threshold value itself, or a scaled value of the low pixel depth value, as discussed above.

The pixel depth scaling instructions 411 may further include instructions 421 to set the virtual object pixel depth to a high value that limits any portion of the depth of the virtual object not to exceed a maximum threshold value. The high value of the too high pixel depth value assigned to the virtual object may be the maximum threshold value itself, or a scaled value of the high pixel depth value, as discussed above.

The pixel depth scaling instructions may further include re-projection instructions 423 that use the resulting set of pixel depth values for the user-controlled virtual object to re-project the two-dimensional image to generate two or more views of the three-dimensional scene. The pixel depth scaling instructions 411 may additionally include instructions 425 to display the scene using the resulting set of virtual object pixel depth settings on a visual display.

As mentioned above, embodiments of the present invention may utilize three-dimensional viewing glasses. An embodiment of three-dimensional viewing glasses 501 according to one aspect of the present invention is shown in fig. 7. The eyewear may include a frame 505 for holding left and right

LCD eyewear lenses

510, 512. As mentioned above, each

eyeglass lens

510 and 512 can be quickly and selectively darkened to prevent the wearer from seeing through the lens. A left earphone 530 and a right earphone 532 are also preferably connected to the frame 505. An antenna 520 for transmitting and receiving wireless information may also be included in or on the frame 505. The glasses may be tracked via any means to determine whether the glasses are looking toward the screen. For example, the front of the glasses may also include one or more photodetectors 540 for detecting the orientation of the glasses toward the monitor.

Various known techniques may be used to provide an alternative display of images from a video feed. The visual display 111 of fig. 1 may be configured to operate in progressive scan mode for each video feed shared on the screen. However, embodiments of the present invention may also be configured to work with interlaced video, as described. For standard television monitors, such as those using the interlaced NTSC or PAL format, the images of the two video feeds can be interlaced and the lines of one image from one video feed can be interlaced with the lines of one image from the other video feed. For example, odd lines derived from an image from a first video feed are displayed, and then even lines derived from an image from a second video feed are displayed.

A system level diagram of eyewear that may be used in connection with embodiments of the present invention is shown in fig. 8. The eyewear may include a processor 602 that executes instructions from a program 608 stored in memory 604. The memory 604 may also store data to be provided to or output from the processor 602 and any other memory retrieval/storage elements of the eyewear. The processor 602, memory 604, and other components of the eyewear may communicate with one another via a bus 606. Such other elements may include an LCD driver 610 that provides drive signals that selectively mask the left and

right LCD lenses

612, 614. The LCD driver may block the left and right LCD lenses individually at different times and for different durations, or together at the same time or for the same duration.

The frequency at which the LCD lens is occluded may be stored in advance in the glasses (e.g., based on the known frequency of NTSC). Alternatively, the frequency may be selected by means of a user input 616 (e.g., a knob or button that adjusts or types in the desired frequency). In addition, the desired frequency, as well as the initial occlusion start time, or other information indicating the time period during which the LCD lens should or should not be occluded, whether or not such time period is at the set frequency and duration, may be transmitted to the eyewear via the wireless transmitter receiver 601 or any other input element. The wireless transmitter/receiver 601 may include any wireless transmitter, including a bluetooth transmitter/receiver.

The audio amplifier 620 may also receive information from the wireless transmitter/receiver 601, i.e., the left and right channels of audio to be provided to the left speaker 622 or the right speaker 624. The eyewear may also include a microphone 630. Microphone 630 may be used in conjunction with a game to provide voice communication; the voice signal may be transmitted to the game console or another device via the wireless transmitter/receiver 601.

The eyewear may also include one or more photodetectors 634. The photodetector may be used to determine whether the eyewear is oriented toward the monitor. For example, a photodetector may detect the intensity of light incident on the photodetector and transmit the information to the processor 602. The processor may terminate occlusion of the lens if the processor detects a substantial drop in light intensity that may be associated with a user's gaze being diverted away from the monitor. Other methods of determining whether the glasses (and hence the user) are oriented toward the monitor may also be used. For example, one or more cameras may be used in place of the photodetectors, and several possible embodiments of using such a camera for the processor 602 to examine the acquired images to determine whether the eyewear is oriented toward the monitor may include: checking a contrast level to detect whether the camera is pointed at the monitor, or attempting to detect a luminance test pattern on the monitor. By transmitting information to processor 602 via wireless transmitter/receiver 601, a device providing multiple feeds to the monitor can indicate the presence of such test patterns.

It should be noted that: for example, certain aspects of embodiments of the invention may be implemented by glasses, through software or firmware implemented on the processor 602. For example, color contrast or correction settings that are content driven and scaled/adjusted by the user may be implemented in the glasses and have additional metadata streams sent to the glasses. In addition, with wireless and LCD improvements, the processor 113 may broadcast left and right eye image data directly to the glasses 119, thereby eliminating the need for a separate display 111. Alternatively, the glasses may be fed a monoscopic image and associated pixel depth values from display 111 or processor 113. Both of which mean that the re-projection process will actually occur on the glasses.

Although examples of implementations have been described in which stereoscopic 3D images are viewed using passive or active 3D viewing glasses, embodiments of the invention are not limited to such implementations. In particular, embodiments of the present invention may be applicable to stereoscopic 3D video technologies that do not rely on head tracking or passive or active 3D viewing glasses. Embodiments of such "glasses-free" stereoscopic 3D video techniques are sometimes referred to as autostereoscopic techniques or freedoms. Examples of such techniques include, but are not limited to, techniques based on the use of lenticular lenses. A lenticular lens is an array of magnifying lenses designed so that when viewed from slightly different angles, different images are magnified. Different images may be selected to provide a three-dimensional viewing effect when the lenticular screen is viewed at different angles. The number of generated images increases in proportion to the number of viewpoints of the screen.

More specifically, in lenticular lens video systems, re-projected images of a scene from slightly different viewing angles may be generated from the original 2D image and the depth information for each pixel in the image. Using re-projection techniques, different views of the scene from progressively different viewing angles may be generated from the original 2D image and depth information. Images representing different views may be divided into strips and displayed in an interleaved fashion on an autostereoscopic display having a display screen located between a lenticular lens array and a viewing position. The lenses making up the lenticular lenses may be cylindrical magnifying lenses aligned with the strips and generally twice as wide as the strips. Depending on the angle from which the screen is viewed, the viewer perceives different views of the scene. Different views may be selected to provide the illusion of depth in the displayed scene.

Although the present invention has been described in considerable detail with reference to certain preferred versions thereof, other versions are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the preferred versions contained herein. Rather, the scope of the invention should be determined with reference to the appended claims, along with their full scope of equivalents.

All the features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features. Any feature, whether preferred or not, may be combined with any other feature, whether preferred or not. In the appended claims, the indefinite article "a" or "an" refers to a quantity of one or more of the item following the article, unless the context clearly dictates otherwise. Any element in a claim that does not explicitly specify "means" for performing a specified function, as specified in U.S. code 35, clause 112, clause 6, is not to be construed as limited to the "means" or "step" clauses. Specifically, the use of "step of" in the claims herein is not intended to be an aid in the regulation of article 112, clause 6 of the American Law code 35.

The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of any such papers and documents are incorporated herein by reference.

Claims

1. A method for scaling one or more pixel depth values of a user-controlled virtual object in a three-dimensional scene, the method comprising:

a) performing an initial depth scaling of a two-dimensional image of the three-dimensional scene;

b) determining a minimum threshold value of the three-dimensional scene;

c) determining a maximum threshold value of the three-dimensional scene;

wherein the maximum threshold value or the minimum threshold value is determined from a target re-projected into a three-dimensional scene based on pixel depth data in an original image to drive an average depth of the scene towards the target;

d) comparing each pixel depth value of the user-controlled virtual object to the minimum threshold value and the maximum threshold value;

e) setting each pixel depth value of the user-controlled virtual object that falls below the minimum threshold value to a corresponding low value;

f) setting each pixel depth value of the user-controlled virtual object that exceeds the maximum threshold value to a corresponding high value, wherein low and high pixel depth values are set to respective minimum threshold values and the maximum threshold value for virtual objects that do not require enhanced perception of depth;

g) performing a re-projection of the two-dimensional image using the resulting set of pixel depth values of the user-controlled virtual object to generate two or more views of the three-dimensional scene; and

h) displaying the two or more views on a three-dimensional display.

2. The method of claim 1, wherein the low value in e) corresponding to pixel depths falling below the minimum threshold value is the minimum threshold value.

3. The method of claim 1, wherein the high value in f) corresponding to a pixel depth exceeding the maximum threshold value is the maximum threshold value.

4. The method of claim 1, wherein the low value in e) corresponding to a pixel depth falling below the minimum threshold value is determined by multiplying the pixel depth by an inverse scale and adding a minimum offset to the product.

5. The method of claim 1, wherein the high value in f) corresponding to a pixel depth exceeding the maximum threshold value is determined by multiplying the pixel depth by an inverse ratio and subtracting the product from a maximum offset.

6. The method of claim 1, wherein the three-dimensional display is a stereoscopic display and the two or more views comprise a left eye view and a right eye view of the three-dimensional scene.

7. The method of claim 1, wherein the three-dimensional display is an autostereoscopic display and the two or more views comprise two or more interleaved views of the three-dimensional scene from slightly different viewing angles.

8. The method of claim 1, wherein the initial depth scaling is performed during rasterization of the two-dimensional image.

9. The method of claim 8, wherein one or more of b), c), d), e), and f) are performed before or during g).

10. The method of claim 1, wherein the user-controlled virtual objects in the three-dimensional scene are located in a simulated environment of the video game.

11. The method of claim 1 wherein the set low pixel depth value is less than the average pixel depth value and the set high pixel depth value is greater than the average pixel depth value.

12. An apparatus for scaling one or more pixel depth values, the apparatus comprising:

a processor;

a memory; and

computer encoded instructions embodied in the memory and executable by the processor, wherein the computer encoded instructions are configured to implement a method for scaling one or more pixel depth values of a user-controlled virtual object in a three-dimensional scene, the method comprising:

b) determining a minimum threshold value of the three-dimensional scene;

c) determining a maximum threshold value of the three-dimensional scene;

h) displaying the two or more views on a three-dimensional display.

13. The apparatus of claim 12, further comprising a three-dimensional visual display configured to display the three-dimensional scene according to scaled pixel depth values corresponding to the one or more virtual objects.

14. The apparatus of claim 13, wherein the three-dimensional display is a stereoscopic display and the two or more views comprise a left eye view and a right eye view of the three-dimensional scene.

15. The apparatus of claim 13, wherein the three-dimensional display is an autostereoscopic display and the two or more views comprise two or more interleaved views of the three-dimensional scene from slightly different viewing angles.

16. The apparatus of claim 12, wherein the initial depth scaling is performed during rasterization of the two-dimensional image.

17. The apparatus of claim 16, wherein one or more of b), c), d), e), and f) are performed before or during g).

18. A non-transitory computer-readable storage medium having computer-readable program code embodied in the medium for scaling one or more pixel depth values of a user-controlled virtual object in a three-dimensional scene, the computer-readable storage medium having computer-readable instructions embodied therein that, when executed, implement a method comprising:

b) determining a minimum threshold value of the three-dimensional scene;

c) determining a maximum threshold value of the three-dimensional scene;

h) displaying the two or more views on a three-dimensional display.

19. The computer-readable storage medium of claim 18, wherein the three-dimensional display is a stereoscopic display and the two or more views comprise a left-eye view and a right-eye view of the three-dimensional scene.

20. The computer-readable storage medium of claim 18, wherein the three-dimensional display is an autostereoscopic display and the two or more views comprise two or more interleaved views of the three-dimensional scene from slightly different viewing angles.

21. The computer-readable storage medium of claim 18, wherein the initial depth scaling is performed during rasterization of the two-dimensional image.

22. The computer-readable storage medium of claim 21, wherein one or more of b), c), d), e), and f) are performed before or during g).