WO2002041069A1

WO2002041069A1 - Method for visually representing and interactively controlling virtual objects on an output visual field

Info

Publication number: WO2002041069A1
Application number: PCT/DE2001/004267
Authority: WO
Inventors: Henry Feil
Original assignee: Siemens Aktiengesellschaft
Priority date: 2000-11-14
Filing date: 2001-11-14
Publication date: 2002-05-23
Also published as: DE10056291A1

Abstract

The invention relates to a method for graphically visualizing objects by means of which the objects represented on at least one visual field (203) are comfortably and reliably manipulated, controlled or influenced, with regard to their features and/or actions, by interactive control commands of a user. Technologies used for inputting the control commands issued by the user can, in addition to standard manually operable mechanical or touch-sensitive input mechanisms such as a keyboard, a mouse, a trackball, a joystick, a graphics tablet and stylus, tactile displays, etc., involve the use of devices (501, 502 and 503) for recording, recognizing, interpreting and processing acoustic and/or optical signals of a user. The user is thus no longer dependent upon the presence of additional hardware devices for manually inputting control commands. The evaluation of the input information can instead or additionally ensue by using methods involving signal or pattern recognition. This inventive step enables a conversion of a simple output visual field, with which the user is not provided with any ability to control, into an interactively operable input and output visual field (203). The type of input method can be individually adapted to the abilities of the user.

Description

description

Process for the visual representation and interactive control of virtual objects on an output field of view

A. Description of the general problem

In the context of communication between man and machine, problems often arise which are based on the insufficient adaptation of the machine to the properties of the recording, processing and output of information by the human user. On the one hand, this mismatch can lead to a flood of information that the user can no longer handle, especially if several tasks have to be completed. On the other hand, if the user is under-challenged, for example in a highly automated system in which humans only have a control function, the monotony of the work situation leads to a drop in performance and incidents due to a lack of practice in such situations no longer being controlled , The lack of consideration of the knowledge and training status of the user of a machine should also be mentioned here. Human behavior, for example in the selection, evaluation and linking of information, in decision-making, in problem solving and in the planning and execution of actions, is insufficiently taken into account and supported when it comes to the design of technical systems.

In order to adapt technical systems to humans, prior knowledge of their properties, behavior patterns, skills and level of knowledge is necessary. In connection with human-machine communication, its sensory, cognitive and motor properties are of interest. On the side of the through the sensory channels Given the sensory properties of humans, conventional machines and devices for outputting information essentially address the following channels:

- the visual channel (eyes) through optical signals,

- The auditory channel (ears) through acoustic signals and

- The tactile channel (sense of touch) through haptic signals.

After processing the signals in the brain (cognition), the following channels are essentially available on the side of the motor properties of the person specified by the output channels:

the arm, hand and finger or leg and foot motor skills as well as body, head, eye or mouth movements, i.e. physical movements, gestures, gestures and facial expressions for mechanical or optical signals, the speech motor skills for acoustic signals ,

Signals can be entered into an information system via these channels in order to trigger a desired action by the system.

B. Known solution to the general problem according to the current state of the art

The aim of developing suitable interfaces between man and machine is to start from the properties of human communication channels and skills in order to provide devices, interaction techniques and interfaces that ensure effective mutual communication via these channels. To achieve this goal, so-called "virtual realities" (VR) are particularly suitable. The term "virtual reality" (VR) means the computer-based generation of an intuitive perceptible or sensible scene, consisting of its graphic representation and the interaction options for the user. A virtual environment enables a user to access information that would otherwise not be available at the given place or time. It relies on natural aspects of human perception by using visual information in three spatial dimensions. This. For example, information can be changed in a targeted manner or enriched with additional sensory stimuli. The essential prerequisites are the control of the perspective in real time and the possibility of active influence on the depicted scene by the user of the system.

When navigating through virtual environments, the user can use the type of control that is natural for him. This can include, for example, appropriate arm or leg movements, positioning of the head or eyes, turning the body or walking towards an object. By using the user's existing skills to control, the cognitive load during the interaction between man and machine can be reduced. This increases the bandwidth of communication between man and machine and improves the usability of the machine. While the conventional forms of human-machine communication control the machine in a command-oriented manner, no specific commands have to be re-learned and used when controlling objects in virtual environments: the computer "observes" the user passively and reacts on the basis of whose eye, head and / or hand movements etc. under real-time conditions in an appropriate manner.

In the case of commercial VR applications, a distinction is made between systems in which the user is fully integrated in the virtual environment ("Immersion") and systems that only present a "window" to virtual reality. In addition to the well-known forms of human-machine communication such as

• Direct manipulation of objects through manual fine motor operations (pointing, touching, gripping, moving,

Hold on etc.),

• formal interaction languages (programming languages, command languages and formal query languages),

• natural language interaction, • gestural interaction using non-verbal symbolic commands (facial expressions, gestures, gestures, movements) and

• Hybrid task-oriented forms of interaction

one can also understand virtual realities as a new form of human-machine communication. Like the name

"Virtual reality" already suggests that this requires a certain degree of realism in the presentation: The user should be provided with the sensory information that is required to process a task or to achieve a goal.

The visual perception not only provides information about the position, movement, shape, structure, contour, texture, color or pattern of objects etc., but also information about the relative body position of the viewer and his movements as well as about the nature of the three-dimensional environment. Synthetically generated environments can be made more realistic if as much as possible of the information occurring in natural environments (movement parallax, vanishing points of the perspective representation, spatial depth effect and plasticity, lighting and shadow cast, concealment, gloss effect, reflection effects and diffuse reflection etc.) is simulated become. How much and what information should be presented depends on the respective task. The differences between the real and virtual world determine how realistic the simulation is perceived.

The visual information has to be simulated by a computer in order to realize virtual realities. Similar aspects are relevant as in painting. The computer-aided simulation of three-dimensional worlds usually simulates the projection of individual light beams. The starting point of such a simulation is the specification of the environment to be simulated. To do this, the individual objects with their properties and their location must be defined. The intensities of individual pixels are then calculated for visualization and projected onto the output medium.

With the help of these simulations, completely new ways of learning and practicing can be realized (examples: vehicle or aircraft simulator), on the other hand, it is always abstracted from certain aspects of the real world. VR applications therefore simultaneously enrich and limit the user's experience.

VR systems basically consist of sensors and actuators and their coupling. Important hardware components include the following:

• "Displays" to present the virtual environment. As part of the visual presentation, today mainly monitors, "Head Mounted Displays" (HMD), "Binocular Omni-Oriented Monitors" (BOOM) and projection systems; However, auditory or tactile displays are also used, which react to acoustic or manual user input. • Positioning and orientation systems to record the location and perspective of the user. A distinction is made between the determination of the absolute position ("position tracking") and the measurement of the flexion of joints ("angle measurement"). Electromagnetic, kinematic, acoustic, optical and image processing procedures are used.

• Interaction and manipulation systems for acting and reacting to the user in the virtual environment. Pointing devices (2D or 3D mice, trackballs, joysticks etc.) or tactile devices (touchscreen, electromagnetic graphics tablet and stylus etc.) are used for this; So-called "data gloves" with diffraction and pressure sensors are also increasingly being used. Voice control should also be mentioned in this context.

• Calculation systems and software for creating the virtual environment under real-time requirements.

• Networks for the integration of different users, through which new forms of cooperation can develop.

The various technical variants of home or head based systems for the visualization of virtual realities are collectively referred to as "Visually Coupled Systems" (VCS). They consist of the following important components:

1. a display attached to the head or helmet,

2. a device for determining the user's head and / or gaze movements, 3. a source of visual information that depends on the user's head and / or gaze direction.

When using such a system for VR applications, information from both the real and the virtual environment can be presented simultaneously. you speaks of "see-through displays" for the representation of enriched realities.

Suitable visual components (lenses, semi-transparent mirrors) of high quality are required for visualization, which enable a sharply focused, enlarged image of the image source. Various systems come into consideration as image sources; however, cathode ray tubes or LCD screens are most commonly used. High resolution and luminance, high color saturation and high contrast as well as small dimensions of the image source are desirable. Two such image sources are required to visualize three-dimensional objects.

Tracking head movements is an important part of VR applications. Usually the position and orientation of the head in the room are determined, advanced systems can also follow the direction of the gaze. Most systems use either ultrasound, magnetic or light energy to communicate between the head-mounted transmitters and the receivers. Important technical data that play a role in the selection of these systems are:

The number of degrees of freedom for the directions of movement which can be registered and tracked,

The detectable angular range,

• the static accuracy (vibration sensitivity),

• the resolving power, • the reliability,

The data throughput and the screen sampling frequency,

• the interface to the computer as well

• further performance aspects. VR applications can be successfully used in practice in a number of different areas. Some possible applications are outlined below as examples.

• Development of virtual prototypes, for example in the automotive industry: Because of the increasing complexity of the components to be developed and the ever shorter development cycles, it makes sense to carry out planning and design with the support of computers. A higher innovation rate, shorter development cycles and better coordination of the knowledge available in a team can be achieved if several people plan and design the product in parallel in a virtual environment. By using virtual realities, even complex three-dimensional structures can be easily specified. The usability of virtual prototypes can then be evaluated by different users.

• Use in the training area: By learning to deal with (virtual) objects, interactive demonstrations, visualization of abstract concepts, virtual training in behavior in dangerous situations, virtual research into distant places or epochs, knowledge can be imparted, creative skills trained and behavior patterns be trained.

• Driving and flight training in appropriate simulators: By using simulators, behavior can be trained, especially in emergency situations

The technologies available today for entering information into a data processing system can be divided into four groups according to the sensors used: mechanical (e.g. keyboards, mice, trackballs and joysticks), electrical (e.g. tactile displays and graphics tablets), optical (e.g. light pens) and acoustic (e.g. voice input and Language interpretation systems). In the following, the aids for inputting information which are used according to the current state of the art and which are used for controlling objects in the area of VR applications will be briefly discussed.

Keyboards, mice, trackballs as well as joysticks need a storage space, i.e. a fixed position. With a touchscreen, on the other hand, it is possible to point directly at objects depicted on the screen without the need for additional space-consuming additional devices on the desk. Low-resolution touchscreens have 10 to 50 positions in the horizontal and vertical directions and use a horizontal and vertical row of infrared LEDs and photo sensors to build up a grid of invisible light rays right in front of the screen. When the screen is touched, both vertical and horizontal light beams are interrupted. The current finger position can be determined from this information.

Another known embodiment of touch-sensitive information input devices is the capacitively coupled touch panel. This provides a resolution of approximately 100 positions in each direction. If a user touches the conductive coated glass plate of the touchscreen with a finger, the current finger position can be determined based on the change in impedance. Other high-resolution panels use two minimally spaced, transparent layers. One is coated with a conductive material, the other with a resistance material. These two layers touch through the pressure of the finger, and the current finger position can then be determined by measuring the resulting voltage drop. A lower resolution Send and cheaper variant of this technology uses a grid of fine wires instead of these layers.

C. Inadequacies, effects and disadvantages of the known solution

The systems currently used to display and control objects in virtual environments increasingly take into account the ability of humans to record and process information, but they have one major disadvantage: when entering control commands for directly influencing the displayed scene, this is Users still rely on conventional methods for manually entering information, such as using a mouse, trackball, joystick, graphics tablet with stylus, touchscreen. The input mechanisms required for this must first be learned by the user in order to be able to be carried out at an appropriate reaction speed. On the other hand, the innate or already existing learned human skills for communication by means of acoustic signals (e.g. speech) or optical signals (e.g. facial expressions, gestures, gestures and movements) are only insufficiently taken into account when entering information for controlling objects.

Manipulating the properties and influencing the actions of objects in a depicted scene requires a complex interplay of sensors, cognitive processing and motor skills, which are influenced by many factors (individual behavior patterns and skills, experience, environmental influences, etc.). Interactions in a virtual world pose additional difficulties. For the control, manipulation or influencing of objects, a reflex-like or cognitive sensor-motor-related feedback is particularly important, for example that of receptors in the skin, kinesthetic sensations, the sense of balance as well as visual and / or acoustic sensations. In many cases this results in redundancy, which is not always the case with VR applications. Due to the insufficient sensory feedback in VR applications, learning motor skills is also made more difficult.

D. Special task to be solved by the invention

An ideal medium for communication between a user and an information system should be tailored to the sensory and perceptual as well as the motor skills as well as to the specific properties of the human user. The information should be structured in such a way that an optimal match between the representation of the information output and the mental model of the user is achieved: If the information to be displayed to the user is presented in such a way that, for example, his spatial perception is addressed users deal with amazingly complex amounts of information per unit of time. Likewise, the information system should be able to record, understand and process as many types of information sent by a user as possible and convert them into corresponding actions. This has the advantage that the user can react more efficiently and quickly to new events and situations. Ease of use and appropriateness of tasks are therefore typical features that an ideal medium has. These characteristics can be expressed as follows:

• Correspondence between type, scope and output speed and presentation of the information output with the sensory properties of the human user, Consideration of all information channels of the user when recording, recognizing and interpreting received control signals of the user,

• easy to learn and intuitive operability of the medium,

• high bandwidth of information transfer to the brain and high throughput of information,

• dynamic adaptation of the application to the individual properties, skills, tasks, work and organization techniques of the user,

• use of a natural interaction language with high semantic content,

• reliability, robustness and maintainability of the medium,

• social acceptance of the medium in the population, • consideration of health, ergonomic and safety-related aspects etc.

The object of the present invention is to improve the existing situation using technical means. The invention is therefore primarily devoted to the task of providing comfortable and reliable methods by means of which the user is able to actively control virtual objects, the existing skills of the user being used to transmit information. This object is achieved by a method with features according to claim 1.

E. Solution according to the invention for the special task according to the claims and advantages of this solution

Regarding claim 1: The user should be able to be enabled with

Using suitable interaction options to manipulate or influence the properties and / or actions of the objects shown in the field of view 203. For this purpose, additional hardware technology aids for manual entry of control commands (e.g. via keyboard, mouse, trackball, joystick or touchscreen) are not absolutely necessary. The present invention therefore uses methods of signal or pattern recognition for the detection and processing by the user in the form of signals transmitted by signals. Only through this inventive step does a pure output field of view become an interactively operable input and output field of view 203 that is adapted to the capabilities of the user. The properties of objects of a depicted scene 204 that can be manipulated by the user can be, for example, features such as location vector, position, dimensions, viewing perspective, geometric shape, structure, contour or texture of the objects. Interaction options are also conceivable, with which the type of representation of the objects is changed, such as color or pattern, brightness, contrast effect against the background, gloss effects, reflections and reflections, shading, etc. With the actions of Objects of the depicted scene can be about the movement of the objects. The movement of the objects can include translation and / or rotation components, the kinematic data of the objects, such as the magnitude and / or direction of its speed or acceleration vector or its angular velocity or angular acceleration vector, individually according to the control commands of the user can be changed.

Regarding claim 2: The interactive control commands of the user can be registered with the aid of sensors 502 and / or recording devices 501. After the data acquisition, the acquired input data can be fed to an evaluation and control device 503, can be interpreted and processed as control commands. In order to keep the possibility of misinterpretation of transmitted signals from a user by the data processing system as low as possible and thus to increase the reliability of the system, the system must be adapted to the usual properties of the user. For this reason, the system must be repeatedly "taught" all possible input signals of an individual command set of the user concerned in a training phase. According to these signals interpreted by the system as control commands, it is possible to manipulate or influence the objects of a depicted scene and / or to trigger an action. The objects manipulated or influenced in this way and / or the actions triggered thereby can be graphically visualized or acoustically and / or optically displayed to the user on a field of view 203. This provides a feedback loop between output and input data, with the aid of which the reaction to changes in objects of a depicted scene 204 can be made possible in almost real time.

Regarding claim 3:

The objects that can be controlled in the context of this method can, for example, be real existing objects in a real environment. A typical application example is the radio remote control of a robot to carry out dangerous work in difficult to access and / or distant places, for example to carry out repair work in sewer pipes, to investigate radioactive substances in hermetically sealed high-security rooms or to collect and transmit data for unmanned people Space missions. Another typical application example is the possibility of remote control of work and / or

Household appliances using non-manual input procedures for physically and / or manually disabled people. The type of input procedure can be based on the individually available

Ability of the user to be tailored.

Regarding claim 4:

However, exemplary embodiments of this invention are also conceivable in which the objects to be controlled are objects that do not actually exist in a virtual environment of a computer-controlled model. The use of systems for interactive control of virtual objects seems to make sense, especially in the training area: by learning how to deal with virtual objects, interactive demonstrations, visualizing abstract concepts, virtual training in behavior in dangerous situations, virtual exploration of distant places or epochs, knowledge can be imparted, creative skills are trained and behavioral patterns are trained. Typical areas of application are, for example, flight simulators, with the help of which critical situations in air traffic (engine failures, stalling, emergency landings on the water and on land, etc.) can be trained by trained pilots and learning progress can be assessed quantitatively.

Regarding claim 5:

A natural form of human-machine interaction and the input of information into an information processing system is that of natural language communication. For this reason, this form of input lends itself to exemplary embodiments of the present invention. However, the speech input systems used hitherto and in the foreseeable future make only limited use of natural language communication, by using only words a basic vocabulary fixed by the manufacturer

Scope plus a user-specific technical vocabulary (usually a few hundred to a thousand words) or combinations of these words are allowed. In order to increase the recognition rate, a user has to repeat the basic vocabulary as well as his user-specific technical vocabulary repeatedly in a training phase so that the system responds to the user's voice and any deviations from the normal state of the voice, for example as a result of a cold or Hoarseness of the user can adjust. In one embodiment of speech recognition systems, statistical models (usually “hidden Markov models”) are generated by the system during this training phase, with the aid of which, for example, the sequence of individual phonemes or syllables of the words being trained are assigned probability density functions, so that newly spoken words are identified as correct or incorrect with a predictable probability.

Regarding claim 6:

The device 503 for recognizing and interpreting the control commands of the user can be, for example, an automatic recognition system for acoustic signals, especially for speech signals. Most of the automatic speech recognition systems used today are, for example, word, syllable or phoneme-based systems that are only designed for context-independent speech recognition. This means that the voice commands consisting of words from a trained vocabulary currently have to be spoken to the system with sufficiently long pauses ("discreet") in order to achieve acceptable word recognition rates in the range of approx. 90% to 95% for the user. Commercial polyphonic-based prototypes of context-dependent speech recognition systems for fluently ("^" continuously ") spoken language already achieve word recognition rates of approx. 95% to 98%. However, the computing power of the system for continuous context-dependent speech recognition is so great that it can no longer be carried out in real time on current high-performance computers.

Regarding claim 7:

Another form of human-machine communication, which accommodates the properties and skills of the user, is the evaluation of optical signals, such as body, head, face, leg, foot, arm, hand and / or finger movements, i.e. gestures, gestures and / or facial expressions. The widespread ability of humans to encode information through the position, spatial direction and movement of body parts still poses a major challenge for the design of input methods for human-machine communication.

Regarding claim 8:

Sensors 202 and / or recording devices 201 can be used to receive optical signals from the user. Specifically, these can be optical or acoustic distance sensors and / or video cameras. The distance sensor can have an ultrasound or high-frequency radiation source as well as a detector and devices that focus the sound or radiation on the measurement object and collect the sound or light waves reflected by the measurement object on the detector. Part of the ambient sound or the ambient radiation can be blocked with the help of filters. So that the sensor system functions reliably regardless of the ambient conditions, compression, coding and modulation of the signals emitted by the signal source and suitable signal processing on the detector side for demodulation, decoding and decompression of the received signals can be provided. Regarding claim 9:

In device 503 for interpreting interactive

Control commands from a user can also be, for example, an automatic detection system for movement signals, whereby methods of image processing can be used. One possibility, for example, is to analyze the user's lip movements when speaking in order to increase the recognition reliability of the speech input. Modern studies have shown that the combination of acoustic and optical speech recognition can reduce the word error rate for individual speakers by 30% to 50% compared to only acoustic speech recognition. Other embodiments can use the head and / or eye movements to input information. A measurement of the head and / or line of sight is required for this. In an input method known since 1982, which is based on a measurement of the user's viewing direction by contactless detection of the head movement, the head position is visually fed back by means of a visor attached in front of an eye. A small light source is also attached to the head, the position of which is measured with the help of a video camera, so that head movements around two axes of rotation (horizontal and vertical) can be recorded. Another input method based on measuring eye movements has been known since 1987. It is assumed that the selection of an object displayed on an optical display is one of the most common input operations and that the visual fixation of an object to be selected is a normal human behavior. This type of input is particularly recommended for high demands on the speed of input or if the hands are disabled or other tasks are required.

Regarding claim 10: The output device 505 for displaying a virtual environment can be, for example, a device for generating a so-called "Virtual Retinal Display" (VRD), in which a virtual image area 203 is projected onto the retina 206 of the user. Instead of a screen or a display device, only a coherent light source emitting photon radiation is required. This can be a device from the work or leisure area or an orthosis that brings about an expansion of human eyesight. Compared to real displays, a VRD has the following advantages: 1. The resolution of the VRD is only limited by the diffraction and optical aberration of the light beam in the human eye, but not by the size of an elementary that is technically feasible for screens or real displays

Picture element (pixel size). For this reason, images can be generated with a very high resolution. 2. The image brightness that can be achieved with the help of a VRD can be controlled by the intensity of the emitted light beam. When using a laser as a light source, the image brightness of the VRD can be set high enough to be used outdoors. 3. VRDs can be operated either in a mode for virtual realities or in a mode for enriched realities ("see-through mode"). 4. The industrial production of a light source for generating a VRD is relatively simple and can be carried out with low production costs in comparison with conventional screens and display devices. 5. Since a large part of the generated light is focused on the retina of the viewer, VRDs work with a high degree of efficiency and have a low power consumption in comparison to screens and real displays.

Regarding claim 11: The virtual image surface 401 can have, for example, virtual input-sensitive reference points 404, 405, 406 and 407 and / or surfaces 403 in a virtual image plane of a predetermined spatial direction. These can be flat surfaces with a given area. The spatial direction of the virtual image plane can preferably be oriented perpendicular to the viewing direction, that is, such that the normal vector n of the virtual image plane is parallel or antiparallel to the viewing direction vector b. Inputs are triggered when a real object (an object or a body part of the user) touches the plane of the virtual display on such an input-sensitive surface 205 or intersects the surface at any entry angle. With the help of methods of pattern recognition it can be possible to distinguish an intended input from an unintentional input. For example, the index finger of a user's right hand could be recognized as the only input medium. In this case, interaction is only permitted if this finger intersects or touches an input-sensitive surface in a virtual image plane.

Regarding claim 12:

The distance between the plumb point of the virtual image plane and the point of the sharpest image on the retina of one of the eyes 304 of the viewer or the limitation and spatial orientation of the image surface 303 of the virtual image plane can be determined by the detection of optical signals by the user with the aid of an optical or acoustic distance sensor 502 , a video camera 501 and a downstream evaluation electronics 503.

Regarding claim 13:

In the case of the optical display devices, in addition to conventional analog and digital displays, the image screen because the information can be encoded in many different ways (text, graphics, images, animation, etc.). In addition, the options for flexible organization of the information are available, with which the required search and selection process of individual objects can be supported. 3D displays or 3D glasses can be used to display spatial facts. With all of these optical displays, in addition to the necessary attention of the user, movements of the head and / or eyes are required for the reading. To prevent the visual distraction of the user, projection devices (“head up” displays) can be used, for example in aircraft or vehicle guidance, in which the information to be read is faded into the viewing area onto which the eye perceives the Main task (such as aircraft or vehicle control) is preferably aligned. A special form is the design as a helmet-based display device in the form of a "head-mounted display" (HMD), in which the display unit is fastened in the helmet of the user. Conventional VR applications use HMDs to display virtual worlds via a stereo display system as well as via surround sound systems or headphones coupled to the HMD. In order for the three-dimensional virtual environment to appear stable to the viewer even when the head is moving, the head movement must be recorded in the six degrees of freedom and passed on to the display. This must be done with a sufficiently high frequency of at least 70 Hz. A large number of different measuring systems have been used to measure head movement. However, many are either uncomfortable for the viewer or too expensive to procure. In addition, acoustic or haptic output devices can be provided, since low-intensity acoustic and haptic signals do not cause any visual distraction for the user. However, haptic signal transmitters are currently only widely used as aids for the blind. F. Description of the figures

The invention is described in more detail below on the basis of preferred exemplary embodiments as described in FIGS. 1 to 5.

Show in detail

1 shows a design sample 101, a schematic diagram 102 and a typical application example 103 of a "Virtual Retinal Display" (VRD) from the user's perspective (source: "Technologies to Watch", May 2000 edition),

FIG. 2 shows a schematic representation of the input method via a VRD,

FIG. 3 shows a schematic illustration for recording the distance of the virtual image plane in a VRD,

FIG. 4 shows an example for the acquisition of the virtual corner coordinates 404, 405, 406 and 407 of the image area 401 of a VRD and

FIG. 5 shows a schematic hardware structure for recording the input and controlling the output of a VRD.

FIG. 1 shows a design pattern 101 of a Microvision system for generating a VRD, a basic illustration 102 of the functioning of the VRD and a typical application example 103 of a VRD. The application example shown is a voltage measurement in the motor vehicle sector, in which the user uses a VRD Serial number, profile, measured values and the measured voltage-time characteristic of the tested component can be displayed.

FIG. 2 demonstrates the method of input using touch-sensitive buttons 205 for VRDs. The image area 203 of the virtual image plane is shown with the partial areas contained therein for the virtual output area 204 and the virtual buttons or switches 205. If the input areas are touched by a finger of the user 207 or an object or are pierced at any entry angle, an input is recognized and the corresponding action is triggered. The inputs of the user can be recorded using a video camera 201 and / or a distance sensor 202.

FIG. 3 illustrates the method for determining the distance of the virtual image plane from the retina of one of the viewer's eyes 304 in a VRD. The position and the two orientation vectors of the virtual image plane 303 are determined by the position and spatial orientation of the user's extended palm 305. For this purpose, for example, a distance sensor 302 emitting ultrasound or infrared waves, a video camera 301 and a method for signal or pattern recognition can be used. Ideally, the spatial orientation of the virtual image plane is determined by the user in such a way that the normal vector n of the virtual image plane runs parallel or antiparallel to the viewing direction vector b, that is to say the scalar product of the magnitude-standardized vectors n / | n | | | ₂ and b / | | B || ₂ gives the values +1 or -1. The distance between the virtual plane and the viewer then results as the distance between the plumb point, that is, the intersection between the virtual image plane and the line of sight, and the point of sharper imaging on the retina of one of the viewer's eyes 304. FIG. 4 shows how the boundary points of a virtual polygonal image area 401 are entered. In the sketched example, a rectangular image area was defined by specifying four corner points. The Cartesian coordinates of the image corner points 404, 405, 406 and 407 can be generated at those spatial positions at which, for example, a finger of a hand 408 of the user or an object touches or intersects the virtual image plane 401. The arrows indicate the movement of the finger from one corner point to the next corner point of the virtual image surface. Via the virtual switch or button 403, for example, the functions for the on / off. Switching off the image displayed on the virtual output surface 402 for switching between several

Information channels for which mode selection for setting system and / or image parameters as well as for the archive for storing sound and / or image sequences are controlled.

FIG. 5 shows the feedback circuit between man and machine, which contains the schematic hardware structure for recording the input and controlling the output in a VRD. The optical information of the user obtained via a video camera 501 or a distance sensor 502 is fed to a central control unit 503, in which the information is recognized, interpreted and processed. After triggering a corresponding action, the newly determined or calculated data is forwarded to the control unit 504 for controlling the VRD. This then takes over the output of the text and / or image data via, for example, a laser-operated VRD 505. After this output data has been recorded and processed by the human user 506, a new reaction by the user to the changed actual state can then begin. The meaning of the symbols denoted by numerals in FIGS. 1 to 5 can be found in the list of reference symbols below.

LIST OF REFERENCE NUMBERS

No.

101 Design samples of a Microvision system for creating a "Virtual Retinal Display" (VRD)

102 Principle representation of how a VRD system works

103 Application example of a VRD system from the user's perspective

201 video camera

202 Optical or acoustic distance sensor

203 image area in the virtual image plane of the VRD (virtual field of view or virtual display)

204 Virtual output area

205 touch-sensitive input surfaces (virtual keys or virtual switches)

206 e.g. right eye of the user

207 Triggering of an input via the VRD by touching or penetrating a touch-sensitive input surface by a finger of the user

301 video camera

302 Optical or acoustic distance sensor

303 image area in the virtual image plane of the VRD (virtual field of view or virtual display)

304 e.g. right eye of the user

305 e.g. right hand of the user

401 image area in the virtual image plane of the VRD (virtual field of view or virtual display)

402 Virtual output area

403 touch-sensitive input surfaces (virtual keys or virtual switches)

404 Upper left corner of the VRD image area

405 Upper right corner of the VRD screen

406 Lower right corner of the VRD image area

407 Lower left corner of the VRD screen

408 e.g. right hand of the user

501 video camera

502 Optical or acoustic distance sensor

503 Central control unit (CCU)

504 control unit for controlling the VRD

505 control unit for a laser-powered VRD

506 Perception and processing of the recorded information by the human user and reaction of the user to the changed situation

Claims

claims

1. Process for the visual representation of objects on min ^¬ least a field of view (203), characterized in that at least a displayed object manipulated using interactive control commands of a user in its properties and / or actions are controlled or influenced and / or It is possible for the user to navigate in a depicted scene (204), the detection and processing of the information entered by the user being carried out with the aid of methods of signal or pattern recognition and thus no additional manually operated, mechanical or touch-sensitive hardware -Devices for entering control commands are required.

2. The method according to claim 1, characterized in that a) the interactive control commands of the user with the aid of sensors (502) and / or recording devices

(501) are registered, b) the recorded input data are fed to an evaluation and control device (503) and interpreted as control commands, and c) the objects are manipulated or influenced according to these control commands or / and an action is triggered d) which manipulated or influenced objects and / or the triggered action are visualized or acoustically and / or optically displayed on a display (204).

3. The method according to any one of the preceding claims, characterized in that the objects to be controlled are actually existing objects in a real environment.

4. The method according to claim 1 or 2, characterized in that the objects to be controlled are non-real objects in a virtual environment of a computer-controlled model.

5. The method according to any one of claims 1 to 4, characterized in that the interactive control commands of a user are acoustic signals, such as voice commands.

6. The method according to any one of claims 1 to 5, characterized in that the device (503) for interpreting control commands is an automatic recognition system for speech signals.

7. The method according to any one of claims 1 to 4, characterized in that the interactive control commands of a user are optical signals, such as body, head, foot, hand and / or finger movements, ie gestures, gestures and / or facial expressions.

8. The method according to any one of claims 1 to 4 or 7, characterized in that the sensors (502) and / or recording devices (501) for registering optical signals of the user are distance sensors and / or video cameras.

9. The method according to any one of claims 1 to 4, 7 or 8, characterized in that the device for interpreting control commands is an automatic detection system for motion signals.

10. The method as claimed in one of claims 1 to 9, characterized in that the output device 505 for displaying a virtual environment is a device for generating a so-called "Virtual Retinal Display" (VRD), in which a virtual display (203 ) is projected onto the user's retina.

11. The method according to any one of claims 1 to 4 or 7 to 10, characterized in that inputs are triggered when a real object intersects or touches the plane of the virtual display (401) on an input-sensitive surface (403).

12. experienced according to one of claims 1 to 4 or 7 to 11, characterized in that the determination of the distance between the plane (303) of the virtual display and the retina of one of the eyes (304) of the user or the limitation and spatial orientation the surface (401) of the virtual display is carried out by the detection of optical signals of the user with the aid of a distance sensor (502) and a downstream evaluation electronics (503).

13. The method according to any one of claims 1 to 9, characterized in that the output device (505) for displaying a virtual environment is a so-called "head-mounted display" (HMD).