CN101690150A

CN101690150A - virtual reality-based teleconferencing

Info

Publication number: CN101690150A
Application number: CN200880012055A
Authority: CN
Inventors: 菲利普·克里斯蒂安·奔迪特; 布尔克哈德特·吕本·约瑟夫·詹森·博内洛; 马蒂亚斯·韦尔克; 保罗·乔纳森·麦凯布; 马克·维尔纳·弗莱施曼; 赖因哈德·克恩
Original assignee: MUSECOM Ltd
Current assignee: MUSECOM Ltd
Priority date: 2007-04-14
Filing date: 2008-04-10
Publication date: 2010-03-31
Also published as: EP2145465A2; WO2008125593A2; WO2008125593A3

Abstract

A virtual reality environment is applied to teleconferencing such that the environment is used to enter into a teleconference. According to a first aspect, the invention provides a method of controlling volume of sound data during a teleconference, the method comprising providing a virtual representation. including objects that represent users in the teleconference; and controlling the volume of the sound data according to how the users change location and relative orientation of their objects in the virtual representation. It is also preferred according to a second aspect that a user is represented by an avatar in the virtual reality environment, and wherein the user can control its avatar to move around the virtual reality environment. The method may allow a user to meet another throughintuitive actions of the user's avatar. The method may further comprise accepting control inputs from the user to control gestures of the user's avatar.

Description

Virtual reality-based teleconference

Technical Field

The present invention relates to a virtual reality environment being applied to a conference call such that the environment is used to enter the conference call.

Disclosure of Invention

According to a first aspect, the present invention provides a method of controlling the volume of sound data during a conference call, the method comprising: providing a virtual representation comprising objects representing users in a teleconference; and controlling the volume of the sound data in dependence on how the user changes the position and relative orientation of his object in the virtual representation.

The method preferably comprises the steps of: other audio characteristics of the sound data are changed depending on how the user interacts with the virtual representation.

Preferably, the objects in the virtual representation also have an audio range, so that the volume of the sound data is also controlled in dependence on the audio range. Preferably, the audio frequency range is adjustable.

According to the method of the invention, the virtual representation is preferably a virtual environment, and wherein the user is represented by an avatar. In the preferred embodiment, the volume of the sound data between two users is a function of the relative orientation of their avatars.

The virtual representation is preferably provided by a server system which calculates sound coefficients for each object as a sound source relative to the drain (drain); and wherein, for each user, controlling the volume comprises: applying those sound coefficients to the sound data of the corresponding objects of the sound coefficients, mixing the modified sound data and providing the mixed sound data to the sink. For example, the sound data is based on

To be mixed.

According to an alternative embodiment of the first aspect of the present invention, there is provided a method comprising: providing a virtual representation; establishing a telephone connection with a plurality of users represented by objects in the virtual representation, each user's representation object being both a sound sink and a sound source; and for each drain, mixing sound data from different sound sources and providing the mixed data to a user associated with the drain, wherein the volume of the sound data from the sources is adjusted according to a topological measure of the sources relative to the drain; so that the user does not communicate directly but instead communicates through a synchronized auditory environment.

The mixed sound data preferably includes, for each leak: calculating an audio parameter for each paired source, each audio parameter controlling volume as a function of its corresponding source-to-drain proximity; and adjusting the sound data of each pair of sources with the corresponding audio parameters, mixing the sound data of the pair of sources, and providing the mixed sound data to a user associated with the leak.

The virtual representation preferably comprises other objects as sound sources, wherein the volume of sound data from the sources is adjusted according to the sources relative to the vulnerability topology measure; and wherein the adjusted sound data from the other objects are also mixed and provided to the sink. The object preferably comprises an audio range.

For example, a topological metric is the visual distance between the source and drain, and may include distance and orientation. The audio is preferably clustered (cluster) to reduce the computational burden. Similar to the first embodiment, the sound is according to

To be mixed.

In order to reduce the computational burden for each leaky mixed sound data, it is preferable to mix sound data only for those sound sources that make a significant contribution. The audio range of a particular object is automatically set at zero or close to zero, thereby excluding sound data for those particular objects from the mix. It is also preferred that a minimum distance between objects is enforced to reduce the computational burden of the mixed sound data. Preferably, some of the sound data is pre-mixed to reduce the computational burden of mixing the sound data; wherein the pre-mixing comprises mixing sound data from a group of sound leaks and assigning a single coefficient to each leak of the group. The invention also includes making direct connections between the source and drain to reduce the computational burden of mixing the sound data.

The first aspect of the present invention also includes a computing system comprising: a telephone-based teleconferencing apparatus; and means for providing a virtual representation comprising objects representing participants in the teleconference, the virtual representation allowing the participants to enter the teleconference using the telephone-based teleconferencing apparatus and controlling the volume during the teleconference, the volume being controlled in dependence on how the user changes the position and relative orientation of the user's objects in the virtual representation.

An alternative communication system according to the first aspect, comprising: a server system for providing a virtual representation; and a teleconferencing system for establishing telephonic connections with a plurality of users, the users being represented by objects in the virtual representation, the teleconferencing system controlling the volume during a teleconference according to how the users change the position and relative orientation of the user's representation objects in the virtual representation. Preferably, the representation object of each user is both a sound leakage and a sound source; and wherein for each drain, sound data from different sound sources is mixed and the mixed data is provided to a user associated with the drain, wherein the volume of the sound data from the sources is adjusted according to a topological measure of the sources relative to the drains.

According to a second aspect, the invention provides a method comprising applying a virtual reality environment to a conference call such that the environment is used to enter the conference call. The environment preferably allows a user to enter without knowing any other users in the environment, while enabling the user to meet and conference on the phone with at least one other user.

The step of applying the virtual reality environment preferably comprises: the method includes presenting a virtual reality environment to a user, presenting representations of the user and other users in the virtual reality environment, and enabling the representations of the user to experience the virtual reality environment, meet the other users, and enter a conference call. It is also preferred that the virtual reality environment enables a user to conduct a conference call via telephone or via a VoIP phone.

The step of applying the environment may further comprise: the method includes initiating a session with a user, presenting a virtual reality environment to the user, recognizing a telephone call from the user, and adding the telephone call to the session. Alternatively, applying the environment may include: the method includes initiating a first session with a user, presenting the user with a virtual reality environment, initiating a second session in response to a telephone call, and merging the first session and the second session together if the telephone call is made by the user.

The method according to the second aspect of the present invention preferably further comprises: the user is called at the user's request so that the user can be voice-enabled in the virtual reality environment.

It is also preferred that when the user calls another user not represented in the virtual reality environment, a representation of the other user is added to the virtual reality environment.

The virtual reality environment preferably enables a user having only devices that are not capable of displaying the virtual reality environment to enter a conference call and experience sound to give a view in the virtual reality environment.

According to a preferred embodiment, more than one virtual reality environment may be applied to the teleconference. The user may move in and out of different virtual reality environments. The virtual reality environments are preferably linked, and each virtual reality environment is preferably uniquely addressable.

It is also preferred that at least a portion of the virtual reality environment is private. The virtual reality environment may also have a persistence state and may overlap the real space.

The user preferably establishes a connection with a location in the virtual reality environment.

Preferably, the user has an audio range in the virtual reality environment. The audio range is dynamically adjustable.

According to a preferred embodiment, the audio between users is attenuated as a function of closeness between users.

It is also preferred according to the second aspect that the user is represented by an avatar in the virtual reality environment, and wherein the user can control movement of his avatar around the virtual reality environment. The method also allows a user to meet another user through intuitive actions of the user avatar. The method may further include accepting control input from the user to control a pose of the avatar of the user.

The volume between the user and another user may be a function of their relative orientation in the virtual reality environment.

The user establishes a connection with a location in the virtual reality environment and may also establish a connection with a multimedia source. The user and the other users preferably view the same multimedia by viewing the same window displaying the multimedia and discuss the displayed multimedia via the teleconference at the same time. The user and another user may share multimedia viewing through co-browsing. Alternatively, one user shares the multimedia source with another user by dragging and releasing the multimedia presentation in the vicinity of the other user's presentation.

The method of the second aspect may further comprise: internet content is mixed using a telephone link so that a user can access the content on the internet via a telephone interface.

An additional virtual reality environment may be preferred for a user, wherein the user is instead assigned to one of the additional environments based on the characteristics of the user. A user may have multiple personal profiles, each representing a different aspect of the user, where the user may switch between the multiple personal profiles. The user may have a personal profile that may be disclosed. However, the user has the option of remaining anonymous.

The method may further include providing a service agent in the virtual reality environment.

According to a second aspect, the invention also provides an apparatus for applying a virtual reality environment to a teleconference to enable a user to enter the virtual reality environment without knowing any other users in the virtual reality environment, while enabling the user to meet and maintain the teleconference with other users in the virtual reality environment.

Furthermore, a second aspect provides a system comprising: means for a conference call; and means for coupling the immersive virtual reality environment with the teleconference. The system is preferably network based.

According to another embodiment, the present invention provides a teleconference method including: entering a virtual reality environment provided by a service provider; manipulating an avatar around a virtual reality environment; establishing a telephone call with a service provider to become voice-enabled; and other conversations with voice-enabled representations in the virtual reality environment.

According to a third aspect, the present invention provides a communication system comprising: a server system for providing a virtual representation comprising at least one object; and a teleconferencing system for establishing audio communication with the audio-only device; the objects in the virtual representation are controlled in response to signals from the audio-only device.

Preferably, at least one of the objects is mobile and represents a user of the audio-only device. More preferably, the object representing the user of the audio-only device is an avatar; and wherein the signal from the audio-only device causes the avatar to move around the virtual representation. Preferably, the signal from the pure device causes the object to move around the virtual representation; and wherein the teleconferencing system allows a user of the audio-only device to speak with other users represented in the virtual representation, but not see the virtual representation.

The server system preferably provides additional virtual representations and signals from the audio-only devices cause users representing the audio-only devices to walk to different virtual representations.

Preferably, a user representing an audio-only device may be assigned to a virtual representation by directly dialing the virtual representation.

The virtual representation is preferably a virtual environment, and the signal from the audio-only device allows the user to interact with the virtual environment. The audio-only device may be a telephone and the signal may be a telephone signal. The signal may be a Dial Tone (DTMF) signal or a voice signal.

The communication system preferably further comprises: means for providing an audio description of the virtual representation to the audio-only device. Objects closer to the user's representation object in the virtual representation are preferably described in more detail. The virtual representation may also be described from the perspective of the first person.

Preferably, the first object in the virtual representation represents an internet resource; and wherein a user of the audio-only device may access the internet by controlling the state of the first object.

The teleconferencing system may include a VoIP system for establishing a VoIP connection with a network connection device.

According to a preferred embodiment of the communication system, the user of the audio-only device is represented in a virtual representation to be seen by other users; and the representation object for the user indicates pure capabilities.

The third aspect of the present invention also provides a system comprising: means for providing a virtual representation, the virtual representation comprising an object; means for receiving a signal from an audio only device; and means for controlling a state of the object in response to the signal.

The third aspect of the present invention also provides a communication system, the communication system: for providing a virtual environment comprising a plurality of objects, the objects having changeable states; and for establishing audio communication with an audio-only device; the system controls the state of objects in the virtual representation in response to signals from the audio-only device so that a user of the audio apparatus can interact with the virtual environment.

Furthermore, a third aspect of the invention provides a method of controlling an object in a virtual environment, the method comprising: receiving a signal from an audio only device; and controlling a state of the object in response to the signal. The method preferably further comprises providing the audio-only device with an audio description of the virtual environment.

According to a fourth aspect, the present invention provides a method of providing a service, the method comprising: providing a network-accessible virtual environment, the virtual environment including objects representing users of the service; allowing a user to control a representation object of the user in the virtual environment to personally interact with other objects represented in the virtual environment and also become voice-enabled; and enabling those voice-enabled users to speak via telephone with other voice-enabled users.

The user controls his representation objects, preferably via a client device; and allowing the user to control its objects includes: a command is received from the client device and the presentation object is moved in response to the command. It may also allow the user to control their representation objects via the internet; and wherein those users that are voice-enabled are enabled to speak with each other via the public switched telephone network.

The fourth aspect of the invention further comprises interacting with the virtual environment to control audio characteristics in the virtual environment. The object in the virtual environment has an audio range, so that the volume of the sound data can also be controlled according to the audio range. The users interact as a function of how closely they are together. Closeness between two users is measured as the distance between the web pages that those two users are currently viewing. Alternatively, closeness between two users is measured as the distance between two coordinates on the web page that those two users are currently viewing.

Still preferably, according to the fourth aspect of the present invention, the user is represented by an avatar; and wherein the volume of the sound data between the two users is a function of the relative orientation of their avatars.

The method preferably allows a particular user to personally interact with other users in the virtual environment without having to see the virtual environment. The method also preferably includes calling the user at the user's request so that the user can be voice-enabled in the virtual environment. When a user calls another user not represented in the virtual environment, a representation object of the other user is preferably added to the virtual environment.

Preferably, a plurality of virtual environments are provided; and the user may move in and out of different virtual environments.

Communications may also be performed through shared, changeable objects. The user may also be allowed to communicate through intuitive actions of their avatar.

According to a preferred embodiment, the user shares the multimedia connection by each viewing a window displaying the multimedia connection and discussing the displayed multimedia via the phone at the same time. Users can share multimedia through co-browsing. A user may also share a multimedia source with another user by dragging and releasing the multimedia presentation near another user's presentation object.

An additional virtual reality environment may be preferred for a user, wherein the user is instead assigned to one of the additional environments based on the characteristics of the user. A user may have multiple personal profiles, each representing a different aspect of the user, where the user may switch between the multiple personal profiles.

The fourth aspect of the present invention also provides a system comprising: means for providing a network-accessible virtual environment, the virtual environment comprising an object representing a user of the system; means for allowing a user to control a representation object of the user in the virtual environment to personally interact with other objects represented in the virtual environment and also become speech-enabled; and means for enabling those voice-enabled users to speak via telephone with other voice-enabled users.

According to a fourth aspect of the present invention, there is also provided a system comprising: a server system for providing a virtual environment, the virtual environment including objects representing users of the system, the server system allowing users to control the objects represented in the virtual environment by the users to personally interact with other objects represented in the virtual environment; and a telephone system for enabling those voice-enabled users to speak with other voice-enabled users via the telephone. The server system is preferably network based; and the server system receiving commands from the client to the device to control the objects in the virtual environment; and the telephone system enables at least some users to speak via the public switched telephone network.

According to a fifth aspect of the present invention, there is provided a communication system comprising: a teleconference system for hosting a teleconference; and a server system for providing a virtual representation for the teleconferencing system, the virtual representation comprising objects, the states of which can be commanded to gradually transition, the server system providing client machines to the client devices, each client machine having its client device display the virtual representation; each client device is capable of generating and transmitting commands to the server system for gradually transitioning the object to a new state in the virtual representation; the server system instructs the client to transition the object to its new state within a specified time.

The server system preferably causes the teleconferencing system to control the audio characteristics in a manner consistent with the virtual representation. The teleconferencing system may include a telephone system.

According to a preferred embodiment of the communication system, when the client device commands an object to gradually transition to a new state, the server system receives the command and generates an event instructing all clients to transition the object to the new state within a specified time. The server system also preferably tracks objects that suddenly transition; and wherein when the client device commands an object to suddenly transition to a new state, the server system receives the command and generates an event commanding all clients to show the object in the new state at the specified time.

Preferably, at least some of the objects are movable and represent users.

In this aspect of the invention, preferably, the virtual representation is a visual virtual environment.

The server preferably manages a master model of the object state over time, adjusting the transitions of the objects in the virtual representation. More preferably, the server system determines, in response to the command, a first time at which the object will start to transition from the current state and a second time at which the object will reach the new state; and wherein the server system sends the start and stop times and the new status to the client. The server system also calculates a movement path including waypoints and arrival times of the waypoints and transmits the movement path to the client. The client may also calculate a transition path and send the transition path to the server system.

The server system is preferably network based. Also preferably, the client may run on a virtual machine. The client may be a Flash client.

The teleconferencing system may include a VoIP system for establishing a VoIP connection with the network connection device.

The communication system preferably further comprises a sound system for generating sound of the object in the virtual representation, wherein the server system also synchronizes the object in the virtual representation with the sound; and wherein the sound system mixes the synchronized sound with audio from the teleconference.

In the communication system of the present invention, the server system preferably includes a world server for generating data for changing temporal audio characteristics of audio between users during a teleconference.

According to the fifth aspect, it is also preferred that the virtual representation is a virtual environment, and wherein the communication system further comprises means for allowing the audio only device to control objects in the virtual environment. The apparatus controls an object in the virtual environment in response to the telephony signal. The system may further include means for providing an audio description of the virtual representation to the audio-only device.

Preferably, the teleconferencing system hosts a plurality of teleconferences between different groups of users; wherein the server system provides separate additional virtual representations and accommodates state transitions of the objects in each virtual representation; and wherein the server system filters communications with the clients, sending communications only to those clients that need to transition objects in the particular virtual representation.

The fifth aspect of the present invention also provides a communication system for a plurality of client devices, comprising: a first means for hosting a conference call; and second means for providing virtual representations of the conference call enabled, each virtual representation comprising an object, the state of the object gradually changing, the second means providing clients to at least some of the client devices, each client having its client device display the virtual representation; each client device is capable of generating and sending commands to the second apparatus for gradually transitioning the object to a new state in the virtual representation; the second device instructs the client to transition the object to approximately the same state at approximately the same time; the second device causes the first device to control audio characteristics of the teleconference to conform to the virtual representation.

The fifth aspect of the present invention also provides a method of providing a communication service, the method comprising: hosting a teleconference; providing a client to a plurality of client devices, each client having its client device a virtual representation of a real teleconference, the virtual representation including objects whose states gradually change; waiting for object state change commands from the client, each object state change command for gradually transitioning the object to a new state in the virtual representation; and generating an event in response to the command, the event causing each client to transition the object to approximately the same state at approximately the same time.

Drawings

FIG. 1 is a system diagram according to an embodiment of the invention.

Fig. 2 is a system diagram according to an embodiment of the invention.

Fig. 3 is a diagram of a method according to an embodiment of the invention.

FIG. 4 is an illustration of a virtual reality environment, according to an embodiment of the invention.

FIG. 5 is an illustration of a state diagram of a virtual reality environment.

Fig. 6-7 are illustrations of a method of providing audio to a user according to an embodiment of the present invention.

FIG. 8 is an illustration of two avatars (avatars) facing each other.

Fig. 9-10 are illustrations of methods according to embodiments of the invention.

Fig. 11 is an illustration of a method of reducing the computational burden of acoustic mixing (sound mixing) according to an embodiment of the present invention.

Fig. 12a-12c are illustrations of acoustic mixing according to embodiments of the present invention.

FIG. 13 is an illustration of a method according to an embodiment of the invention.

Fig. 14 is an illustration of a service provided by a service provider according to an embodiment of the present invention.

FIG. 15 is an illustration of a method according to an embodiment of the invention.

Fig. 16 is an illustration of a system according to an embodiment of the invention.

FIG. 17 is an illustration of a method according to an embodiment of the invention.

FIG. 18 is an illustration of a method of acoustic mixing according to an embodiment of the present invention.

FIG. 19 is an illustration of a system according to an embodiment of the invention.

FIG. 20 is an illustration of a method according to an embodiment of the invention.

FIG. 21 is an illustration of a method according to an embodiment of the invention.

FIG. 22 is an illustration of a method according to an embodiment of the invention.

Fig. 23 is an illustration of an activity schedule (timeline) according to an embodiment of the invention.

FIG. 24 is an illustration of a method of acoustic mixing according to an embodiment of the present invention.

Fig. 25-26 are diagrams of a method of calculating waypoints for a moving object according to an embodiment of the present invention.

Fig. 27a-27d are diagrams of different topologies of a communication system according to an embodiment of the present invention.

FIG. 28 is an illustration of a system according to an embodiment of the invention.

FIG. 29 is an illustration of a portion of a system according to an embodiment of the invention.

Detailed Description

Reference is made to fig. 1, which illustrates a teleconferencing system 100 that includes a teleconferencing service provider 110. The service provider 110 applies the virtual reality environment to the teleconference so that the environment can be used to enter the teleconference. In some embodiments, an environment enables a user to enter the environment without knowing any other users in the environment, yet still enable the user to meet and conduct a teleconference with other users in the environment.

The term "user" refers to an entity that utilizes the teleconference service. The entity may be an individual (individual), a group of people (e.g., a family, a business) that collectively appear as a single unit, and so on. The term "another" (when used alone) refers to another user. The term "other" refers to other users.

A user may utilize a user device 120 having a geographic user interface to connect to the service provider 110. Such user devices 120 include, without limitation, computers, tablet PCs, VoIP phones, game consoles, television sets with set-top boxes, certain cellular telephones, and personal digital assistants. For example, a computer may be connected to the service provider 110 via the internet or other network, and its user may enter a virtual reality environment and participate in a conference call.

A user may connect to the service provider 110 using a user device 130 that does not have a geographic user interface. Such user devices 130 include, without limitation, traditional telephones (e.g., button phones, dial phones), cellular phones, VoIP phones, and other devices that have a telephone interface but no geographic user interface. For example, a traditional telephone may be connected to the service provider 110 via a PSTN network, and its user may enter a virtual reality environment and participate in a conference call.

A user may utilize both device 120 and device 130 during a single teleconference. For example, a user may utilize a device 120, such as a computer, to enter and manipulate (navigator) a virtual reality telephony environment, and a button phone 130 to participate in a conference call.

Reference is made to fig. 21, which illustrates a method of providing a service that allows personal interaction between users. The method comprises the following steps: providing a virtual environment accessible to the network (the environment including objects of users presenting the service) (block 2110), allowing users to control the objects of their representations in the virtual environment to personally interact with other users presented in the virtual environment and also become voice-enabled (block 2120), and enabling those voice-enabled users to speak with other voice-enabled users via telephone (block 2130).

The phone is not limited to any particular type. Examples of telephones include PSTN telephones (e.g., push-button telephones) and VoIP telephones including voice over internet protocol (soft) telephones.

When a user becomes speech-enabled, the user may speak with other speech-enabled users presented in the virtual environment. As a first example of being voice-enabled, a user of a traditional telephone may become voice-enabled by placing a (place) call to a service provider. As a second example, a user may become voice-enabled by receiving a call from a service provider.

According to fig. 21, a user interacts with a virtual environment to control audio characteristics in the virtual environment (block 2140). For example, volume data may be controlled. In some embodiments, the volume between one user and another is a function of the relative orientation of their respective objects and the distance between them representing the objects. In some embodiments, the presentation object also has an audio range.

Audio characteristics other than volume may also be controlled depending on how the user interacts with the virtual environment. For example, filters may be applied to sound data to add reverberation, distort sound, and the like. It is possible to change the audio characteristics of an object by applying a filter (e.g., reverberation device, room acoustics device) to the sound data of the object. Examples of changing audio characteristics include the following. When an avatar walks from a carpeted room into the stone hall, the parameters of the reverberation filter are adjusted to add more reverberation to the user's speech and the avatar's footsteps. When an avatar walks into a metal room, the parameters of the effect filter are adjusted so that the user's voice and the avatar's footsteps are distorted to sound metallic. When an avatar speaks into a virtual microphone or virtual phone, a filter (e.g., a band pass filter) is applied to the avatar's voice data so that the user's voice sounds as if the voice came from a speaker system or phone.

Reference is made to fig. 15, which illustrates a method of controlling volume data during a conference call. The method comprises the following steps: providing a virtual representation that includes objects (e.g., avatars) that present participants (i.e., users) in the teleconference (block 1510); and controlling the volume data according to how the user changes the position in the virtual representation and its relative orientation of the object (block 1520).

In some embodiments, the user's object has an audio range. The audio range limits the distance sound can be received and/or broadcast. The audio range facilitates multiple teleconferences in a single virtual representation.

Audio characteristics other than volume may also be controlled depending on how the user interacts with the virtual representation (block 1530). For example, filters may be applied to sound data to add reverberation, distort sound, and the like. Examples will be provided below.

The virtual representation is not limited to any particular type. The first type of virtual representation may be similar to the virtual metaphor shown by the singer, et al, in fig. 3-5 and 8a-8 b. 5,889,843 (graphical user interface displays icons on a flat surface, where the icons represent audio sources).

The second type of virtual representation is a virtual environment. The virtual environment includes a scene and sound. The virtual environment is not limited to any particular type of scene or sound. As a first example, the virtual environment includes a beach scene with blue water, white sand and blue sky. Further, the virtual environment includes an audio representation of the beach (e.g., wave beating coast, gull chirps). As a second example, the virtual environment includes a club scene along with bar, dance floor, and dance music (an exemplary bar scene 310 is shown in FIG. 4). As a third example, the virtual environment includes a park having a microphone and a speaker, wherein sound collected by the microphone is played through the speaker.

The virtual representation includes an object. Objects in a virtual environment have properties that enable a user to perform particular actions on them (e.g., sit, move, and open). An object in the virtual environment (e.g.,objects) may comply with a particular specification (e.g., API).

At least some of these objects represent users of the communication system 110. These objects representing the user may be images, avatars, live video, recorded sound samples, name tags, logos (logos), user profiles, etc. In the case of an avatar, live video or photos may be projected thereon. The object of the user's representation enables its user to see and communicate with other users in the virtual representation. In some cases, a user cannot see his own representation object, but can see a virtual representation (i.e., the perspective of the first person) that the object that his representation can see.

In some embodiments, the virtual representation is a virtual environment and the user is represented by an avatar. In some embodiments, the volume between one user and another is a function of the relative orientation of their avatars and the distance between their avatars. In some embodiments, the avatar also has an audio range.

Reference is made to fig. 2, which illustrates an exemplary communication system 110 for providing teleconferencing services. The teleconference service may be provided to a user having a client device (client device)120 and an audio-only device (audio-only device) 130.The client device 120 refers to a device capable of running a client and providing a graphical interface. An example of a client is

A client. Client device 120 is not limited to any particular type. Examples of client devices 120 include, but are not limited to, computers, tablet PCs, VoIP phones, game consoles, television sets with set-top boxes, certain cellular telephones, and personal digital assistants. Other examples of client devices 120 are devices running a remote login (Telnet) program.

Audio-only device 130 refers to a device that provides audio but for whatever reason does not display a virtual representation. Examples of audio-only devices 130 include traditional phones (e.g., push-button phones) and VoIP phones.

A user may utilize both client device 120 and audio-only device 130 during a teleconference. The client device 120 is used to interact with the virtual representation and assist the user in entering the teleconference. The client device 120 also interacts with the virtual representation to control volume data during the teleconference. Audio only device 130 is used to speak with at least one other user during the teleconference.

The communication system 110 includes a teleconference system 140 for hosting a teleconference. The teleconferencing system 140 may include a telephone system for establishing telephone connections with conventional telephones (communication cables and cellular), VoIP telephones, and other audio-only devices 130. For example, a user of a conventional telephone may connect to the teleconferencing system 140 by placing a call to the teleconferencing system. The teleconferencing system 140 may also include devices (e.g., a computer equipped with a microphone, speaker, and teleconferencing software) for establishing a connection with the client device 120 having teleconferencing capabilities.

A conference call is not limited to a conversation between two users. A conference call may involve many users. Also, the teleconference system 140 may host one or more teleconferences at any given time.

The communication system 110 also includes a server system 150 for providing a client (client)160 to those users having client devices 120. Each client 160 causes its client device 120 to display a virtual representation. The virtual representation provides a conveyance (venue) by which a user can enter a teleconference (e.g., initiate a teleconference, join a teleconference already in progress), even if the user knows that no other users are present in the virtual representation. Communication system 110 enables a user to listen on one or more teleconferences. Even when joining a teleconference, the user still has the ability to listen to other conferences and seamlessly leaves the teleconference and joins another teleconference. Even a user may join a chain of teleconferences (e.g., a team of people, where person C hears B and D, person D hears C and E, etc.).

Each client 160 enables its client device 120 to move its user's representation object in the virtual representation. By moving his representation object around the virtual representation, the user can move around the other representation object to listen to the conversation and meet the other user. By moving his representation objects around the virtual environment, the user can feel the view and sound provided by the virtual environment.

In a virtual environment, a user-represented object has a changeable state. For example, the avatar has states such as position and orientation. The avatar may be commanded to go (i.e., make a step-wise displacement) from its current position (current state) to a new position (new state).

Other objects in the virtual environment (not presenting the user) may have a state of gradual or abrupt displacement. Other objects in the virtual environment have changeable states. As a first example, a user may participate in a virtual volleyball game, wherein the volleyballs are represented by objects. Hitting a volleyball causes the volleyball to follow a path toward a new position. As a second example, a balloon is represented by an object. The balloon may be started from an unpressurized (e.g., current state) and gradually inflated to a fully pressurized size (new state). As a third example, the object represents a jukebox having programs (actions) such as play/stop/pause, and attributes such as volume, song list, and song selection. As a fourth example, the object represents an internet object, such as a Uniform Resource Identifier (URI) (e.g., a network address). And clicking the internet object to open the internet connection.

Different objects may provide different sounds. The sound of the jukebox may comprise different songs in the playlist. The avatar's voice may include walking voice. Moreover, even the walking sounds of different avatars may be different. For example, the walking sound of an avatar wearing high-heeled shoes is different from the walking sound of an avatar wearing sandals. The walking sounds may also vary with the terrain. For example, the sound of walking on wood floor may be different from the sound of walking on snow.

The overall goal is that one user may change his state, while the other user will experience the state change. For example, a user may turn down the volume of their jukebox, while everyone represented in the virtual representation hears the reduced volume.

The virtual environment is network accessible. For example, the virtual environment may be accessed via the internet or a Local Area Network (LAN).

A user may utilize a client device to control objects in a virtual environment. A client device refers to a device that can run a client and provide a graphical interface. An example of a client isA client. The client devices are not limited to any particular type. Examples of client devices include, but are not limited to, computers, tablet PCs, game consoles, television sets with set-top boxes, certain cellular telephones, and personal digital assistants. Other examples of client devices are devices that run a text user interface, such as a telnet. Yet another example is a mobile such as running a chat client (chat-client), such as Google-TalkA telephone (such as an iPhone).

Such a client causes its client device to display a virtual environment that includes objects. The client device generates commands and those objects are controlled in response to those commands. By moving its representation objects around the virtual environment, the user can feel the view and sound provided by the virtual environment.

By moving his representation objects around the virtual environment, the user can interact with other users. For example, a voice-enabled user may interact with another voice-enabled user by moving to the audio range of the other user. The audio range limits the distance sound can be received and/or broadcast. The audio range facilitates multiple conversations in a single virtual environment.

Generally, an interaction is a function of "closeness" between two users. Closeness may be measured in terms of the distance of two representative objects in the virtual environment. However, closeness is not so limited. Another topological metric may be used to measure closeness. For example, closeness may be the Euclidean (Euclidean) distance between two representation objects. The distance may even be a real distance between the user and another user or a real-life object. For example, the real-world distance may be the distance between a user in new york and another user in berlin. Another topological metric may measure closeness as the distance (in the form of hyperlinks) between web pages currently being viewed by two users. Yet another topological metric may measure closeness as a distance between two coordinates on a web page (e.g., a distance between two coordinates pointed to by two users using their mouse pointers) (e.g., a pixel distance).

The virtual environment may overlap with the real space. For example, a scene of a real space (e.g., a map of a city, a county, a room) is displayed. For example, a GPS equipped phone may be utilized to determine the location of a person in the real space. Users with known real locations are virtually presented by avatars in their respective locations in the virtual environment. Alternatively, a place may be real, but the location is not. Instead, the user's avatar may roam to different spaces to meet different people.

The user may also become voice-enabled via the client device. As a first example, the client device initiates a telephone connection by pressing a "call me" button where the service provider calls the user's telephone. As a second example, a client device may command a co-installed (co-installed) VoIP voice over internet protocol (e.g., via an XML socket) to establish a connection with a service provider. As a third example, an integrated client/phone, such as a geographic flash (R) client, may have built-in VoIP capability. As a fourth example, the mobile phone may run a GUI + voice application. As a fifth example, a blind user may issue a text command with a text (telnet/braille) client, based on which the service provider calls the user's phone.

If a user wants to talk to some other user who is not in the virtual environment at the time, the user may request that the service provider send an invitation to those users (e.g., via email, instant message, or SMS message). In the case of email or instant messaging, the recipient may simply click on a link in the message to download the client program and participate in the conversation (conversation).

Thus, a user may interact with other users using both the client device and the telephone. The client device is used to interact with the virtual environment and help the user meet other users. The telephone is used to speak with at least one other user. However, some phones (e.g., a particular VoIP phone) may also have the functionality of a client device.

Reference is made to fig. 3, which shows an example of how a virtual reality environment may be applied to a conference call. In this example, the service provider runs an online service that enables users to begin a teleconference session (block 200). In some embodiments, the service provider provides teleconferencing services via a website. Using a web browser, the user enters the web site and logs into the service, whereby the service provider starts a session.

After the session begins, the virtual reality environment is presented to the user (block 210). For example, if a service provider runs a website, a web browser may download and display a virtual reality environment to the user.

The virtual reality environment includes a scene and (optionally) sound. The virtual reality environment is not limited to any particular type of scene or sound. As a first example, the virtual reality environment includes a beach scene with blue water, white sand, and blue sky. In addition to this visualization, the virtual display environment includes an audio representation of the beach (e.g., wave beating the coast, gull chirps). As a second example, the virtual reality environment provides a club scene along with bar, dance floor, and dance music (an exemplary bar scene 310 is shown in FIG. 4).

Scenes in a virtual reality environment are not limited to any particular number of dimensions. The scene may be represented in two, three, or higher dimensions.

Included in the virtual reality environment are representations of the user and other users. These representations may be images, avatars, live video, recorded sound samples, name tags, logos, user profiles, etc. In the case of an avatar, live video and photos may be transmitted thereon. The service provider assigns a location for each representation in the virtual reality environment. In a virtual reality environment, each user has the ability to view and communicate with other users. In some embodiments, the user cannot see his or her own representation, but rather can see the virtual reality environment (i.e., the perspective of the first person) that his or her representation can see.

The user can control the movement of their representation around the virtual reality environment. By moving around the virtual reality environment, the user may experience different views and sounds provided by the virtual reality environment (block 220). For example, the presentation object may turn on the jukebox and select a song from the playlist. The song selected is played by the song-on-demand machine.

Reference is also made to fig. 4, which illustrates a virtual reality environment including a club scene 310. The club scene 310 includes a bar 320, and a dance floor 330. The user is represented by an avatar 340. Other users in the club scene 310 are represented by other avatars. The avatar may be moved from its current location to a new location in the virtual environment by clicking on the new location, pressing keys on a keyboard, entering text, entering voice commands, etc. Dance music is transmitted from speakers (not shown) near the floor 330 of the dance floor. As the user's avatar 340 approaches the dance floor 330, the music becomes louder. The music sound is maximized when the user's avatar 340 is in front of the speakers. When the user's avatar 340 leaves the speaker, the dance music becomes low. If the user's avatar 340 moves to the bar 320, the user hears the background conversation (which may be the actual conversation between other users at the bar 320). The user may hear other background sounds at the bar 320, such as the sound of a bar serving cups or a call. The audio representation may involve changing the audio characteristics of the speaker by applying a filter (e.g., reverberation, club acoustics) to the sound data of the object. Examples of changing audio characteristics include the following. As the avatar walks from the carpeted room into the stone hall, the parameters of the reverberation filter are adjusted to add more reverberation to the user's speech or the avatar's footsteps. When the avatar walks into the metal room, the parameters of the effect filter are adjusted so that the voice of the user and the footstep sound of the avatar are distorted into a metallic sound. When an avatar speaks into a virtual microphone or virtual phone, a filter (e.g., a band pass filter) is applied to the avatar's voice data so that the user's voice sounds as if the voice came from a speaker system or phone.

The user may not be aware of any other users represented in the club scenario 310. However, the user may enter a teleconference with another user by becoming speech-enabled and bringing his avatar 340 into proximity with the other user's avatar (once the two avatars are within audio range of each other, the users may speak to each other). Users may utilize their audio-only devices 130 to speak with each other (each audio-only device 130 connects with the teleconference system 140 so that the teleconference system 140 completes the connection between the audio-only devices 130). The user may command his avatar 340 to leave the teleconference, hover in the club scene 310, and approach other avatars to listen to other conversations and speak with others. The user may listen to one or more conversations simultaneously. Even when joining one conversation, the user has the ability to listen to other conversations and seamlessly leave one conversation and join another. The user may even join the chain of conversations (e.g., a team of people, where person C hears B and D, person D hears C and E, etc.).

The communication system 110 may host multiple virtual representations simultaneously. The communication system 110 may host multiple teleconferences in each virtual representation. Each teleconference may include two or more individuals.

If more than one virtual representation is available to a user, the user may move in and out of the different virtual representations. Each of these virtual representations is uniquely addressable via a unique telephone number. The server system 150 may then place each user directly into the selected virtual representation.

The user may subscribe to and enter the private virtual representation to maintain a private conversation. The user may also subscribe to and enter a private area of the virtual representation to maintain a private conversation.

This interaction is different from that of a conventional teleconference. In a traditional teleconference, several parties schedule (schedule) the teleconference in advance. When time comes, the participants call a number, wait for verification, and then talk. When these participants complete the conversation, they hang up. In contrast, a conference call according to the present invention is dynamic. Multiple teleconferences may occur between different groups of people. The conference call can occur without the need for pre-planning. A user may listen to one or more teleconferences simultaneously, enter and leave a teleconference as desired, and jump from one teleconference to another.

The virtual reality environment just described is considered "immersive". An "immersive" environment is defined herein as an environment with which a user can interact.

Reference is again made to fig. 3. The user may also move their representation around the virtual reality environment to participate in other users in the virtual reality environment (block 220). The user's representation may be moved by clicking a location in the virtual reality environment, pressing a key on a keyboard, pressing a key on a phone, entering text, entering a voice command, etc.

There are a number of ways in which the user may participate in other users in the virtual reality environment. One way is by wandering around the virtual reality environment and listening for a conversation already in progress. The user may hear speech and other sounds as the user moves their representation around the virtual reality environment.

Another way for the user to participate in other users is through text messages, video chat, and the like. Another way is by clicking on a representation of another user, thereby displaying a personal profile. The profile provides information about the person behind the representation. In some embodiments, images of other users nearby (e.g., personal profile photos, live webcam feeds) may appear automatically.

Yet another way is to become voice-enabled via the phone (block 230). Becoming voice-enabled enables the user to conference call with other voice-enabled users. For example, the user wants to take a conference call with a telephone. The phone may be a conventional phone or a VoIP phone. To enter a conference call, the user may call a service provider. When a call is made through a traditional telephone, the user may call the virtual reality environment (e.g., by calling a unique telephone number, or by calling a universal number and entering the user ID and PIN via DTMF, or by entering a code that the user can find on a web page).

When a call is made through a VoIP phone, the user may call the virtual reality environment by calling its unique SIP address. The user can authenticate by attaching a certificate (credential) to the SIP address.

If the service provider can recognize the user's phone number, the service provider may join the phone call with the ongoing session (block 232). If the service provider cannot recognize the user's telephone number, the user initiates a new session via the telephone (block 234). For example, the presentation object may turn on the jukebox and select a song from the playlist. The jukebox will play the selected song and the service provider will then merge the new telephone session with the already ongoing session (block 236).

The user does not call the service provider and the user may request that the service provider call the user (block 238). For example, the toolbar includes a "CALL (CALL) button" that the user clicks to become voice-enabled. Once voice-enabled, the user may walk to another voice-enabled user and immediately begin talking. A phone icon above the avatar's head may be used to indicate that its user is voice-enabled, and/or another graphical symbol (sign), such as a sound wave, may be displayed near the avatar (e.g., in front of its face) to indicate that it is speaking or making other sounds.

In some embodiments, the user has the option to become voice-enabled immediately after the session is started (block 230). This option enables the user to immediately enter a conference call with other voice-enabled users (block 240). The voice-enabled user may even call a person who has not yet entered the virtual reality environment, pulling that person into the virtual reality environment (block 240). Once voice enabled (block 230), the user remains voice enabled until the user interrupts the call (e.g., hangs up the phone).

In some embodiments, the user may connect to the service provider using only a single device 120 (e.g., a computer with a microphone and speaker, a VoIP phone) that is both manipulable virtual reality environment and also used for teleconferencing. For example, the user connects to a website via the internet, is automatically voice-enabled, interviews other users in the virtual reality environment, and enters a conference call (as indicated by the line from block 210 directly to block 240).

VoIP provides certain advantages. VoIP over broadband connections enables truly seamless persistent connections that allow users to randomly "hang out" in one or more environments for long periods of time. From time to time, an interest may be heard, or someone's voice may be recognized so that the user may be more attentive and walk directly through the chat. Yet another advantage of VoIP is that stereo connections can be easily established.

In some embodiments, the service provider runs a website, but allows the user to log into the teleconference service and enter the teleconference without accessing the website (block 260). The user may only have access to a button phone or other device 130 that does not have access to the website or display the virtual reality environment. Or the user may have access to a single device (e.g., a cellular telephone) that can access a website or can make a telephone call (but not both). Consider a conventional telephone. With only the phone, the user can call a phone number and connect to the service provider. The service provider may then create a representation of the user in a virtual reality environment. Via telephone signals (e.g., DTMF, voice control), the user can move his representation around the virtual reality environment, listen to other conversations, meet others, and feel the sound (rather than the field of view) in the virtual reality environment. Although the user cannot see its representation, other users who visit the website may see the user's representation.

A conference call is not limited to a conversation between one user and another user (e.g., a single person). A conference call may involve many other users (e.g., groups). Also, other users may be added to the conference call when they meet and participate in those users already in the conference call. And once engaged in a teleconference, one person has the ability to "listen" to other teleconferences and seamlessly leave one teleconference and join another teleconference. A user can even join a chain of teleconferences (e.g., a team of people, with person C listening to B and D, person D listening to C and E, etc.).

If more than one virtual reality environment is available to a user, the user may move in and out of these different environments and thus meet even more different groups of people. Each virtual reality environment is uniquely addressable via an internet address or a unique telephone number. The service provider may then place each user directly into the selected target virtual reality environment. A user may subscribe to and enter a private virtual reality environment and conduct a private conversation. The user may also subscribe to and enter a private area of the public environment for private conversations. A web browser or other graphical user interface may include a common bar or other means for indicating different environments available to a user. The toolbar allows a user to move in and out of different virtual reality environments, as well as to book and enter private areas of the virtual reality environment.

A service provider may host multiple teleconferences in a virtual reality environment. A service provider may host multiple virtual reality environments simultaneously. In some embodiments, one user may be in more than one virtual reality environment at the same time.

Reference is now made to FIG. 5, which illustrates a state diagram of a virtual reality environment (the arrows indicated indicate actions in the diagram). The state of the virtual reality environment may be persistent in that it exists continuously throughout many user sessions, and it exists continuously throughout actions of different users. This allows the virtual reality environment to be modified by one user while the modification is seen by the other users. For example, graffiti can be written on a wall, light switches in a virtual reality environment can be turned on and off, and so forth.

The user may add, remove, and move objects in the virtual reality environment. Examples of objects include sound sources (e.g., music boxes, bubbling fish tanks), data objects (e.g., modifiable books with text and pictures), visualized music objects, and so forth. Objects may have properties that allow a user to perform specific actions on them. The user may sit on a chair, open a window or operate a jukebox. The object may also have a personal profile. For example, a car in a virtual exhibition hall may have a make, model, year, top speed, number of cylinders, and so forth.

The persistent state also enables "things" to be placed on top of other things. The file may be sent to the user or placed on the floor as a way to share the file with the user. Music or sound files may be placed on the jukebox. Pictures or video may be placed on the projection device to trigger playback/display. A multimedia sample (e.g., an audio clip or video clip containing a message) can be "pinned" to the whiteboard.

The persistence status also allows for meta-representation (meta-representation) of the file. The meta representation may be an icon that provides a preview of the actual file. For example, an audio file may be represented as a compact disc; the image file may be represented as a small picture (which may be a frame), and so on.

The virtual reality environment may overlap with the real space. For example, a scene of a real place (e.g., a map of a city or a countryside, a room) is displayed. For example, the location of a person in the real-world location may be determined using a GPS phone. The participating people, whose real-world locations are known, are represented by avatars in their respective locations in the virtual reality environment. Alternatively, the place may be real, but the location is not. Instead, the user's avatar loiters to a different place to meet a different person.

Different virtual reality environments may be linked together. The virtual reality environments may be linked to form a continuous open environment, or different virtual reality environments may be linked together in the same manner as web pages are linked together. There may be a link from one virtual reality environment to another. There may be a link from the virtual reality environment, object, or avatar to the network (or vice versa). As an example, a link from a user's avatar may lead to a web version of the user's personal profile. Links from web pages or unique telephone numbers may lead to the user's favorite virtual reality environment or jukebox playlist.

Reference is now made to fig. 6, which illustrates how a user experiences audio in a virtual reality environment. The user has a location in the environment and establishes an audio connection with the location.

At block 510, the locations of all sound sources in the virtual reality environment are determined. The sound source includes representations of objects (e.g., jukeboxes, speakers, streaming water) in the virtual reality environment, those users that are talking.

At block 512, the closeness of each sound source to the user representation is determined. Closeness is a function of the topological metric. In a virtual reality environment, the metric may be the Euclidean distance between the user and the sound source. The distance may even be a real distance between the user and the source. For example, the real-world distance may be the distance between a user in new york city and a sound source (e.g., another user) of berlin.

At block 514, the audio streams from the sound sources are weighted as a function of closeness to the user representation. Sound sources closer to the user representation get higher weight (more sound) than sound sources further away from the user representation.

At block 516, the weighted streams are combined and presented to the user. The sounds from all sources available to the user are processed (e.g., isolated, filtered, phase shifted) and mixed together and provided to the user. The sound does not include the user's own voice. The audio range of the user and each sound source may have a geometric shape or a shape simulating real life attenuation.

Reference is also made to fig. 7, which illustrates performing additional attenuation of sound in a virtual reality environment using audio ranges. Representation object position of a userAt position P_WAnd three other objects are located at position P_X、P_YAnd P_Z. Suppose MIX_WIs position P_WThe sound heard by the user. In a simplified acoustic model, MIX_WCan be expressed as

MIX_W＝aV_X+bV_Y+cV_Z

Wherein, V_X、V_YAnd V_ZIs from position P_X、P_YAnd P_ZAnd wherein a, b and c are sound coefficients. In the simplified model, sound data V_XIs adjusted by a coefficient, and sound data V_YIs adjusted by a coefficient b, and sound data V_ZIs adjusted by a coefficient c.

The value of each coefficient may be inversely proportional to the distance between the respective sound source and the respective object of the user. Also, the closer the user's object and sound source move, the larger the sound becomes, and the farther the user's object and sound source move, the smaller the sound becomes. The server system generates these sound coefficients. However, the volume control is not limited to topological measures such as distance. That is, the closeness of two objects is not limited to distance.

Each object may have an audio range. The audio frequency range is used to determine whether the sound is cut off. Position P_WAnd P_ZThe audio range of the object of (a) passes through the circle E_WAnd E_ZTo indicate. Position P_XAnd P_YThe audio range of the representation of (a) passes through an ellipse E_XAnd E_YTo indicate. The oval audio range indicates that the sound from its audio source is directional or asymmetric. A circle indicates that the sound is omnidirectional (i.e., projected equally in all directions).

In some embodiments, when position P_ZIn the range E_WWhen other than that, the coefficient c is 0, and when the position P is_XAnd P_YIn the range E_WWhen within, the coefficient a is 1 and b is 1. In thatIn other embodiments, the coefficients may vary between 0 and 1. For example, at the perimeter (perimeter) of the range, the coefficient may equal zero, at the location of the user representing the object, the coefficient may equal one, and between the perimeter and the location, the coefficient equals a decimal value.

In some embodiments, the topological metric may be used in conjunction with an audio range. For example, as the distance between the source and the user's presentation object increases, the sound becomes smaller and cuts off as soon as the sound source is out of range.

The audio range may be a reception range or a broadcast range. If it is a reception range, the user will hear other sources within the range. Thus, the user will hear an indication that the object is at position P_XAnd P_YOther users of (2), due to the audio range E_XAnd E_YAnd range E_WAnd (4) intersecting. The user does not hear the indication that the object is at position P_ZDue to the audio range E_WNot in the range E_ZAnd (4) intersecting.

If the audio range is a broadcast range, the user hears those sources in the broadcast range in which he is located. Thus, the user will hear an indication that the object is at position P_XDue to the position P_WIn an ellipse E_XWithin. The user does not hear the indication that the object is at position P_YAnd P_ZDue to the position P_WIn an ellipse E_YAnd E_ZAnd (c) out.

In some embodiments, the audio range of the user is fixed. In other embodiments, the audio range of the user may be dynamically adjusted. For example, if the virtual environment becomes crowded, the audio range may be reduced. Some embodiments may have functionality that allows for private conversations. This function may be achieved by reducing the audio range (e.g., to whisper) or by forming a broken "sound bubble". Some embodiments may have a "do not disturb" function, which may be achieved by reducing the audio range.

Different audio frequency ranges may have different shapes and sizes, different attenuation functions, directionality/orientation, state-dependent attenuation, and so on.

With respect to objects representing users, avatars provide certain advantages over other types of objects. The avatar enables one user to interact with another user.

In some embodiments, metrics (metrics) may be used in conjunction with audio ranges. For example, as the distance between the source and the user increases, the sound will decrease, and once the audio source is out of range, the sound will cut off.

In some embodiments, the sound from the user may be projected equally in all directions (i.e., the sound is omnidirectional). In other embodiments, the sound projection may be directional or asymmetric.

The user representation is not limited to avatars. However, avatars offer certain advantages. The avatar enables one user to meet another user through intuitive actions. All the user needs to do is to control his avatar to go to another avatar and face it. The user may then introduce himself and invite another user to the conference call.

Another intuitive action is achieved by controlling the gesture (gettrue) of the avatar. This may be done to convey information from one user to another. For example, the gesture may be accomplished by pressing a button on a keyboard or keypad. The different buttons may correspond to gestures such as waving hands, kissing, smiling, frowning, etc. In some embodiments, the user's gestures may be monitored via a webcam, corresponding control signals may be generated, and these control signals may be transmitted to the service provider. The service provider may then utilize those control signals to control the pose of the avatar.

Yet another intuitive action is achieved through the orientation of the two avatars. For example, the volume between two users may be a function of the relative orientation of the two avatars. Avatars facing each other will sound more clearly to each other than if one avatar faces away from the other avatar, and will sound much more clearly to each other than if two avatars facing in different directions.

Reference is made to FIG. 8, which shows two avatars A and B facing in the direction of the arrows. If the angles α and β between the avatars' yaw angles, and their connecting line AB, are equal to zero, avatars A and B are directly facing each other. Assume avatar a is the speaker and avatar B is the listener. The value of the decay function may vary differently for changes in alpha and beta. In this case, the attenuation is asymmetric. One advantage of orientation-based attenuation is to allow a user to engage in one conversation while at the same time listening to other conversations at will.

The attenuation may also be a function of the distance between avatars a and B. The distance between avatars A and B may be taken along line AB.

The acoustic model may be based on the direction, orientation, distance, and state of objects associated with sound sources and sound leaks (sound sources). As an example of a state, the volume or audio range of the sound data may be decreased if the object is in a whisper mode, or may be increased if the object is in a roar mode. If the object is in the do not disturb mode, the volume heard by the object or the reception range of the object may be reduced. The acoustic model may also take into account other factors that affect the volume of the acoustic data. For example, the user's broadcast audio range may increase when he is detected shouting, and the user's broadcast range may decrease when he is detected whispering.

Suppose V_dw(t) is at position P_WThe sound heard by the user represented by the object at (a) is associated with the sound leakage w. In this model, V_dw(t) can be expressed as

Wherein,

wherein,

is the leakage gain of the acoustic leakage w,

S_maxis the total number of sound sources in the environment,

is the sound produced by the sound source n,

is the source gain of the sound source n,

f_wn(d_nw，α_nw，β_nw，u_n，u_w) Is a decay function that determines how the source n decays with respect to the drain w,

d_nwis the distance between w and n,

α_nwis the angle between the direction of sound emission (the direction of speech) and the line connecting the user w and the sound source n, and

β_nwis the angle between the connecting line of the user w and the sound source n and the sound reception direction (listening direction),

u_nis the state of the object associated with sound source n, an

u_wIs the state of the object associated with the sound leakage w.

State u of an object associated with sound source n_nAny other factor or set of factors affecting the volume from sound source n is reflected. For example, if the object associated with sound source n is in whisper mode, state u_nThe volume may be reduced or if the object associated with sound source n is in the big roar mode, state u_nThe volume may be increased. Similarly, an object u associated with the acoustic leak w_wReflects any other factor or set of factors that affect the volume heard by the sound leakage w. For example, if an object associated with an acoustic leak w is in a non-disturbing mode, then the stateu_wThe volume heard by the sound leakage w can be reduced.

Reference is made to fig. 9 and 10, which illustrate a first method of controlling the volume of sound data in a conference call. The server system generates sound coefficients, and the teleconference system uses the sound coefficients to change the audio characteristics (e.g., audio volume) of the sound data going from the sound source to the sound sink. Sound leakage refers to a representative object of a user that can hear sound in a virtual environment. The sound coefficients may change the audio volume or other audio characteristics as a function of the closeness of the sound source and sound leakage.

Providing a virtual environment (block 710), and establishing a telephone connection with a plurality of users (block 720). These users are represented by objects in the virtual environment. The representation object of each user may be both a sound leakage and a sound source.

At block 730, the locations of all sound sources and sound leaks in the virtual environment are determined. The sound source includes objects (e.g., jukeboxes, speakers, streaming water, user's presentation objects) that can provide sound in the virtual environment. The sound source may be multimedia from an internet connection (e.g., audio from YouTube video).

The following functions are performed for each sound leak in the virtual environment. At block 740, the closeness of each sound source to the leak is determined. This function is performed for each sound leak in the virtual environment. The server system may perform this function because it tracks the state of the object.

At block 750, the coefficients for each drain/source pair are calculated. Each coefficient varies the volume from the source as a function of proximity to the drain. Closeness is not limited to distance. This function may also be performed by the server system, since information about the closeness of the object is maintained. The server system provides the sound coefficients to the teleconference system.

If the drain is outside the audio range of the source (in the case of a broadcast range), then the sound from the source to the drain may be cut off (i.e., not heard). The sound coefficients will reflect this cut-off (e.g., by being set to zero or close to zero). The server system may determine the scope and whether a cut-off occurred because of its managed object state.

At block 760, the sound data from each sound source is adjusted according to its respective coefficients. Thus, sound data from a sound source is weighted as a function of closeness to the sink.

At block 770, the weighted sound data is combined and sent back to the user over the telephone line or VoIP channel. Accordingly, an auditory environment is synthesized according to sounds of different objects, so that the user hears the synthesized environment.

The processing at block 730 and 750 is performed continuously because the position, orientation, and other states in the virtual representation are continuously changed. The processing at block 760-770 is also performed continuously since the sound data is flowing continuously (e.g., in 100ms chunks).

Consider a virtual environment in which each of the n leaks has n sound sources. The computational effort to mix the sound data from all n sources per drain will be about n²(i.e., O (n)²)). This poses a significant measurement problem, especially for large teleconferences and dense populations.

Reference is now made to fig. 11. Any of the following methods, alone or in combination, may be used to reduce the computational burden.

At block 1010, for each sink, only those sound sources are mixed sound data that contribute significantly. As a first example, the subset includes the largest sound sources (i.e., those having the highest coefficients). As a second example, the subset includes only those representation objects to which the user actually converses.

As a third example, inactive sound sources (i.e., sound sources that do not provide sound data) are excluded. If the user's object is not voice-enabled, it may be excluded. The jukebox may be excluded if its play feature is turned off.

At block 1008, the audio range of the particular object may be automatically set at or near zero such that its coefficients are set at or near zero. Sound data from those objects may be excluded at block 1010.

At block 1020, a minimum distance between objects may be enforced. This policy will prevent users from forming a dense crowd.

The teleconferencing system may also pre-mix the sound data of the sound source groups at block 1030. For sound leakage, the pre-mixed sound source group sound data may be mixed with other audio data. An example of premixing is shown in fig. 12 c.

At block 1040, the teleconferencing system may make a direct connection between the source and drain, in addition to or without sound mixing as shown in fig. 9 and 10 (i.e., without generating a synchronous environment). This may be done if the server system determines that the two users can mainly only listen to each other. Making a direct connection may save computational power and reduce latency.

Reference is now made to fig. 12a, which shows a fleet of sound sources (source 0 to source 3) and five objects (drain 5 to drain 9) listening to those sound sources. The five leaks (leak 5 to leak 9) are in different positions relative to the team of sound sources.

Fig. 12b shows a sound mixer that mixes sound data from the team of sources (source 0 to source 3) without pre-mixing. Each sound source (source 0 to source 3) has coefficients for each sound leakage (these coefficients are represented by filled circles and provide exemplary values). The acoustic mixer 1110 performs four mixing operations for each acoustic leak, for a total of 20 mixing operations.

Fig. 12c shows an alternative sound mixer 1120 that pre-mixes sound data from this team of sources (source 0 to source 3). Sound sources (source 0 to source 3) are grouped together, and a sound mixer 1120 mixes sound data from the group. Four mixing operations are performed during premixing.

Acoustic mixer 1120 calculates a single coefficient for each drain and performs a mixing operation on each drain. The value of the coefficient may be a function of the distance from its drain to the group (e.g., the distance from the drain to the centroid of the group). Thus, acoustic mixer 1120 performs an additional five mixing operations, for a total of nine mixing operations.

The coefficients of the individual sound sources that premix the sound data into a group may be determined relative to a particular point (such as a centroid) (such coefficients may be indicated by values 0.8, 0.9, and 0.8) or some other metric. Alternatively, these values may be set to one, which means that each drain will hear the same volume from each sound source (source 0 to source 3). However, different leaks may also hear different volumes from the group (as indicated by different coefficients 0.97, 0.84, 0.75, 0.61, and 0.50).

The sound sources may be grouped in a manner that minimizes mixing operations but also maintains the deviation from the ideal sound (i.e., the sound without premixing) at an acceptable level. Various clustering algorithms (clustering algorithms) may be used to group sound sources (e.g., K-means algorithms; or by iteratively clustering neighbors that are closest to each other).

Additional sources may be mixed without premixing. Fig. 12c shows a fifth sound source (source 4) which is not grouped with this team of sound sources. For leaks 3 and 7, the fifth sound source is assigned its own coefficients. Thus, for drain 3, a single blending operation is performed, while for drain 7, two blending operations are performed.

Reference is now made to fig. 13, which illustrates different activities available to a service provider. The connection is not limited to audio sources. A connection may also be made with a multimedia source (block 1310). Examples of such multimedia include, without limitation, video streams, text chat messages, instant messaging messages, avatar gestures or movements, emotional expressions, emoticons, and web pages.

A multimedia source (e.g., viewing, listening) may be displayed from the virtual reality environment (block 1320). For example, a video clip can be viewed on a screen of a virtual reality environment. Sound may be played from the virtual reality environment.

The multimedia source may be viewed in a separate pop-up window (block 1330). For example, another instance of a web browser is opened and a video clip is played therein.

The virtual reality environment facilitates sharing of multimedia (block 1340). Multiple users can share a media presentation (e.g., view it, edit it, browse, listen to it) and discuss the presentation via a teleconference at the same time. In some embodiments, one of the users may control the presentation of the multimedia. This feature allows all browsers to be synchronized so that all users can view the presentation at the same time. In other embodiments, each user controls the presentation, although the browsers are not synchronized.

The multimedia connection can be shared in a number of ways. One user may share a media connection with another user by dragging and dropping the media representation onto the other user's avatar, or by having their avatar deliver the media representation to the other user's avatar.

As a first example, the first user's avatar releases a video file photograph or file onto the second user's avatar. Both the first user and the second user can then view the video in a browser or media player while discussing it via a teleconference.

As a second example, the avatar of the first user releases the URL on the avatar of the second user. Each user's web browser opens and downloads the content at that URL. The first user and the second user may then co-browse the content while discussing the content via a teleconference.

As a third example, one user presents things to surrounding avatars. All users within range can see the presentation (however, they may first be asked if they want to see the presentation).

Multimedia connections offer another advantage: which enables phones and other devices without browsers to access content on the internet. For example, the multimedia connection may provide streaming audio to the virtual reality environment. Streaming audio will be an audio source with a specific location in the virtual reality environment. A user with only a standard telephone may wander around the virtual reality environment and find the audio source. Thus, the user can hear the streaming audio over the telephone.

Reference is now made to fig. 14. The service provider 1400 may provide other services. One service is to assign a particular virtual reality environment to a user based on the user's characteristics (block 1410). The characteristic may be a parameter in the user's profile, or the user's interests, or the user's mood, or some other characteristic.

A user may have multiple personal profiles. Each profile represents a different aspect of the user. Different profiles allow the user to access a particular virtual reality environment. The user may switch between multiple profiles during the session.

The personal profile may state a need. For example, a personal profile may reveal that the user is purchasing a car. The user may be automatically assigned to a virtual exhibition hall including a representation of a car and a representation of a salesperson.

In some embodiments, user profiles may be made public so that they may be viewed by other users. For example, a first user may click on an avatar of a second user, and the second user's personal profile appears as an introduction. Alternatively, the first user may wander in the virtual reality environment looking for a person to meet. The first user may learn about the second user by clicking on the avatar of the second user. In response, the second user's profile will be displayed to the first user. If the personal profile does not reveal the user's actual name and phone number, the second user remains anonymous.

Another service is to provide agents (e.g., operators, security, experts) that provide services to those users in the virtual reality environment (block 1420). As a first example, a user may talk while watching a movie while an agent is looking for information about the cast. As a second example, a user chats with another person, and the person requests an agent to find something using a search engine. As a third example, the agent alone identifies participants who appear to match and introduces them to each other.

Another service is to provide a video chat service (block 1440). For example, a service provider may receive webcam data from different users and associate the webcam data with different users so that the webcam data of one user can be viewed by a particular other user.

Yet another service is to host different functions in different virtual reality environments (block 1430). Examples of different functions include, but are not limited to, social networks, business meetings, business-to-business services, business-to-customer services, commercial exchanges, meetings, work and entertainment venues, virtual stores, promotional gifts, online gambling and entertainment, virtual games and entertainment shows, virtual schools and universities, online education, tutoring sessions, karaoke, pluggable (team) games, entertainment, bonus confrontations, clubs, concerts, virtual galleries, theaters, and demonstrations and any scenes available in real life. The virtual reality environment may be a user hosting a television show or movie.

Reference is made to fig. 16, which illustrates an exemplary network-based communication system 1600. The communication system 1600 includes a VE server system 1610. "VE" refers to a virtual environment.

The VE server system 1610 hosts a website that includes web pages, images, videos, or other combinations of digital resources (collections). The VE server system 1610 includes a web server 1612 for serving web pages, and a media server 1614 for storing video, images, and other digital resources.

One or more of these web pages embed the client file.

Client's files, e.g., provided by web server 1612

Objects (. swf files) (some of which may be dynamically downloaded when they are needed) are composed.

Clients are not limited to

A client. Other browser-based clients include, but are not limited to, Java^TMSmall application program,

Silverlight^TMClient, NET applet,

A client, a script such as JavaScript, and the like. Even downloadable, installable programs may be used.

Using a web browser, the client device downloads the web page from web server 1612 and then the embedded client file from web server 1612. The client file is downloaded to the client device, thereby initiating the client. The client begins running the client file and downloads the remainder of the client file (if any) from the web server 1612.

The entire client, or a portion thereof, may be provided to the client device. Consider that includesPlayer and one or moreOf objectsAn instance of a client.

The player is already installed on the client device. When the swf file is sent to and downloaded to

When the server is in the state of the server,the player causes the client device to display the virtual environment. The client also accepts input (e.g., keyboard input, mouse input) instructing the user's representative objects to move around and feel the virtual environment.

The server system 1610 also includes a world server 1616. The "world" refers to all virtual representations provided by the server system 1610. When the client begins to run, it opens a connection to the world server 1616. The server system 1610 selects a description of the virtual environment and sends the selected description to the client. The selected description contains links to graphics or other media in the virtual environment. The description also contains the coordinates and appearance of all objects in the virtual environment. The client downloads media (e.g., images) from the media server 1614 and projects the images (e.g., in isocratic, 3-D).

The client displays the objects in the virtual environment. Some of these objects are user representation objects, such as avatars. The animated view of the object may include pre-rendered images or instantly rendered 3D models and text, i.e., the object may be downloaded as a separate object

Object, parameterized generalizations

Objects, images, movies, 3D models optionally include text and animation. The user may have a unique/personal avatar or a shared general avatar.

The objects can be downloaded as needed, which can shorten the initial download time. For example, a low quality or general representation may be downloaded first when the avatar is far away from another object, while a higher quality representation may be downloaded later when the avatar walks into the object.

When a client device wants an object to move to a new location in the virtual environment, its client determines the coordinates of the new location and the desired time to start moving the object and generates a request. The request is sent to the world server 1616.

The world server 1616 receives the request and updates the data structure representing the "world". The world server 1616 manages each object state on one or more virtual environments and updates the changed state. Examples of states include avatar states, objects they carry, user states (accounts, permissions, rights, audio range, etc.), and call management. When a user commands an object in the virtual environment to a new state, the world server 1616 commands all clients represented in the virtual environment to transition the state of the object so that the client devices display the object in approximately the same state at approximately the same time.

The world server 1616 may also manage objects that transition gradually or abruptly. When a client device commands an object to transition to a new state, the world server 1616 receives the command and generates an event that causes all clients to show the object in the new state at a particular time.

The communication system 1600 also includes a teleconferencing system 1620. Some embodiments of the teleconferencing system may include a telephony server 1622 for establishing calls with conventional telephones. For example, telephony server 1622 may include PBX and ISDN cards for connecting to customers using traditional telephones (e.g., push button telephones) and digital telephones. The telephony server 1622 may include a mobile network or analog network connector. These cards act as the terminal side of the PBX or ISDN line and, in conjunction with associated software, perform all low level signaling for establishing a telephone connection. Events (e.g., ringing, connecting, disconnecting) and audio data in chunks (e.g., 100 ms) are passed from the card to the sound system 1626. The sound system 1626, among other things, mixes audio between users in the teleconference, mixes any external sounds (e.g., jukebox sounds, people walking, etc.), and passes the mixed (missed) chunks back to the card and thus to the user.

Some embodiments of the teleconferencing system 1620 may transcode a call to VoIP or receive a VoIP stream directly from a third party (e.g., a telecommunications system). In those embodiments, the events will not originate from these cards, but will originate transparently from the IP network.

Some embodiments of the teleconferencing system 1620 may include a VoIP server 1624 that establishes a connection with a user who utilizes a VoIP incoming telephone call. In this case, a client (e.g., client 160 of fig. 1) may contain functionality by which it attempts to connect to a VoIP voice over ip pure audio device using, for example, an xml-socket connection. If the client detects a VoIP phone, it enables VoIP functionality for the user. The user then causes the client to establish a connection (e.g., by clicking a button) by sending a CALL (CALL) command (while including information needed to authenticate the VoIP connection) via the socket to the VoIP phone calling VoIP server 1624.

The world server 1616 associates each authenticated VoIP connection with a client connection. The world server 1616 associates each PBX connection that is authenticated with a client connection.

For devices enabled to run a telnet session, a user may establish a telnet session to receive information, questions, and options, and enter commands. For telnet enabled devices, the apparatus 1617 may provide a written description of the virtual environment.

The telephone system 1622 may also allow a user of the audio-only device to control objects in the virtual environment. A user with only a pure audio device may experience sound in the virtual environment and speak to other users, but may not see the view in the virtual environment. The telephone system 1622 may utilize telephone signals (e.g., DTMF, voice commands) from the telephone to control the actions of its respective representation in the virtual environment.

The audio-only device generates signals for selecting and controlling objects in the virtual representation, and the telephone system 1622 interprets these signals and notifies the server system to take action, such as changing the state of the objects. These signals may be, by way of example, dial-tone (DTMF) signals, voice signals, or some other type of telephony signals. Consider a dial tone telephone. A particular button on the phone may correspond to a command. A user with a push button phone or a DTMF enabled VoIP phone may execute a command by entering the command using DTMF buttons. Each command may be provided with one or more arguments (arguments). The argument may be a telephone number or other sequence of digits. In some embodiments, voice commands may be translated and used.

The server system may further comprise means 1617 for providing an audio description of the virtual environment. For example, the virtual environment may be described to a user from the perspective of the user's avatar. Objects closer to the user's avatar may be described in more detail. The description may include or omit details to maintain a nearly constant overall description length. The user may request a more detailed description of the particular object, upon which additional details are revealed. The server system may also generate an option audio description in response to the command. The teleconferencing system mixes the audio description (if any) with the other audio and provides the mixed sound data to the user's audio-only device. For telnet enabled devices, the apparatus 417 may provide a written description of the virtual environment. For other audio-only devices, the apparatus 417 may include a speech synthesis system for providing a description of speech that is heard on the audio-only device.

The sound system 1626 may play a sound clip, such as sound in a virtual environment. The sound clip is synchronized with the state changes of these objects in the virtual environment. The sound system 1626 starts and stops sound clips at the state transition start and stop times indicated by the world server 1616.

The sound system 1626 may mix the sound of the virtual environment with audio from the teleconference. The sound mixing is not limited to any particular method and may be performed as described above. The teleconferencing system may receive a list of patches, a set of coefficients, and scrutinize the list. The teleconferencing system may also utilize heuristics (hearistics) to determine if it has enough time to patch all connections. If not enough time is available, the packet is released.

The VE server system 1610 may also include one or more servers that provide additional services. For example, a network container (web container)1610 may be used to implement a servlet (servlet) and Java Server Page (JSP) specification to provide a Java code environment that operates in conjunction with a network server 1612.

Fig. 29 illustrates several other services that may be provided by the VE server system 1610 in addition to or in place of the services illustrated in fig. 16.

The service repository 1632 provides information about other services provided by the VE server system.

Content store 1634 provides information for booting (bootstrap) and configuring the client at startup and runtime, such as how to set up the virtual environment, dependencies between modules, and behavior of objects. Preferably, this is done using a domain specific language (domain specific language).

Object store 1636 provides information to clients about the environment and the objects it contains. To this end, it holds information about the object in a format (e.g., XML format) that allows different object implementations or different versions thereof to use and change it.

Authentication service 1641 changes the user's credentials (credential), such as nickname and password, and provides the client with a token that can be used to authenticate other services. The token may be stored in a small piece of information (cookie).

The configuration service 1642 provides user profile data and actions, such as requesting a user to be a friend of a person based on the user's profile.

Account services 1643 provide information about the available funds for the user.

The room service 1644 manages rooms by providing a method for customers to enter or leave virtual environments. Furthermore, it tracks all state changes within the virtual environment. All current states of the room and avatar may be retrieved by the client to obtain the current snapshot. If the user wants to log into a room that is not currently being manipulated, the service opens the room so that the user will log in. Clients may connect to a messaging service to obtain notifications about room changes and send the changes to other clients. It also calculates the sound coefficients it sends to the sound system 1626 and controls the playback of the audio samples.

Call services 1645 provide information about the phone rate and may be used by clients to initiate phone connections.

Mail service 1646 can be used by clients to send messages to a particular set of destinations (e.g., a service that further processes those messages).

All servers in communication system 1600 may run on the same machine or be distributed across multiple different machines. The communication may be performed by remotely invoking a call. For example, HTTP or HTTPs-based protocols (e.g., SOAP) may be used by the server(s) and network connection device to transport and communicate with clients.

Reference is made to fig. 9 and 24, which illustrate a first method for mixing sound. The world server 1616 generates data, such as sound coefficients, which the sound system 1626 uses to change audio characteristics (e.g., audio volume). The sound coefficients or other data change the audio volume or other audio characteristics as a function of the closeness of the object pair.

At block 2410, the locations of all sound sources in the virtual environment are determined. The sound source includes an object (e.g., a jukebox, a speaker, a streaming water flow) in the virtual environment. Sound sources also include those users that are talking. The sound source may be multimedia from an internet connection (e.g., audio from YouTube video).

The following functions are performed for each drain in the virtual environment. A leak refers to a representation of a user that can hear sounds in a virtual environment. At block 2420, the closeness of each source to drain is determined. This function is performed for each sound leak in the virtual environment. Closeness is not limited to distance. World server 1616 may perform this function because it maintains information about the location of sound sources.

At block 2430, coefficients for each drain/source pair are generated. Each coefficient changes the volume from the source as a function of source-to-drain closeness. This function may also be performed by the world server 1616, as it maintains information about the location of the object. The world server 1616 provides the sound coefficients to the sound system 1626.

If the source is outside the audio range of the drain, the sound from the source to the drain may be cut off (i.e., inaudible). The coefficient will reflect this cut-off (e.g., by being set to zero or close to zero). The world server 1616 can determine the scope, and whether a cut-off occurred, because it tracks the object state.

At block 2440, the audio streams from the audio sources are weighted as a function of closeness to the sinks, and the weighted streams are combined and sent back to the user over the telephone line or VoIP channel. The sound system 1626 may include a processor that receives the patch list, the set of coefficients, and scrutinizes the list. The processor may also utilize heuristics to determine if it has enough time to patch all connections. If not enough time is available, the packet is released.

A more preferred method is shown in figure 18. Again, the world server 1616 generates sound coefficients that are used by the sound system 1626 to change the audio characteristics (e.g., audio volume) of the sound data going from the sound source to the sound drain. Sound leakage refers to a representative object of a user that can hear sound in a virtual environment. The sound coefficient may change the audio volume and other audio characteristics as a function of the closeness of the source and drain.

At block 1810, the positions of all sound sources in the virtual environment are determined. The sound source includes an object (e.g., a jukebox, a speaker, a streaming water flow) in the virtual environment. Sound sources also include representative objects of those users that are talking. The sound source may be multimedia from an internet connection (e.g., audio from YouTube video).

The following functions are performed for each drain in the virtual environment. At block 1820, the closeness of each sound source to the leak is determined. This function is performed for each sound leak in the virtual environment. Closeness is not limited to distance. World server 1616 may perform this function because it maintains information about the location of sound sources.

At block 1830, the coefficients for each drain/source pair are calculated. Each coefficient changes the volume from the source as a function of the closeness from the source to the drain. This function may also be performed by the world server 1616, as it maintains information about the location of the object. The world server 1616 provides the sound coefficients to the sound server 1626.

If the source is outside the audio range of the drain, the sound from the source may be cut off (i.e., inaudible). The coefficient may reflect such a cut-off (e.g., by being set to zero or close to zero). The world server 1616 may determine the scope and whether a cut-off occurred because of its tracked object state.

At block 1840, the sound data from each sound source is adjusted with its corresponding coefficient. Thus, sound data from a sound source is weighted as a function of closeness to the sink.

At block 1850, the weighted sound data is combined and sent back to the user over the telephone line or VoIP channel. The sound system 1626 may include a processor that receives the patch list, coefficient sets, and scrutinizes the list. The processor may also utilize heuristics to determine whether it has enough time to patch all connections. If not enough time is available, the packet is released.

In addition to or instead of the sound mixing shown in fig. 9 and 18, the teleconferencing system 1620 may switch the source/drain pairs together to a direct connection in order to save computational power and reduce latency. This may be done if the world server 1616 determines that the two users can primarily only listen to each other. The teleconferencing system 1620 may also premix some or all of the sources of the several drains with similar coefficients. In the latter case, each user's own source may have to be subtracted from the combined drain to produce its drain.

A user may utilize both client device 120 and audio-only device 130 during a teleconference. The client device 120 is used to interact with the virtual representation and find others to speak with. Audio only device 130 is used to speak with others.

However, some users may only be able to use audio only devices. However, such a user may still control the objects in the virtual representation. For example, such a user may move their representation object around the virtual environment to listen to a conference call and to approach and speak to other users. By moving its representation objects around the virtual environment, a user with only a pure audio device can hear the sound, but not see the view provided by the virtual environment.

Reference is now made to fig. 17. To begin a session using only audio-only devices, the audio-only devices establish audio communication with the teleconferencing system (block 1710). With a traditional telephone, the user may call the virtual representation (e.g., by calling a unique telephone number, or by calling a universal number and entering data such as a user ID and PIN via DTMF). With a VoIP phone, a user may call the virtual representation, for example, by calling the virtual representation's unique VoIP address.

The teleconference system notifies the server system for the session (block 1715). The server system assigns the user to a location within the virtual representation (block 1720).

The audio only device generates signals for selecting and controlling objects in the virtual representation (block 1730). These signals are not limited to any particular type. These signals may be, by way of example, Dial Tone (DTMF) signals, voice signals, or any other type of telephony signals.

Consider a push button phone. A particular button on the phone may correspond to a command. A user with a push button phone or a DTMF enabled VoIP phone may execute a command by entering the command using DTMF push buttons. Each command may be provided with one or more arguments. The argument may be a telephone number or other sequence of numbers. In some embodiments, voice commands may be translated and used.

The command argument may expect (expect) a value from the option list. The options may be in a tree structure such that the user selects a first group having one bit and then the user is presented with the resulting subset of remaining options, and so on. The most likely option may be listed first.

For example, a user may press '0' to enter a menu of commands, where all available commands are read to the user. The user then enters a CALL (CALL) command (e.g., 2255), followed by a # symbol. The user may then be asked to identify the person to call, for example, by speaking the name of that person, entering the phone number of that person, entering a code corresponding to that person, and so on. Instead of pressing a button to enter a command menu, the user may speak a prompt (catechword), such as "computer". The teleconferencing system may also detect audio, process audio signals, and act upon audio signals prior to a user entering a command menu. For example, the teleconference system may analyze the user's voice and detect emotional changes and communicate them to the server system. The server system may modify the representation object of the user in response to reflect the emotional change.

Another command may cause an object to move within its virtual environment. The arguments of the command may specify a direction, distance, new location, etc.

Another command may allow the user to switch to another virtual environment, and the argument of the command may specify the virtual environment. Another command may allow the user to join the conference call. Another command may allow a user to request information about the environment or about other users. Another command may allow one user's avatar to carry another user's avatar in hand so that a following avatar will follow a preceding avatar (follow the back).

Another command may allow the user to select an object representing an internet resource, such as a web page. An argument may specify a particular link, URL, or bookmark. For example, a list of available links may be read to a user who enters arguments to select a link (e.g., an internet radio station). In this manner, phones and other devices without browsers can be used to access content on the internet.

For example, the virtual environment includes an internet object. When the object is selected, a connection is established to the site providing the streaming audio. The server system provides the streaming audio to a teleconferencing system that mixes the streaming audio on the user's telephone line.

Another command may allow one user to give another user or a particular set of rights of the user or access to one or more of their files or directories. Another command may allow one user to transfer an object (e.g., a document, token, or monetary unit) to another user. Another command may allow one user to record and leave a voice message for the other user (the voice message may be converted to a file and left as a text message). Another command may allow one user to present media (such as video, sound samples, and images) to other users (e.g., on a visual screen), change their representation object (e.g., change the avatar's mood), invite or participate in a vote or play game.

The teleconference system receives and interprets these signals and notifies the server system to take an action (block 1740), such as changing the state of the object. The teleconferencing system interprets these signals and tells the server system to change the state.

The teleconference system may play audio segments, such as sounds in the virtual environment (block 1750). The server system may also synchronize the sound clip with the state change in the virtual representation.

The server system may also provide an audio description in the virtual environment (block 1750). For example, the virtual environment may be described to a user from the perspective of the user's avatar. Objects closer to the user's avatar may be described in more detail. The description may include or omit details to maintain a nearly constant overall description length. The user may request a more detailed description of the particular object, upon which additional details are revealed. The server system may also generate an option audio description in response to the command (block 1750). The teleconference system mixes those audio descriptions with other audio for the user and provides the mixed sound data to the user's audio-only device (block 1760).

The server system may also generate data for controlling audio characteristics over time. For example, the volume of a conversation between two users is a function of the distance and/or orientation of their two avatars in the virtual environment. In this example, the sound becomes larger as the two avatars move closer together, and the sound becomes smaller as the two avatars move farther apart. The server system generates a sound coefficient that varies the volume between the two users as a function of the distance between the two users. These coefficients are used by the teleconferencing system to change the volume over time (block 1780). In this manner, the server system commands the teleconference system to attenuate or change the sound so that the conversation is consistent with the virtual environment.

Reference is made to fig. 19, which illustrates an exemplary network-based system 1900 similar to the system illustrated in fig. 16. The communication system 1900 includes a VE server system 1910. "VE" refers to a virtual environment.

The client device is numbered 1902. The phone is numbered 1904.

The VE server system 1910 hosts a website that includes a combination of web pages, images, videos, and other digital resources. The VE server system 1910 includes one or more web servers 1912 for serving web pages, and one or more media servers 1914 for storing videos, images, and other digital resources.

One or more of these web pages embed the client file.

Client's files, e.g., one or more individual files provided by web server 1912Objects (. swf files) (some of which may be dynamically downloaded when they are needed) are composed.

Clients are not limited toA client. Other browser-based clients include (without limitation) Java^TMSmall application program,

Silverlight^TMClient, NET applet,

Using a web browser, the client device 1902 downloads a web page from the web server 1912 and then downloads the embedded client file from the web server 1912. The client file is downloaded to the client device, thereby initiating the client. The client begins running the client file and downloads the remainder of the client file (if any) from the web server 1912.

The entire client, or a portion thereof, may be provided to the client device. Consider that includes

Player and one or more

Of objectsAn instance of a client.

The player is already installed on the client device. When swf files are sent to and downloaded to

When the server is in the state of the server,

the player causes the client device to display the virtual environment. The client also accepts input (e.g., keyboard input, mouse input) instructing the user's representative objects to move around and feel the virtual environment.

The server system 1910 also includes one or more world servers 1916. The "world" refers to a collection of representations of virtual environments provided by the server system 1910. When the client begins to run, it opens a connection to the world server 1916. The server system 1910 selects a description of the virtual environment and sends the selected description to the client. The selected description contains a link to a graphic or other media of the virtual environment. The description also contains the coordinates and appearance of all objects in the virtual environment. The client downloads media (e.g., images) from the media server 1914 and projects the images (e.g., equal-scale, 3-D).

Object, parameterized generalizations

When the client device 1902 wants an object to move to a new location in the virtual environment, its client determines the coordinates of the new location and the requested time to begin moving the object, and generates a request. The request is sent to the world server 1916.

The world server 1916 receives the request and updates the data structure representing the "world". The world server 1916 manages the state of each object on one or more virtual environments and updates the changed state. Examples of states include avatar states, objects they carry, user states (accounts, permissions, rights, audio range, etc.), and call management. When a user commands an object in the virtual environment to a new state, the world server 1916 commands all clients represented in the virtual environment to transition the state of the object so that the client devices display the object in approximately the same state at approximately the same time. World server 1916 may also perform collision detection and avoidance, path finding, and ensure, in general, consistent (e.g., physically correct) behavior.

World server 1916 may also manage objects that transition gradually or abruptly. When a client device commands an object to transition to a new state, the world server 1916 receives the command and generates an event that causes all clients to show the object in the new state at a particular time.

The world server 1916 generates coefficients for the acoustic model. For example, the world server 1916 tracks the distance between objects and generates coefficients as a function of the distance between objects. The world server 1916 provides these coefficients to the telephony system 1920, which applies them to the audio data.

The telephone system 1920 establishes telephone connections with traditional telephones (landline and cellular), VoIP telephones, and other telephones 1904. Some embodiments of the telephony system 1920 may include one or more telephony servers 1922 for establishing calls with telephones via the Public Switched Telephone Network (PSTN). For example, the telephony server 1922 may be used for PBX and ISDN cards that connect to customers using traditional telephones (e.g., push button telephones) and digital telephones. The telephony server 1922 may include a mobile network or analog network connector. These cards act as the terminal side of the PBX or ISDN line and, in conjunction with associated software, perform all low level signaling for establishing a telephone connection. The audio data in the events (e.g., ring, connect, disconnect) and chunks (e.g., 100 ms) are passed from the card to the sound system 1926. The sound system 1926 mixes audio between users in a teleconference, mixes any external sounds (e.g., jukebox sounds, people walking, etc.), and passes the mixed (missed) chunks back to the card and thus to the user, among other things.

Some embodiments of the teleconferencing system 1920 may include one or more VoIP servers 1924 that establish connections with users who utilize VoIP incoming telephone calls. In this case, a client (e.g., client 160 of fig. 1) may contain functionality by which it attempts to connect to a VoIP voice over ip phone using, for example, an xml-socket connection. If the client detects a VoIP phone, it enables VoIP functionality for the user. The user may then cause the client to establish a connection (e.g., by clicking a button) by sending a CALL (CALL) command (while including information needed to authenticate the VoIP connection) via the socket to the VoIP phone that called the VoIP server 1924.

Some embodiments of the teleconferencing system 1920 may transcode the call to VoIP or receive a VoIP stream directly from a third party (e.g., a telecommunications system). In those embodiments, the events will not originate from these cards, but will originate transparently from the IP network.

World server 1916 may associate the authenticated VoIP connection with the client connection (if present). The world server 1916 may associate each authenticated PBX connection with a client connection (if any).

The server system 1910 may provide the same virtual representation to different kinds of client devices 1902, possibly with different visual representations (e.g., 3D, scaled, and text), so that users of those different client devices 1902 can also interact with each other. For devices enabled to run text sessions, such as telnet sessions, a user may establish a file session to receive information, questions, and options, and also enter commands. For a text device, a written description of the virtual environment may be provided.

The phone system 1920 may also allow the user of the phone to control objects in the virtual environment. Even if the user without the client device, and only the phone, cannot see the view of the virtual environment, the user may still perceive the sound of the virtual environment and speak to other users (with or without the client device 1902). The telephony system 1920 may accept telephony signals (e.g., DTMF, voice commands) from a telephone to control the actions of its corresponding representation in the virtual environment. The phone system 1902 may also receive SMS or MMS to control these actions.

The phone 1904 generates signals for selecting and controlling objects in the virtual representation, and the phone system 1920 interprets these signals and notifies the server system to take action, such as changing the object state. These signals may be, by way of example, dial-tone (DTMF) signals, voice signals, or some other type of telephony signals. Consider a dial tone telephone. A particular button on the phone may correspond to a command. A user with a push button phone or a DTMF enabled VoIP phone may execute a command by entering the command using DTMF buttons. The telephony server 1922 detects (in-band) DTMF keys and converts them to (out-of-band) control signals, which are passed to the world server 1916. Each command may be provided with one or more arguments. The argument may be a telephone number or other sequence of digits. In some embodiments, voice commands may be translated and used.

The server system 1910 may further include means 1917 for providing an audio description of the virtual environment. For example, the virtual environment may be described to a user from the perspective of the user's avatar. Objects closer to the user's avatar may be described in more detail. The description may include or omit details to maintain a nearly constant overall description length. The user may request a more detailed description of the particular object, upon which additional details are revealed. The server system 1910 may also generate an option audio description in response to the command. The teleconferencing system 1920 mixes the audio description (if any) with the other audio and provides the mixed sound data to the user's phone.

The sound system 1926 may play sound clips, such as sound in a virtual environment. The sound clip is synchronized with the state changes of these objects in the virtual environment. The sound system 1926 starts and stops sound clips at the state transition start and stop times indicated by the world server 1916.

The sound system 1926 may mix the sound of the virtual environment with the audio from the phone 1904. The sound mixing is not limited to any particular sound model. The telephony system 1920 may receive the list of patches, the set of coefficients, and scrutinize the list.

The VE server system 1910 may also include one or more servers that provide additional services. For example, one or more network hosts 1918 may be used to implement a servlet and Java Server Page (JSP) specification to provide a Java code environment that operates in conjunction with the network server 1912.

All servers in system 1900 can run on the same machine or be distributed across multiple different machines. The communication may be performed using a remote call. For example, HTTP or HTTPs-based protocols (e.g., SOAP) may be used by the server(s) and network connection device to transport and communicate with clients.

Reference is now made to FIG. 20, which illustrates an example of utilizing system 1900. At block 2000, the user is allowed to start a session. For example, using a web browser, a user enters a website and logs into the system 1900. The provider of the service starts the session.

After the session begins, the virtual environment is presented to the user (block 2010). For example, if a service provider runs a website, a web browser may download and display the virtual environment to the user.

The user may control their representative object to move around the virtual environment to feel the different views and sounds provided by the virtual environment (block 2020). For example, the presentation object may turn on the jukebox and select a song from the playlist. The jukebox may play the selected song. The user may also drag and drop songs from a shared or local folder onto the jukebox to have the songs uploaded and played.

The user may also move their representation objects around the virtual environment to interact with other users represented in the virtual environment (block 2040). The user's presentation object may be moved by clicking on the user's presentation object at a location in the virtual environment, pressing keys on a keyboard, pressing keys on a telephone, entering text, entering voice commands, and so forth.

There are a number of ways in which a user may interact with other users in a virtual environment. One way is by wandering around the virtual environment and listening for a conversation already in progress. As a user moves their representation objects around the virtual environment, the user may hear speech and other sounds.

The user may then engage in a conversation or otherwise interact with other users by becoming voice-enabled via the telephone (block 2040). Becoming voice-enabled allows the user to speak with other voice-enabled users. For example, the user wants to take a conference call with a telephone. To enter a conference call, the user calls the communication system using a telephone. With a traditional telephone, the user can call the virtual environment in which he is located (e.g., by calling a unique telephone number, or by calling a universal number and entering traditional data such as a user ID and PIN via DTMF). With VoIP telephony, a user may call the virtual environment by calling a unique VoIP address of the virtual environment.

If the service provider can recognize the user's telephone number, the service provider may join the telephone call with the ongoing session (block 2032). If the service provider cannot recognize the user's telephone number, the user initiates a new session via telephone (block 2034), the user identifies himself (e.g., by entering additional data such as a user ID or PIN via DTMF), and the service provider then merges the new telephone session with the already ongoing session (block 2036). Instead of the user calling the service provider, the user may request the service provider to call the user (block 2038).

Once voice-enabled (block 2030), the user may speak to other voice-enabled users using the phone. Once voice enabled (block 2030), the user remains voice enabled until the user does not continue the call (e.g., hangs up the phone).

In some embodiments, the system 1900 allows a user to log into the teleconference service and enter a conversation without accessing a website (block 2060). The user may only have access to a button phone or other phone 1904 that is not capable of displaying the virtual environment. Consider a conventional telephone. With only the phone, the user can call the phone number and connect to the service provider. The service provider may then add the user's representation object to the virtual environment. Via telephone signals (e.g., DTMF, voice control) the user can move his representation object around the virtual environment, listen to other conversations, meet others and feel the sound (rather than the field of view) of the virtual environment. Although the user cannot see its representation object, other users viewing the virtual environment may see the user's representation object.

More than one virtual environment may be hosted at any given time. If more than one virtual reality environment is available to the user, the user may move in and out of these different virtual environments and thereby interact with even more people. Each of these virtual environments may be uniquely addressable via an internet address or a unique telephone number. The service provider may then place each user directly into the selected target virtual environment. The user may subscribe to and enter a private virtual environment to maintain a private conversation. The user may also subscribe to and enter a private area of the public environment to maintain private conversations. The web browser or other graphical user interface may include a toolbar, browser extension, or other means for indicating different contexts available to the user. The toolbar allows a user to move in and out of different virtual environments and to reserve and enter private areas of the virtual environment.

Communication between users is not limited to conversation via telephone. The communication may occur in other ways. Examples include (without limitation) video streams, text chat messages, instant messaging messages, avatar gestures or movements, emotional expressions, emoticons, and web pages.

The state of the virtual environment may be persistent in that it exists continuously throughout many user sessions and continuously throughout actions of different users. This allows the virtual environment to be modified by one user while the modification is seen by the other users. For example, as a way of signaling to another user, graffiti can be written on a wall, a light switch in a virtual reality environment can be turned on and off, and so forth.

A user can add, remove, and move objects in a virtual environment as a way of signaling to another user. Examples of objects include sound sources (e.g., music boxes, bubbling fish tanks), data objects (e.g., modifiable books with text and pictures), visualized music objects, and so forth.

Communication between users may be performed by sharing a particular object. The persistent state also allows "things" to be placed on top of each other. The file may fall on the user or on the floor as a way to share the file with the user. Music or sound files may be dropped on the jukebox. Pictures or video may be dropped on the projection device to trigger playback/display. A multimedia sample (e.g., an audio clip or video clip containing a message) can be pinned to the whiteboard.

Referring back to fig. 2, the virtual representation and the teleconference is generated by two

different systems

140 and 150. Furthermore, the different clients 150 displaying the virtual representation may not be in direct communication with each other (in a pure client-server system, they would not). However, communication system 110 ensures that clients 160 display approximately the same object transitions in the virtual representation at approximately the same time.

If the user commands a new object state in the virtual representation, his client does not directly notify other clients of the new state. Moreover, the client does not immediately transition the object to the new state. Instead, the client sends a request to the server system 150 and waits for an indication from the server system 150.

The server system 150 causes all clients displaying the virtual representation to gradually transition the object to its new state within a certain time. When the state of an object in the environment has changed, the server system 150 notifies all necessary clients of the change. In this manner, the server system 150 ensures that all client devices 120 display approximately the same object transition in the virtual representation at approximately the same time.

The communication system 110 may host multiple virtual representations simultaneously. The communication system 110 may host multiple teleconferences in virtual representations.

If more than one virtual representation is available to the user, the user may move in and out of the different virtual representations. Each of these virtual representations may be uniquely addressable via an internet address or a unique telephone number. The server system 150 may then place each user directly into the selected virtual representation. The user may subscribe to and enter the private virtual representation to maintain a private conversation. The user may also subscribe to and enter a private area of the virtual representation to maintain a private conversation. A web browser or other graphical user interface may include a toolbar or other means for indicating different virtual representations available to the user.

Thus, a user may utilize both the client device 120 and the audio-only device 130 during a teleconference. The client device 120 is used to interact with the virtual representation and find others to speak with. Audio only device 130 is used to speak with others.

Reference is also made to fig. 22, which illustrates an example of how the communication system 110 manages the state of an object when a client device requests a new state for the object. To further illustrate this example, the object will be described as an avatar representing the user, and the new state will be the new position of the avatar.

On the client side, the client receives input to change the state of an object (block 2210). For example, the new location of the object is received by clicking on the new location in the virtual representation.

In response, client 160 calculates coordinates in the virtual representation based on the click screen coordinates of the new location (block 2215) and sends a state change request to the server system (block 2220). The state change request includes the coordinates of the new location. The state change request may also include a desired time at which the avatar will begin to move to the new location (block 2215). The desired time will be slightly advanced so that the event can be communicated to all clients 160 before that time arrives. Then, the client 160 enters a wait state (2225).

The server system 150 validates the request (block 2230). For example, the server system 150 checks whether the virtual representation contains a path that allows the avatar to move to the new location. This may include determining whether the coordinates of the new location are within a walkable interval and whether the avatar is allowed to walk to that from its current location at that particular time. If the time has passed or no time is allowed to be communicated, the start time is moved slightly backwards if necessary.

If the request is valid, server system 150 may also calculate the arrival time and path for the representation to transition from the current state to the new state (block 2230). For example, server system 150 may utilize a way-finding algorithm to calculate a walking route having waypoints (waypoints) and an arrival time for each waypoint. An exemplary way-finding algorithm is described below.

The server system 150 updates the master model, which is a data structure containing all the object states in time (block 2235). For example, the server system 150 adds the avatar's waypoints and their arrival times to the master model.

The server system 150 then generates an event that notifies all clients 160 of the updated object state (block 2240). For example, the event includes the start and stop times of each waypoint in the avatar walking path. All those clients 160 that display the virtual representation move the avatar to various waypoints at the same arrival time. Thus, all those clients 160 will display approximately the same avatar movement at approximately the same time (e.g., approximately, expectedly, to imperfectly synchronize the system clock or system latency).

Further, the server system 150 may instruct the teleconference system 140 to play the mobile sound at the appropriate time (block 2260). The teleconference system 140 plays the set of sound segments at the specified time(s) (block 2270). For example, the server system 150 may provide a sound clip of the footsteps sound as the avatar moves to a new location, and the teleconference system 140 plays the sound clip to the user whose avatar is walking.

The server system 150 also synchronizes sound segments with movement and state changes in the virtual representation.

Server system 150 may also generate data for controlling audio characteristics over time (block 2280). For example, the volume of a conversation between two users is a function of the distance and/or orientation of their two avatars in the virtual environment. In this example, the sound becomes larger as the two avatars move closer together, and the sound becomes smaller as the two avatars move farther apart. The server system 150 generates a sound coefficient that changes the volume between two users as a function of the distance between the two users. These coefficients are used by teleconference system 140 to change the volume over time (block 2290). In this manner, the server system 150 commands the teleconference system 140 to attenuate or change the sound so that the conversation is consistent with the virtual environment. In this manner, the server system 150 may also command the teleconference system 140 to play sound clips, record user speech, or change operating parameters that affect sound quality.

Generally, objects in a virtual representation have properties that allow a user to perform particular actions (e.g., sit, move, and open) on them. Objects (e.g., Flash objects) comply with a particular specification (e.g., API). But as an example, the object may be a jukebox with methods (actions) such as play/stop/pause, and attributes such as volume, song list, and song selection. When the jukebox is turned on and a song is selected, the server system 150 will generate an event. The server system 150 will command the teleconference system to play the selected clip.

Client 160 may optionally calculate a transition path and send the transition path to server system 150 at block 2215. This may be done to ease the workload on the server system 150 (at block 2230), which will not have to compute the transition path.

Reference is again made to fig. 24. In addition to or instead of the sound mixing shown in fig. 24 to save computational power and reduce latency, the teleconferencing system 1620 may switch the source/drain pairs together to a direct connection. This may be done if the world server 1616 determines that the two users may simply listen to each other, primarily. Further, the teleconferencing system 1620 may premix some or all of the several vulnerabilities that have similar coefficients. In the latter case, each user's own source may have to be subtracted from the combined drain to produce its drain.

The telephone system 1622 (see fig. 16) may allow a user of the audio-only device to control objects in the virtual environment and move from one virtual environment to another. A user with only a pure audio device may experience the sound of the virtual environment and speak to other users, but may not experience the view of the virtual environment. The telephone system 1622 may utilize telephone signals (e.g., DTMF, voice commands) from the telephone to control actions of its respective representation in the virtual environment.

A particular button on the phone may correspond to a command. A user with a push button phone or a DTMF enabled VoIP phone may execute a command by entering the command using DTMF buttons. Each command may be provided with one or more arguments. The argument may be a telephone number or other sequence of digits. In some embodiments, voice commands may be translated and used.

For example, a user may press '0' to enter a menu of commands, where all available commands are read to the user. The user then enters a CALL (CALL) command (e.g., 2255), followed by a # symbol. The user may then be asked to identify the person to call, for example, by speaking the name of that person, entering the phone number of that person, entering a code corresponding to that person, and so on.

Another command may cause the avatar to move within its virtual environment. The edges of the command may specify directions, distances, new locations, etc. Another command may allow the user to switch to another virtual environment, and the edge of the command may specify the virtual environment. Another command allows the user to join the conference call. Another command may allow a user to request information about the environment or about other users. Another command may allow an avatar user of one user to carry the avatar of another user so that a following avatar will follow a preceding avatar (follow the back).

For devices enabled to run a telnet session, a user may establish a telnet conference to receive information, questions, and options, and also enter commands.

The client device may include a braille terminal. The braille terminal for the blind is used like a text terminal.

For users with only audio-only devices, the server system 1610 may include means 1617 for providing an alternative description of the virtual environment. For telnet enabled devices, the apparatus 1617 may provide a written description of the virtual environment. For other audio-only devices, the system may include a speech synthesis system for providing a description of speech that is heard on the audio-only device.

For example, the virtual environment may be described to the user from the perspective of the user's avatar. Objects closer to the user's avatar may be described in more detail. The description may include or omit details to maintain a nearly constant overall description length. The user may request a more detailed description of the particular object, upon which additional details are revealed.

The communication system 1610 provides a teleconferencing system without requiring the user to install software or acquire special equipment (e.g., microphone, PC speakers, headphones). If the system 1610 is web-based, a web browser may be used to connect to the VE server system 1610, download and run clients, and play virtual environments. This makes it easy to connect and use the communication system 1600.

Fig. 25 and 26 illustrate a method of calculating a waypoint of a mobile object. The method may be performed by the world server 1616 or by the client 160. Consider the exemplary space shown in fig. 25. The control is represented by one polygon boundary 2510 and two polygon obstacles 2520. Boundary 2510 has vertices A, B, C, D, E and F. One barrier 2520 has vertices G, H, I, J and K, and the other barrier 2520 has vertices L, M and N. Boundary 2510 may depict the boundary of the virtual representation, while obstacle 2520 may represent a movable object and a stationary object in the virtual representation. The purpose of this method is to find a path from the current position S to the new position T that is not obstructed by either the boundary 2510 or the obstacle 2520.

If any portion of the line segment lies outside of the edge 2510, the line segment is obstructed by the boundary 2510. If any portion of a line segment is within the corresponding open polygon 2520, the line segment is obstructed by an obstruction 2520.

The path may be composed of one or more line segments (i.e., a piecewise linear path). The inner vertices of the path (i.e., excluding S and T) are the vertices of the boundary 2510 and the obstacle (S) 2520. Vertices such as K are excluded as internal vertices because the path made up of segment GJ is shorter than the path made up of segments GK and KJ. Vertices such as A, B, D, E and F are also excluded as interior vertices, since shorter paths can be composed with other vertices. The path may even be along a boundary (e.g., a line segment along vertices H and I).

Referring to FIG. 26, the world Server computes a visual graph (block 2610), for example, using a flat-scan algorithm. The visual pattern includes the vertices of the boundary 2510 and the vertices of each obstacle 2520. Between each pair of points, the visual graphic also includes an edge, but only if the line segment is not obstructed by the boundary 2510 or by any obstruction 2520.

The visual graphic is updated whenever the obstacle 2520 moves or a new obstacle appears (block 2620). For example, if a new avatar enters the virtual representation, or if an object (e.g., avatar) in the virtual environment moves, the visual graphics are updated.

When an object is commanded to move, the new location and the current location of the object are added to the visual representation (block 2630) and the shortest path between the new location and the current location is found (block 2640). An algorithm, such as Dijkstra's algorithm, may be run on the visual graph to identify the edges of the shortest path.

Multiple objects may be moved within one virtual representation at any given time. The shortest path determination (block 2640) may also include collision avoidance. One way to achieve collision avoidance between two moving objects is to move each object immediately to its new location. However, when objects can be allowed to conflict (e.g., pass through each other), conflict avoidance is optional.

As mentioned above, the system according to the invention is not limited to a single virtual representation. Moreover, a system according to the present invention may host multiple independent virtual representations, assign different users to different virtual representations, allow one or more teleconferences for each virtual representation, and manage the state of objects in each virtual representation (e.g., adjust the motion of objects). For example, the system may provide a first virtual environment that includes a club scene, and a second virtual environment that includes a beach scene. Some users may be assigned to a first virtual environment, experience the view and sound of the club scene, and conference call with those users represented by the avatar of the virtual club. Other users may be assigned to the second virtual environment, feel the view and sound of the beach scene, and conference call those users represented by avatars on the virtual beach. The server system will manage the objects in both environments.

The server system also filters communications with the clients, sending communications only to those clients that need to change the state of objects in a particular virtual representation. The world server 1616 may perform two or more of the following functions:

(1) create a session, process session timeouts, and terminate the session. For example, once the client closes and a timeout has elapsed, the user session terminates.

(2) Patch events from the world server 1616 are only given to those clients that will be affected by these events. For example, a user in one virtual environment is not affected by an event in another virtual environment. Thus, the world server 1616 sends events affecting the virtual environment to only those clients represented in the virtual environment.

Filtering reduces excessive communication. Thus, the traffic between the world server and the client is reduced.

The communication system according to the invention is not limited to any particular topology. Several exemplary topologies are shown in fig. 27a-27 d. These topologies provide different ways in which clients can communicate with the server system. These figures 27a-27d do not show the topology of a pure audio device in communication with a teleconferencing system.

Reference is made to fig. 27a, which shows a pure client-server topology. Clients are represented by circles and server systems are represented by triangles. The server system may include one or more servers.

Reference is now made to FIG. 27b, which illustrates a topology that includes clients (represented by circles), server systems (represented by large triangles), and supernodes (represented by small triangles). Each supernode acts as both a client and a server. Acting as a server, the supernode provides data to all connected clients. Acting as a client, the supernode may display and interact with the virtual representation. The server system and the supernodes coordinate to track objects in the virtual representation. The supernode may be operated by a user or by a communication service provider.

Reference is now made to fig. 27c, which shows a topology comprising peers (represented by hexagons) and a server system (represented by triangles). Each peer is connected to the server system, as shown by the dashed lines, to display and interact with the virtual representation. However, the peers are also connected to each other, as shown by the solid lines, and thus may bypass the server system and pass certain data between them. Examples of such data include, but are not limited to, audio files (e.g., sound clips), still files (e.g., background images, user pictures), live data (e.g., webcams). One of the peers may initiate and/or receive such data from the server system and pass the data to its peer.

To ease the burden on the server system, the peers may also exchange data about the virtual environment as well as object commands. The transition path and time may be calculated by the peer commanding the object state change, and the path and time may be assigned to its peer. Such data may also be sent to the server system so that the server system can track the objects in the virtual representation.

Reference is now made to fig. 27d, which shows a topology comprising a server system (represented by triangles), clients (represented by circles), supernodes (represented by squares), and peers (represented by hexagons). Peers exchange data among each other and clients are connected to one or more supernodes (both shown with solid lines). If such a connection fails, the client may connect to a back-off peer (shown by the dotted line) which will then become a supernode. The client and peer may also connect or exchange data with the server system, as shown by the dashed lines.

The topology of fig. 27d provides the advantages of peer-to-peer communications, including reducing traffic and computational load on the server while also allowing simple client participation. Peers or supernodes may need to be installed compared to clients running in virtual machines.

Reference is made to fig. 28, which illustrates an exemplary communication system 2800 that includes a teleconference system 2820 and a server system 2810 that communicate with peers 2850. Server system 2810 provides a virtual representation to each peer 2850 and ensures that each peer 2850 displays approximately the same object transition at approximately the same time.

Peers 2850 exchange data between each other using peer-to-peer communication. Each peer 2850 includes a graphical user interface 2852, a sound mixer, and audio input/output hardware 2856 (e.g., a microphone and speaker). Each peer 2850 may generate an audio stream using I/O hardware and distribute the audio stream to one or more other peers 2850. The sound mixer 2854 of peer 2850 weights the audio streams from other peers and the audio streams from other audio sources. The weighted streams are combined in a sound mixer 2854 and the combined stream is output on audio I/O hardware 2856. The sound coefficients used to weight the audio streams may be calculated through the peer's graphical user interface 2852 or through the server system 2810. Peer 2850 may also send the combined audio stream to other peers to save bandwidth.

The peer communication may also be used to exchange data such as files and events without downloading the data from the server system 2810 or in addition to downloading the data from the server system. A peer-to-peer file sharing protocol such as bitstream (Bittorrent) may be used to transfer static files. This reduces traffic on the media servers of the server system because each file is optimally downloaded only once from the server system 2810.

Each user's media (e.g., representation/avatar graphics, personal profile pictures, files) may be seeded (seed) from the user's peer 2850 and distributed among multiple peers 2850. Status change commands, text messages, and webcam data or pictures may also be downloaded only once from server system 2810 and distributed in a peer-to-peer manner to reduce traffic on server system 2810.

The system is not limited to a particular architecture. For example, the system of FIG. 1 may be implemented as a client-server system. In such a system, the service provider includes one or more servers and the different user devices are client devices. Certain types of client devices (e.g., computers) may be connected to a server via a network such as the internet. Other types of client devices may be connected via different networks. For example, a traditional telephone may be connected via a PSTN line, a VoIP telephone may be connected through the internet, and so on.

The conference call according to the invention can be conveniently performed. Entering a conference call is as simple as entering a website and clicking a mouse button (perhaps a few times). The telephone number does not have to be predetermined. No pre-meeting introduction is necessary. No special software (e.g., webcam, sound card, and microphone) is needed since voice communication can be provided over the phone. The communication is intuitive and therefore easy to learn. Audio-visual dynamic group communication is enabled. A user may move from one group to another to change the person with whom they communicate.

The system according to the invention allows to centralize and integrate different communication technologies. Users with traditional phones, VoIP phones, devices with GUI interfaces and internet connectivity, etc. can take a conference call.

Claims

1. A method of controlling the volume of sound data during a teleconference, the method comprising: providing a virtual representation comprising objects representing users in the teleconference; and controlling the volume of the sound data in dependence on how the user changes the position and relative orientation of their object in the virtual representation.

2. The method of claim 1, further comprising: changing other audio characteristics of the sound data according to how the user interacts with the virtual representation.

3. The method of claim 1, wherein objects in the virtual representation also have an audio range, thereby controlling the volume of the sound data also as a function of the audio range.

4. The method of claim 3, wherein the audio range is adjustable.

5. The method of claim 1, wherein the virtual representation is a virtual environment; and wherein the user is represented by an avatar.

6. The method of claim 5, wherein the volume of sound data between two users is a function of the relative orientation of the avatars of the two users.

7. The method of claim 1, wherein the virtual representation is provided by a server system for calculating sound coefficients for each object as a sound source relative to a leak; and wherein for each user, controlling the volume comprises: applying the sound coefficient to the sound data of a corresponding object of the sound coefficient, mixing the modified sound data and providing the mixed sound data to the sink.

8. The method of claim 7, wherein the sound data is based on

To be mixed.

9. A method, comprising:

providing a virtual representation;

establishing a telephone connection with a plurality of users, said users being represented by objects in said virtual representation, each user represented object being both a sound sink and a sound source; and

for each drain, mixing sound data from different sound sources and providing the mixed data to a user associated with the drain, wherein a volume of sound data from a source is adjusted according to a topological measure of the source relative to the drain;

so that the user is not directly connected but communicates through the synthesized auditory environment.

10. The method of claim 9, wherein mixing the sound data for each leak comprises:

calculating an audio parameter for each pair of sources, each audio parameter controlling volume as a function of its proximity to the drain corresponding to the source; and

adjusting sound data for each pair of sources with corresponding audio parameters, mixing the adjusted sound data for the pair of sources, and providing the mixed sound data to a user associated with the drain.

11. The method of claim 9, wherein the virtual representation includes other objects as sound sources, wherein a volume of sound data from a source is adjusted according to a topological measure of the source relative to the drain; and wherein adjusted sound data from other objects is also mixed and provided to the sink.

12. The method of claim 9, wherein the object comprises a range of audio frequencies.

13. The method of claim 9, wherein the topological metric is a visual distance between source and drain.

14. The method of claim 9, wherein the topological metrics include distance and orientation.

15. The method of claim 9, wherein the audio is grouped together to reduce computational burden.

16. The method of claim 9, wherein the sound is based on

To be mixed.

17. The method of claim 9, wherein, to reduce the computational burden of mixing each of the leaked sound data, the sound data of only those sound sources that make a significant contribution are mixed.

18. The method of claim 17, wherein the audio range of particular objects is automatically set to zero or close to zero, thereby excluding sound data of those particular objects from the mix.

19. The method of claim 9, wherein a minimum distance between objects is enforced to reduce the computational burden of blending the sound data.

20. The method of claim 9, wherein at least some sound data is pre-mixed to reduce the computational burden of mixing the sound data; wherein the pre-mixing comprises mixing sound data from a group of sound leaks and assigning a single coefficient to each leak of the group.

21. The method of claim 9, wherein direct connections are made between source and drain to reduce the computational burden of mixing the sound data.

22. A communication system, comprising:

a telephone-based teleconferencing apparatus; and

means for providing a virtual representation comprising objects representing participants in a conference call, the virtual representation enabling participants to enter the conference call using the telephone-based conference call arrangement and to control the volume during the conference call, the volume being controlled in dependence on how the user changes the position and relative orientation of their objects in the virtual representation.

23. A communication system, comprising:

a server system for providing a virtual representation; and

a teleconferencing system for establishing telephonic connections with a plurality of users, the users being represented by objects in the virtual representation,

the teleconferencing system controls the volume during a teleconference according to how the user changes the position and relative orientation of their representation objects in the virtual representation.

24. The system of claim 23, wherein each user-represented object is both a sound sink and a sound source; and wherein for each drain, sound data from different sound sources is mixed and the mixed data is provided to a user associated with the drain, wherein the volume of sound data from a source is adjusted according to a topological measure of the source relative to the drain.

25. A method comprising applying a virtual reality environment to a teleconference such that the environment is applied to the teleconference.

26. An apparatus for applying a virtual reality environment to a teleconference to enable a user to enter the virtual reality environment without knowing other users in the virtual reality environment, to enable the user to meet and hold a teleconference with other users in the virtual reality environment.

27. A system, comprising:

means for a conference call; and

means for coupling an immersive virtual reality environment with the teleconference.

28. A communication system, comprising:

a server system for providing a virtual representation comprising at least one object; and

a teleconferencing system for establishing audio communication with audio-only devices;

the objects in the virtual representation are controlled in response to signals from the audio-only device.

29. A system, comprising:

means for providing a virtual representation, the virtual representation comprising an object;

means for receiving a signal from an audio only device; and

means for controlling a state of the object in response to the signal.

30. A communication system for providing a virtual environment comprising a plurality of objects, said objects having changeable states; and for establishing audio communication with an audio-only device; the system controls the state of objects in the virtual representation in response to signals from the audio-only device to enable a user of the audio apparatus to interact with the virtual environment.

31. A method of controlling objects in a virtual environment, comprising:

receiving a signal from an audio only device; and

controlling a state of the object in response to the signal.

32. A method of providing a service, comprising:

providing a network-accessible virtual environment comprising objects representing users of the service;

enabling the user to control the user's represented objects in the virtual environment to personally interact with other objects represented in the virtual environment and become voice-enabled; and

those users that are voice-enabled are able to talk to other voice-enabled users via the telephone.

33. A system, comprising:

means for providing a network-accessible virtual environment comprising objects representing system users;

means for enabling the user to control the user's represented object in the virtual environment to personally interact with other objects represented in the virtual environment and become voice-enabled; and

means for enabling those voice-enabled users to talk to other voice-enabled users via the telephone.

34. A communication system, comprising:

a teleconference system for hosting a teleconference; and

a server system for providing a virtual representation for the teleconferencing system, the virtual representation comprising objects whose states can be commanded to gradually transition, the server system providing clients to client devices, each client having its client device display the virtual representation;

each client device is capable of generating commands for gradually transitioning an object to a new state in the virtual representation and sending the commands to the server system;

the server system commands the client to transition the object to its new state within a specified time.

35. A communication system for a plurality of client devices, comprising:

a first device for holding a conference call; and

second means for providing virtual representations enabling said conference call, each virtual representation comprising an object whose state gradually changes, said second means providing clients to at least some of said client devices, each client having its client device displaying a virtual representation;

each client device is capable of generating a command for gradually transitioning an object to a new state in the virtual representation and sending the command to the second apparatus;

the second device instructing the client to transition the object to substantially the same state at substantially the same time;

the second device causes the first device to control audio characteristics of the teleconference to conform to the virtual representation.