[go: up one dir, main page]

CN119620891A - A user experience interaction method based on VR engine - Google Patents

A user experience interaction method based on VR engine Download PDF

Info

Publication number
CN119620891A
CN119620891A CN202411830551.0A CN202411830551A CN119620891A CN 119620891 A CN119620891 A CN 119620891A CN 202411830551 A CN202411830551 A CN 202411830551A CN 119620891 A CN119620891 A CN 119620891A
Authority
CN
China
Prior art keywords
user
gesture
engine
model
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411830551.0A
Other languages
Chinese (zh)
Inventor
郭建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Axis Animation Technology Development Beijing Co ltd
Original Assignee
New Axis Animation Technology Development Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Axis Animation Technology Development Beijing Co ltd filed Critical New Axis Animation Technology Development Beijing Co ltd
Priority to CN202411830551.0A priority Critical patent/CN119620891A/en
Publication of CN119620891A publication Critical patent/CN119620891A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a user experience interaction method based on a VR engine, which relates to the technical field of interaction experience, and comprises the steps of extracting characteristics of voice, gesture, touch and sight signals through a deep learning model, carrying out signal fusion by combining a cross-modal attention mechanism, enhancing the understanding capability of user behaviors, capturing the position and the operation context of a user in a real environment in real time by utilizing a synchronous positioning and mapping SLAM technology to enable interaction to be more natural and smooth, and realizing the deep fusion of a virtual object and the real environment by combining an AR engine with a light and shadow consistency calculation technology, a physical simulation technology and a dynamic environment adaptation, so that the problem of virtual and real rupture in the traditional method is solved, the immersion of interaction is greatly improved, and in addition, the user intention including the interaction type, a target object and a task target is predicted through dynamic Bayesian network modeling, and the prediction model is optimized by combining a real-time feedback and a self-learning mechanism, so that the method is gradually adapted to the personalized requirements of the user.

Description

VR engine-based user experience interaction method
Technical Field
The invention relates to the technical field of interaction experience, in particular to a user experience interaction method based on a VR engine.
Background
The conventional interaction experience mainly takes a graph as a user interface as a core, a user finishes operation in clicking, sliding, voice input and other modes, the interaction mode is simple and direct, and along with the complexity of user requirements, the conventional interaction mode cannot meet the user experience requirements under the condition of multi-mode and virtual scenes, mobile equipment can be controlled through voice commands, and the operation of a mouse or a keyboard is still relied on the desktop equipment, so that the interaction experience is too split.
In order to solve the problems, a part of interaction schemes are introduced into a voice assistant to process natural language instructions, the naturalness of operation is improved by adopting a gesture recognition technology, user experience is optimized to a certain extent, so that interaction between a person and equipment is closer to nature, in addition, the part of schemes are introduced into augmented reality AR and virtual reality VR, virtual-real interaction effects are optimized through virtual object rendering and ambient light adaptation, more immersive experience is brought to users, however, the improvement schemes still cannot thoroughly solve the problem of insufficient interaction fusion, users still need to frequently switch between different input modes, real seamless cross-mode interaction cannot be realized, in addition, the current interaction schemes still have defects in adaptability and dynamic optimizing capability, and the interaction mode and display content cannot be dynamically adjusted according to real-time behaviors of the users, so that a fusion interaction method with higher layers and a dynamic adaptation based on VR engine user experience interaction method are needed to solve the problems.
Disclosure of Invention
The present invention has been made in view of the above-described problems occurring in the prior art.
The invention provides a VR engine-based user experience interaction method for solving the problem of insufficient interaction fusion, and the problem that a user still needs to switch between different input modes instead of really realizing seamless cross-mode interaction.
In order to solve the technical problems, the invention provides the following technical scheme:
the embodiment of the invention provides a user experience interaction method based on a VR engine, which comprises the following steps,
Step S1, a signal interface is arranged in an AR engine, user input signals including voice commands, gesture actions, touch operation and sight line changes are captured, and a deep learning model is adopted to extract characteristics of the input signals, so that signal characteristics are generated;
S2, fusing signal characteristics, generating a user intention model by using a probability map model, and predicting the operation purpose of a user, wherein the predicted content comprises an interaction type, a target object and a task target;
Step S3, driving the virtual scene to correspondingly adjust by the AR engine according to the prediction result of the user intention model, wherein the adjustment comprises positioning, scaling or rotation of the virtual object and adaptive optimization of display content;
S4, the adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine, and meanwhile, the environment light change is dynamically adapted to obtain a final interaction effect;
And S5, outputting and feeding back the final interaction effect to the user, re-capturing the further operation of the user, and updating to the input signal of the step S1 of a new round by adopting a feedback mechanism.
As a preferred scheme of the VR engine-based user experience interaction method, the AR engine provides a synchronous multi-mode data acquisition framework, captures user input signals in real time and locates the position of a user in a real environment through synchronous locating and mapping SLAM technology.
As a preferred scheme of the VR engine-based user experience interaction method, the invention adopts a deep learning model to extract characteristics of input signals, generates signal characteristics,
The method comprises the steps of encoding a voice waveform by using a transducer model, extracting semantic information, wherein the encoding formula is as follows:
zspeech=fspeech(xspeechspeech),
Wherein z speech is the extracted semantic feature vector, x speech is the input voice waveform signal, θ speech is the training parameter of the voice model, and f speech is the deep learning function for voice feature extraction;
capturing continuous gesture motion frames of a user, and modeling by combining a 3D convolution network CNN and a long-short-term memory network LSTM, wherein the capturing formula is as follows:
zgesture=fgesture(xgesturegesture),
Wherein z gesture is the high-dimensional feature representation of the gesture signal, x gesture is the input gesture video sequence, θ gesture is the training parameter of the gesture model, and f gesture is the gesture feature extraction model function;
the touch signal characteristics are extracted, including clicking positions and sliding tracks, and are encoded by using a multi-layer perceptron MLP, wherein the encoding formula is as follows:
ztouch=ftouch(xtouchtouch),
Wherein z touch is a feature vector of a touch signal, x touch is touch input data comprising a click position and a sliding track, θ touch is a touch model parameter, and f touch is a touch feature extraction function;
Extracting sight signal characteristics, processing sight track data by ResNet, extracting characteristics of a user gazing area, wherein an extraction formula is as follows:
zgaze=fgaze(xgazegaze),
Wherein z gaze is the high-dimensional feature of the sight line signal, x gaze is the sight line track input data, θ gaze is the training parameter of the sight line model, and f gaze is the sight line feature extraction model function;
fusing the characteristics, wherein the fusion formula is as follows:
zfused=ffusion(zspeech,zgesture,ztouch,zgazefusion),
Wherein z fused is the multi-modal feature representation after fusion, f fusion is the multi-modal fusion function, and θ fusion is the multi-modal fusion parameter.
The method for fusing the signal features further comprises the steps of analyzing depth information of a real scene and a three-dimensional structure of a virtual scene by using an AR engine to obtain environmental features, and fusing the environmental features with the signal features input by a user, wherein the depth information comprises object distances and object surface characteristics.
As a preferred scheme of the VR engine-based user experience interaction method, the invention comprises the steps of fusing signal characteristics, generating a user intention model by using a probability map model, predicting the operation purpose of a user,
Taking the multi-mode signal characteristic z fused in the step S1 as an input characteristic, extracting an environmental characteristic e by adopting an AR engine, wherein the e comprises depth information e depth of a real scene and a three-dimensional structure e 3D of a virtual scene, the depth information comprises the distance and surface characteristics of an object, and the three-dimensional structure is the shape and the spatial distribution of the object;
feature fusion is carried out, and a fusion formula is as follows:
zcombined=fattention(zfused,e;θfusion),
Wherein z combined is the feature after fusion, f attention is the cross-modal attention mechanism, and θ fusion is the feature fusion parameter;
Modeling a user intention model through a dynamic Bayesian network DBN, and calculating the posterior probability of the intention, wherein the calculation formula is as follows:
Where P (I|z combined) is the posterior probability of the user's intent I given the fused feature z combined, P (z combined |I) is the conditional probability of the feature's intent, P (I) is the prior probability of the intent, Σ i (·) represents normalization of all possible intentions I i.
As a preferable scheme of the VR engine-based user experience interaction method, the method comprises the steps of predicting the result according to a user intention model, driving a virtual scene by an AR engine to correspondingly adjust,
Positioning and adjusting the virtual object, dynamically updating the three-dimensional space position of the virtual object according to the target object in the user intention and the operation area prediction,
The update formula is:
pnew=pold+Δp,
Wherein, p new is the new position vector of the virtual object, p old is the original position vector of the virtual object, and Δp is the position change vector;
scaling adjustment of the virtual object is carried out, the proportion of the virtual object is adjusted according to the task target prediction,
The scaling formula is:
Snew=Sold·k,
Wherein S new is the updated scaling, S old is the original scaling, and k is the scaling factor;
performing a rotation adjustment of the virtual object, adjusting a direction of the virtual object by a rotation operation of a user,
The rotation formula is:
Rnew=Rold·R(θ,u),
Wherein R new is the updated rotation matrix, R old is the initial rotation matrix, R (θ, u) is the rotation matrix, which is composed of the rotation axis u and the rotation angle θ,
Optimizing the layout of display contents in the virtual scene according to the user intention model, wherein the optimization formula is as follows:
Wherein L opt is the optimized display layout, L is the candidate display layout, w i is the weight in the user intent model, n is the total number of contents, and D i (L) is the distance between the user interest object and the display content.
As a preferred scheme of the VR engine-based user experience interaction method of the present invention, the step of making corresponding adjustments further includes,
Recording and self-learning operation habit of user, dynamically regulating initial state of virtual scene, self-learning formula is p initial=E[puser,
Wherein, p user is the position distribution in the user operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback.
As a preferred scheme of the VR engine-based user experience interaction method, the method comprises the steps of performing depth fusion with a real environment through the light and shadow consistency calculation and physical characteristic simulation technology of an AR engine,
The AR engine captures illumination conditions in a real environment in real time, including the direction, the intensity and the color temperature of a light source, and renders shadows and highlights through a global illumination model so that the virtual object is consistent with the change of the shadows in the real environment;
The AR engine simulates physical interaction characteristics of a virtual object and a real environment;
The AR engine dynamically adjusts the expression form of the virtual object according to the environmental data collected in real time.
As a preferred scheme of the VR engine-based user experience interaction method, the feedback mechanism comprises the following steps:
Adjustment of signal capture and feature extraction parameters,
And adjusting the user intent model parameters based on the user repetitive behavior.
As a preferred scheme of the VR engine-based user experience interaction method, the step of updating to the new round of step S1 input signals by adopting a feedback mechanism is that,
Capturing voice and gesture signals, and carrying out corresponding parameter adjustment, wherein the adjustment formula is as follows:
θspeech,new=θspeech,old+Δθspeech,
θgesture,new=θgesture,old+Δθgesture,
Wherein θ speech,new and θ gesture,new are updated speech and gesture capture parameters, respectively, θ speech,old and θ gesture,old are original speech and gesture capture parameters, respectively, Δθ speech and Δθ gesture are delta parameters adjusted by user feedback,
And carrying out feature extraction model, wherein an optimization formula is as follows:
z′=z+Δzfeedback,
Wherein z' is an optimized feature vector, z is an original feature vector, and Δz feedback is an optimized increment introduced by a feedback mechanism;
Updating prior probabilities and conditional probabilities in the user intention model according to the repeated behaviors of the user and the historical operation modes,
Historical behavior modeling is carried out, and a model formula is as follows:
P(I)new=αPhistory(I)+(1-α)Pprior(I),
Wherein P (I) new is the updated prior probability of user intention, alpha is a historical weight factor, P history (I) represents the prior probability of user intention I based on user historical behavior statistics, P prior (I) represents the initial prior probability of preset user intention I,
Updating the conditional probability, wherein the updating formula is as follows:
P(z′|I)new=P(z′|I)+ΔPfeedback(z′|I),
Wherein,
P (Z ' |i) new is the conditional probability after optimization, P (Z ' } I) is the conditional probability before optimization, Δp feedback (Z ' } I) is the conditional probability increment based on user operation feedback,
The feedback mechanism not only adjusts the model parameters, but also adjusts the formula as follows:
θ′=θ+Δθfeedback,
Wherein θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ feedback is the parameter increment calculated by the feedback mechanism.
The invention has the beneficial effects that the characteristics of voice, gesture, touch control and sight line signals are extracted through a deep learning model, and the signals are fused by combining a cross-modal attention mechanism, so that the understanding capability of the user behavior is enhanced, and simultaneously, the position and the operation context of the user in the real environment are captured in real time by utilizing a synchronous positioning and mapping SLAM technology, so that the interaction is more natural and smooth; in addition, through modeling and predicting user intention of a dynamic Bayesian network, including interaction type, target object and task target, and optimizing a prediction model by combining real-time feedback and a self-learning mechanism, the AR engine gradually adapts to individual requirements of a user, in the aspect of virtual scene adjustment, the AR engine dynamically adjusts the position, scaling and rotation angle of the virtual object according to the user intention, simultaneously optimizes display content layout, records user operation habit by combining the self-learning mechanism, further reduces redundant operation and improves interaction efficiency;
According to the method, the signal capturing parameters, the characteristic extraction model and the intention prediction model are dynamically optimized through a feedback mechanism, a self-adaptive closed-loop optimization flow is constructed, and the adaptability to dynamic changes and the support capability of complex scenes are enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a VR engine-based user experience interaction method of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Embodiment 1, referring to fig. 1, this embodiment provides a VR engine-based user experience interaction method, including:
Step S1, a signal interface is arranged in an AR engine, user input signals including voice commands, gesture actions, touch operation and sight line changes are captured, and a deep learning model is adopted to extract characteristics of the input signals, so that signal characteristics are generated;
The AR engine provides an acquisition framework for synchronous multi-mode data, captures user input signals in real time, and positions the user in a real environment through synchronous positioning and mapping SLAM technology;
The deep learning model is adopted to extract the characteristics of the input signal, the step of generating the signal characteristics is that,
The method comprises the steps of encoding a voice waveform by using a transducer model, extracting semantic information, wherein the encoding formula is as follows:
zspeech=fspeech(xspeechspeech),
Wherein z speech is the extracted semantic feature vector, x speech is the input voice waveform signal, θ speech is the training parameter of the voice model, and f speech is the deep learning function for voice feature extraction;
capturing continuous gesture motion frames of a user, and modeling by combining a 3D convolution network CNN and a long-short-term memory network LSTM, wherein the capturing formula is as follows:
zgesture=fgesture(xgesturegesture),
Wherein z gesture is the high-dimensional feature representation of the gesture signal, x gesture is the input gesture video sequence, θ gesture is the training parameter of the gesture model, and f gesture is the gesture feature extraction model function;
the touch signal characteristics are extracted, including clicking positions and sliding tracks, and are encoded by using a multi-layer perceptron MLP, wherein the encoding formula is as follows:
ztouch=ftouch(xtouchtouch),
Wherein z touch is a feature vector of a touch signal, x touch is touch input data comprising a click position and a sliding track, θ touch is a touch model parameter, and f touch is a touch feature extraction function;
Extracting sight signal characteristics, processing sight track data by ResNet, extracting characteristics of a user gazing area, wherein an extraction formula is as follows:
zgaze=fgaze(xgazegaze),
Wherein z gaze is the high-dimensional feature of the sight line signal, x gaze is the sight line track input data, θ gaze is the training parameter of the sight line model, and f gaze is the sight line feature extraction model function;
fusing the characteristics, wherein the fusion formula is as follows:
zfused=ffusion(zspeech,zgesture,ztouch,zgazefusion),
Wherein z fused is a fused multi-modal feature representation, f fusion is a multi-modal fusion function, and θ fusion is a multi-modal fusion parameter;
Specifically, the characteristics of voice, gestures, touch and sight are extracted through the deep learning model, the multi-modal characteristics are fused by using a cross-modal attention mechanism, unified high-dimensional characteristic representation is generated, and the integrity and consistency of signal characteristics are ensured.
S2, fusing signal characteristics, generating a user intention model by using a probability map model, and predicting the operation purpose of a user, wherein the predicted content comprises an interaction type, a target object and a task target;
The method for fusing the signal features further comprises the steps of analyzing depth information of a real scene and a three-dimensional structure of a virtual scene by using an AR engine to obtain environmental features, and fusing the environmental features with the signal features input by a user;
The signal characteristics are fused, a probability map model is used for generating a user intention model, and the operation purpose of the user is predicted by the steps of,
Taking the multi-mode signal characteristic z fused in the step S1 as an input characteristic, extracting an environmental characteristic e by adopting an AR engine, wherein the e comprises depth information e depth of a real scene and a three-dimensional structure e 3D of a virtual scene, the depth information comprises the distance and surface characteristics of an object, and the three-dimensional structure is the shape and the spatial distribution of the object;
feature fusion is carried out, and a fusion formula is as follows:
zcombined=fattention(zfused,e;θfusion),
Wherein z combined is the feature after fusion, f attention is the cross-modal attention mechanism, and θ fusion is the feature fusion parameter;
Modeling a user intention model through a dynamic Bayesian network DBN, and calculating the posterior probability of the intention, wherein the calculation formula is as follows:
Where P (I|z combined) is the posterior probability of the user intent I given the fused feature z combined, P (z combined |I) is the conditional probability of the feature given intent, P (I) is the prior probability of intent, Σ i (·) represents normalizing all possible intentions I i;
Specifically, the interaction type, the target object and the task target of the user are predicted in real time by fusing the multi-mode signal characteristics and the environment characteristics and utilizing dynamic Bayesian network modeling.
Step S3, driving the virtual scene to correspondingly adjust by the AR engine according to the prediction result of the user intention model, wherein the adjustment comprises positioning, scaling or rotation of the virtual object and adaptive optimization of display content;
according to the prediction result of the user intention model, the AR engine drives the virtual scene to carry out corresponding adjustment,
Positioning and adjusting the virtual object, dynamically updating the three-dimensional space position of the virtual object according to the target object in the user intention and the operation area prediction,
The update formula is:
pnew=pold+Δp,
Wherein, p new is the new position vector of the virtual object, p old is the original position vector of the virtual object, and Δp is the position change vector;
scaling adjustment of the virtual object is carried out, the proportion of the virtual object is adjusted according to the task target prediction,
The scaling formula is:
Snew=Sold·k,
Wherein S new is the updated scaling, S old is the original scaling, and k is the scaling factor;
performing a rotation adjustment of the virtual object, adjusting a direction of the virtual object by a rotation operation of a user,
The rotation formula is:
Rnew=Rold·R(θ,u),
Wherein R new is the updated rotation matrix, R old is the initial rotation matrix, R (θ, u) is the rotation matrix, which is composed of the rotation axis u and the rotation angle θ,
Optimizing the layout of display contents in the virtual scene according to the user intention model, wherein the optimization formula is as follows:
Wherein L opt is the optimized display layout, L is the candidate display layout, w i is the weight in the user intention model, n is the total number of contents, D i (L) is the distance between the user attention target and the display contents,
The step of making a corresponding adjustment may further comprise,
Recording and self-learning operation habit of user, dynamically regulating initial state of virtual scene, self-learning formula is p initial=E[puser,
Wherein, p user is the position distribution in the user operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback;
specifically, through positioning, scaling, rotation and display content optimization, the AR engine can dynamically adjust virtual scenes, respond to the effect of user intention accurately, introduce a self-learning mechanism, and optimize initial states and response logic after multiple interactions of users.
S4, the adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine, and meanwhile, the environment light change is dynamically adapted to obtain a final interaction effect;
The adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine,
The AR engine captures illumination conditions in the real environment in real time, including the direction, intensity and color temperature of a light source, and renders shadows and highlights through the global illumination model so that the virtual object is consistent with the change of the shadows in the real environment;
The AR engine simulates physical interaction characteristics of the virtual object and the real environment;
The AR engine dynamically adjusts the expression form of the virtual object according to the environmental data acquired in real time;
Specifically, through light and shadow consistency calculation, physical characteristic simulation and dynamic environment adaptation, the depth fusion of the virtual scene and the real environment is realized, and the immersion feeling and interaction naturalness of a user are enhanced.
Step S5, outputting and feeding back the final interaction effect to the user, re-capturing the further operation of the user, and updating to the input signal of the step S1 of a new round by adopting a feedback mechanism;
The feedback mechanism includes:
Adjustment of signal capture and feature extraction parameters,
Adjusting user intent model parameters based on user repetitive behaviors,
The step of updating the input signal to the new round of step S1 using the feedback mechanism is,
Capturing voice and gesture signals, and carrying out corresponding parameter adjustment, wherein the adjustment formula is as follows:
θspeech,new=θspeech,old+Δθspeech,
θgesture,new=θgesture,old+Δθgesture,
Wherein θ speech,new and θ gesture,new are updated speech and gesture capture parameters, respectively, θ speech,old and θ gesture,old are original speech and gesture capture parameters, respectively, Δθ speech and Δθ gesture are delta parameters adjusted by user feedback,
And carrying out feature extraction model, wherein an optimization formula is as follows:
z′=z+Δzfeedback,
Wherein z' is an optimized feature vector, z is an original feature vector, and Δz feedback is an optimized increment introduced by a feedback mechanism;
Updating prior probabilities and conditional probabilities in the user intention model according to the repeated behaviors of the user and the historical operation modes,
Historical behavior modeling is carried out, and a model formula is as follows:
P(I)new=αPhistory(I)+(1-α)Pprior(I),
Wherein P (I) new is the updated prior probability of user intention, alpha is a historical weight factor, P history (I) represents the prior probability of user intention I based on user historical behavior statistics, P prior (I) represents the initial prior probability of preset user intention I,
Updating the conditional probability, wherein the updating formula is as follows:
P(z′|I)new=P(z′|I)+ΔPfeedback(z′|I),
Wherein,
P (Z ' |i) new is the conditional probability after optimization, P (Z ' } I) is the conditional probability before optimization, Δp feedback (Z ' } I) is the conditional probability increment based on user operation feedback,
The feedback mechanism not only adjusts the model parameters, but also adjusts the formula as follows:
θ′=θ+Δθfeedback,
wherein θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ feedback is the parameter increment calculated by the feedback mechanism;
Specifically, through a feedback mechanism, self-adaptive optimization is realized in each link of signal capturing, feature extraction and intention modeling, and the adaptive capacity to dynamic change is gradually improved.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (10)

1.一种基于VR引擎的用户体验交互方法,其特征在于:包括,1. A user experience interaction method based on a VR engine, characterized by: comprising: 步骤S1,在AR引擎中部署信号接口,捕捉用户输入信号,包括语音命令、手势动作、触控操作和视线变化,并采用深度学习模型对输入信号进行特征提取,生成信号特征;Step S1, deploying a signal interface in the AR engine to capture user input signals, including voice commands, gestures, touch operations, and line of sight changes, and using a deep learning model to extract features from the input signals to generate signal features; 步骤S2,融合信号特征,使用概率图模型生成用户意图模型,对用户的操作目的进行预测,预测内容包括:交互类型、目标对象和任务目标;Step S2, integrating signal features, using a probabilistic graph model to generate a user intention model, and predicting the user's operation purpose. The prediction content includes: interaction type, target object, and task goal; 步骤S3,根据用户意图模型预测结果,由AR引擎驱动虚拟场景进行对应调整,包括虚拟对象的定位、缩放或旋转,以及显示内容的适应性优化;Step S3, based on the prediction result of the user intention model, the AR engine drives the virtual scene to make corresponding adjustments, including positioning, scaling or rotation of virtual objects, and adaptive optimization of displayed content; 步骤S4,调整后的虚拟场景通过AR引擎的光影一致性计算和物理特性仿真技术,与现实环境进行深度融合,同时动态适应环境光变化,得到最终交互效果;Step S4, the adjusted virtual scene is deeply integrated with the real environment through the light and shadow consistency calculation and physical property simulation technology of the AR engine, and dynamically adapts to the changes in ambient light to obtain the final interactive effect; 步骤S5,最终交互效果输出反馈给用户,并重新捕获用户的进一步操作,采用反馈机制更新至新一轮的步骤S1输入信号中。In step S5, the final interaction effect is output and fed back to the user, and the user's further operations are recaptured and updated to a new round of step S1 input signals using a feedback mechanism. 2.如权利要求1所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述AR引擎提供同步多模态数据的采集框架,实时捕捉用户输入信号,通过同步定位与建图SLAM技术定位用户在现实环境中的位置。2. A user experience interaction method based on a VR engine as described in claim 1, characterized in that: the AR engine provides a synchronous multimodal data acquisition framework, captures user input signals in real time, and locates the user's position in the real environment through synchronous positioning and mapping SLAM technology. 3.如权利要求2所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述采用深度学习模型对输入信号进行特征提取,生成信号特征的步骤为,3. A user experience interaction method based on a VR engine as claimed in claim 2, characterized in that: the step of extracting features of the input signal using a deep learning model to generate signal features is: 利用Transformer模型对语音波形进行编码,提取语义信息,编码公式为:The Transformer model is used to encode the speech waveform and extract semantic information. The encoding formula is: zspeech=fspeech(xspeech;θspeech),z speech = f speech (x speech ; θ speech ), 其中,zspeech为提取的语义特征向量,xspeech为输入的语音波形信号,θspeech为语音模型的训练参数,fspeech为用于语音特征提取的深度学习函数;Among them, z speech is the extracted semantic feature vector, x speech is the input speech waveform signal, θ speech is the training parameter of the speech model, and f speech is the deep learning function used for speech feature extraction; 捕捉用户的连续手势动作帧,结合3D卷积网络CNN和长短期记忆网络LSTM进行建模,捕捉公式为:Capture the user's continuous gesture action frames, combine the 3D convolutional network CNN and the long short-term memory network LSTM for modeling, and the capture formula is: zgesture=fgesture(xgesture;θgesture),z gesture = f gesture (x gesture ; θ gesture ), 其中,zgesture为手势信号的高维特征表示,xgesture为输入的手势视频序列,θgesture为手势模型的训练参数,fgesture为手势特征提取模型函数;Where z gesture is the high-dimensional feature representation of the gesture signal, x gesture is the input gesture video sequence, θ gesture is the training parameter of the gesture model, and f gesture is the gesture feature extraction model function; 提取触控信号特征,包括点击位置和滑动轨迹,使用多层感知器MLP对其编码,编码公式为:Extract touch signal features, including click position and sliding trajectory, and encode them using multi-layer perceptron MLP. The encoding formula is: ztouch=ftouch(xtouch;θtouch),z touch = f touch (x touch ; θ touch ), 其中,ztouch为触控信号的特征向量,xtouch为触控输入数据,包含点击位置和滑动轨迹,θtouch为触控模型参数,ftouch为触控特征提取函数;Wherein, z touch is the feature vector of the touch signal, x touch is the touch input data, including the click position and sliding trajectory, θ touch is the touch model parameter, and f touch is the touch feature extraction function; 提取视线信号特征,利用ResNet处理视线轨迹数据,提取用户注视区域的特征,提取公式为:Extract the sight signal features, use ResNet to process the sight trajectory data, and extract the features of the user's gaze area. The extraction formula is: zgaze=fgaze(xgaze;θgaze),z gaze = f gaze (x gaze ; θ gaze ), 其中,zgaze为视线信号的高维特征,xgaze为视线轨迹输入数据,θgaze为视线模型的训练参数,fgaze为视线特征提取模型函数;Among them, z gaze is the high-dimensional feature of the gaze signal, x gaze is the gaze trajectory input data, θ gaze is the training parameter of the gaze model, and f gaze is the gaze feature extraction model function; 将特征融合,融合公式为:The feature fusion formula is: zfused=ffusion(zspeech,zgesture,ztouch,zgaze;θfusion),z fused = f fusion (z speech , z gesture , z touch , z gaze ; θ fusion ), 其中,zfused为融合后的多模态特征表示,ffusion为多模态融合函数,θfusion为多模态融合参数。Among them, z fused is the fused multimodal feature representation, f fusion is the multimodal fusion function, and θ fusion is the multimodal fusion parameter. 4.如权利要求3所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述融合信号特征的方式还包括,使用AR引擎对现实场景的深度信息和虚拟场景的三维结构进行解析,得到环境特征,将环境特征与用户输入的信号特征融合;深度信息包括物体距离和物体表面特性。4. A user experience interaction method based on a VR engine as described in claim 3, characterized in that: the method of fusing signal features also includes using an AR engine to analyze the depth information of the real scene and the three-dimensional structure of the virtual scene to obtain environmental features, and fusing the environmental features with the signal features input by the user; the depth information includes the distance of the object and the surface characteristics of the object. 5.如权利要求4所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述融合信号特征,使用概率图模型生成用户意图模型,对用户的操作目的进行预测的步骤为,5. A user experience interaction method based on a VR engine as claimed in claim 4, characterized in that: the step of fusing signal features, using a probabilistic graphical model to generate a user intention model, and predicting the user's operation purpose is as follows: 以步骤S1的多模态信号特征zfused为输入特征,采用AR引擎提取环境特征e,e包括现实场景的深度信息edepth和虚拟场景的三维结构e3D,深度信息包含物体的距离和表面特性,三维结构为物体的形状和空间分布;Taking the multimodal signal feature z fused in step S1 as input feature, an AR engine is used to extract environmental feature e, where e includes depth information e depth of the real scene and three-dimensional structure e 3D of the virtual scene. The depth information includes the distance and surface characteristics of the object, and the three-dimensional structure is the shape and spatial distribution of the object. 进行特征融合,融合公式为:Perform feature fusion, the fusion formula is: zcombined=fattention(zfused,e;θfusion),z combined = f attention (z fused ,e; θ fusion ), 其中,zcombined为融合后的特征,fattention为跨模态注意力机制,θfusion为特征融合的参数;Among them, z combined is the fused feature, f attention is the cross-modal attention mechanism, and θ fusion is the parameter of feature fusion; 通过动态贝叶斯网络DBN建模用户意图模型,计算意图的后验概率,计算公式为:The user intention model is modeled through the dynamic Bayesian network DBN, and the posterior probability of the intention is calculated. The calculation formula is: 其中,P(I|zcombined)为在给定融合特征zcombined下,用户意图I的后验概率,P(zcombined|I)为特征给定意图的条件概率,P(I)为意图的先验概率,∑i(·)表示对所有可能的意图Ii进行归一化。Where P(I|z combined ) is the posterior probability of the user intention I given the fused feature z combined , P(z combined |I) is the conditional probability of the intention given by the feature, P(I) is the prior probability of the intention, and ∑ i (·) represents the normalization of all possible intentions I i . 6.如权利要求5所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述根据用户意图模型预测结果,由AR引擎驱动虚拟场景进行对应调整的步骤为,6. A user experience interaction method based on a VR engine as claimed in claim 5, characterized in that: the step of driving the virtual scene to make corresponding adjustments based on the prediction result of the user intention model by the AR engine is: 进行虚拟对象的定位调整,根据用户意图中的目标对象和操作区域预测,动态更新虚拟对象的三维空间位置,Adjust the positioning of virtual objects, dynamically update the three-dimensional spatial position of virtual objects based on the target objects and operation area predictions in the user's intention, 更新公式为:The update formula is: pnew=pold+Δp, pnewpold +Δp, 其中,pnew为虚拟对象的新位置向量,pold为虚拟对象的原始位置向量,Δp为位置变化向量;Where p new is the new position vector of the virtual object, p old is the original position vector of the virtual object, and Δp is the position change vector; 进行虚拟对象的缩放调整,根据任务目标预测,调整虚拟对象的比例,Perform scaling adjustments on virtual objects and adjust the scale of virtual objects based on task target predictions. 缩放公式为:The scaling formula is: Snew=Sold·k,S new = S old ·k, 其中,Snew为更新后的缩放比例,Sold为原始缩放比例,k为缩放系数;Where S new is the updated scaling ratio, S old is the original scaling ratio, and k is the scaling factor; 进行虚拟对象的旋转调整,通过用户的旋转操作调整虚拟对象的方向,Perform rotation adjustment of the virtual object, and adjust the direction of the virtual object through the user's rotation operation. 旋转公式为:The rotation formula is: Rnew=Rold·R(θ,u), RnewRold ·R(θ,u), 其中,Rnew为更新后的旋转矩阵,Rold为初始旋转矩阵,R(θ,u)为旋转矩阵,由旋转轴u和旋转角度θ构成,Among them, R new is the updated rotation matrix, R old is the initial rotation matrix, and R (θ, u) is the rotation matrix, which consists of the rotation axis u and the rotation angle θ. 根据用户意图模型优化虚拟场景中的显示内容布局,优化公式为:The display content layout in the virtual scene is optimized according to the user intention model. The optimization formula is: 其中,Lopt为优化后的显示布局,L为候选显示布局,wi为用户意图模型中的权重,n为内容总数,Di(L)为用户关注目标与显示内容之间的距离。Where L opt is the optimized display layout, L is the candidate display layout, wi is the weight in the user intention model, n is the total number of contents, and Di (L) is the distance between the user's attention target and the displayed content. 7.如权利要求6所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述进行对应调整的步骤还包括,7. A method for user experience interaction based on a VR engine as claimed in claim 6, characterized in that: the step of making corresponding adjustments also includes: 记录并自学习用户的操作习惯,动态调整虚拟场景的初始状态,自学习公式为:pinitial=E[puser],Record and self-learn the user's operating habits, dynamically adjust the initial state of the virtual scene, and the self-learning formula is: p initial = E[p user ], 其中,为初始位置,puser为用户操作历史中的位置分布,E为期望值计算,调整后的虚拟场景通过视觉和触觉反馈传递给用户。Among them, is the initial position, p user is the position distribution in the user's operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback. 8.如权利要求7所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述调整后的虚拟场景通过AR引擎的光影一致性计算和物理特性仿真技术,与现实环境进行深度融合的步骤为,8. A user experience interaction method based on a VR engine as claimed in claim 7, characterized in that: the adjusted virtual scene is deeply integrated with the real environment through the light and shadow consistency calculation and physical property simulation technology of the AR engine, 所述AR引擎实时捕捉现实环境中的光照条件,包括光源方向、强度和色温,通过全局光照模型渲染阴影和高光,使虚拟对象与现实环境中的光影变化一致;The AR engine captures the lighting conditions in the real environment in real time, including the direction, intensity and color temperature of the light source, and renders shadows and highlights through a global illumination model, so that the virtual objects are consistent with the light and shadow changes in the real environment; 所述AR引擎模拟虚拟物体与现实环境的物理交互特性;The AR engine simulates the physical interaction characteristics of virtual objects and real environment; 所述AR引擎根据实时采集的环境数据,动态调整虚拟对象的表现形式。The AR engine dynamically adjusts the presentation of virtual objects based on the environmental data collected in real time. 9.如权利要求8所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述反馈机制包括:9. A user experience interaction method based on a VR engine as claimed in claim 8, characterized in that: the feedback mechanism comprises: 对信号捕捉和特征提取参数的调整,Adjustment of signal capture and feature extraction parameters, 以及基于用户重复行为调整用户意图模型参数。and adjusting the user intent model parameters based on user repetitive behaviors. 10.如权利要求9所述的一种基于VR引擎的用户体验交互方法,其特征在于:所述采用反馈机制更新至新一轮的步骤S1输入信号中的步骤为,10. A user experience interaction method based on a VR engine as claimed in claim 9, characterized in that: the step of updating the input signal of step S1 to a new round using a feedback mechanism is: 捕捉语音和手势信号,并进行对应的参数调整,调整公式为:Capture voice and gesture signals and adjust the corresponding parameters. The adjustment formula is: θspeech,new=θspeech,old+Δθspeechθ speech,newspeech,old +Δθ speech , θgesture,new=θgesture,old+Δθgestureθ gesture, new = θ gesture, old + Δθ gesture , 其中,θspeech,new和θgesture,new分别为更新后的语音和手势捕捉参数,θspeech,old和θgesture,old分别为原始语音和手势捕捉参数,Δθspeech和Δθgesture为由用户反馈调整的增量参数,Among them, θ speech,new and θ gesture,new are the updated speech and gesture capture parameters respectively, θ speech,old and θ gesture,old are the original speech and gesture capture parameters respectively, Δθ speech and Δθ gesture are the incremental parameters adjusted by user feedback, 对特征提取模型进行,优化公式为:The optimization formula for the feature extraction model is: z′=z+Δzfeedbackz′=z+Δz feedback 其中,z'为优化后的特征向量,z为原始特征向量,Δzfeedback为反馈机制引入的优化增量;Where z' is the optimized eigenvector, z is the original eigenvector, and Δz feedback is the optimization increment introduced by the feedback mechanism; 根据用户重复行为和历史操作模式,更新用户意图模型中的先验概率和条件概率,Update the prior probability and conditional probability in the user intention model based on the user's repeated behavior and historical operation patterns. 进行历史行为建模,模型公式为:To model historical behavior, the model formula is: P(I)new=αPhistory(I)+(1-α)Pprior(I),P(I) new =αP history (I)+(1-α)P prior (I), 其中,P(I)new为更新后的用户意图先验概率,α为历史权重因子,Phistory(I)表示基于用户历史行为统计的用户意图I的先验概率,Pprior(I)表示预设的用户意图I的初始先验概率,Wherein, P(I) new is the updated prior probability of user intention, α is the history weight factor, P history (I) represents the prior probability of user intention I based on the user's historical behavior statistics, and P prior (I) represents the preset initial prior probability of user intention I. 更新条件概率,更新公式为:Update the conditional probability, the update formula is: P(z′|I)new=P(z′|I)+ΔPfeedback(z′|I),P(z′|I) new =P(z′|I)+ΔP feedback (z′|I), 其中,in, P(z'|I)new为优化后的条件概率,P(z'|I)为优化前的条件概率,ΔPfeedback(z'|I为基于用户操作反馈的条件概率增量,P(z'|I) new is the conditional probability after optimization, P(z'|I) is the conditional probability before optimization, ΔP feedback (z'|I is the conditional probability increment based on user operation feedback, 反馈机制不仅调整模型参数,调整公式为:The feedback mechanism not only adjusts the model parameters, but also the adjustment formula is: θ′=θ+Δθfeedbackθ′=θ+Δθ feedback 其中,θ'为优化后的全局参数集合,θ为优化前的全局参数集合,Δθfeedback为反馈机制计算的参数增量。Among them, θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ feedback is the parameter increment calculated by the feedback mechanism.
CN202411830551.0A 2024-12-12 2024-12-12 A user experience interaction method based on VR engine Pending CN119620891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411830551.0A CN119620891A (en) 2024-12-12 2024-12-12 A user experience interaction method based on VR engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411830551.0A CN119620891A (en) 2024-12-12 2024-12-12 A user experience interaction method based on VR engine

Publications (1)

Publication Number Publication Date
CN119620891A true CN119620891A (en) 2025-03-14

Family

ID=94906819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411830551.0A Pending CN119620891A (en) 2024-12-12 2024-12-12 A user experience interaction method based on VR engine

Country Status (1)

Country Link
CN (1) CN119620891A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112424727A (en) * 2018-05-22 2021-02-26 奇跃公司 Cross-modal input fusion for wearable systems
CN114691839A (en) * 2020-12-30 2022-07-01 华为技术有限公司 Intention slot position identification method
US20230401976A1 (en) * 2022-06-12 2023-12-14 The Travelers Indemnity Company Systems and methods for artificial intelligence (ai) virtual reality (vr) emotive conversation training
CN118379403A (en) * 2024-03-12 2024-07-23 南阳印迹影视文化传媒有限公司 Emotion symbiotic communication system
CN118708085A (en) * 2024-08-29 2024-09-27 杭州川核灵境科技有限公司 A virtual scene control method and system based on MR large space

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112424727A (en) * 2018-05-22 2021-02-26 奇跃公司 Cross-modal input fusion for wearable systems
CN114691839A (en) * 2020-12-30 2022-07-01 华为技术有限公司 Intention slot position identification method
US20230401976A1 (en) * 2022-06-12 2023-12-14 The Travelers Indemnity Company Systems and methods for artificial intelligence (ai) virtual reality (vr) emotive conversation training
CN118379403A (en) * 2024-03-12 2024-07-23 南阳印迹影视文化传媒有限公司 Emotion symbiotic communication system
CN118708085A (en) * 2024-08-29 2024-09-27 杭州川核灵境科技有限公司 A virtual scene control method and system based on MR large space

Similar Documents

Publication Publication Date Title
US11163987B2 (en) Compact language-free facial expression embedding and novel triplet training scheme
US20240353918A1 (en) Machine interaction
CN110622109A (en) Computer animation based on natural language
TW202238532A (en) Three-dimensional face animation from speech
CN113255457A (en) Animation character facial expression generation method and system based on facial expression recognition
CN118379403A (en) Emotion symbiotic communication system
CN119600159B (en) Interactive digital person generation method and system based on artificial intelligence
CN119229336A (en) A method for building an efficient sign language translation system based on deep learning
CN118820787A (en) A virtual population generation training method, system, medium and program product
Shao et al. Computer vision-driven gesture recognition: Toward natural and intuitive human-computer
Kavitha et al. Advancing Human-Computer Interaction: Real-Time Gesture Recognition and Language Generation Using CNN-LSTM Networks
Li et al. Pose-aware 3D talking face synthesis using geometry-guided audio-vertices attention
CN120852604B (en) Meta-universe digital person generation method and system based on deep learning
US20250299426A1 (en) Interactivity and generative rendering for virtual and wearable display systems
US20250259362A1 (en) Prompt editor for use with a visual media generative response engine
CN119620891A (en) A user experience interaction method based on VR engine
CN119027557A (en) Speech-driven 3D facial animation generation method and device based on reinforcement learning
EP4345755A1 (en) Expression transfer to stylized avatars
Zhang Computer animation interaction design driven by neural network algorithm
CN119474956A (en) A method for intelligently controlling digital human using action tags
Shao et al. Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer Interfaces
WO2022164849A1 (en) Deep relightable appearance models for animatable face avatars
US20250165060A1 (en) Determining Body Pose From Environmental Data
US20250259272A1 (en) Blending user interface for blending visual media using a visual media generative response engine
WO2025166188A1 (en) System/ method for generative body, gesture, and facial expression in 3d characters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination