CN119620891A - A user experience interaction method based on VR engine - Google Patents
A user experience interaction method based on VR engine Download PDFInfo
- Publication number
- CN119620891A CN119620891A CN202411830551.0A CN202411830551A CN119620891A CN 119620891 A CN119620891 A CN 119620891A CN 202411830551 A CN202411830551 A CN 202411830551A CN 119620891 A CN119620891 A CN 119620891A
- Authority
- CN
- China
- Prior art keywords
- user
- gesture
- engine
- model
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04815—Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Databases & Information Systems (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a user experience interaction method based on a VR engine, which relates to the technical field of interaction experience, and comprises the steps of extracting characteristics of voice, gesture, touch and sight signals through a deep learning model, carrying out signal fusion by combining a cross-modal attention mechanism, enhancing the understanding capability of user behaviors, capturing the position and the operation context of a user in a real environment in real time by utilizing a synchronous positioning and mapping SLAM technology to enable interaction to be more natural and smooth, and realizing the deep fusion of a virtual object and the real environment by combining an AR engine with a light and shadow consistency calculation technology, a physical simulation technology and a dynamic environment adaptation, so that the problem of virtual and real rupture in the traditional method is solved, the immersion of interaction is greatly improved, and in addition, the user intention including the interaction type, a target object and a task target is predicted through dynamic Bayesian network modeling, and the prediction model is optimized by combining a real-time feedback and a self-learning mechanism, so that the method is gradually adapted to the personalized requirements of the user.
Description
Technical Field
The invention relates to the technical field of interaction experience, in particular to a user experience interaction method based on a VR engine.
Background
The conventional interaction experience mainly takes a graph as a user interface as a core, a user finishes operation in clicking, sliding, voice input and other modes, the interaction mode is simple and direct, and along with the complexity of user requirements, the conventional interaction mode cannot meet the user experience requirements under the condition of multi-mode and virtual scenes, mobile equipment can be controlled through voice commands, and the operation of a mouse or a keyboard is still relied on the desktop equipment, so that the interaction experience is too split.
In order to solve the problems, a part of interaction schemes are introduced into a voice assistant to process natural language instructions, the naturalness of operation is improved by adopting a gesture recognition technology, user experience is optimized to a certain extent, so that interaction between a person and equipment is closer to nature, in addition, the part of schemes are introduced into augmented reality AR and virtual reality VR, virtual-real interaction effects are optimized through virtual object rendering and ambient light adaptation, more immersive experience is brought to users, however, the improvement schemes still cannot thoroughly solve the problem of insufficient interaction fusion, users still need to frequently switch between different input modes, real seamless cross-mode interaction cannot be realized, in addition, the current interaction schemes still have defects in adaptability and dynamic optimizing capability, and the interaction mode and display content cannot be dynamically adjusted according to real-time behaviors of the users, so that a fusion interaction method with higher layers and a dynamic adaptation based on VR engine user experience interaction method are needed to solve the problems.
Disclosure of Invention
The present invention has been made in view of the above-described problems occurring in the prior art.
The invention provides a VR engine-based user experience interaction method for solving the problem of insufficient interaction fusion, and the problem that a user still needs to switch between different input modes instead of really realizing seamless cross-mode interaction.
In order to solve the technical problems, the invention provides the following technical scheme:
the embodiment of the invention provides a user experience interaction method based on a VR engine, which comprises the following steps,
Step S1, a signal interface is arranged in an AR engine, user input signals including voice commands, gesture actions, touch operation and sight line changes are captured, and a deep learning model is adopted to extract characteristics of the input signals, so that signal characteristics are generated;
S2, fusing signal characteristics, generating a user intention model by using a probability map model, and predicting the operation purpose of a user, wherein the predicted content comprises an interaction type, a target object and a task target;
Step S3, driving the virtual scene to correspondingly adjust by the AR engine according to the prediction result of the user intention model, wherein the adjustment comprises positioning, scaling or rotation of the virtual object and adaptive optimization of display content;
S4, the adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine, and meanwhile, the environment light change is dynamically adapted to obtain a final interaction effect;
And S5, outputting and feeding back the final interaction effect to the user, re-capturing the further operation of the user, and updating to the input signal of the step S1 of a new round by adopting a feedback mechanism.
As a preferred scheme of the VR engine-based user experience interaction method, the AR engine provides a synchronous multi-mode data acquisition framework, captures user input signals in real time and locates the position of a user in a real environment through synchronous locating and mapping SLAM technology.
As a preferred scheme of the VR engine-based user experience interaction method, the invention adopts a deep learning model to extract characteristics of input signals, generates signal characteristics,
The method comprises the steps of encoding a voice waveform by using a transducer model, extracting semantic information, wherein the encoding formula is as follows:
zspeech=fspeech(xspeech;θspeech),
Wherein z speech is the extracted semantic feature vector, x speech is the input voice waveform signal, θ speech is the training parameter of the voice model, and f speech is the deep learning function for voice feature extraction;
capturing continuous gesture motion frames of a user, and modeling by combining a 3D convolution network CNN and a long-short-term memory network LSTM, wherein the capturing formula is as follows:
zgesture=fgesture(xgesture;θgesture),
Wherein z gesture is the high-dimensional feature representation of the gesture signal, x gesture is the input gesture video sequence, θ gesture is the training parameter of the gesture model, and f gesture is the gesture feature extraction model function;
the touch signal characteristics are extracted, including clicking positions and sliding tracks, and are encoded by using a multi-layer perceptron MLP, wherein the encoding formula is as follows:
ztouch=ftouch(xtouch;θtouch),
Wherein z touch is a feature vector of a touch signal, x touch is touch input data comprising a click position and a sliding track, θ touch is a touch model parameter, and f touch is a touch feature extraction function;
Extracting sight signal characteristics, processing sight track data by ResNet, extracting characteristics of a user gazing area, wherein an extraction formula is as follows:
zgaze=fgaze(xgaze;θgaze),
Wherein z gaze is the high-dimensional feature of the sight line signal, x gaze is the sight line track input data, θ gaze is the training parameter of the sight line model, and f gaze is the sight line feature extraction model function;
fusing the characteristics, wherein the fusion formula is as follows:
zfused=ffusion(zspeech,zgesture,ztouch,zgaze;θfusion),
Wherein z fused is the multi-modal feature representation after fusion, f fusion is the multi-modal fusion function, and θ fusion is the multi-modal fusion parameter.
The method for fusing the signal features further comprises the steps of analyzing depth information of a real scene and a three-dimensional structure of a virtual scene by using an AR engine to obtain environmental features, and fusing the environmental features with the signal features input by a user, wherein the depth information comprises object distances and object surface characteristics.
As a preferred scheme of the VR engine-based user experience interaction method, the invention comprises the steps of fusing signal characteristics, generating a user intention model by using a probability map model, predicting the operation purpose of a user,
Taking the multi-mode signal characteristic z fused in the step S1 as an input characteristic, extracting an environmental characteristic e by adopting an AR engine, wherein the e comprises depth information e depth of a real scene and a three-dimensional structure e 3D of a virtual scene, the depth information comprises the distance and surface characteristics of an object, and the three-dimensional structure is the shape and the spatial distribution of the object;
feature fusion is carried out, and a fusion formula is as follows:
zcombined=fattention(zfused,e;θfusion),
Wherein z combined is the feature after fusion, f attention is the cross-modal attention mechanism, and θ fusion is the feature fusion parameter;
Modeling a user intention model through a dynamic Bayesian network DBN, and calculating the posterior probability of the intention, wherein the calculation formula is as follows:
Where P (I|z combined) is the posterior probability of the user's intent I given the fused feature z combined, P (z combined |I) is the conditional probability of the feature's intent, P (I) is the prior probability of the intent, Σ i (·) represents normalization of all possible intentions I i.
As a preferable scheme of the VR engine-based user experience interaction method, the method comprises the steps of predicting the result according to a user intention model, driving a virtual scene by an AR engine to correspondingly adjust,
Positioning and adjusting the virtual object, dynamically updating the three-dimensional space position of the virtual object according to the target object in the user intention and the operation area prediction,
The update formula is:
pnew=pold+Δp,
Wherein, p new is the new position vector of the virtual object, p old is the original position vector of the virtual object, and Δp is the position change vector;
scaling adjustment of the virtual object is carried out, the proportion of the virtual object is adjusted according to the task target prediction,
The scaling formula is:
Snew=Sold·k,
Wherein S new is the updated scaling, S old is the original scaling, and k is the scaling factor;
performing a rotation adjustment of the virtual object, adjusting a direction of the virtual object by a rotation operation of a user,
The rotation formula is:
Rnew=Rold·R(θ,u),
Wherein R new is the updated rotation matrix, R old is the initial rotation matrix, R (θ, u) is the rotation matrix, which is composed of the rotation axis u and the rotation angle θ,
Optimizing the layout of display contents in the virtual scene according to the user intention model, wherein the optimization formula is as follows:
Wherein L opt is the optimized display layout, L is the candidate display layout, w i is the weight in the user intent model, n is the total number of contents, and D i (L) is the distance between the user interest object and the display content.
As a preferred scheme of the VR engine-based user experience interaction method of the present invention, the step of making corresponding adjustments further includes,
Recording and self-learning operation habit of user, dynamically regulating initial state of virtual scene, self-learning formula is p initial=E[puser,
Wherein, p user is the position distribution in the user operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback.
As a preferred scheme of the VR engine-based user experience interaction method, the method comprises the steps of performing depth fusion with a real environment through the light and shadow consistency calculation and physical characteristic simulation technology of an AR engine,
The AR engine captures illumination conditions in a real environment in real time, including the direction, the intensity and the color temperature of a light source, and renders shadows and highlights through a global illumination model so that the virtual object is consistent with the change of the shadows in the real environment;
The AR engine simulates physical interaction characteristics of a virtual object and a real environment;
The AR engine dynamically adjusts the expression form of the virtual object according to the environmental data collected in real time.
As a preferred scheme of the VR engine-based user experience interaction method, the feedback mechanism comprises the following steps:
Adjustment of signal capture and feature extraction parameters,
And adjusting the user intent model parameters based on the user repetitive behavior.
As a preferred scheme of the VR engine-based user experience interaction method, the step of updating to the new round of step S1 input signals by adopting a feedback mechanism is that,
Capturing voice and gesture signals, and carrying out corresponding parameter adjustment, wherein the adjustment formula is as follows:
θspeech,new=θspeech,old+Δθspeech,
θgesture,new=θgesture,old+Δθgesture,
Wherein θ speech,new and θ gesture,new are updated speech and gesture capture parameters, respectively, θ speech,old and θ gesture,old are original speech and gesture capture parameters, respectively, Δθ speech and Δθ gesture are delta parameters adjusted by user feedback,
And carrying out feature extraction model, wherein an optimization formula is as follows:
z′=z+Δzfeedback,
Wherein z' is an optimized feature vector, z is an original feature vector, and Δz feedback is an optimized increment introduced by a feedback mechanism;
Updating prior probabilities and conditional probabilities in the user intention model according to the repeated behaviors of the user and the historical operation modes,
Historical behavior modeling is carried out, and a model formula is as follows:
P(I)new=αPhistory(I)+(1-α)Pprior(I),
Wherein P (I) new is the updated prior probability of user intention, alpha is a historical weight factor, P history (I) represents the prior probability of user intention I based on user historical behavior statistics, P prior (I) represents the initial prior probability of preset user intention I,
Updating the conditional probability, wherein the updating formula is as follows:
P(z′|I)new=P(z′|I)+ΔPfeedback(z′|I),
Wherein,
P (Z ' |i) new is the conditional probability after optimization, P (Z ' } I) is the conditional probability before optimization, Δp feedback (Z ' } I) is the conditional probability increment based on user operation feedback,
The feedback mechanism not only adjusts the model parameters, but also adjusts the formula as follows:
θ′=θ+Δθfeedback,
Wherein θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ feedback is the parameter increment calculated by the feedback mechanism.
The invention has the beneficial effects that the characteristics of voice, gesture, touch control and sight line signals are extracted through a deep learning model, and the signals are fused by combining a cross-modal attention mechanism, so that the understanding capability of the user behavior is enhanced, and simultaneously, the position and the operation context of the user in the real environment are captured in real time by utilizing a synchronous positioning and mapping SLAM technology, so that the interaction is more natural and smooth; in addition, through modeling and predicting user intention of a dynamic Bayesian network, including interaction type, target object and task target, and optimizing a prediction model by combining real-time feedback and a self-learning mechanism, the AR engine gradually adapts to individual requirements of a user, in the aspect of virtual scene adjustment, the AR engine dynamically adjusts the position, scaling and rotation angle of the virtual object according to the user intention, simultaneously optimizes display content layout, records user operation habit by combining the self-learning mechanism, further reduces redundant operation and improves interaction efficiency;
According to the method, the signal capturing parameters, the characteristic extraction model and the intention prediction model are dynamically optimized through a feedback mechanism, a self-adaptive closed-loop optimization flow is constructed, and the adaptability to dynamic changes and the support capability of complex scenes are enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a VR engine-based user experience interaction method of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Embodiment 1, referring to fig. 1, this embodiment provides a VR engine-based user experience interaction method, including:
Step S1, a signal interface is arranged in an AR engine, user input signals including voice commands, gesture actions, touch operation and sight line changes are captured, and a deep learning model is adopted to extract characteristics of the input signals, so that signal characteristics are generated;
The AR engine provides an acquisition framework for synchronous multi-mode data, captures user input signals in real time, and positions the user in a real environment through synchronous positioning and mapping SLAM technology;
The deep learning model is adopted to extract the characteristics of the input signal, the step of generating the signal characteristics is that,
The method comprises the steps of encoding a voice waveform by using a transducer model, extracting semantic information, wherein the encoding formula is as follows:
zspeech=fspeech(xspeech;θspeech),
Wherein z speech is the extracted semantic feature vector, x speech is the input voice waveform signal, θ speech is the training parameter of the voice model, and f speech is the deep learning function for voice feature extraction;
capturing continuous gesture motion frames of a user, and modeling by combining a 3D convolution network CNN and a long-short-term memory network LSTM, wherein the capturing formula is as follows:
zgesture=fgesture(xgesture;θgesture),
Wherein z gesture is the high-dimensional feature representation of the gesture signal, x gesture is the input gesture video sequence, θ gesture is the training parameter of the gesture model, and f gesture is the gesture feature extraction model function;
the touch signal characteristics are extracted, including clicking positions and sliding tracks, and are encoded by using a multi-layer perceptron MLP, wherein the encoding formula is as follows:
ztouch=ftouch(xtouch;θtouch),
Wherein z touch is a feature vector of a touch signal, x touch is touch input data comprising a click position and a sliding track, θ touch is a touch model parameter, and f touch is a touch feature extraction function;
Extracting sight signal characteristics, processing sight track data by ResNet, extracting characteristics of a user gazing area, wherein an extraction formula is as follows:
zgaze=fgaze(xgaze;θgaze),
Wherein z gaze is the high-dimensional feature of the sight line signal, x gaze is the sight line track input data, θ gaze is the training parameter of the sight line model, and f gaze is the sight line feature extraction model function;
fusing the characteristics, wherein the fusion formula is as follows:
zfused=ffusion(zspeech,zgesture,ztouch,zgaze;θfusion),
Wherein z fused is a fused multi-modal feature representation, f fusion is a multi-modal fusion function, and θ fusion is a multi-modal fusion parameter;
Specifically, the characteristics of voice, gestures, touch and sight are extracted through the deep learning model, the multi-modal characteristics are fused by using a cross-modal attention mechanism, unified high-dimensional characteristic representation is generated, and the integrity and consistency of signal characteristics are ensured.
S2, fusing signal characteristics, generating a user intention model by using a probability map model, and predicting the operation purpose of a user, wherein the predicted content comprises an interaction type, a target object and a task target;
The method for fusing the signal features further comprises the steps of analyzing depth information of a real scene and a three-dimensional structure of a virtual scene by using an AR engine to obtain environmental features, and fusing the environmental features with the signal features input by a user;
The signal characteristics are fused, a probability map model is used for generating a user intention model, and the operation purpose of the user is predicted by the steps of,
Taking the multi-mode signal characteristic z fused in the step S1 as an input characteristic, extracting an environmental characteristic e by adopting an AR engine, wherein the e comprises depth information e depth of a real scene and a three-dimensional structure e 3D of a virtual scene, the depth information comprises the distance and surface characteristics of an object, and the three-dimensional structure is the shape and the spatial distribution of the object;
feature fusion is carried out, and a fusion formula is as follows:
zcombined=fattention(zfused,e;θfusion),
Wherein z combined is the feature after fusion, f attention is the cross-modal attention mechanism, and θ fusion is the feature fusion parameter;
Modeling a user intention model through a dynamic Bayesian network DBN, and calculating the posterior probability of the intention, wherein the calculation formula is as follows:
Where P (I|z combined) is the posterior probability of the user intent I given the fused feature z combined, P (z combined |I) is the conditional probability of the feature given intent, P (I) is the prior probability of intent, Σ i (·) represents normalizing all possible intentions I i;
Specifically, the interaction type, the target object and the task target of the user are predicted in real time by fusing the multi-mode signal characteristics and the environment characteristics and utilizing dynamic Bayesian network modeling.
Step S3, driving the virtual scene to correspondingly adjust by the AR engine according to the prediction result of the user intention model, wherein the adjustment comprises positioning, scaling or rotation of the virtual object and adaptive optimization of display content;
according to the prediction result of the user intention model, the AR engine drives the virtual scene to carry out corresponding adjustment,
Positioning and adjusting the virtual object, dynamically updating the three-dimensional space position of the virtual object according to the target object in the user intention and the operation area prediction,
The update formula is:
pnew=pold+Δp,
Wherein, p new is the new position vector of the virtual object, p old is the original position vector of the virtual object, and Δp is the position change vector;
scaling adjustment of the virtual object is carried out, the proportion of the virtual object is adjusted according to the task target prediction,
The scaling formula is:
Snew=Sold·k,
Wherein S new is the updated scaling, S old is the original scaling, and k is the scaling factor;
performing a rotation adjustment of the virtual object, adjusting a direction of the virtual object by a rotation operation of a user,
The rotation formula is:
Rnew=Rold·R(θ,u),
Wherein R new is the updated rotation matrix, R old is the initial rotation matrix, R (θ, u) is the rotation matrix, which is composed of the rotation axis u and the rotation angle θ,
Optimizing the layout of display contents in the virtual scene according to the user intention model, wherein the optimization formula is as follows:
Wherein L opt is the optimized display layout, L is the candidate display layout, w i is the weight in the user intention model, n is the total number of contents, D i (L) is the distance between the user attention target and the display contents,
The step of making a corresponding adjustment may further comprise,
Recording and self-learning operation habit of user, dynamically regulating initial state of virtual scene, self-learning formula is p initial=E[puser,
Wherein, p user is the position distribution in the user operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback;
specifically, through positioning, scaling, rotation and display content optimization, the AR engine can dynamically adjust virtual scenes, respond to the effect of user intention accurately, introduce a self-learning mechanism, and optimize initial states and response logic after multiple interactions of users.
S4, the adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine, and meanwhile, the environment light change is dynamically adapted to obtain a final interaction effect;
The adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine,
The AR engine captures illumination conditions in the real environment in real time, including the direction, intensity and color temperature of a light source, and renders shadows and highlights through the global illumination model so that the virtual object is consistent with the change of the shadows in the real environment;
The AR engine simulates physical interaction characteristics of the virtual object and the real environment;
The AR engine dynamically adjusts the expression form of the virtual object according to the environmental data acquired in real time;
Specifically, through light and shadow consistency calculation, physical characteristic simulation and dynamic environment adaptation, the depth fusion of the virtual scene and the real environment is realized, and the immersion feeling and interaction naturalness of a user are enhanced.
Step S5, outputting and feeding back the final interaction effect to the user, re-capturing the further operation of the user, and updating to the input signal of the step S1 of a new round by adopting a feedback mechanism;
The feedback mechanism includes:
Adjustment of signal capture and feature extraction parameters,
Adjusting user intent model parameters based on user repetitive behaviors,
The step of updating the input signal to the new round of step S1 using the feedback mechanism is,
Capturing voice and gesture signals, and carrying out corresponding parameter adjustment, wherein the adjustment formula is as follows:
θspeech,new=θspeech,old+Δθspeech,
θgesture,new=θgesture,old+Δθgesture,
Wherein θ speech,new and θ gesture,new are updated speech and gesture capture parameters, respectively, θ speech,old and θ gesture,old are original speech and gesture capture parameters, respectively, Δθ speech and Δθ gesture are delta parameters adjusted by user feedback,
And carrying out feature extraction model, wherein an optimization formula is as follows:
z′=z+Δzfeedback,
Wherein z' is an optimized feature vector, z is an original feature vector, and Δz feedback is an optimized increment introduced by a feedback mechanism;
Updating prior probabilities and conditional probabilities in the user intention model according to the repeated behaviors of the user and the historical operation modes,
Historical behavior modeling is carried out, and a model formula is as follows:
P(I)new=αPhistory(I)+(1-α)Pprior(I),
Wherein P (I) new is the updated prior probability of user intention, alpha is a historical weight factor, P history (I) represents the prior probability of user intention I based on user historical behavior statistics, P prior (I) represents the initial prior probability of preset user intention I,
Updating the conditional probability, wherein the updating formula is as follows:
P(z′|I)new=P(z′|I)+ΔPfeedback(z′|I),
Wherein,
P (Z ' |i) new is the conditional probability after optimization, P (Z ' } I) is the conditional probability before optimization, Δp feedback (Z ' } I) is the conditional probability increment based on user operation feedback,
The feedback mechanism not only adjusts the model parameters, but also adjusts the formula as follows:
θ′=θ+Δθfeedback,
wherein θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ feedback is the parameter increment calculated by the feedback mechanism;
Specifically, through a feedback mechanism, self-adaptive optimization is realized in each link of signal capturing, feature extraction and intention modeling, and the adaptive capacity to dynamic change is gradually improved.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411830551.0A CN119620891A (en) | 2024-12-12 | 2024-12-12 | A user experience interaction method based on VR engine |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411830551.0A CN119620891A (en) | 2024-12-12 | 2024-12-12 | A user experience interaction method based on VR engine |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN119620891A true CN119620891A (en) | 2025-03-14 |
Family
ID=94906819
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411830551.0A Pending CN119620891A (en) | 2024-12-12 | 2024-12-12 | A user experience interaction method based on VR engine |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119620891A (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112424727A (en) * | 2018-05-22 | 2021-02-26 | 奇跃公司 | Cross-modal input fusion for wearable systems |
| CN114691839A (en) * | 2020-12-30 | 2022-07-01 | 华为技术有限公司 | Intention slot position identification method |
| US20230401976A1 (en) * | 2022-06-12 | 2023-12-14 | The Travelers Indemnity Company | Systems and methods for artificial intelligence (ai) virtual reality (vr) emotive conversation training |
| CN118379403A (en) * | 2024-03-12 | 2024-07-23 | 南阳印迹影视文化传媒有限公司 | Emotion symbiotic communication system |
| CN118708085A (en) * | 2024-08-29 | 2024-09-27 | 杭州川核灵境科技有限公司 | A virtual scene control method and system based on MR large space |
-
2024
- 2024-12-12 CN CN202411830551.0A patent/CN119620891A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112424727A (en) * | 2018-05-22 | 2021-02-26 | 奇跃公司 | Cross-modal input fusion for wearable systems |
| CN114691839A (en) * | 2020-12-30 | 2022-07-01 | 华为技术有限公司 | Intention slot position identification method |
| US20230401976A1 (en) * | 2022-06-12 | 2023-12-14 | The Travelers Indemnity Company | Systems and methods for artificial intelligence (ai) virtual reality (vr) emotive conversation training |
| CN118379403A (en) * | 2024-03-12 | 2024-07-23 | 南阳印迹影视文化传媒有限公司 | Emotion symbiotic communication system |
| CN118708085A (en) * | 2024-08-29 | 2024-09-27 | 杭州川核灵境科技有限公司 | A virtual scene control method and system based on MR large space |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11163987B2 (en) | Compact language-free facial expression embedding and novel triplet training scheme | |
| US20240353918A1 (en) | Machine interaction | |
| CN110622109A (en) | Computer animation based on natural language | |
| TW202238532A (en) | Three-dimensional face animation from speech | |
| CN113255457A (en) | Animation character facial expression generation method and system based on facial expression recognition | |
| CN118379403A (en) | Emotion symbiotic communication system | |
| CN119600159B (en) | Interactive digital person generation method and system based on artificial intelligence | |
| CN119229336A (en) | A method for building an efficient sign language translation system based on deep learning | |
| CN118820787A (en) | A virtual population generation training method, system, medium and program product | |
| Shao et al. | Computer vision-driven gesture recognition: Toward natural and intuitive human-computer | |
| Kavitha et al. | Advancing Human-Computer Interaction: Real-Time Gesture Recognition and Language Generation Using CNN-LSTM Networks | |
| Li et al. | Pose-aware 3D talking face synthesis using geometry-guided audio-vertices attention | |
| CN120852604B (en) | Meta-universe digital person generation method and system based on deep learning | |
| US20250299426A1 (en) | Interactivity and generative rendering for virtual and wearable display systems | |
| US20250259362A1 (en) | Prompt editor for use with a visual media generative response engine | |
| CN119620891A (en) | A user experience interaction method based on VR engine | |
| CN119027557A (en) | Speech-driven 3D facial animation generation method and device based on reinforcement learning | |
| EP4345755A1 (en) | Expression transfer to stylized avatars | |
| Zhang | Computer animation interaction design driven by neural network algorithm | |
| CN119474956A (en) | A method for intelligently controlling digital human using action tags | |
| Shao et al. | Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer Interfaces | |
| WO2022164849A1 (en) | Deep relightable appearance models for animatable face avatars | |
| US20250165060A1 (en) | Determining Body Pose From Environmental Data | |
| US20250259272A1 (en) | Blending user interface for blending visual media using a visual media generative response engine | |
| WO2025166188A1 (en) | System/ method for generative body, gesture, and facial expression in 3d characters |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |