CN119620891A

CN119620891A - A user experience interaction method based on VR engine

Info

Publication number: CN119620891A
Application number: CN202411830551.0A
Authority: CN
Inventors: 郭建伟
Original assignee: New Axis Animation Technology Development Beijing Co ltd
Current assignee: New Axis Animation Technology Development Beijing Co ltd
Priority date: 2024-12-12
Filing date: 2024-12-12
Publication date: 2025-03-14

Abstract

The invention discloses a user experience interaction method based on a VR engine, which relates to the technical field of interaction experience, and comprises the steps of extracting characteristics of voice, gesture, touch and sight signals through a deep learning model, carrying out signal fusion by combining a cross-modal attention mechanism, enhancing the understanding capability of user behaviors, capturing the position and the operation context of a user in a real environment in real time by utilizing a synchronous positioning and mapping SLAM technology to enable interaction to be more natural and smooth, and realizing the deep fusion of a virtual object and the real environment by combining an AR engine with a light and shadow consistency calculation technology, a physical simulation technology and a dynamic environment adaptation, so that the problem of virtual and real rupture in the traditional method is solved, the immersion of interaction is greatly improved, and in addition, the user intention including the interaction type, a target object and a task target is predicted through dynamic Bayesian network modeling, and the prediction model is optimized by combining a real-time feedback and a self-learning mechanism, so that the method is gradually adapted to the personalized requirements of the user.

Description

VR engine-based user experience interaction method

Technical Field

The invention relates to the technical field of interaction experience, in particular to a user experience interaction method based on a VR engine.

Background

The conventional interaction experience mainly takes a graph as a user interface as a core, a user finishes operation in clicking, sliding, voice input and other modes, the interaction mode is simple and direct, and along with the complexity of user requirements, the conventional interaction mode cannot meet the user experience requirements under the condition of multi-mode and virtual scenes, mobile equipment can be controlled through voice commands, and the operation of a mouse or a keyboard is still relied on the desktop equipment, so that the interaction experience is too split.

In order to solve the problems, a part of interaction schemes are introduced into a voice assistant to process natural language instructions, the naturalness of operation is improved by adopting a gesture recognition technology, user experience is optimized to a certain extent, so that interaction between a person and equipment is closer to nature, in addition, the part of schemes are introduced into augmented reality AR and virtual reality VR, virtual-real interaction effects are optimized through virtual object rendering and ambient light adaptation, more immersive experience is brought to users, however, the improvement schemes still cannot thoroughly solve the problem of insufficient interaction fusion, users still need to frequently switch between different input modes, real seamless cross-mode interaction cannot be realized, in addition, the current interaction schemes still have defects in adaptability and dynamic optimizing capability, and the interaction mode and display content cannot be dynamically adjusted according to real-time behaviors of the users, so that a fusion interaction method with higher layers and a dynamic adaptation based on VR engine user experience interaction method are needed to solve the problems.

Disclosure of Invention

The present invention has been made in view of the above-described problems occurring in the prior art.

The invention provides a VR engine-based user experience interaction method for solving the problem of insufficient interaction fusion, and the problem that a user still needs to switch between different input modes instead of really realizing seamless cross-mode interaction.

In order to solve the technical problems, the invention provides the following technical scheme:

the embodiment of the invention provides a user experience interaction method based on a VR engine, which comprises the following steps,

Step S1, a signal interface is arranged in an AR engine, user input signals including voice commands, gesture actions, touch operation and sight line changes are captured, and a deep learning model is adopted to extract characteristics of the input signals, so that signal characteristics are generated;

S2, fusing signal characteristics, generating a user intention model by using a probability map model, and predicting the operation purpose of a user, wherein the predicted content comprises an interaction type, a target object and a task target;

Step S3, driving the virtual scene to correspondingly adjust by the AR engine according to the prediction result of the user intention model, wherein the adjustment comprises positioning, scaling or rotation of the virtual object and adaptive optimization of display content;

S4, the adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine, and meanwhile, the environment light change is dynamically adapted to obtain a final interaction effect;

And S5, outputting and feeding back the final interaction effect to the user, re-capturing the further operation of the user, and updating to the input signal of the step S1 of a new round by adopting a feedback mechanism.

As a preferred scheme of the VR engine-based user experience interaction method, the AR engine provides a synchronous multi-mode data acquisition framework, captures user input signals in real time and locates the position of a user in a real environment through synchronous locating and mapping SLAM technology.

As a preferred scheme of the VR engine-based user experience interaction method, the invention adopts a deep learning model to extract characteristics of input signals, generates signal characteristics,

The method comprises the steps of encoding a voice waveform by using a transducer model, extracting semantic information, wherein the encoding formula is as follows:

z_speech＝f_speech(x_speech;θ_speech),

Wherein z _speech is the extracted semantic feature vector, x _speech is the input voice waveform signal, θ _speech is the training parameter of the voice model, and f _speech is the deep learning function for voice feature extraction;

capturing continuous gesture motion frames of a user, and modeling by combining a 3D convolution network CNN and a long-short-term memory network LSTM, wherein the capturing formula is as follows:

z_gesture＝f_gesture(x_gesture;θ_gesture),

Wherein z _gesture is the high-dimensional feature representation of the gesture signal, x _gesture is the input gesture video sequence, θ _gesture is the training parameter of the gesture model, and f _gesture is the gesture feature extraction model function;

the touch signal characteristics are extracted, including clicking positions and sliding tracks, and are encoded by using a multi-layer perceptron MLP, wherein the encoding formula is as follows:

z_touch＝f_touch(x_touch;θ_touch),

Wherein z _touch is a feature vector of a touch signal, x _touch is touch input data comprising a click position and a sliding track, θ _touch is a touch model parameter, and f _touch is a touch feature extraction function;

Extracting sight signal characteristics, processing sight track data by ResNet, extracting characteristics of a user gazing area, wherein an extraction formula is as follows:

z_gaze＝f_gaze(x_gaze;θ_gaze),

Wherein z _gaze is the high-dimensional feature of the sight line signal, x _gaze is the sight line track input data, θ _gaze is the training parameter of the sight line model, and f _gaze is the sight line feature extraction model function;

fusing the characteristics, wherein the fusion formula is as follows:

z_fused＝f_fusion(z_speech,z_gesture,z_touch,z_gaze;θ_fusion),

Wherein z _fused is the multi-modal feature representation after fusion, f _fusion is the multi-modal fusion function, and θ _fusion is the multi-modal fusion parameter.

The method for fusing the signal features further comprises the steps of analyzing depth information of a real scene and a three-dimensional structure of a virtual scene by using an AR engine to obtain environmental features, and fusing the environmental features with the signal features input by a user, wherein the depth information comprises object distances and object surface characteristics.

As a preferred scheme of the VR engine-based user experience interaction method, the invention comprises the steps of fusing signal characteristics, generating a user intention model by using a probability map model, predicting the operation purpose of a user,

Taking the multi-mode signal characteristic z _fused in the step S1 as an input characteristic, extracting an environmental characteristic e by adopting an AR engine, wherein the e comprises depth information e _depth of a real scene and a three-dimensional structure e _3D of a virtual scene, the depth information comprises the distance and surface characteristics of an object, and the three-dimensional structure is the shape and the spatial distribution of the object;

feature fusion is carried out, and a fusion formula is as follows:

z_combined＝f_attention(z_fused,e;θ_fusion),

Wherein z _combined is the feature after fusion, f _attention is the cross-modal attention mechanism, and θ _fusion is the feature fusion parameter;

Modeling a user intention model through a dynamic Bayesian network DBN, and calculating the posterior probability of the intention, wherein the calculation formula is as follows:

Where P (I|z _combined) is the posterior probability of the user's intent I given the fused feature z _combined, P (z _combined |I) is the conditional probability of the feature's intent, P (I) is the prior probability of the intent, Σ _i (·) represents normalization of all possible intentions I _i.

As a preferable scheme of the VR engine-based user experience interaction method, the method comprises the steps of predicting the result according to a user intention model, driving a virtual scene by an AR engine to correspondingly adjust,

Positioning and adjusting the virtual object, dynamically updating the three-dimensional space position of the virtual object according to the target object in the user intention and the operation area prediction,

The update formula is:

p_new＝p_old+Δp,

Wherein, p _new is the new position vector of the virtual object, p _old is the original position vector of the virtual object, and Δp is the position change vector;

scaling adjustment of the virtual object is carried out, the proportion of the virtual object is adjusted according to the task target prediction,

The scaling formula is:

S_new＝S_old·k,

Wherein S _new is the updated scaling, S _old is the original scaling, and k is the scaling factor;

performing a rotation adjustment of the virtual object, adjusting a direction of the virtual object by a rotation operation of a user,

The rotation formula is:

R_new＝R_old·R(θ,u),

Wherein R _new is the updated rotation matrix, R _old is the initial rotation matrix, R (θ, u) is the rotation matrix, which is composed of the rotation axis u and the rotation angle θ,

Optimizing the layout of display contents in the virtual scene according to the user intention model, wherein the optimization formula is as follows:

Wherein L _opt is the optimized display layout, L is the candidate display layout, w _i is the weight in the user intent model, n is the total number of contents, and D _i (L) is the distance between the user interest object and the display content.

As a preferred scheme of the VR engine-based user experience interaction method of the present invention, the step of making corresponding adjustments further includes,

Recording and self-learning operation habit of user, dynamically regulating initial state of virtual scene, self-learning formula is p _initial＝E[p_user,

Wherein, p _user is the position distribution in the user operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback.

As a preferred scheme of the VR engine-based user experience interaction method, the method comprises the steps of performing depth fusion with a real environment through the light and shadow consistency calculation and physical characteristic simulation technology of an AR engine,

The AR engine captures illumination conditions in a real environment in real time, including the direction, the intensity and the color temperature of a light source, and renders shadows and highlights through a global illumination model so that the virtual object is consistent with the change of the shadows in the real environment;

The AR engine simulates physical interaction characteristics of a virtual object and a real environment;

The AR engine dynamically adjusts the expression form of the virtual object according to the environmental data collected in real time.

As a preferred scheme of the VR engine-based user experience interaction method, the feedback mechanism comprises the following steps:

Adjustment of signal capture and feature extraction parameters,

And adjusting the user intent model parameters based on the user repetitive behavior.

As a preferred scheme of the VR engine-based user experience interaction method, the step of updating to the new round of step S1 input signals by adopting a feedback mechanism is that,

Capturing voice and gesture signals, and carrying out corresponding parameter adjustment, wherein the adjustment formula is as follows:

θ_speech,new＝θ_speech,old+Δθ_speech,

θ_gesture,new＝θ_gesture,old+Δθ_gesture,

Wherein θ _speech,new and θ _gesture,new are updated speech and gesture capture parameters, respectively, θ _speech,old and θ _gesture,old are original speech and gesture capture parameters, respectively, Δθ _speech and Δθ _gesture are delta parameters adjusted by user feedback,

And carrying out feature extraction model, wherein an optimization formula is as follows:

z′=z+Δz_feedback,

Wherein z' is an optimized feature vector, z is an original feature vector, and Δz _feedback is an optimized increment introduced by a feedback mechanism;

Updating prior probabilities and conditional probabilities in the user intention model according to the repeated behaviors of the user and the historical operation modes,

Historical behavior modeling is carried out, and a model formula is as follows:

P(I)_new＝αP_history(I)+(1-α)P_prior(I),

Wherein P (I) _new is the updated prior probability of user intention, alpha is a historical weight factor, P _history (I) represents the prior probability of user intention I based on user historical behavior statistics, P _prior (I) represents the initial prior probability of preset user intention I,

Updating the conditional probability, wherein the updating formula is as follows:

P(z′|I)_new＝P(z′|I)+ΔP_feedback(z′|I),

Wherein,

P (Z ' |i) _new is the conditional probability after optimization, P (Z ' } I) is the conditional probability before optimization, Δp _feedback (Z ' } I) is the conditional probability increment based on user operation feedback,

The feedback mechanism not only adjusts the model parameters, but also adjusts the formula as follows:

θ′=θ+Δθ_feedback,

Wherein θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ _feedback is the parameter increment calculated by the feedback mechanism.

The invention has the beneficial effects that the characteristics of voice, gesture, touch control and sight line signals are extracted through a deep learning model, and the signals are fused by combining a cross-modal attention mechanism, so that the understanding capability of the user behavior is enhanced, and simultaneously, the position and the operation context of the user in the real environment are captured in real time by utilizing a synchronous positioning and mapping SLAM technology, so that the interaction is more natural and smooth; in addition, through modeling and predicting user intention of a dynamic Bayesian network, including interaction type, target object and task target, and optimizing a prediction model by combining real-time feedback and a self-learning mechanism, the AR engine gradually adapts to individual requirements of a user, in the aspect of virtual scene adjustment, the AR engine dynamically adjusts the position, scaling and rotation angle of the virtual object according to the user intention, simultaneously optimizes display content layout, records user operation habit by combining the self-learning mechanism, further reduces redundant operation and improves interaction efficiency;

According to the method, the signal capturing parameters, the characteristic extraction model and the intention prediction model are dynamically optimized through a feedback mechanism, a self-adaptive closed-loop optimization flow is constructed, and the adaptability to dynamic changes and the support capability of complex scenes are enhanced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a VR engine-based user experience interaction method of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Embodiment 1, referring to fig. 1, this embodiment provides a VR engine-based user experience interaction method, including:

The AR engine provides an acquisition framework for synchronous multi-mode data, captures user input signals in real time, and positions the user in a real environment through synchronous positioning and mapping SLAM technology;

The deep learning model is adopted to extract the characteristics of the input signal, the step of generating the signal characteristics is that,

z_speech＝f_speech(x_speech;θ_speech),

z_gesture＝f_gesture(x_gesture;θ_gesture),

z_touch＝f_touch(x_touch;θ_touch),

z_gaze＝f_gaze(x_gaze;θ_gaze),

fusing the characteristics, wherein the fusion formula is as follows:

z_fused＝f_fusion(z_speech,z_gesture,z_touch,z_gaze;θ_fusion),

Wherein z _fused is a fused multi-modal feature representation, f _fusion is a multi-modal fusion function, and θ _fusion is a multi-modal fusion parameter;

Specifically, the characteristics of voice, gestures, touch and sight are extracted through the deep learning model, the multi-modal characteristics are fused by using a cross-modal attention mechanism, unified high-dimensional characteristic representation is generated, and the integrity and consistency of signal characteristics are ensured.

The method for fusing the signal features further comprises the steps of analyzing depth information of a real scene and a three-dimensional structure of a virtual scene by using an AR engine to obtain environmental features, and fusing the environmental features with the signal features input by a user;

The signal characteristics are fused, a probability map model is used for generating a user intention model, and the operation purpose of the user is predicted by the steps of,

feature fusion is carried out, and a fusion formula is as follows:

z_combined＝f_attention(z_fused,e;θ_fusion),

Where P (I|z _combined) is the posterior probability of the user intent I given the fused feature z _combined, P (z _combined |I) is the conditional probability of the feature given intent, P (I) is the prior probability of intent, Σ _i (·) represents normalizing all possible intentions I _i;

Specifically, the interaction type, the target object and the task target of the user are predicted in real time by fusing the multi-mode signal characteristics and the environment characteristics and utilizing dynamic Bayesian network modeling.

according to the prediction result of the user intention model, the AR engine drives the virtual scene to carry out corresponding adjustment,

The update formula is:

p_new＝p_old+Δp,

The scaling formula is:

S_new＝S_old·k,

The rotation formula is:

R_new＝R_old·R(θ,u),

Wherein L _opt is the optimized display layout, L is the candidate display layout, w _i is the weight in the user intention model, n is the total number of contents, D _i (L) is the distance between the user attention target and the display contents,

The step of making a corresponding adjustment may further comprise,

Wherein, p _user is the position distribution in the user operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback;

specifically, through positioning, scaling, rotation and display content optimization, the AR engine can dynamically adjust virtual scenes, respond to the effect of user intention accurately, introduce a self-learning mechanism, and optimize initial states and response logic after multiple interactions of users.

The adjusted virtual scene is subjected to depth fusion with the real environment through the light and shadow consistency calculation and physical characteristic simulation technology of the AR engine,

The AR engine captures illumination conditions in the real environment in real time, including the direction, intensity and color temperature of a light source, and renders shadows and highlights through the global illumination model so that the virtual object is consistent with the change of the shadows in the real environment;

The AR engine simulates physical interaction characteristics of the virtual object and the real environment;

The AR engine dynamically adjusts the expression form of the virtual object according to the environmental data acquired in real time;

Specifically, through light and shadow consistency calculation, physical characteristic simulation and dynamic environment adaptation, the depth fusion of the virtual scene and the real environment is realized, and the immersion feeling and interaction naturalness of a user are enhanced.

Step S5, outputting and feeding back the final interaction effect to the user, re-capturing the further operation of the user, and updating to the input signal of the step S1 of a new round by adopting a feedback mechanism;

The feedback mechanism includes:

Adjustment of signal capture and feature extraction parameters,

Adjusting user intent model parameters based on user repetitive behaviors,

The step of updating the input signal to the new round of step S1 using the feedback mechanism is,

θ_speech,new＝θ_speech,old+Δθ_speech,

θ_gesture,new＝θ_gesture,old+Δθ_gesture,

z′=z+Δz_feedback,

Historical behavior modeling is carried out, and a model formula is as follows:

P(I)_new＝αP_history(I)+(1-α)P_prior(I),

P(z′|I)_new＝P(z′|I)+ΔP_feedback(z′|I),

Wherein,

θ′=θ+Δθ_feedback,

wherein θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ _feedback is the parameter increment calculated by the feedback mechanism;

Specifically, through a feedback mechanism, self-adaptive optimization is realized in each link of signal capturing, feature extraction and intention modeling, and the adaptive capacity to dynamic change is gradually improved.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. A user experience interaction method based on a VR engine, characterized by: comprising:

Step S1, deploying a signal interface in the AR engine to capture user input signals, including voice commands, gestures, touch operations, and line of sight changes, and using a deep learning model to extract features from the input signals to generate signal features;

Step S2, integrating signal features, using a probabilistic graph model to generate a user intention model, and predicting the user's operation purpose. The prediction content includes: interaction type, target object, and task goal;

Step S3, based on the prediction result of the user intention model, the AR engine drives the virtual scene to make corresponding adjustments, including positioning, scaling or rotation of virtual objects, and adaptive optimization of displayed content;

Step S4, the adjusted virtual scene is deeply integrated with the real environment through the light and shadow consistency calculation and physical property simulation technology of the AR engine, and dynamically adapts to the changes in ambient light to obtain the final interactive effect;

In step S5, the final interaction effect is output and fed back to the user, and the user's further operations are recaptured and updated to a new round of step S1 input signals using a feedback mechanism.

2. A user experience interaction method based on a VR engine as described in claim 1, characterized in that: the AR engine provides a synchronous multimodal data acquisition framework, captures user input signals in real time, and locates the user's position in the real environment through synchronous positioning and mapping SLAM technology.

3. A user experience interaction method based on a VR engine as claimed in claim 2, characterized in that: the step of extracting features of the input signal using a deep learning model to generate signal features is:

The Transformer model is used to encode the speech waveform and extract semantic information. The encoding formula is:

z _speech = f _speech (x _speech ; θ _speech ),

Among them, z _speech is the extracted semantic feature vector, x _speech is the input speech waveform signal, θ _speech is the training parameter of the speech model, and f _speech is the deep learning function used for speech feature extraction;

Capture the user's continuous gesture action frames, combine the 3D convolutional network CNN and the long short-term memory network LSTM for modeling, and the capture formula is:

z _gesture = f _gesture (x _gesture ; θ _gesture ),

Where z _gesture is the high-dimensional feature representation of the gesture signal, x _gesture is the input gesture video sequence, θ _gesture is the training parameter of the gesture model, and f _gesture is the gesture feature extraction model function;

Extract touch signal features, including click position and sliding trajectory, and encode them using multi-layer perceptron MLP. The encoding formula is:

z _touch = f _touch (x _touch ; θ _touch ),

Wherein, z _touch is the feature vector of the touch signal, x _touch is the touch input data, including the click position and sliding trajectory, θ _touch is the touch model parameter, and f _touch is the touch feature extraction function;

Extract the sight signal features, use ResNet to process the sight trajectory data, and extract the features of the user's gaze area. The extraction formula is:

z _gaze = f _gaze (x _gaze ; θ _gaze ),

Among them, z _gaze is the high-dimensional feature of the gaze signal, x _gaze is the gaze trajectory input data, θ _gaze is the training parameter of the gaze model, and f _gaze is the gaze feature extraction model function;

The feature fusion formula is:

z _fused = f _fusion (z _speech , z _gesture , z _touch , z _gaze ; θ _fusion ),

Among them, z _fused is the fused multimodal feature representation, f _fusion is the multimodal fusion function, and θ _fusion is the multimodal fusion parameter.

4. A user experience interaction method based on a VR engine as described in claim 3, characterized in that: the method of fusing signal features also includes using an AR engine to analyze the depth information of the real scene and the three-dimensional structure of the virtual scene to obtain environmental features, and fusing the environmental features with the signal features input by the user; the depth information includes the distance of the object and the surface characteristics of the object.

5. A user experience interaction method based on a VR engine as claimed in claim 4, characterized in that: the step of fusing signal features, using a probabilistic graphical model to generate a user intention model, and predicting the user's operation purpose is as follows:

Taking the multimodal signal feature z _fused in step S1 as input feature, an AR engine is used to extract environmental feature e, where e includes depth information e _depth of the real scene and three-dimensional structure e _3D of the virtual scene. The depth information includes the distance and surface characteristics of the object, and the three-dimensional structure is the shape and spatial distribution of the object.

Perform feature fusion, the fusion formula is:

z _combined = f _attention (z _fused ,e; θ _fusion ),

Among them, z _combined is the fused feature, f _attention is the cross-modal attention mechanism, and θ _fusion is the parameter of feature fusion;

The user intention model is modeled through the dynamic Bayesian network DBN, and the posterior probability of the intention is calculated. The calculation formula is:

Where P(I|z _combined ) is the posterior probability of the user intention I given the fused feature z _combined , P(z _combined |I) is the conditional probability of the intention given by the feature, P(I) is the prior probability of the intention, and ∑ _i (·) represents the normalization of all possible intentions I _i .

6. A user experience interaction method based on a VR engine as claimed in claim 5, characterized in that: the step of driving the virtual scene to make corresponding adjustments based on the prediction result of the user intention model by the AR engine is:

Adjust the positioning of virtual objects, dynamically update the three-dimensional spatial position of virtual objects based on the target objects and operation area predictions in the user's intention,

The update formula is:

_pnew ＝ _pold +Δp，

Where p _new is the new position vector of the virtual object, p _old is the original position vector of the virtual object, and Δp is the position change vector;

Perform scaling adjustments on virtual objects and adjust the scale of virtual objects based on task target predictions.

The scaling formula is:

S _new = S _old ·k,

Where S _new is the updated scaling ratio, S _old is the original scaling ratio, and k is the scaling factor;

Perform rotation adjustment of the virtual object, and adjust the direction of the virtual object through the user's rotation operation.

The rotation formula is:

_Rnew ＝ _Rold ·R(θ,u),

Among them, R _new is the updated rotation matrix, R _old is the initial rotation matrix, and R (θ, u) is the rotation matrix, which consists of the rotation axis u and the rotation angle θ.

The display content layout in the virtual scene is optimized according to the user intention model. The optimization formula is:

Where L _opt is the optimized display layout, L is the candidate display layout, _wi is the weight in the user intention model, n is the total number of contents, and _Di (L) is the distance between the user's attention target and the displayed content.

7. A method for user experience interaction based on a VR engine as claimed in claim 6, characterized in that: the step of making corresponding adjustments also includes:

Record and self-learn the user's operating habits, dynamically adjust the initial state of the virtual scene, and the self-learning formula is: p _initial = E[p _user ],

Among them, is the initial position, p _user is the position distribution in the user's operation history, E is the expected value calculation, and the adjusted virtual scene is transmitted to the user through visual and tactile feedback.

8. A user experience interaction method based on a VR engine as claimed in claim 7, characterized in that: the adjusted virtual scene is deeply integrated with the real environment through the light and shadow consistency calculation and physical property simulation technology of the AR engine,

The AR engine captures the lighting conditions in the real environment in real time, including the direction, intensity and color temperature of the light source, and renders shadows and highlights through a global illumination model, so that the virtual objects are consistent with the light and shadow changes in the real environment;

The AR engine simulates the physical interaction characteristics of virtual objects and real environment;

The AR engine dynamically adjusts the presentation of virtual objects based on the environmental data collected in real time.

9. A user experience interaction method based on a VR engine as claimed in claim 8, characterized in that: the feedback mechanism comprises:

Adjustment of signal capture and feature extraction parameters,

and adjusting the user intent model parameters based on user repetitive behaviors.

10. A user experience interaction method based on a VR engine as claimed in claim 9, characterized in that: the step of updating the input signal of step S1 to a new round using a feedback mechanism is:

Capture voice and gesture signals and adjust the corresponding parameters. The adjustment formula is:

θ _speech,new =θ _speech,old +Δθ _speech ,

θ _{gesture, new} = θ _{gesture, old} + Δθ _gesture ,

Among them, θ _speech,new and θ _gesture,new are the updated speech and gesture capture parameters respectively, θ _speech,old and θ _gesture,old are the original speech and gesture capture parameters respectively, Δθ _speech and Δθ _gesture are the incremental parameters adjusted by user feedback,

The optimization formula for the feature extraction model is:

z′＝z+Δz _feedback ，

Where z' is the optimized eigenvector, z is the original eigenvector, and Δz _feedback is the optimization increment introduced by the feedback mechanism;

Update the prior probability and conditional probability in the user intention model based on the user's repeated behavior and historical operation patterns.

To model historical behavior, the model formula is:

P(I) _new =αP _history (I)+(1-α)P _prior (I),

Wherein, P(I) _new is the updated prior probability of user intention, α is the history weight factor, P _history (I) represents the prior probability of user intention I based on the user's historical behavior statistics, and P _prior (I) represents the preset initial prior probability of user intention I.

Update the conditional probability, the update formula is:

P(z′|I) _new =P(z′|I)+ΔP _feedback (z′|I),

in,

P(z'|I) _new is the conditional probability after optimization, P(z'|I) is the conditional probability before optimization, ΔP _feedback (z'|I is the conditional probability increment based on user operation feedback,

The feedback mechanism not only adjusts the model parameters, but also the adjustment formula is:

θ′＝θ+Δθ _feedback ，

Among them, θ' is the global parameter set after optimization, θ is the global parameter set before optimization, and Δθ _feedback is the parameter increment calculated by the feedback mechanism.