CN119474415A

CN119474415A - Data processing method and device

Info

Publication number: CN119474415A
Application number: CN202411533513.9A
Authority: CN
Inventors: 叶桂
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2024-10-30
Filing date: 2024-10-30
Publication date: 2025-02-18

Abstract

The present application discloses a data processing method and a device thereof, belonging to the field of data processing technology. The method comprises acquiring multimodal behavior data; determining behavior description data of user behavior according to the multimodal behavior data; screening first multimodal behavior data associated with the behavior description data from the multimodal behavior data; and generating user memory data corresponding to the user behavior based on the behavior description data and the first multimodal behavior data.

Description

Data processing method and device

Technical Field

The application belongs to the technical field of data processing, and particularly relates to a data processing method and a device thereof.

Background

With the rapid development of electronic device technology and mobile internet, more and more users can manually edit information of important life experiences such as gathering with friends or family, traveling, watching performances and the like through various applications in the electronic device, and hope to recall the details of the time from past records when looking back at a future day.

However, the coverage of the information of these important life experiences is wide and various, for example, the user can edit the information by self-driving with friends, such as images and videos shot during the travel, the route track of self-driving, the scenic spots visited, the food eaten, the performance watched, the songs heard during self-driving, etc., if the user edits the information manually, the complexity of editing the information is increased and the efficiency of editing the information is reduced.

Disclosure of Invention

The embodiment of the application aims to provide a data processing method, a data processing device, electronic equipment and a storage medium, which can reduce the complexity of information editing and improve the efficiency of information editing.

In a first aspect, an embodiment of the present application provides a data processing method, including:

Acquiring multi-modal behavior data;

Determining behavior description data of user behaviors according to the multi-mode behavior data;

screening first multi-modal behavior data associated with behavior description data from the multi-modal behavior data;

user memory data corresponding to the user behavior is generated based on the behavior description data and the first multimodal behavior data.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

the acquisition module is used for acquiring the multi-mode behavior data;

The determining module is used for determining behavior description data of the user behavior according to the multi-mode behavior data;

The screening module is used for screening the first multi-mode behavior data associated with the behavior description data from the multi-mode behavior data;

And the generation module is used for generating user memory data corresponding to the user behaviors based on the behavior description data and the first multi-mode behavior data.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions implementing the steps of the data processing method as shown in the first aspect when executed by the processor.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the data processing method as described in the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, the chip including a processor and a display interface, the display interface being coupled to the processor, the processor being configured to execute programs or instructions to implement the steps of the data processing method as shown in the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to perform the steps of the data processing method as shown in the first aspect.

In the embodiment of the application, the behavior description data of the user behavior is determined through the multi-modal behavior data of the user, the first multi-modal behavior data associated with the behavior description data is screened from the multi-modal behavior data, and then the user memory data corresponding to the user behavior is generated based on the behavior description data and the first multi-modal behavior data. Therefore, the types of the data elements can be enriched by collecting the multi-modal behavior data, the comprehensiveness and the completeness of the data for acquiring the user behavior are improved, the accuracy for determining the source of the user behavior data is further improved, the first multi-modal behavior data is summarized and arranged according to the behavior description data, the user is not required to manually edit the information experienced by the user, the information editing complexity is reduced, and the information editing efficiency is improved.

Drawings

FIG. 1 is a flow chart of a data processing method provided by some embodiments of the present application;

FIG. 2 is a flow chart of image data processing according to some embodiments of the present application;

FIG. 3 is a flow chart of a multi-modal behavior data process in a data processing method according to some embodiments of the present application;

FIG. 4 is a schematic diagram of a data processing apparatus according to some embodiments of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to some embodiments of the present application;

Fig. 6 is a schematic hardware structure of an electronic device according to some embodiments of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type not limited to the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

Friends or family gathering, traveling, important performances, etc. as important life experiences of the user, most users want to be able to memorize these important life experiences, want to recall the details of the time, and the feeling of the scene of the time, etc. from past recordings when looking back at a future day. For example, traveling with friends, the information the user wants to record about the life of the person may include photos taken during the travel, route tracks of the person driving, scenic spots visited, food consumed, shows viewed, songs heard during the person driving, etc.

In the related art, the image recall function of the album application program is used for summarizing images or recorded videos shot by a user in a certain place to be circularly played, or some map navigation application programs can be used for providing the function of lighting the corners of the earth, namely, the user can light corresponding positions on the passing cities when navigating by using the map navigation application programs, so that the footprints of the user are recorded.

However, the dimensions of the information of the user life experience acquired in the two modes are single, and the data of the single dimension can be arranged into the memory, and cannot be directly related to each other, so that the user needs to repeatedly adjust among a plurality of application programs when viewing the information, and the operation of viewing the information by the user is increased. And comparing whether the whole journey of the user experiences or lacks a large amount of data in other dimensions, such as food, scenic spots and the like, so that the user needs to intervene a large amount of energy to carry out secondary processing or reediting, the complexity of information editing is increased, and the efficiency of information editing is reduced.

In order to solve the problems in the related art, the embodiment of the application provides a data processing method, a data processing device and electronic equipment. The following describes in detail the data processing method provided by the embodiment of the present application through specific embodiments and application scenarios thereof with reference to fig. 1 to 6.

First, a data processing method provided in an embodiment of the present application is described in detail with reference to fig. 1.

Fig. 1 is a flow chart of a data processing method according to some embodiments of the present application.

As shown in fig. 1, the data processing method provided in the embodiment of the present application may be applied to an electronic device, and based on this, the data processing method may include steps 110 and 140, which are specifically shown below.

Step 110, acquiring multi-modal behavior data, step 120, determining behavior description data of user behaviors according to the multi-modal behavior data, step 130, screening first multi-modal behavior data associated with the behavior description data from the multi-modal behavior data, and step 140, generating user memory data corresponding to the user behaviors based on the behavior description data and the first multi-modal behavior data.

Therefore, the types of the data elements can be enriched by collecting the multi-modal behavior data, the comprehensiveness and the integrity of the data for acquiring the user behavior are improved, the accuracy for determining the source of the user behavior data is further improved, the first multi-modal behavior data are summarized and arranged according to the behavior description data, the user does not need to manually edit the information experienced by the user, the information editing complexity is reduced, and the information editing efficiency is improved.

The above steps are described in detail below, and are specifically described below.

Referring to step 110, in the embodiment of the present application, the multimodal behavior data may include, but is not limited to, data generated when a user operates on a platform such as an application program, an applet, a web page, etc. in an electronic device, including behavior data such as browsing, clicking, sliding, long press, etc., that is, data perceived in all aspects in the electronic device of the user, and may specifically include at least one data type of image data, audio data, text data, and numeric data.

The image data may include, but is not limited to, images or videos in an album application, images collected in an application of an electronic device (such as an instant messaging application, a shopping application), videos, reference images in an image editing process, processing parameters, and the like.

The audio data may include, but is not limited to, audio in an album application, a recording, local music, downloaded audio, audio played by a music playing application.

The text data may include, but is not limited to, text information viewed, edited by a user in the electronic device, such as text message content, calendar, notebook, favorites links.

The numerical data includes, but is not limited to, numerical data provided by an operating system in the electronic device to a user, and numerical data provided by various application programs in the electronic device based on user behaviors. The operating system provides numerical data such as mobile phone electric quantity, weather conditions, calendars and alarm clocks for users. Various application programs are based on numerical data provided by user behaviors, such as the geographic position and short message receiving.

It should be noted that, in the embodiment of the present application, the data embedding points are performed on the electronic device, so as to obtain the multi-mode behavior data of these data types, thereby recording the behavior actions of the user in a period of time.

Therefore, the embodiment of the application can integrate data of various types generated by electronic equipment, including data of four data types such as image, audio, text, numerical data and the like, thereby solving the problem that the integrated elements are single and the information source for determining the user behavior is deficient in the prior art.

Referring to step 120, in some embodiments of the application, step 120 may include, in particular, step 1201 and step 1202.

Step 1201, determining user behaviors according to the multi-modal behavior data, wherein the user behaviors comprise at least one of a first user behavior corresponding to a user experience time, a second user behavior corresponding to a user experience place and a third user behavior corresponding to a user experience event, the user experience time comprises at least one of a commemorative day marked by a user, a holiday provided by a device system, a holiday provided by the device system and a solar term provided by the device system, the experience place comprises at least one of a user temporary place and a user marked commemorative place, and the user experience event comprises at least one of a non-daily event and a planning event.

Among other things, user-marked anniversaries may include, but are not limited to, birthdays, wedding anniversaries, anniversary of the death, and the like. Holidays may include, but are not limited to, holidays, and medium-and-outer holidays. The user-marked commemorative sites may include, but are not limited to, sites that the user has once visited, sites that the user has planned to visit, sites that the user likes or dislikes, sites that the user has browsed images, text, or sites that are involved in web pages.

Step 1202, generating behavior description data of the user behavior according to the user behavior.

Therefore, scene collection and integration are carried out on the multi-modal behavior data with various data types, the problem that the source of the user behavior data is not comprehensive in the prior art is solved, and further the accuracy of determining the user behavior can be improved by analyzing the complete and comprehensive multi-modal behavior data.

In the embodiment of the present application, the multimodal behavior data related to the step 1201 may include various contents, and the determining of the user behavior may also include various contents.

In some embodiments of the application, the first user behavior may be determined based on a time elapsed by the user.

Based on the above, the multi-mode behavior data in the embodiment of the application can comprise at least one of the past time of the user including the commemorative day marked by the user, the holiday provided by the equipment system and the holiday provided by the equipment system. Therefore, the user behavior can be determined based on the behavior of the user in the user elapsed time. For example, on the day of the user's birthday, the user's behavior on the day of the birthday may be taken as the first user behavior, where the user may arrive at the user's residence, perform regular actions, i.e., go to the company to work and leave as usual, and also go to the user's residence, perform non-daily actions.

In some embodiments of the application, the second user behavior may be determined from a user experience location.

Based on this, the multimodal behavior data comprises location data for the user to locate a location within N time periods, the location data comprising location data, location type and user residence time, the second user behavior comprising a behavior of the user at the user residence location, N being an integer greater than 1, based on which the step 1201 may specifically comprise steps 12011 to 12013.

Step 12011, determining a user resident location and a user movement track of each of the N time periods according to location data of the user locating the location in the N time periods.

Specifically, the step 12011 may specifically include steps 120111 to 120113.

And step 120111, fitting the positioning places in each time period by a density clustering algorithm according to the position data of the positioning places of the user in N time periods to obtain the user stay place in each time period.

The location data in the embodiment of the application comprises, but is not limited to, satellite positioning location data, bluetooth positioning location data, wireless network communication technology (Wi-Fi) positioning location data and base station positioning location data. Density clustering algorithms include, but are not limited to, density-based anti-noise clustering algorithms (Density-Based Spatial Clustering of Applications with Noise, DBSCAN).

For example, according to the positioning function of the electronic device, the position data of the positioning places are collected at preset intervals, and since the positioning places are discrete, there may be a plurality of position data in the same place, therefore, the positioning places in each time period may be fitted by the DBSCAN based on the position data of the positioning places in the time period, and the user stay places in the time period, such as the user stay place a, the user stay place B, and the user stay place C, may be obtained.

Step 120112, generating a user movement track for each time period based on the user stay point in each time period.

For example, if the user stay locations in the time period include a user stay location a, a user stay location B, and a user stay location C, the user stay locations a, B, and C may be sorted and recorded according to the time sequence of collecting the locations, so as to obtain the user movement track in the time period.

Here, since the user stay point is obtained by fitting the position data of at least one positioning point, if the user stay point is obtained by fitting the position data of one positioning point, the time of the user stay point may be the time of acquiring the position data of the positioning point. If the user stay place is obtained by the position data of at least two positioning places, the time of the user stay place can be the earliest time of collecting the position data of the positioning places in the time of collecting the position data of at least two positioning places.

And 120113, screening the user resident places from the user resident places, wherein the user resident places meet at least one condition that the frequency of occurrence of the position data of the user resident places in N user movement tracks is larger than or equal to the preset frequency, the place types of the user resident places are matched with the preset place types, and the user resident time of the user resident places is matched with the preset resident time.

The preset site types in the embodiment of the application include, but are not limited to, industrial sites, commercial sites, residential sites, public facility sites, sports sites and agricultural sites.

For example, if N is 10, the number of occurrences of the user stay point a in 10 user movement tracks is 10 greater than the preset number of occurrences of 5, and then the user stay point a may be considered as the user resident point. And, the location type of the user resident location is that the house matches with a preset point type such as house, then the user resident location a can be considered to be the user resident location. And the user dwell time of the user at the user dwell location a is 8 hours matched with a preset dwell time, such as 5-24 hours, then the user dwell location a may be considered to be the user resident location.

Step 12012, screening out the user temporary location from the user movement track of each time period based on the user temporary location.

For example, the user stay location in the fitted user movement track may be compared with the user resident location to obtain the user temporary location.

Further, the user stay point in the fitted user movement track can be compared with the user resident point to obtain a non-daily stay point, and then the user temporary stay point is further screened from the non-daily stay points based on the position data, the place type and the user stay time of the non-daily stay point.

Illustratively, the user transient location may be determined when the user dwell time at the non-daily dwell point is 24 hours.

Step 12013, determining the behavior of the user occurring at the user temporary location as a second user behavior.

Thus, by defining the user-resident location and the user-transient location, the irregular, sporadic behavior of the user-transient location is determined as the second user behavior by distinguishing between the two.

In some embodiments of the application, the third user behavior may be determined from a user experience event. In this embodiment, two ways are provided based on different properties of the event experienced by the user, as is shown in detail below.

In some examples, the third user behavior includes a behavior corresponding to a non-daily event, based on which the step 1201 may include steps 12014 through 12017, in particular.

In step 12014, based on the multimodal behavior data, the daily behavior data of the user is determined through a frequent mining algorithm, where the daily behavior data of the user includes behavior data of the user that occurs regularly or frequently.

Step 12015, determining non-daily behavior data of the user from the multi-modal behavior data based on the daily behavior data of the user.

Step 12016, performing data analysis on the non-daily behavior data of the user to obtain a non-daily event.

For example, regular or frequent behavior data is defined as daily events, based on which, the multi-modal behavior data may include usage data of each application program in the electronic device, and frequent modes of the application program may be mined through a conventional machine algorithm such as an association rule mining algorithm (Apriori), a decision tree, etc., for example, a user may hear many songs in a regular time when the user listens to songs in an audio player, but may hear the bitter songs in a day in a certain day, and the number and duration of listening to songs are much more than usual, so that it is inferred that the user may be at the time fail in love, and the user may be regarded as a non-daily event.

In step 12017, the behavior corresponding to the non-daily event is determined as the third user behavior.

Illustratively, an example in step 12016 may still be illustrated, and the behavior corresponding to the non-daily event fail in love, i.e., the behavior of listening to the Song, may be determined as the third user behavior.

Thus, by defining daily events and non-daily events, both are distinguished, and irregular, sporadic events are marked as third user behavior.

In other examples, the third user behavior includes a behavior corresponding to a planned event, based on which the step 1201 may include steps 12018 through 12020, in particular.

Step 12018, screening the data of the user pre-planned event from the multi-modal behavior data.

Step 12019, data analysis is performed on the data of the user pre-planned event to obtain the planned event.

In step 12020, the behavior corresponding to the planning event is determined as a third user behavior.

For example, if relevant ticketing data of the user is identified, namely, a ticket of a singer singing concert of 10 months 10 is purchased, an air ticket of 10 months 1 to A place is reserved, and the like, a planned event, namely, a performance plan event, a travel plan event, of the user can be confirmed. At this time, the third user behavior may be determined based on the behavior around the purchase concert corresponding to the viewing performance planning event, or may be determined based on the behavior of the predetermined a place accommodation, the route of the inquiry scenic spot, or the like corresponding to the travel planning event.

Referring to step 130, since the multimodal behavior data in the embodiment of the present application includes at least one data type of image data, audio data, text data, and numeric data, the acquired multimodal behavior data may be screened and preprocessed to remove irrelevant information, and retain content related to important memory of the user, as shown in detail below.

Based on this, this step 130 may specifically include step 1301 and step 1302. This step includes data cleansing, deduplication, format conversion, etc., ensuring data quality.

In step 1301, the data corresponding to each type in the multi-mode behavior data is processed by the encoder corresponding to each data type, so as to obtain user data associated with the behavior description data under each type. Step 1302, associating, by using the large model, user data associated with the behavior description data under each type, to obtain first multi-modal behavior data.

Illustratively, the base data is converted to a digital matrix by the encoder and then docked to the embedded layer space of the large model as large model supplemental input information. For the four data types, four data encoders are respectively corresponding. The visual data encoder converts image pixels into image embedded data, the audio data encoder converts audio signals into audio embedded data, the text data encoder converts multi-language text into text embedded data, and the numerical embedded data encoder splices the numerical embedded data and then converts the numerical embedded data into numerical embedded data.

Therefore, the user behavior data are automatically arranged and aggregated based on the large model, and important elements in the user behavior data are extracted based on intention understanding and emotion analysis, so that the problems that emotion analysis is lacking and users are difficult to co-emotion in the prior art are solved.

Here, in the embodiment of the present application, the data types include a plurality of kinds, so for different data types, a manner of determining user data associated with behavior description data under each type is provided as follows.

Based on this, in some embodiments, the encoder comprises a visual data encoder, the user data comprises image embedded data, and the step 1301 may specifically comprise steps 13011 through 13014.

Step 13011, matching, by the visual data encoder, the aspect ratio of the image data with the preset aspect ratio to obtain a target preset aspect ratio matched with the image data.

Illustratively, as shown in FIG. 2, the processing steps of dynamic aspect Ratio Matching (DYNAMIC ASPECT Ratio Matching) may be performed, i.e., the visual data encoder may dynamically match the optimal aspect Ratio from a predefined set of aspect ratios to preserve the natural aspect Ratio of the image, where the set of aspect ratios may include aspect ratios of 1:1,1:2,2:1, etc., and the visual data encoder may match these predefined aspect ratios based on the aspect Ratio of the input image to obtain the target preset aspect Ratio that matches the image data.

Step 13012, performing image segmentation processing on the image corresponding to the image data according to the target preset aspect ratio to obtain at least two image segments.

For example, processing steps of Image segmentation and Thumbnail (Image Division & thumb) may be performed, i.e. after determining the appropriate aspect ratio, the Image may be adjusted to the corresponding resolution and then segmented into at least two Image segments of 448 x 448 pixels.

In addition, to capture the global context, the model also generates a thumbnail of the entire image, which is also scaled to 448 x 448 pixels.

And 13013, performing pixel shuffling on at least two image segments to obtain at least two rearranged image segments, wherein the resolution of the rearranged image segments is greater than that of the image.

Thus, a Pixel Shuffle (Pixel Shuffle) process for reducing the number of visual tokens can be performed, by which the number of visual tokens represented by an image can be reduced to one-fourth of the original number, which contributes to an improvement in the computational efficiency of the model in processing a high-resolution image.

Step 13014, adjusting the data dimension of at least two rearranged image segments to the preset input data dimension of the large model to obtain the image embedded data.

Illustratively, at least two rearranged image segments may be further passed through a full link layer (MLP Projector) to align the dimensions of the picture data output to 256tokens of the large model input, resulting in image embedded data that may be input to the large model.

In other embodiments, where the encoder comprises an audio data encoder and the user data comprises audio embedded data, step 1301 may comprise steps 13015-13017.

Step 13015 extracts user audio features from the audio waveforms of the audio data.

By way of example, features may be extracted from the original audio waveform, common features including Mel-frequency cepstral coefficients (Mel-Frequency Cepstrum, MFCCs), and the like. These features enable capturing user audio features such as intonation, speed of speech, etc. of the audio.

In step 13016, the user audio features are converted into audio conversion data by a small convolutional neural network, wherein the feature dimensions of the audio conversion data are smaller than those of the audio data.

Illustratively, features are converted by a small convolutional neural network (Small Convolutional Neural Network, SCNN) into embedded sequences that have lower dimensions and are able to capture important information of the audio, which is sub-sampled to reduce the sequence length until the last convolutional layer outputs a vector of a certain dimension as an embedding.

In step 13017, discretizing the audio conversion data by a multi-scale residual vectorization algorithm to obtain audio embedded data.

The embedded sequence may be converted into a discrete token sequence using residual vector quantization techniques, illustratively, by a multi-scale residual vector quantization algorithm (Residual Vector Quantization, RVQ).

Therefore, the RVQ algorithm encodes information with different granularities at different layers, the integrity and the compactness are balanced, and the accuracy of acquiring the audio embedded data is improved.

In still other embodiments, where the encoder comprises a numeric data encoder and the user data comprises numeric embedded data, step 1301 may specifically comprise steps 13018 through 13020.

In step 13018, the numerical data is adjusted by a numerical data encoder according to a preset standardized quantitative parameter to obtain standardized quantitative numerical data.

For example, the data preprocessing and normalization process may be performed on the digital data by the digital data encoder, that is, normalization of the longitude and latitude of the geographic location, the temperature and humidity of the weather, and the like to a uniform scale, such as normalization using Z-score (Z-score) or min-max normalization. The conversion of the numeric data into quantitative data was performed using One-Hot encoding (One-Hot).

Step 13019, extracting scene characteristic data corresponding to the data type from the standardized quantitative numerical data according to the data type of the numerical data.

For example, key features may be selected based on the nature and importance of the data, such as latitude and longitude of the geographic location, average temperature of the weather, humidity, precipitation probability, and so forth. And based on the data scene understanding, new features are built, such as extracting places frequently visited by the user from the geographical location data, or extracting trends and seasonal features from the time series data. For time series data, the time stamps are converted to time features such as hours, weeks, months, etc., and their association with a particular event.

Step 13020, performing data reconstruction on the scene feature data according to the sequence of the occurrence time of the scene feature data to obtain numerical embedded data.

Illustratively, the time dependence may be captured by a sequence model such as a recurrent neural network model (Recurrent Neural Networks, RNN), long Short-Term Memory model (LSTM), or recurrent neural network (Recurrent Neural Network, GRU). And splicing the embedded layers, and mapping the discrete features to continuous vectors in a high-dimensional space, thereby obtaining numerical embedded data.

Also, as shown in fig. 3, since text data can be directly input into a large model, it is unnecessary to process the text data. Here, as shown in fig. 3, if data of at least two data types are included, encoders corresponding to the plurality of data types may be performed in parallel, thereby improving efficiency of determining user data.

Therefore, scene collection and integration are carried out based on multi-mode data, encoders are respectively defined for four types of basic data of images, audios, texts and numbers, and the basic data are input into a large model embedding space. The method solves the problems that the user behavior data sources in the prior art are not comprehensive, the data integration of time-span and region-span, the daily behavior and important behavior cannot be distinguished and the like cannot be met.

In addition, based on the screened important behaviors, the basic data is embedded into the matrix input large model, the large model selects the content related to the behavior data in the basic data through the prompt words, and the basic data is returned to be input into the matrix index, so that the step 1302 may specifically include the steps 13021 and 13024.

And step 13021, carrying out semantic analysis on the behavior description data through the large model to obtain the user behavior intention and the context environment information characterized by the behavior description data.

Illustratively, behavioral data intent and context are analyzed by the cue words in the large model, intent recognition is performed on the behavioral data by the cue word indications, and intent behind the user's behavior is analyzed. For example, a user taking a picture at a particular time may be for recording an important event. The contextual environment in which the behavioral data occurs, including time, place, participating person, event type, etc., is analyzed to enhance the accuracy of intent recognition. Indexing behavior data, including time stamps, behavior types, behavior durations, etc., to facilitate quick retrieval and matching.

Step 13022 extracts user behavior element data associated with the user behavior intention from the user data of each data type associated with the behavior description data.

Illustratively, in conjunction with image, audio, text, and digital data encoders, cross-modal important elements are extracted, such as key objects in the image, emotion words in the audio, and emotion words in the text. And adding an attention layer after embedding the large model into the matrix, and utilizing an attention mechanism, so that the large model can automatically allocate more attention resources to the region with higher correlation with the behavior data when processing the embedded matrix.

And 13023, scoring the user behavior element data according to the context environment information to obtain a correlation score of the user behavior element.

Illustratively, the relevance of the underlying data is scored by the cue words, i.e., the relevance of the underlying data to the behavioral data is scored by the large model, by the contextual analysis results.

Step 13024, screening the first multi-modal behavior data from the user behavior element data, wherein the correlation score of the first multi-modal behavior data is greater than or equal to the preset threshold.

Therefore, the data are automatically arranged and aggregated based on the large model, basic data related to user behavior data are selected through understanding of the large model on multi-mode data, and the problems that emotion analysis is lacking in the prior art and users are difficult to co-emotion due to an aggregation result are solved.

Referring to step 140, in an embodiment of the present application, step 140 may include step 1401 and step 1402, in particular.

Step 1401, classifying and summarizing the first multi-mode behavior data according to the behavior description data to obtain multi-mode summarized behavior data.

Step 1402, associating the behavior description data with the multimodal summary behavior data to obtain user memory data.

For example, the behavior description data and the multi-mode summary behavior data can be associated to obtain user memory data, and the user memory data can be stored and provided for various types of applications for scene analysis and data presentation.

In summary, in the embodiment of the application, the method for automatically fusing the multi-mode behavior data is understood to be the important moment data of the user. Through the powerful understanding capability of the large model, various user data can be perceived every day by means of the electronic equipment, aggregation and understanding can be automatically carried out, the purposes of automatically arranging and constructing important scene libraries of users, providing users with convenient emotion memory management schemes and bringing better self-growing value to the users are achieved.

It should be noted that, in the data processing method provided by the embodiment of the present application, the execution body may be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a palm computer, a wearable device, and the like. In some embodiments of the present application, an electronic device is taken as an execution body to execute a data processing method, which is described in the embodiments of the present application.

According to the data processing method provided by the embodiment of the application, the execution main body can be a data processing device. In the embodiment of the present application, a data processing device executes a data processing method as an example, and a device for the data processing method provided in the embodiment of the present application is described.

The application also provides a data processing device. This is described in detail with reference to fig. 4.

Fig. 4 is a schematic structural diagram of a data processing apparatus according to some embodiments of the present application.

As shown in fig. 4, the data processing apparatus 40 may be applied to an electronic device, and the data processing apparatus 40 may specifically include:

the obtaining module 401 is configured to obtain multi-modal behavior data.

A determining module 402, configured to determine behavior description data of the user behavior according to the multimodal behavior data.

A screening module 403, configured to screen the first multimodal behavior data associated with the behavior description data from the multimodal behavior data.

The generating module 404 is configured to generate user memory data corresponding to the user behavior based on the behavior description data and the first multimodal behavior data.

The data processing apparatus 40 in the embodiment of the present application will be described in detail as follows.

In some embodiments of the present application, the determining module 402 is further configured to determine, according to the multimodal behavior data, a user behavior, where the user behavior includes at least one of a first user behavior corresponding to a user experience time, a second user behavior corresponding to a user experience location, and a third user behavior corresponding to a user experience event, where the user experience time includes at least one of a user-marked anniversary, a holiday provided by a device system, and a holiday provided by a device system, where the experience location includes at least one of a user-temporary location, a user-marked commemorative location, and a user experience event includes at least one of a non-daily event, a scheduled event;

the generating module 404 is further configured to generate behavior description data of the user behavior according to the user behavior.

In some embodiments of the present application, the determining module 402 is further configured to determine, when the multimodal behavior data includes location data for a user to locate a location within N time periods, the location data including location data, a location type, and a user residence time, the second user behavior includes a behavior of the user at a user-temporary location, and N is an integer greater than 1, a user-movement trajectory for the user and each of the N time periods based on the location data for the user to locate the location within the N time periods;

The screening module 403 is further configured to screen the user temporary location from the user movement track of each time period based on the user permanent location;

the determining module 402 is further configured to determine, as the second user behavior, a behavior of the user occurring at the user-temporary location.

In some embodiments of the present application, the data processing apparatus 40 further includes a fitting module, configured to, according to position data of the user positioning locations in N time periods, perform fitting on the positioning locations in each time period by using a density clustering algorithm, to obtain a user stay location in each time period;

The generating module 404 is further configured to generate a user movement track for each time period based on the user stay location in each time period;

The screening module 403 is further configured to screen the user resident locations from the user resident locations, where the user resident locations satisfy at least one condition that the number of occurrences of the location data of the user resident locations in the N user movement tracks is greater than or equal to a preset number of times, the location type of the user resident locations matches the preset location type, and the user resident time of the user resident locations matches the preset resident time.

In some embodiments of the present application, the determining module 402 is specifically configured to determine, based on the multimodal behavior data, daily behavior data of the user through a frequent mining algorithm, where the third user behavior includes a behavior corresponding to a non-daily event, the daily behavior data of the user including behavior data that the user regularly or frequently occurs;

determining non-daily behavior data of the user from the multi-modal behavior data based on the daily behavior data of the user;

carrying out data analysis on the non-daily behavior data of the user to obtain a non-daily event;

and determining the behavior corresponding to the non-daily event as a third user behavior.

In some embodiments of the present application, the determining module 402 is specifically configured to, in a case where the third user behavior includes a behavior corresponding to the planned event, screen the data of the user pre-planned event from the multimodal behavior data;

carrying out data analysis on the data of the user pre-planned event to obtain the planned event;

and determining the corresponding behavior of the planning event as a third user behavior.

In some embodiments of the present application, the filtering module 403 is specifically configured to, in a case where the multimodal behavior data includes data of at least one data type selected from the group consisting of image data, audio data, text data, and numeric data, process, by an encoder corresponding to each data type, the data corresponding to each type in the multimodal behavior data to obtain user data associated with behavior description data under each type;

And correlating the user data correlated with the behavior description data under each type through a large model to obtain first multi-mode behavior data.

In some embodiments of the present application, the filtering module 403 is specifically configured to, when the encoder includes a visual data encoder and the user data includes image embedded data, match, by the visual data encoder, an aspect ratio of the image data with a preset aspect ratio to obtain a target preset aspect ratio matched with the image data;

according to the target preset aspect ratio, performing image segmentation processing on an image corresponding to the image data to obtain at least two image fragments;

performing pixel shuffling on at least two image segments to obtain at least two rearranged image segments, wherein the resolution of the rearranged image segments is greater than that of the image;

and adjusting the data dimension of at least two rearranged image fragments to the preset input data dimension of the large model to obtain the image embedded data.

In some embodiments of the present application, the filtering module 403 is specifically configured to extract the user audio feature from the audio waveform of the audio data in the case where the encoder includes an audio data encoder and the user data includes audio embedded data;

Converting the audio characteristics of the user into audio conversion data through a small convolutional neural network, wherein the characteristic dimension of the audio conversion data is smaller than that of the audio data;

And performing discretization processing on the audio conversion data through a multi-scale residual vectorization algorithm to obtain audio embedded data.

In some embodiments of the present application, the filtering module 403 is specifically configured to, when the encoder includes a numeric data encoder and the user data includes numeric embedded data, adjust the numeric data according to a preset standardized quantitative parameter by the numeric data encoder to obtain standardized quantitative numeric data;

Extracting scene characteristic data corresponding to the data type from the standardized quantitative numerical data according to the data type of the numerical data;

And carrying out data reconstruction on the scene characteristic data according to the sequence of the occurrence time of the scene characteristic data to obtain numerical embedded data.

In some embodiments of the present application, the screening module 403 is specifically configured to perform semantic analysis on the behavior description data through a large model, so as to obtain the user behavior intention and the context information represented by the behavior description data;

extracting user behavior element data associated with user behavior intents from user data of each data type associated with behavior description data;

scoring the user behavior element data according to the context environment information to obtain a correlation score of the user behavior element;

and screening the first multi-mode behavior data from the user behavior element data, wherein the correlation score of the first multi-mode behavior data is larger than or equal to a preset threshold value.

In some embodiments of the present application, the generating module 404 is specifically configured to classify and summarize the first multimodal behavior data according to the behavior description data to obtain multimodal summarized behavior data;

and associating the behavior description data with the multi-mode summary behavior data to obtain user memory data.

The data processing device 40 in the embodiment of the present application may be an electronic device, or may be a component in an electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and may also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which are not particularly limited in the embodiments of the present application.

The data processing device 40 in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The data processing device 40 provided in the embodiment of the present application can implement each process implemented by the embodiments of the data processing method shown in fig. 1 to 3, so as to achieve the same technical effects, and for avoiding repetition, a detailed description is omitted here.

Based on the above, the data processing device provided by the embodiment of the application determines the behavior description data of the user behavior through the multi-modal behavior data of the user, screens the first multi-modal behavior data associated with the behavior description data from the multi-modal behavior data, and then generates the user memory data corresponding to the user behavior based on the behavior description data and the first multi-modal behavior data. Therefore, the types of the data elements can be enriched by collecting the multi-modal behavior data, the comprehensiveness and the completeness of the data for acquiring the user behavior are improved, the accuracy for determining the source of the user behavior data is further improved, the first multi-modal behavior data is summarized and arranged according to the behavior description data, the user is not required to manually edit the information experienced by the user, the information editing complexity is reduced, and the information editing efficiency is improved.

Optionally, as shown in fig. 5, the embodiment of the present application further provides an electronic device 50, including a processor 501 and a memory 502, where the memory 502 stores a program or an instruction that can be executed on the processor 501, and the program or the instruction implements each step of the above-mentioned data processing method embodiment when executed by the processor 501, and the steps achieve the same technical effect, so that repetition is avoided, and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

Fig. 6 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

The electronic device 600 includes, but is not limited to, a radio frequency unit 601, a network module 602, an audio output unit 603, an input unit 604, a sensor 605, a display unit 606, a user input unit 607, an interface unit 608, a memory 609, a processor 610, and the like.

Those skilled in the art will appreciate that the electronic device 600 may further include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 610 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system.

The electronic device structure shown in fig. 6 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

In an embodiment of the present application, the processor 610 is configured to obtain multi-modal behavior data. The processor 610 is further configured to determine behavior description data of the user behavior based on the multimodal behavior data. The processor 610 is further configured to filter the first multimodal behavior data associated with the behavior description data from the multimodal behavior data. The processor 610 is further configured to generate user memory data corresponding to the user behavior based on the behavior description data and the first multimodal behavior data.

The electronic device 600 will be described in detail below, as follows.

In some embodiments of the present application, the processor 610 is further configured to determine, based on the multimodal behavior data, a user behavior including at least one of a first user behavior corresponding to a user experience time, a second user behavior corresponding to a user experience place, and a third user behavior corresponding to a user experience event, wherein the user experience time includes at least one of a user-tagged holiday, a device-system-provided holiday, a experience place includes at least one of a user-dump place, a user-tagged commemorative place, and a user-experience event includes at least one of a non-daily event, and a planned event;

and generating behavior description data of the user behavior according to the user behavior.

In some embodiments of the present application, the processor 610 is further configured to determine, when the multimodal behavior data includes location data for a user to locate a location within N time periods, the location data including location data, a location type, and a user dwell time, the second user behavior including a behavior of the user at a user-temporary location, N being an integer greater than 1, a user-movement trajectory for the user at the user-temporary location and for each of the N time periods based on the location data for the user to locate the location within the N time periods;

Screening the user temporary residence places from the user movement tracks of each time period based on the user residence places;

And determining the behavior of the user at the user temporary location as a second user behavior.

In some embodiments of the present application, the processor 610 is further configured to fit the positioning locations in each time period to obtain a user stay location in each time period through a density clustering algorithm according to the position data of the positioning locations of the user in the N time periods;

generating a user movement track of each time period based on the user stay place in each time period;

and screening the user resident places from the user resident places, wherein the user resident places meet at least one condition that the frequency of occurrence of the position data of the user resident places in N user movement tracks is larger than or equal to a preset frequency, the place types of the user resident places are matched with the preset place types, and the user resident time of the user resident places is matched with the preset resident time.

In some embodiments of the present application, the processor 610 is further configured to, in a case where the third user behavior includes a behavior corresponding to a non-daily event, determine, based on the multimodal behavior data, daily behavior data of the user including behavior data of the user that occurs regularly or frequently through a frequent mining algorithm;

In some embodiments of the present application, the processor 610 is further configured to screen the data of the user pre-programmed event from the multimodal behavioral data in the event that the third user behavior includes a behavior corresponding to the programmed event;

In some embodiments of the present application, the processor 610 is further configured to, in a case where the multimodal behavior data includes data of at least one data type selected from the group consisting of image data, audio data, text data, and numeric data, process, by an encoder corresponding to each data type, the data corresponding to each type in the multimodal behavior data to obtain user data associated with behavior description data under each type;

In some embodiments of the present application, the processor 610 is further configured to, in a case where the encoder includes a visual data encoder and the user data includes image embedded data, match an aspect ratio of the image data with a preset aspect ratio by the visual data encoder to obtain a target preset aspect ratio matched with the image data;

In some embodiments of the present application, the processor 610 is further configured to extract user audio features from an audio waveform of the audio data in the case where the encoder comprises an audio data encoder and the user data comprises audio embedded data;

In some embodiments of the present application, the processor 610 is further configured to, when the encoder includes a numeric data encoder and the user data includes numeric embedded data, adjust the numeric data according to a preset standardized quantitative parameter by the numeric data encoder to obtain standardized quantitative numeric data;

In some embodiments of the present application, the processor 610 is further configured to perform semantic analysis on the behavior description data through a large model to obtain user behavior intent and context information characterized by the behavior description data;

In some embodiments of the present application, the processor 610 is further configured to classify and summarize the first multimodal behavior data according to the behavior description data to obtain multimodal summarized behavior data;

It should be appreciated that the input unit 604 may include a graphics processor (Graphics Processing Unit, GPU) 6041 and a microphone 6042, the graphics processor 6041 processing image data of still images or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The display unit 606 may include a display panel, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 607 includes at least one of a touch panel 6071 and other input devices 6072. The touch panel 6071 is also called a touch screen. The touch panel 6071 may include two parts, a touch detection device and a touch display. Other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume display keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

The memory 609 may be used to store software programs and various data, and the memory 609 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 609 may include volatile memory or nonvolatile memory, or the memory 609 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDRSDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct random access memory (DRRAM). Memory 609 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.

The processor 610 may include one or more processing units, and optionally, the processor 610 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless display signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 610.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above-mentioned data processing method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

The processor is a processor in the electronic device in the above embodiment. Among them, the readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic disk or optical disk, etc.

In addition, the embodiment of the application further provides a chip, the chip comprises a processor and a display interface, the display interface is coupled with the processor, the processor is used for running programs or instructions, the processes of the embodiment of the data processing method can be realized, the same technical effects can be achieved, and the repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the above-described data processing method embodiments, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in part in the form of a computer software product stored on a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A data processing method, comprising:

Acquire multimodal behavioral data;

Determining behavior description data of the user behavior based on the multimodal behavior data;

Filtering first multimodal behavior data associated with the behavior description data from the multimodal behavior data;

Based on the behavior description data and the first multimodal behavior data, user memory data corresponding to the user behavior is generated.

2. The method according to claim 1, characterized in that the step of determining the behavior description data of the user behavior based on the multimodal behavior data comprises:

Determine the user behavior according to the multimodal behavior data, wherein the user behavior includes at least one of the following: a first user behavior corresponding to a user-experienced time, a second user behavior corresponding to a user-experienced location, and a third user behavior corresponding to a user-experienced event; wherein the user-experienced time includes at least one of the following: a commemorative day marked by the user, a holiday provided by the device system, and a public holiday provided by the device system; the experience location includes at least one of the following: a user's temporary location, a commemorative location marked by the user; the user-experienced event includes at least one of the following: a non-routine event, and a planned event;

Based on the user behavior, behavior description data of the user behavior is generated.

3. The method according to claim 1, characterized in that the multimodal behavior data includes data of at least one of the following data types: image data, audio data, text data, and numerical data;

The step of selecting the first multimodal behavior data associated with the behavior description data from the multimodal behavior data includes:

Processing the data corresponding to each type in the multimodal behavior data through an encoder corresponding to each data type to obtain user data associated with the behavior description data under each type;

The user data associated with the behavior description data under each type is associated through the large model to obtain the first multimodal behavior data.

4. The method according to claim 3, characterized in that the step of associating the user data associated with the behavior description data under each type through the large model to obtain the first multimodal behavior data comprises:

By using the large model, semantic analysis is performed on the behavior description data to obtain user behavior intention and context environment information represented by the behavior description data;

Extracting user behavior element data associated with the user behavior intention from the user data of each data type associated with the behavior description data;

Scoring the user behavior element data according to the context information to obtain a relevance score of the user behavior element;

The first multimodal behavior data is filtered from the user behavior element data, and a relevance score of the first multimodal behavior data is greater than or equal to a preset threshold.

5. The method according to claim 1, characterized in that the generating of user memory data corresponding to the user behavior based on the behavior description data and the first multimodal behavior data comprises:

According to the behavior description data, the first multimodal behavior data is classified and summarized to obtain multimodal summary behavior data;

The behavior description data is associated with the multimodal summary behavior data to obtain the user memory data.

6. A data processing device, comprising:

An acquisition module, used to acquire multimodal behavior data;

A determination module, used to determine behavior description data of the user behavior based on the multimodal behavior data;

A screening module, configured to screen first multimodal behavior data associated with the behavior description data from the multimodal behavior data;

A generation module is used to generate user memory data corresponding to the user behavior based on the behavior description data and the first multimodal behavior data.

7. The device according to claim 6 is characterized in that the determination module is also used to determine the user behavior according to the multimodal behavior data, and the user behavior includes at least one of the following: a first user behavior corresponding to the user's experience time, a second user behavior corresponding to the user's experience location, and a third user behavior corresponding to the user's experience event; wherein the user's experience time includes at least one of the following: a commemorative day marked by the user, a holiday provided by the device system, and a public holiday provided by the device system; the experience location includes at least one of the following: a temporary location of the user, a commemorative location marked by the user; the user's experience event includes at least one of the following: a non-routine event, a planned event;

The generating module is further used to generate behavior description data of the user behavior according to the user behavior.

8. The device according to claim 6, characterized in that the screening module is specifically used to, when the multimodal behavior data includes data of at least one of the following data types: image data, audio data, text data, and numerical data, process the data corresponding to each type in the multimodal behavior data through an encoder corresponding to each data type to obtain user data associated with the behavior description data under each type;

9. The device according to claim 8, characterized in that the screening module is specifically used to perform semantic analysis on the behavior description data through the large model to obtain the user behavior intention and context environment information represented by the behavior description data;

10. The device according to claim 6, characterized in that the generating module is specifically used to classify and summarize the first multimodal behavior data according to the behavior description data to obtain multimodal summary behavior data;