CN117556065B

CN117556065B - Deep learning-based large model data management system and method

Info

Publication number: CN117556065B
Application number: CN202410040885.1A
Authority: CN
Inventors: 徐步海
Original assignee: Yancheng Jishuo Technology Co ltd; Jiangsu Guzhuo Technology Co ltd
Current assignee: Yancheng Jishuo Technology Co ltd; Jiangsu Guzhuo Technology Co ltd
Priority date: 2024-01-11
Filing date: 2024-01-11
Publication date: 2024-03-26
Anticipated expiration: 2044-01-11
Also published as: CN117556065A

Abstract

The invention discloses a deep learning-based large model data management system and method, and belongs to the technical field of large-scale data classification. Along with popularization of the Internet and rapid development of the digitizing technology, the world is in a big data age, under the background of the big data age, the computer is gradually difficult to process a large-scale data set, and then a large-scale data processing mode such as a block chain, a database and the like and a plurality of parallel computing frameworks such as MapReduce and the like are sequentially developed to rapidly process the large-scale data; the method analyzes the large-scale data and carries out quick preliminary classification on the large-scale data set; the specific classification is carried out according to the key information of each type of data, and the data of different types are stored in different storage spaces, so that the user can search the information more conveniently and rapidly; and predicting the browsing preference of the user data according to the history record of the user browsing the data, and carrying out accurate data classified pushing on the user.

Description

Deep learning-based large model data management system and method

Technical Field

The invention relates to the technical field of data classification, in particular to a large model data management system and method based on deep learning.

Background

With the popularization of the internet and the rapid development of the digitizing technology, the amount of data has shown an explosive growth. The method has the advantages that a large amount of data is not generated at any time in the humane activities, the mess and the value of the networked data information are increased gradually, the world is in a big data age, the mess and the value of the networked data information are increased gradually along with the popularization of the Internet and the rapid development of the digital technology, the world is in the big data age, the computer is difficult to process a large-scale data set in the big data age background, and then a plurality of large-scale data processing modes such as a block chain and a database and a plurality of parallel computing frames such as MapReduce and Spark are sequentially developed, so that the rapid processing of the large-scale data can be realized; the large-scale data information processing is mainly characterized in that large-scale data sets are analyzed and processed through information acquisition and preprocessing and information analysis technologies, so that many large-scale data-based and intelligent data processing systems are gushed, and the processing of data such as accurate acquisition, real-time sharing, data confidentiality and storage of the large-scale data becomes more important;

the data processing skills in the large data background can deep mine the logic relation of data hiding from massive and disordered large-scale data, the classification processing of the large-scale data can provide various decisions for society, the difficulty of reading the wanted part in huge large-scale data is higher and higher due to the development of the Internet, when the data is acquired, the data is classified into various types and fish-eye mixed beads, and when the data is read and managed, the specific data cannot be accurately and effectively managed, so that the classification of the data is vital, the data is classified, the data is respectively stored according to the classification, the data reading speed can be greatly increased, and the data management is more convenient.

Disclosure of Invention

The invention aims to provide a large model data management system and method based on deep learning, so as to solve the problems in the background technology.

In order to solve the technical problems, the invention provides the following technical scheme:

a deep learning based large model data management method, the method comprising the steps of:

s1, acquiring various real-time data information in a database query and sensor perception mode, and summarizing and transmitting the acquired data information to a system for processing;

s2, analyzing historical data, wherein the historical data are divided into three types of characters, videos and audios; under three data types, calculating information in the historical data to obtain key information of each type of historical data as a standard for judging and classifying real-time data;

classifying the historical data, classifying the historical data into three types of text, video and audio, and classifying the three types of the historical data according to the content, wherein the specific modes are as follows:

in the aspect of text data, the historical data is investigated, the historical data is classified according to content, and all text historical data is specifically classified into by a decision tree classification algorithm n categories, namely, investigation is carried out on the history data of the n categories, the occurrence times of each word in the history data of each category are calculated, the proportion of each word in the total word is obtained, and the +_s are obtained>For the duty cycle of each word occurrence, +.>The number of occurrences of each word, +.>For all word numbers that occur, the duty ratio calculation formula is: />；

Word with maximum occurrence ratio of each historical dataThe method comprises the steps of determining key words of historical data of each category, and taking the key words as a standard for classifying real-time text data;

in the aspect of audio data, historical data is investigated, the historical data is classified according to content, and the historical audio data is divided into the historical audio data through a decision tree classification algorithmm categories, observing all the historical audio data by using a spectrometer to obtain the frequency spectrum of each type of historical audio data, and observing and calculating the amplitude and the frequency in the frequency spectrum in the historical audio data of m categories, wherein the specific mode is as follows:

collecting historical audio data, and performing time domain analysis on the audio data to obtain time domain waveforms of the audio dataTime domain waveform for audio data>Performing Fourier transform to obtain a spectrum of the audio data in a frequency domain, wherein the formula is as follows: ；

Fourier transform is carried out on the historical audio data through a formula to obtain a frequency spectrumAfter all the historical audio data are subjected to Fourier transform, m frequency spectrums are obtained, in the Fourier transform, the amplitude is the intensity or energy of different frequency components of the signal in the frequency domain, and the amplitude is calculated through the result of the Fourier transform;

spectral representation of fourier transformsCalculating amplitude as a complex function consisting of real and imaginary parts, requiring calculation of the amplitude spectrum of each frequency component, setting the frequency spectrum +.>The magnitude spectrum is +.>WhereinAs real part->Is imaginary, f is frequency, ">Is angular frequency; the amplitudes and frequencies in all spectra are calculated as:；/>；

obtaining waveforms with the same amplitude and frequency of the frequency spectrum of the historical audio data under each category and the largest occurrence frequencyThe +.>Determining a key waveform of corresponding historical audio data, and taking the key waveform as a standard for classifying real-time audio data;

in the aspect of video data, historical data is investigated, the historical data is classified according to content, and all the video historical data is divided into the video historical data through a decision tree classification algorithmr categories, framing all video history data, And each frame of image is segmented, under each frame of image, the RGB color collocation of the central block and the peripheral 8 blocks in the 3X 3 segmented matrix form matrix colors, and each frame is divided into +.>Block color matrix->Representing a color matrix of nine RGB values, each block of color matrix consisting of 3X 3 RGB values +.>For the RGB value of the center in the color matrix, +.>The specific matrix is as follows: /> ；

Calculating the number of each matrix color in a frame image, calculating the proportion of each matrix color in each frame image to all matrix colors to obtain matrix color q with the largest proportion in the frame image, dividing each video history data into j frames, and obtaining a largest matrix color q in each frame to form a setThe method comprises the steps of carrying out a first treatment on the surface of the Comparing j matrix colors, determining the same matrix color as the same matrix color, calculating the quantity of each matrix color and the proportion of the matrix with the total color in all frames, and determining the matrix color with the largest proportion as the key matrix color of video data->Will beKey matrix colors are used as criteria for real-time video data classification.

S3, comparing the most significant information in the collected real-time data with the key information by utilizing the information standard of the historical data classification, so as to realize automatic classification of the real-time data;

The method comprises the steps of obtaining various new real-time data through database query and sensor perception by utilizing the obtained classification standards of text, audio and video data, classifying the newly obtained real-time data, forming a data set by using the obtained key information, and performing deep machine learning by utilizing a neural network algorithm to realize classification of different data, wherein the specific method comprises the following steps:

judging the real-time text data, counting the number of each word in the acquired real-time text data by calculation, and obtaining the word with the largest numberKeyword +.>Comparing, if the data types are the same, the text data are classified into data types corresponding to the key words, and if the data types are not the same, the data types are independently formed;

judging the real-time audio data, observing the collected real-time audio data by using a spectrometer to obtain the frequency spectrum of the real-time audio data, calculating the occurrence times of each waveform in the frequency spectrum, and comparing the waveform s with the largest occurrence times with the key waveform of the historical audio data of each categoryComparing, if the same, classifying the real-time audio data into +.>If the represented audio data do not have the same key waveform, defining the data represented by the s waveform as a new data category;

Judging real-time video data, and obtaining the number of videosAccording to the frame dividing process, after each frame picture is divided into blocks, the central blocks in the 3X 3 block matrix are matched with RGB colors of 8 blocks around to form matrix colors, the number of each matrix color in one frame picture is calculated, and then the proportion of each matrix color in each frame picture to all matrix colors is calculated, so that the matrix color with the largest proportion in one frame picture is obtainedEach video history data is specifically divided into j frames, and each frame obtains a maximum matrix color +.>Constitutes a collection->The method comprises the steps of carrying out a first treatment on the surface of the Comparing j matrix colors, determining the same matrix color as the same matrix color, calculating the number of each matrix color and the proportion of the matrix with the total color in all frames to obtain the matrix color with the largest proportion and the largest number>Real-time matrix color->Key matrix color for each type of historical video data>By contrast, if the same, the real-time video data is assigned to +.>The represented video data type, if not having the same color matching matrix, will +.>The represented video data is defined as a new video data category.

S4, the acquired real-time data are safely stored after being analyzed and tidied, and a plurality of storage spaces are divided for storage according to different information contents contained in the real-time data;

Encrypting the classified data by an AES symmetric encryption algorithm to obtain encrypted ciphertext data; before ciphertext data is stored, the storage space in the system is subjected to region segmentation processing, the storage space of the whole system is divided into three data memories, namely a word memory, an audio memory and a video memory, and the word memory is divided into different key wordsn blocks of storage units for dividing the audio memory into different key waveformsm-block memory unit for dividing video memory into different key color picturesAnd r storage units for storing different data types in different storage units according to the corresponding key information.

When data are stored, hyperlinks are used in documents, videos exist in text data, texts exist in video data, and when the mixed data are stored in a classified mode, refined analysis judgment needs to be carried out, wherein the specific analysis process is as follows:

when classifying the mixed data, judging a subject and an object of the mixed data, and further analyzing and classifying according to the subject after obtaining the subject data;

in the mixed data, the condition that the combination of the characters and the video, the characters and the audio exists, when the classification is carried out, the main body in the mixed data is judged to be any one of three data, the other data except the main body are all objects, the main body is a carrier for mainly expressing information in the data in the mixed data, and the objects are carriers for serving the main body and further explaining the main body;

Step1, judging that a subject and an object in the mixed data calculate the character data in the mixed data, and firstly calculating the word number of the character data in the mixed data to obtain the total word number of the character data in the mixed data as count;

step2, extracting subtitles and text in video or audio data in the mixed data, calculating words in all words in the mixed data, and determining the word with the largest occurrence as the subject information of the mixed data to respectively obtain the number i of the subject information in the video subtitles and the text, wherein the number v of the subject information in the word data;

step3, judging the subject and the object in the mixed data, converting into judging whether the text data in the mixed data is the subject, if not, the other data is the subject, if the text data is the subject, the other data is the object, and specifically, judging the text data is as follows:；

when the total word number of the word data in the mixed data is larger than t, t is defined by word number, the value of t is determined by the duration of the video and audio data, the longer the duration of the video and audio is, the larger the value of t is, when judging whether the word data in the mixed data is a main body, firstly, the number of the word data is judged, if the word number is too small, the word data cannot be the main body of the mixed data, and the judgment standard of the word data number is t, and when the duration of other data in the mixed data is longer, the standard of the word data number is relatively widened, and vice versa; because the number of words of the literal data is simply used, whether the literal data in the mixed data is a main body or not can not be accurately judged, and the number of the mixed data subject contents expressed by the literal data and other data in the mixed data is required to be judged, and the main body of the mixed data with more expressed subject contents is required to be judged; therefore, when the number of the subject information in the text data is larger than the number i of the main body information of the video subtitles and the text, and when both conditions are met, judging that the text data in the mixed data is the main body, otherwise, the video data is the main body; the mixed data of video and audio do not need to be judged, and because the audio exists in the video data, the video data is taken as a main body when the video data and the audio data are mixed;

When the mixed data is classified and stored, the mixed data is stored according to the main body of the mixed data, the mixed data is stored in a character memory when the main body is character data, the mixed data is stored in other types of memories when the main body is other data, and the classified and stored mixed data is completed.

S5, after the data are classified and stored, predicting user data browsing preference according to the history of the user browsing the data, and performing accurate data classified pushing on the user.

Predicting the preference of a user for browsing data according to the browsing record of the user in the system, and carrying out accurate data pushing according to the data type, wherein the method specifically comprises the following steps:

recording browsing data when a user uses the system, extracting all historical browsing data, extracting key information from the historical browsing data of the user according to a method for extracting key information when the data is classified, classifying the historical browsing data of the user according to a classification standard, and obtaining a data class setCalculating the number of historical browsing data of each category, and adding each historical browsing data in the collection +.>Sequencing from large to small according to the quantity, predicting the preference degree of the user for each data according to the sequencing of the historical browsing data categories, wherein the more the quantity of the historical browsing data is, the more the user favors the data categories;

Analyzing the sorting result, setting the maximum quantity of the historical browsing data as u, setting the minimum quantity of the historical browsing data as e, and according to the formula:；/>；/>；

in the formula, o is one third of the maximum and minimum difference values of the amount of the historical browsing data,、/>all are criteria for sequencing and segmenting;

dividing the sorting result into three data sets according to the above formula toAs a standard, the historical browsing data quantityThe section is defined as the most popular data set of the user, and the number of the historical browsing data is +.>The interval is defined as a favorite data set of a user, and the number of historical browsing data is +.>The interval is defined as a data set which is not favored by the user; according to the three data sets, when a user uses the system, the data types in the data set which is most favored by the user are actively pushed, the data types in the data set which is not favored by the user are shielded from being processed, and the data is intelligently pushed according to the preference of the user.

The large model data management system based on deep learning comprises a data acquisition module, a data classification processing module, a data storage module and an intelligent data pushing module;

the data acquisition module is used for acquiring various different real-time data in a database query and sensor mode and transmitting the different real-time data to the data processing module;

The data classification processing module is used for classifying, integrating and analyzing the data acquired in real time to obtain different data types;

the data storage module is used for safely storing the acquired real-time data after analysis and arrangement, and dividing a plurality of storage spaces for storage according to different data types;

the intelligent push data module is used for predicting user data browsing preference according to the history of the user browsing data after classifying and storing the data, and carrying out accurate data classified push on the user.

The data acquisition module acquires various real-time data information in a database query and sensor sensing mode, and gathers and transmits the acquired real-time data information to the data processing module.

The data processing module receives the real-time data information from the data acquisition module, analyzes all real-time data and classifies historical data into three types of characters, videos and audios; under three types of historical data, calculating information in the historical data to obtain key information of each type of historical data as a standard for judging and classifying real-time data; by utilizing the information standard of real-time data classification, the automatic classification of the real-time data is realized by comparing the information in all the real-time data with the key information.

The data storage module comprises a data encryption unit and a classification storage unit;

the data encryption unit is used for encrypting the classified data through an AES symmetric encryption algorithm to obtain encrypted ciphertext data;

the classification storage unit divides the storage space according to the data set obtained by classifying the data by the data processing module, and the data of the same data set are stored in the same storage space, so that the data reading is fast.

The intelligent push data module records browsing data when a user uses the system, extracts all historical browsing data, extracts key information according to a method for extracting key information when the data is classified, classifies all the historical browsing data of the user according to classification standards, calculates the number of the historical browsing data of each class, ranks the historical browsing data according to the number from large to small, predicts the favorite degree of the user for each data, and intelligently pushes the data according to user preference.

Compared with the prior art, the invention has the following beneficial effects:

the invention uses the mode of searching key information of large-scale data to obtain the standard of large-scale data classification, and by comparing the information in the data with the standard and classifying all the acquired data information, the invention can realize the rapid classification of large-scale data, each data is respectively stored in different storage spaces, the influence of mixed data exists when the data is stored, the classification result of the mixed data is obtained according to the judgment of the main body and the object of the mixed data, and the mixed data is classified and stored, so that the user can search the information more conveniently and rapidly.

According to the method and the device for classifying and storing the large-scale data, after the large-scale data is classified and stored, the browsing preference of the user data is predicted according to the history record of the user browsing the large-scale data, the user is accurately classified and pushed, the dislike data of the user is shielded, and when the user does not actively search, the dislike data types of the user are pushed, so that the occurrence of the dislike data types of the user is reduced, the convenience and the comfort of the user in use are improved, and the user can accurately browse the dislike data types.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of the deep learning based large model data management method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the present invention provides the following technical solutions:

in the aspect of text data, the historical data is investigated, the historical data is classified according to content, and all text historical data is specifically classified into by a decision tree classification algorithmn categories, namely, investigation is carried out on the history data of the n categories, the occurrence times of each word in the history data of each category are calculated, the proportion of each word in the total word is obtained, and the +_s are obtained >For the duty cycle of each word occurrence, +.>The number of occurrences of each word, +.>For all word numbers that occur, the duty ratio calculation formula is: />；

collecting historical audio data, and performing time domain analysis on the audio data to obtain time domain waveforms of the audio dataTime domain waveform for audio data>Performing Fourier transform to obtain a spectrum of the audio data in a frequency domain, wherein the formula is as follows:；

in the aspect of video data, historical data is investigated, the historical data is classified according to content, and all the video historical data is divided into the video historical data through a decision tree classification algorithmr categories, where all video history data is framedAnd dividing each frame of image into +.A central block in 3×3 block matrix and RGB color of 8 blocks around the central block form matrix color under each frame of image>Block color matrix->Representing a color matrix of nine RGB values, each block of color matrix consisting of 3X 3 RGB values +. >For the RGB value of the center in the color matrix, +.>The specific matrix is as follows:

；

calculating the number of each matrix color in a frame image, calculating the proportion of each matrix color in each frame image to all matrix colors to obtain matrix color q with the largest proportion in the frame image, dividing each video history data into j frames, and obtaining a largest matrix color q in each frame to form a setThe method comprises the steps of carrying out a first treatment on the surface of the Comparing j matrix colors, determining the same matrix color as the same matrix color, calculating the quantity of each matrix color and the proportion of the matrix with the total color in all frames, and determining the matrix color with the largest proportion as the key matrix color of video data->The key matrix colors are used as criteria for real-time video data classification.

judging real-time video data, framing the obtained video data, partitioning each frame of picture, collocating RGB colors of central partition blocks and peripheral 8 blocks in a 3×3 partitioned matrix to form matrix colors, calculating the number of each matrix color in one frame of picture, and calculating the proportion of each matrix color in each frame of picture to all matrix colors to obtain the matrix color with the largest proportion in one frame of picture Each video history data is specifically divided into j frames, and each frame obtains a maximum matrix color +.>Constitutes a collection->The method comprises the steps of carrying out a first treatment on the surface of the Comparing j matrix colors, determining the same matrix color as the same matrix color, calculating the number of each matrix color and the proportion of the matrix with the total color in all frames to obtain the matrix color with the largest proportion and the largest number>Real-time matrix color->Key matrix color for each type of historical video data>By contrast, if the same, the real-time video data is assigned to +.>The represented video data type, if not having the same color matching matrix, will +.>The represented video data is defined as a new video data category.

encrypting the classified data by an AES symmetric encryption algorithm to obtain encrypted ciphertext data; before ciphertext data is stored, the storage space in the system is subjected to region segmentation processing, the storage space of the whole system is divided into three data memories, namely a word memory, an audio memory and a video memory, and the word memory is divided into different key words n blocks of storage units for dividing the audio memory into different key waveformsm-block memory unit for dividing video memory into different key color picturesAnd r storage units for storing different data types in different storage units according to the corresponding key information.

Examples

The data browsing of the user in the last year is collected and analyzed, and the number of each data category browsing of the user in one year is obtained through statistical calculation, specifically:

The six data are ranked from big to small according to the browsing quantity:;

the segmentation criteria are calculated by means of a formula,;/>;;

dividing the sorting result into three data sets according to the above formula toAs a standard, the historical browsing data quantityThe interval is defined as a favorite data set of the most popular users, and comprises animation and catering category data; the historical browsing data quantity +.>The interval is defined as a favorite data set of a user, including military data; the historical browsing data quantity +.>Intervals are defined as user dislike data sets including science popularization, animal and life class numbersAccording to the above; according to the three data sets, when a user uses the system, the animation and food and beverage data types in the data set which is most favored by the user are actively pushed, military data in the data set which is more favored by the user are not processed, scientific popularization, animal and life data in the data set which is not favored by the user are shielded, and the data is intelligently pushed according to the preference of the user.

Example 2

When a document for hyperlink appears, the document has video, and before the document is stored, the body of the document is judged, and the specific steps are as follows:

step1, calculating the number of text data in a document to obtain 15000 characters;

Step2, extracting captions and text in videos in the file, calculating words in all the words in the file, determining the most-appearing words as the subject information of the mixed data, and obtaining 105 words with the most-appearing words as 'future environments', wherein 105 words with the most-appearing words in the video captions and the text are obtained respectively, and 4000 words with the most-appearing words in the text are obtained respectively;

step3, regarding whether the text data is a main body, since the video duration is two minutes and ten seconds, the number standard of the specified text is 8000, and the judgment according to the formula is:）;（/>）;

finally, the text data in the document is judged as a main body, and the text data is stored in a document memory when the document is stored.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The large model data management method based on deep learning is characterized by comprising the following steps of: the method comprises the following steps:

s1, acquiring various real-time large-scale data information in a database query and sensor sensing mode, and summarizing and transmitting the acquired data information to a system for processing;

In the aspect of text data, the historical data is investigated, the historical data is classified according to content, and all text historical data is specifically classified into by a decision tree classification algorithmn categories, wherein the number of occurrences of each word in the history data of each category is calculated by investigating the history data of the n categories, so as to obtain each wordThe proportion of the total word present, +.>For the duty cycle of each word occurrence, +.>The number of occurrences of each word, +.>For all word numbers that occur, the duty ratio calculation formula is: />；

collecting historical audio data, and performing time domain analysis on the audio data to obtain time domain waveforms of the audio data Time domain waveform for audio data>Performing Fourier transform to obtain a spectrum of the audio data in a frequency domain, wherein the formula is as follows:；

spectral representation of fourier transformsCalculating amplitude as a complex function consisting of real and imaginary parts, requiring calculation of the amplitude spectrum of each frequency component, setting the frequency spectrum +.>The magnitude spectrum is +.>Wherein->As real part->Is imaginary, f is frequency, ">Is angular frequency; the amplitudes and frequencies in all spectra are calculated as:；/>；

obtaining the frequency spectrum of the historical audio data under each categoryIs the same in amplitude and frequency and most frequently occurs Will be Obtained from each category/> Determining key waveforms of corresponding historical audio data, and taking the key waveforms as real-time audio data A classified criterion;

in the aspect of video data, historical data is investigated, the historical data is classified according to content and is classified by a decision tree Class algorithms divide all video history data into r categories, for all video history data Performing frame division, and performing block division on each frame of image, wherein the center of the 3×3 block matrix is divided under each frame of image The RGB color collocation of the blocks and the surrounding 8 blocks forms matrix colors, and each frame is divided into/> A block color matrix is provided which is a matrix of block colors,/> represents one A color matrix consisting of nine RGB values, each block of color matrix consisting of RGB values of 3 x 3,/> is in a color matrix The RGB value of the center,/> the specific matrix is as follows:/> ；

by counting the number of each matrix color in an image frame, and then counting each matrix color in an image frame The ratio of the matrix colors is calculated to obtain the matrix color q with the largest ratio in a frame of image, and each video history data is provided with The body is divided into j frames, each frame obtains a maximum matrix color q to form a set The method comprises the steps of carrying out a first treatment on the surface of the For j matrices Comparing and calculating the colors, determining the same matrix color as the same matrix color, and calculating each matrix color in all frames The number of colors and the proportion of the color matrix to the total color matrix, and the matrix color with the largest proportion is defined as the key matrix color of the video data Taking the key matrix color as a standard of real-time video data classification;

s3, utilizing information standards of historical data classification to carry out most remarkable information and key information in the collected real-time data Line comparison, realizing automatic classification of real-time data;

using the obtained classification criteria of text, audio and video data, the method can determine the parties sensed by database inquiry and sensor Acquiring various new real-time data, classifying the newly acquired real-time data, and composing by using the obtained key information The data set utilizes a neural network algorithm to carry out deep machine learning, so as to realize classification of different data, and the specific method is as follows:

judging the real-time text data, counting the number of each word in the acquired real-time text data The most abundant words obtained Key words and each category of history data/> Comparing, if the two are the same, the text is written Word data is assigned to data categories corresponding to key words, and if there is no identical key word, a new number is formed separately According to category;

judging the real-time audio data, and observing the collected real-time audio data by using a spectrometer to obtain real-time audio The frequency spectrum of the data, the occurrence times of each waveform in the frequency spectrum are calculated, and the waveform s with the largest occurrence times and the history of each category are calculated Audio data key waveform Comparing, if the real-time audio data are the same, classifying the real-time audio data into/> Representative audio data, if If the same key waveforms do not exist, defining the data represented by the s waveforms as new data types;

judging real-time video data, framing the obtained video data, and blocking each frame of picture Then, the central block in the 3 x 3 block matrix is matched with RGB colors of 8 blocks around to form matrix colors, and a frame is calculated The number of each matrix color in the image is then calculated for each matrix color in each frame of the image to account for all matrix colors Proportion, obtain the matrix color with the largest proportion in one frame image Each video history data is specifically divided into j frames, each The frame gets a maximum matrix color/> Form a collection/> The method comprises the steps of carrying out a first treatment on the surface of the Comparing j matrix colors Comparing the calculation, the same matrix color is defined as the same matrix color, and the quantity and the sum of each matrix color in all frames are calculated The proportion of the total color matrix is calculated to obtain the matrix color with the largest proportion and the largest quantity /> ；

Color of real-time matrix Key matrix colors for each type of historical video data/> Comparing, if the two are the same, then Attributing real-time video data to/> The represented video data type, if not identical color matching matrix, will/> Substituted for Of watchesVideo data is defined as a new video data category;

s4, the acquired real-time data is safely stored after analysis and arrangement, and the acquired real-time data is stored according to the information contained in the real-time data Different capacities, a plurality of storage spaces are partitioned for storage;

encrypting the classified data by an AES symmetric encryption algorithm to obtain encrypted ciphertext data; in the pair of ciphertext numbers Before the data is stored, the storage space in the system is subjected to region segmentation processing, and the storage space of the whole system is divided into characters The memory, the audio memory and the video memory are three data memories, and the word memory is divided into different key words n blocks of storage units for dividing the audio memory into different key waveforms m-block memory unit for dividing video memory into different key color pictures r storage units for respectively storing different data types according to the corresponding key information Stored in different memory cells;

When the data stored in real time are recorded in different types of storage units after being classified, the mixed data are mixed, the main body and the object of the mixed data are judged when the mixed data are classified, and after the main body data are obtained, further analysis and classification are carried out according to the main body;

in the mixed data, there are cases where text and video are combined with text and audio, and the mixed number is determined at the time of classification The subject is one of three kinds of data, the other data except the subject are all objects, and the subject is the subject in the mixed data A vector for expressing information within the data, and the subject is a vector serving the subject, the subject being further interpreted; judging the mixing number The process of the main data in the data is specifically as follows:

step1, judging the subject and object in the mixed data, calculating the character data in the mixed data, firstly calculating Calculating the word number of the word data in the mixed data to obtain the total word number of the word data in the mixed data as count;

step2, extracting the captions and the text in the video or audio data in the mixed data, and calculating all the text in the mixed data The words with the most occurrence are defined as the topic information of the mixed data, and the topics in the video captions and the text are respectively obtained The number of information i and the number of topic information in the text data v;

step3, judging the subject and object in the mixed data, and converting into judging whether the text data in the mixed data is the subject If the character data is not subject, the character data is subject, and if the character data is subject, the other data is subject, and the character data is subject The judgment is specifically as follows: ；

when the total word number of the word data in the mixed data is greater than t, t is the word number standard for judging whether the word data is a main body, t The value of (2) is determined by the duration of the video and audio data; the number of subject information in the text data is larger than that of the main video subtitle and text file The number of the question information i, when two conditions are met, judging that the text data in the mixed data is the main body, otherwise, the number of the video and the audio is the number The main body; the mixed data of video and audio do not need to judge that the video data are taken as main bodies;

when the mixed data is classified and stored, the mixed data is stored according to the main body of the mixed data, the mixed data is stored in a character memory when the main body is character data, the mixed data is stored in other types of memories when the main body is other data, and the classified and stored mixed data is completed;

S5, after the data are classified and stored, predicting user data browsing preference according to the history of the user browsing the data, and performing accurate data classified pushing on the user;

predicting the preference of the user for browsing the data according to the browsing record of the user in the system,carry out accurate data push according to data category, specifically do:

recording browsing data when a user uses the system, extracting all historical browsing data, and classifying according to the data Method for extracting key information, extracting key information from historical browsing data of user, and browsing all histories of user Classifying the data according to the classification standard to obtain a data class set Calculating the calendar of each category The number of history browsing data, each history browsing data in the collection/> From large to small Line sorting, predicting the preference degree of the user for each data according to the sorting of the historical browsing data types, and the historical browsing number The greater the number of data, the more favored the user is for such data categories;

analyzing the sorting result, setting the maximum quantity of the historical browsing data as u, and setting the historyThe minimum number of browsing data is e, according to the formula: ；/>；/>；

In the formula, o is one third of the maximum and minimum difference values of the amount of the historical browsing data, 、/>all are criteria for sequencing and segmenting;

dividing the sorting result into three data sets according to the above formula to As a standard, the historical browsing data quantity The interval is defined as the favorite data set of the user, and the historical browsing data quantity is calculated/> The interval is determined to be more influenced User favorites data set, historical browsing data quantity/> The interval is defined as a data set which is not favored by the user; according to three numbers According to the data set, when a user uses the system, the data category in the data set which is most favored by the user is actively pushed, and the data category is favored by the user The data set is not processed, the data categories within the user disfavored data set are masked,intelligent pushing of data according to user preferences。

2.Deep learning-based large model data management system applied to the deep learning-based large model data management system as set forth in claim 1 The model data management method is characterized in that: the data management system comprises a data acquisition module and a data classification processing module Block, data storageThe system comprises a module and an intelligent data pushing module;

The data classification processing module is used for classifying, integrating and analyzing the data acquired in real time to obtain different data Is a data type of (a);

the intelligent push data module is used for storing the data in a classified mode and recording according to the history of the user browsing the data Recording, predicting user data browsing preference, and accurately carrying out userAnd data classification pushing.

3.The deep learning based large model data management system of claim 2, wherein: the data is The acquisition module acquires various real-time data information in a database query and sensor sensing mode, and acquires the acquired real-time number Summarizing and transmitting data according to informationAnd (5) a data processing module.

4.The deep learning based large model data management system of claim 2, wherein: the data is The processing module receives the real-time data information from the data acquisition module, analyzes all real-time data, and stores the history Data is divided into three types of text, video and audio A type; counting information in the historical data under three historical data types Calculating, namely obtaining key information of each type of historical data as a standard for judging and classifying real-time data; letter using real-time data classification Information standard, by entering information and key information in all real-time dataAnd (3) performing row comparison to realize automatic classification of real-time data.

5. The deep learning based large model data management system of claim 2, wherein: the data storage module comprises a data encryption unit and a classification storage unit;

the classifying storage unit classifies the data according to the data set obtained by classifying the data by the data processing module, and stores the storage space into And the data of the same data set are stored in the same storage space by line division, so that the data reading speed is ensured.

6.The deep learning based large model data management system of claim 2, wherein: the intelligence The push data module records browsing data when a user uses the system, extracts all historical browsing data and calculates the number of the historical browsing data Method for extracting key information according to classification, extracting key information from historical browsing data of user, and obtaining all the user information The historical browsing data are classified according to the classification standard, the number of the historical browsing data of each category is calculated, and the number is large according to the number Sorting to small, predicting the preference degree of the user for each data according to the sorting of the historical browsing data types, and according to the following The user prefers to intelligently push data.