CN115098788A

CN115098788A - Recommendation method for massive digital information

Info

Publication number: CN115098788A
Application number: CN202210879288.9A
Authority: CN
Inventors: 廖秋平
Original assignee: Guangzhou Shengwang Information Technology Co ltd
Current assignee: Guangzhou Shengwang Information Technology Co ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2022-09-23

Abstract

The invention discloses a recommendation method of massive digital information, which relates to the technical field of data information identification and solves the problem of data information recommendation, and the method comprises the following steps: providing dynamic information representation; step two, adaptively filtering original digital information; step three, presenting recommendation information to a user; step four, receiving feedback from a user; and fifthly, the original digital information can be continuously and adaptively filtered in the dynamic data information through the adaptive information processing algorithm, the application capability of the data information is improved through a recommendation method, and the data processing efficiency is improved. The method and the device can dynamically recommend different data information to the user, classify, calculate and diagnose the recommended data information, improve the recommendation efficiency of the different data information by a function calculation method, and greatly improve the information identification capability.

Description

Recommendation method for massive digital information

Technical Field

The invention relates to the technical field of data information identification, in particular to a recommendation method for massive digital information.

Background

With the development of information technology, especially information communication technology, the internet, social networks, internet of things, mobile internet, cloud computing and the like are continuously introduced into daily work and life of people, and the global data information amount is exponentially increased in an explosion manner. The processing demands for large amounts of data within an enterprise have generated a driving force for the development of large amounts of data. The inclusion of big data promotes the development progress of new media industry in China, opens the boundaries between traditional media such as television, radio, newspaper and the like, resolves the boundaries between countries, communities and industries, and resolves the boundaries between information senders and receivers.

In mass data information, how to quickly and effectively acquire data information can improve different data experiences of a user, in the conventional technology, in the face of mass digital information, the user acquires the data information through conventional portal site hot post recommendation and data information link, and the method cannot realize quick recommendation of the data information. In the prior art, massive digital information recommendation is improved by adopting a search engine mode, the method cannot meet autonomous recommendation and discovery of users on massive data information, and data information processing difficulty is high.

Disclosure of Invention

Aiming at the technical defects, the invention discloses a recommendation method of massive digital information, which can continuously and adaptively filter original digital information in dynamic data information, improve the application capability of the data information and improve the data processing efficiency through the recommendation method.

The invention adopts the following technical scheme:

a recommendation method for massive digital information comprises the following steps:

providing dynamic information representation;

the step is used for inputting massive digital information and providing dynamic information characteristics through a characteristic template;

step two, adaptively filtering original digital information;

the method is used for adaptively screening unprocessed original digital information recommended to communities and users; according to the extracting device included in the information filtering device, the extracting device can be coupled with the network layer and can receive the data stream from the network layer, and the extracting device can identify and extract the original digital information from the data stream;

step three, presenting recommendation information to a user;

this step is for having a plurality of profiles encoded therein, conveying the recommended original digital information to the user; the configuration files include an adaptive content configuration file and an adaptive collaboration configuration file, and the original digital information is adaptively filtered by responding to the dynamic information characterization, so that recommendation information is generated;

step four, receiving feedback from a user; receiving, by the communication device, feedback from the user in response to the proposed raw digital information;

step five, self-adaptive information processing algorithm; processing the fed-back digital information by the term frequency-inverse document frequency weighting method and the minimum description length method, and updating the response dynamic information characterization, the adaptive information processing algorithm includes original digital information credibility filtering in response to adaptive credibility, updating the credibility table in response to user feedback, and responding to the user feedback table.

As a further technical solution of the present invention, the feature template includes a first feature information module, a second feature information module, and a conversion module, wherein the first feature information module and the second feature information module respectively implement format information conversion and protocol conversion through the conversion module, the first feature information module extracts a format of the digital information, and the second feature information module extracts a protocol of the data information.

As a further technical scheme of the invention, the method for adaptively filtering the original digital information comprises the following steps:

(S21) outputting the data information stream through the network layer,

(S22) coupling with the network layer through the extracting means and receiving the data stream from the network layer;

(S23) converting the data information, which the extracting means adaptively filters the original digital information and transmits to the user, into recommendation information through the communication means;

as a further technical scheme of the invention, the information filtering method comprises the following steps:

coupling an extraction device to the network layer and receiving the data stream from the network layer, the extraction device identifying and extracting the original digital information from the data stream; setting a data information communication protocol, and selecting different frequency bands for the original data information through the data information communication protocol so as to filter the data information in the set frequency bands.

As a further technical scheme of the invention, a method for generating recommendation informationIs composed of

And (4) an algorithm model.

As a further technical solution of the present invention,

the working method of the algorithm model comprises the following steps:

(1) setting the number of clusters to

Will be

Value of random value

Each calculated by K-means

Value and recommendation information

The distance between them;

（2）

the value change trend calculation formula is as follows:

（1）

(3) determining from the result of step (1)

The interval in which the rate of change of the value is the fastest,

the trend of the change is shown,

is shown as

The distance between each recommended value and the clustering center is determined

Range of values:

（2）

(4) the interval is reduced according to the 2-division method, and finally the result is obtained

A value;

(5) and when the recommended information is in the range in the formula (2), recommending the information, and when the recommended information is not in the range in the formula (2), not recommending the information.

As a further technical scheme of the invention, the self-adaptive information processing algorithm comprises the following steps:

step 1: the term frequency-inverse document frequency is used;

processing each raw digital information into a normalized vector having a length of between 20000 to 100000 tokens; hypothetical token

In digital information documents

The more times there are occurrences in (a),

the more and more

Is related to the subject matter of (1); secondly, the first step is to carry out the first,

in all digital information documents

The greater the number of occurrences in (a),

for document

The worse the differentiation of (a); for a given document, the two terms may be passed

Multiplied by each token

Are combined into weights, the file

In token

The weight of (A) is:

（3）

in the formula (3), the first and second groups,

the weight is represented by a weight that is,

representing a set of all digital information documents;

step 2: converting the digital information in each category to

Vector, normalized to a unit length,then averaged to obtain a prototype vector for that category;

and 3, step 3: adopting a minimum description length;

the description length of the entire data set is minimized,

for measuring the quality and cost of a predictive user recommendation information data set,

the rule function is:

（4）

in the formula (4), the first and second groups,

indicating the quality of the user recommendation information data set,

represents all of the pieces of recommendation information data,

a probability function representing all recommendation information data,

a probability function representing the quality of all user recommendation information data sets,

representing the probability that the quality of the predicted user recommendation data set accounts for all recommendation information data sets,

representing the probability function output value of the recommendation information data in the probability of all recommendation information data sets;

and 4, step 4: given function information length and class assuming data independence

And if the probability of the cost of the user recommendation information data set is the product of the probabilities of the single tokens, the probability function output of the cost of the user recommendation information data set is as follows:

（5）

in equation (5), the token

Is a binary value if the token

The distribution is a simple binomial distribution, and then the distance to the recommendation is determined

Is not related to the length of the token

Function of (2)

The formula is expressed as:

（6）

in equation (6), each token is assumed

If a distribution function exists in the mass data information, the function for predicting the user recommendation information data set is as follows:

（7）

in the formula (7), the first and second groups,

represents the total bits of the predicted user recommendation information data set.

The invention is different from the conventional technology in that: the method and the device can dynamically recommend different data information to the user, classify, calculate and diagnose the recommended data information, improve the recommendation efficiency of the different data information by a function calculation method, greatly improve the information identification capability, and are beneficial to processing mass digital information.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive labor, wherein:

FIG. 1 is a schematic flow chart of a recommendation method of the present invention;

fig. 2 is a schematic structural diagram of an embodiment of adaptively filtering original digital information according to the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.

As shown in fig. 1 and fig. 2, a method for recommending massive digital information includes:

providing dynamic information representation;

step two, adaptively filtering original digital information;

step three, presenting recommendation information to a user;

this step is for a user, having a plurality of profiles encoded therein, to convey recommended raw digital information to the user; the configuration files comprise an adaptive content configuration file and an adaptive collaboration configuration file, and original digital information is subjected to adaptive filtering in response to the dynamic information representation, so that recommendation information is generated;

step five, self-adaptive information processing algorithm; processing the fed-back digital information by a term frequency-inverse document frequency weighting method and a minimum description length method, and updating the response dynamic information representation, wherein the adaptive information processing algorithm comprises original digital information reliability filtering responding to adaptive reliability, updating a reliability table responding to user feedback, and responding to the user feedback table;

in general, in particular embodiments, a user feedback table may facilitate providing targeted recommendation information, such as recommendation information, to a preference group by grouping unified feature information in a user into the preference group in response to user preference criteria relative to user preference criteria for which the user is a community.

In the above embodiment, the feature template includes a first feature information module, a second feature information module, and a conversion module, where the first feature information module and the second feature information module respectively implement format information conversion and protocol conversion through the conversion module, the first feature information module extracts a format of the digital information, and the second feature information module extracts a protocol of the data information.

In a specific embodiment, different data information including different data information formats is output through the first characteristic information module, data information of different data information protocols is output through the second characteristic information module, and at least one basic template can be extracted in an application process; the training information is decoded using different feature modules, respectively. And further realize data information representation.

The method for adaptively filtering the original digital information comprises the following steps:

(S21) outputting the data information stream through the network layer,

in particular embodiments, the network layer may be the global internet; the information filtering apparatus may be used with other types of networks, for example, including enterprise-wide networks or "intranets" using a network layer, where user #1 may communicate with other users, such as user #2 and user #3, and may also communicate with distributed network resources, such as distributed network resource #1 and resource # 2; also included is a computer storage means for storing a profile, including an adaptive content profile and an adaptive collaboration profile; additional trend tracking information may be stored in the information storage for later retrieval, or may be transmitted, for example, by user #2 to the network layer for remote analysis;

(S22) coupling with and receiving a data stream from the network layer through the extracting means;

in particular embodiments, the extraction device may identify and extract raw digital information from the data stream; each original digital information has information content; the extraction means using the adaptive content filter and at least a portion of the adaptive content profile to analyze the data stream for the presence of raw digital information; the original digital information is those data entities whose content identifies them as "in the general case, or in the case where the community coupled to the information filtering device may be interested in; the extraction means can delete duplicate information even if the information comes from different sources so that user resources are not wasted by processing and viewing duplicate and accumulated information; the extraction means may also determine whether the information content is related to the community to which the user #1 belongs, using at least part of the community profile and the user profile of the user # 1;

in a specific embodiment, after the user #1 is converted into the recommendation information, the adaptive content filter of the extraction device may include a plurality of community filters and a plurality of user side filters, each filter having a community and user side configuration file, respectively; as the raw digital information is filtered by the extraction means, communicating to it those information predicted to be suitable for a particular member client of a particular community, such as user #1, in response to the respective community and member client profiles; if desired, the extraction device may also include a trustworthiness filter that enables the extraction device to perform trustworthiness filtering on the raw digital information according to a trustworthiness profile.

In the above embodiment, the information filtering method includes:

In the above embodiment, the recommendation information is generated by the method of

And (4) an algorithm model.

In the above-described embodiment of the present invention,

the working method of the algorithm model comprises the following steps:

(1) setting the number of clusters to

Will be

Value of random value

Each calculated by K-means

Value and recommendation information

The distance between them;

（2）

the value change trend calculation formula is as follows:

（1）

(3) determining from the result of step (1)

The interval in which the rate of change of the value changes most rapidly,

the trend of the change is shown,

is shown as

Range of values:

（2）

A value;

In the above embodiments, the dynamic information characterization adaptively filters the raw digital information to generate the recommendation information. In other embodiments, the obtained data information can be classified and bar identification can be used to determine the cluster type of the target recommendation data information, and different cluster types correspond to different pre-constructed recommendation information generation models; acquiring a user identification mark of a query user, and determining user interest information of the query user according to the user identification mark; generating user interest information in advance according to recommendation information data of a user; inputting the user interest information and the query information of the query user into a recommendation information generation model corresponding to the cluster type to acquire the recommendation information recommended by the target. And clustering the recommendation information in advance, and independently constructing a recommendation information generation model for each cluster type.

In a further embodiment, a plurality of weight values of a plurality of recommendation algorithms are set; acquiring a plurality of groups of objects to be recommended output by the recommendation algorithms according to recommendation information data, wherein the recommendation information data comprises the types of the objects and the data information times of a user; and generating recommendation information for the user according to the plurality of weighted values and the plurality of groups of objects to be recommended.

In the above embodiment, the adaptive information processing algorithm includes the steps of:

step 1: processing each original digital information into a standardized vector with the term frequency-inverse document frequency (TF-IDF), the length of the vector being between 20000 and 100000 tokens; hypothetical token

In digital information documents

The greater the number of occurrences in (a),

the more and more

Subject matter of (1)Correlation; secondly, the first step is to carry out the first,

in all digital information documents

The greater the number of occurrences in (a),

for document

The worse the differentiation of (d); for a given document, the two terms may be passed

Multiplied by each token

Are combined into weights, the file

In token

The weight of (A) is:

（3）

in the formula (3), the first and second groups,

the weight is represented by a weight that is,

representing a set of all digital information documents; since the digital information document d typically contains only a small portion of the total vocabulary, the meaning of the appearing words is much greater than the non-appearing words; to emphasize stronger information content in emerging words, the angle between vectors is usedCosine of degree to measure similarity between them;

step 2: converting the digital information in each category to

Vectors, normalized to unit length, then averaged to obtain a prototype vector for that class, with the advantages of this approach including faster computation speed and more compact representation; to classify new digital information documents, the digital information documents are compared to each prototype vector and a prediction rating is given based on the cosine similarity of each class rating, in which step linear regression may be used to convert the results from the classification process to continuous values;

and step 3: minimizing the Description Length of the entire data set using a Minimum Description Length (MDL),

the system is used for measuring the quality and the cost of the data set of the predicted user recommendation information, optimizing the quality and the cost and balancing the quality of the data set of the predicted user recommendation information and the complex cost for realizing the quality;

the rule function is:

（4）

in the formula (4), the first and second groups of the chemical reaction are shown in the specification,

indicating the quality of the user's recommendation information data set,

represents all of the pieces of recommendation information data,

a probability function representing all recommendation information data,

according to Bayes' rule, this is equivalent to maximization

Because of

Is essentially independent of

，

Can be maximized; thus, of the above expression

The explanation is that in order to find the most likely hypothesis for a given data, the hypothesis that minimizes the total code length should be found; the code length is equal to the number of bits required to code a hypothesis plus the number of bits required to code the data for a given hypothesis; given having a token vector

(1 non-zero unique token included in the digital information) and training data

Document of

，

Most probable category of

Is to minimize the coding

A desired bit;

（5）

in equation (5), the token

Is a binary value, if the token

Is not related to the length of the token

Function of (2)

The formula is expressed as:

（6）

in equation (6), each token is assumed

（7）

in the formula (7), the first and second groups,

representing the total bits of the predicted user recommendation information data set.

The total bit is larger than the complexity cost of the extra category specific parameters containing the user recommendation information, the probability model selects the bit required by the actual digital information document to be quickly calculated, and then the problem of low calculation processing efficiency of massive digital information is solved.

TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency). TF-IDF is a statistical method to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of relevance between a document and a user query. In addition to TF-IDF, search engines on the internet use a ranking method based on link analysis to determine the order in which documents appear in search results.

The main idea of TF-IDF is: if a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification. TFIDF is actually: TF, IDF, TF Term Frequency (Term Frequency), IDF Inverse file Frequency (Inverse Document Frequency). TF represents the frequency with which terms appear in document d. The main idea of IDF is: if the documents containing the entry t are fewer, that is, the smaller n is, the larger IDF is, the entry t has good category distinguishing capability. If the number of documents containing the entry t in a certain class of documents C is m, and the total number of documents containing the entry t in other classes is k, it is obvious that the number of documents containing t n = m + k, when m is large, n is also large, and the IDF value obtained according to the IDF formula is small, which means that the category distinguishing capability of the entry t is not strong. In practice, however, if a term frequently appears in a document of a class, it indicates that the term can well represent the characteristics of the text of the class, and such terms should be given higher weight and selected as characteristic words of the text of the class to distinguish the document from other classes. In a given document, the Term Frequency (TF) refers to the frequency with which a given term appears in the document. This number is a normalization of the number of words (term count) to prevent it from biasing towards long documents. (the same word may have a higher number of words in a long document than in a short document, regardless of whether the word is important or not.) for a word in a particular document, its importance may be expressed as: the numerator in the above equation is the number of occurrences of the word in the document, and the denominator is the sum of the number of occurrences of all words in the document.

The recommendation capability can be improved through the method.

Although specific embodiments of the invention have been described herein, it will be understood by those skilled in the art that these embodiments are merely illustrative and that various omissions, substitutions and changes in the form and details of the methods and systems described may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the methods described above to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims

1. A recommendation method for massive digital information is characterized in that: the method comprises the following steps:

providing dynamic information representation;

step two, adaptively filtering original digital information;

step three, presenting recommendation information to a user;

this step is for having a plurality of profiles encoded therein, conveying the recommended original digital information to the user; the configuration files comprise an adaptive content configuration file and an adaptive collaboration configuration file, and original digital information is subjected to adaptive filtering in response to the dynamic information representation, so that recommendation information is generated;

2. The recommendation method of massive digital information according to claim 1, characterized in that: the feature template comprises a first feature information module, a second feature information module and a conversion module, wherein the first feature information module and the second feature information module respectively realize format information conversion and protocol conversion through the conversion module, the first feature information module extracts the format of digital information, and the second feature information module extracts the protocol of data information.

3. The recommendation method of massive digital information according to claim 1, characterized in that: the method for adaptively filtering the original digital information comprises the following steps:

(S21) outputting the data information stream through the network layer,

(S23) converting the data information, which the extracting means adaptively filters the original digital information to be transmitted to the user, into recommendation information through the communication means.

4. The recommendation method of massive digital information according to claim 3, wherein: the information filtering method comprises the following steps:

5. The recommendation method of massive digital information according to claim 1, characterized in that: the method for generating the recommendation information comprises

And (4) an algorithm model.

6. The recommendation method of massive digital information according to claim 5, wherein:

the working method of the algorithm model comprises the following steps:

(1) setting the clustering number as

Will be

Value of random value

Each calculated by K-means

Value and recommendation information

The distance therebetween;

（2）

the value change trend calculation formula is as follows:

（1）

(3) determining from the result of step (1)

The interval in which the rate of change of the value changes most rapidly,

the trend of the change is shown,

is shown as

Range of values:

（2）

A value;

7. The mass digital information recommendation method according to claim 1, wherein: the adaptive information processing algorithm comprises the following steps:

step 1: the term frequency-inverse document frequency is used;

processing each raw digital information into a normalized vector having a length of between 20000 to 100000 tokens; hypothesis token

In digital information documents

The greater the number of occurrences in (a),

the more and more

in all digital information documents

The greater the number of occurrences in (a),

for document

Multiplied by each token

Are combined into weights, the file

In token

The weight of (A) is:

（3）

in the formula (3), the first and second groups,

the weight is represented by a weight that is,

representing a set of all digital information documents;

step 2: converting the digital information in each category to

Vector, normalized to unit length, howeverPost-averaging to obtain prototype vectors for the class;

and step 3: adopting a minimum description length;

the description length of the entire data set is minimized,

for measuring the quality and cost of predicting a user recommendation information data set,

the rule function is:

（4）

in the formula (4), the first and second groups,

indicating the quality of the user recommendation information data set,

represents all of the pieces of recommendation information data,

a probability function representing all recommendation information data,

and 4, step 4: assuming that the data independence gives a function information length and a category c, and the probability of the user recommendation information data set cost is the product of the probabilities of the single tokens, the probability function output of the user recommendation information data set cost is: