[go: up one dir, main page]

CN115098788A - Recommendation method for massive digital information - Google Patents

Recommendation method for massive digital information Download PDF

Info

Publication number
CN115098788A
CN115098788A CN202210879288.9A CN202210879288A CN115098788A CN 115098788 A CN115098788 A CN 115098788A CN 202210879288 A CN202210879288 A CN 202210879288A CN 115098788 A CN115098788 A CN 115098788A
Authority
CN
China
Prior art keywords
information
recommendation
data
digital information
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210879288.9A
Other languages
Chinese (zh)
Inventor
廖秋平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shengwang Information Technology Co ltd
Original Assignee
Guangzhou Shengwang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shengwang Information Technology Co ltd filed Critical Guangzhou Shengwang Information Technology Co ltd
Priority to CN202210879288.9A priority Critical patent/CN115098788A/en
Publication of CN115098788A publication Critical patent/CN115098788A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation method of massive digital information, which relates to the technical field of data information identification and solves the problem of data information recommendation, and the method comprises the following steps: providing dynamic information representation; step two, adaptively filtering original digital information; step three, presenting recommendation information to a user; step four, receiving feedback from a user; and fifthly, the original digital information can be continuously and adaptively filtered in the dynamic data information through the adaptive information processing algorithm, the application capability of the data information is improved through a recommendation method, and the data processing efficiency is improved. The method and the device can dynamically recommend different data information to the user, classify, calculate and diagnose the recommended data information, improve the recommendation efficiency of the different data information by a function calculation method, and greatly improve the information identification capability.

Description

Recommendation method for massive digital information
Technical Field
The invention relates to the technical field of data information identification, in particular to a recommendation method for massive digital information.
Background
With the development of information technology, especially information communication technology, the internet, social networks, internet of things, mobile internet, cloud computing and the like are continuously introduced into daily work and life of people, and the global data information amount is exponentially increased in an explosion manner. The processing demands for large amounts of data within an enterprise have generated a driving force for the development of large amounts of data. The inclusion of big data promotes the development progress of new media industry in China, opens the boundaries between traditional media such as television, radio, newspaper and the like, resolves the boundaries between countries, communities and industries, and resolves the boundaries between information senders and receivers.
In mass data information, how to quickly and effectively acquire data information can improve different data experiences of a user, in the conventional technology, in the face of mass digital information, the user acquires the data information through conventional portal site hot post recommendation and data information link, and the method cannot realize quick recommendation of the data information. In the prior art, massive digital information recommendation is improved by adopting a search engine mode, the method cannot meet autonomous recommendation and discovery of users on massive data information, and data information processing difficulty is high.
Disclosure of Invention
Aiming at the technical defects, the invention discloses a recommendation method of massive digital information, which can continuously and adaptively filter original digital information in dynamic data information, improve the application capability of the data information and improve the data processing efficiency through the recommendation method.
The invention adopts the following technical scheme:
a recommendation method for massive digital information comprises the following steps:
providing dynamic information representation;
the step is used for inputting massive digital information and providing dynamic information characteristics through a characteristic template;
step two, adaptively filtering original digital information;
the method is used for adaptively screening unprocessed original digital information recommended to communities and users; according to the extracting device included in the information filtering device, the extracting device can be coupled with the network layer and can receive the data stream from the network layer, and the extracting device can identify and extract the original digital information from the data stream;
step three, presenting recommendation information to a user;
this step is for having a plurality of profiles encoded therein, conveying the recommended original digital information to the user; the configuration files include an adaptive content configuration file and an adaptive collaboration configuration file, and the original digital information is adaptively filtered by responding to the dynamic information characterization, so that recommendation information is generated;
step four, receiving feedback from a user; receiving, by the communication device, feedback from the user in response to the proposed raw digital information;
step five, self-adaptive information processing algorithm; processing the fed-back digital information by the term frequency-inverse document frequency weighting method and the minimum description length method, and updating the response dynamic information characterization, the adaptive information processing algorithm includes original digital information credibility filtering in response to adaptive credibility, updating the credibility table in response to user feedback, and responding to the user feedback table.
As a further technical solution of the present invention, the feature template includes a first feature information module, a second feature information module, and a conversion module, wherein the first feature information module and the second feature information module respectively implement format information conversion and protocol conversion through the conversion module, the first feature information module extracts a format of the digital information, and the second feature information module extracts a protocol of the data information.
As a further technical scheme of the invention, the method for adaptively filtering the original digital information comprises the following steps:
(S21) outputting the data information stream through the network layer,
(S22) coupling with the network layer through the extracting means and receiving the data stream from the network layer;
(S23) converting the data information, which the extracting means adaptively filters the original digital information and transmits to the user, into recommendation information through the communication means;
as a further technical scheme of the invention, the information filtering method comprises the following steps:
coupling an extraction device to the network layer and receiving the data stream from the network layer, the extraction device identifying and extracting the original digital information from the data stream; setting a data information communication protocol, and selecting different frequency bands for the original data information through the data information communication protocol so as to filter the data information in the set frequency bands.
As a further technical scheme of the invention, a method for generating recommendation informationIs composed of
Figure 212153DEST_PATH_IMAGE002
And (4) an algorithm model.
As a further technical solution of the present invention,
Figure 426097DEST_PATH_IMAGE004
the working method of the algorithm model comprises the following steps:
(1) setting the number of clusters to
Figure 348048DEST_PATH_IMAGE005
Will be
Figure 647442DEST_PATH_IMAGE005
Value of random value
Figure 827888DEST_PATH_IMAGE007
Each calculated by K-means
Figure 145606DEST_PATH_IMAGE005
Value and recommendation information
Figure 487725DEST_PATH_IMAGE008
The distance between them;
(2)
Figure 539995DEST_PATH_IMAGE008
the value change trend calculation formula is as follows:
Figure 731987DEST_PATH_IMAGE010
(1)
(3) determining from the result of step (1)
Figure 920522DEST_PATH_IMAGE008
The interval in which the rate of change of the value is the fastest,
Figure 699123DEST_PATH_IMAGE011
the trend of the change is shown,
Figure 222377DEST_PATH_IMAGE013
is shown as
Figure 744625DEST_PATH_IMAGE015
The distance between each recommended value and the clustering center is determined
Figure 787667DEST_PATH_IMAGE005
Range of values:
Figure 222322DEST_PATH_IMAGE017
(2)
(4) the interval is reduced according to the 2-division method, and finally the result is obtained
Figure 983604DEST_PATH_IMAGE018
A value;
(5) and when the recommended information is in the range in the formula (2), recommending the information, and when the recommended information is not in the range in the formula (2), not recommending the information.
As a further technical scheme of the invention, the self-adaptive information processing algorithm comprises the following steps:
step 1: the term frequency-inverse document frequency is used;
processing each raw digital information into a normalized vector having a length of between 20000 to 100000 tokens; hypothetical token
Figure 43964DEST_PATH_IMAGE019
In digital information documents
Figure 190781DEST_PATH_IMAGE020
The more times there are occurrences in (a),
Figure 45604DEST_PATH_IMAGE019
the more and more
Figure 559762DEST_PATH_IMAGE021
Is related to the subject matter of (1); secondly, the first step is to carry out the first,
Figure 908966DEST_PATH_IMAGE019
in all digital information documents
Figure 661022DEST_PATH_IMAGE008
The greater the number of occurrences in (a),
Figure 217905DEST_PATH_IMAGE019
for document
Figure 937468DEST_PATH_IMAGE008
The worse the differentiation of (a); for a given document, the two terms may be passed
Figure 339631DEST_PATH_IMAGE023
Multiplied by each token
Figure 211772DEST_PATH_IMAGE025
Are combined into weights, the file
Figure 424709DEST_PATH_IMAGE026
In token
Figure 382301DEST_PATH_IMAGE019
The weight of (A) is:
Figure 588155DEST_PATH_IMAGE028
(3)
in the formula (3), the first and second groups,
Figure 564070DEST_PATH_IMAGE029
the weight is represented by a weight that is,
Figure 462756DEST_PATH_IMAGE030
representing a set of all digital information documents;
step 2: converting the digital information in each category to
Figure 642064DEST_PATH_IMAGE031
Vector, normalized to a unit length,then averaged to obtain a prototype vector for that category;
and 3, step 3: adopting a minimum description length;
the description length of the entire data set is minimized,
Figure 136762DEST_PATH_IMAGE033
for measuring the quality and cost of a predictive user recommendation information data set,
Figure 983495DEST_PATH_IMAGE034
the rule function is:
Figure 787503DEST_PATH_IMAGE036
(4)
in the formula (4), the first and second groups,
Figure 968954DEST_PATH_IMAGE038
indicating the quality of the user recommendation information data set,
Figure 516610DEST_PATH_IMAGE040
represents all of the pieces of recommendation information data,
Figure 217850DEST_PATH_IMAGE042
a probability function representing all recommendation information data,
Figure 943492DEST_PATH_IMAGE043
a probability function representing the quality of all user recommendation information data sets,
Figure 362972DEST_PATH_IMAGE045
representing the probability that the quality of the predicted user recommendation data set accounts for all recommendation information data sets,
Figure 448740DEST_PATH_IMAGE047
representing the probability function output value of the recommendation information data in the probability of all recommendation information data sets;
and 4, step 4: given function information length and class assuming data independence
Figure 253753DEST_PATH_IMAGE048
And if the probability of the cost of the user recommendation information data set is the product of the probabilities of the single tokens, the probability function output of the cost of the user recommendation information data set is as follows:
Figure 665143DEST_PATH_IMAGE050
(5)
in equation (5), the token
Figure 571919DEST_PATH_IMAGE019
Is a binary value if the token
Figure 680952DEST_PATH_IMAGE019
The distribution is a simple binomial distribution, and then the distance to the recommendation is determined
Figure 622363DEST_PATH_IMAGE008
Is not related to the length of the token
Figure 939075DEST_PATH_IMAGE019
Function of (2)
Figure 582415DEST_PATH_IMAGE052
The formula is expressed as:
Figure 9985DEST_PATH_IMAGE054
(6)
in equation (6), each token is assumed
Figure 805903DEST_PATH_IMAGE056
If a distribution function exists in the mass data information, the function for predicting the user recommendation information data set is as follows:
Figure 559095DEST_PATH_IMAGE057
(7)
in the formula (7), the first and second groups,
Figure 191196DEST_PATH_IMAGE059
represents the total bits of the predicted user recommendation information data set.
The invention is different from the conventional technology in that: the method and the device can dynamically recommend different data information to the user, classify, calculate and diagnose the recommended data information, improve the recommendation efficiency of the different data information by a function calculation method, greatly improve the information identification capability, and are beneficial to processing mass digital information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive labor, wherein:
FIG. 1 is a schematic flow chart of a recommendation method of the present invention;
fig. 2 is a schematic structural diagram of an embodiment of adaptively filtering original digital information according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
As shown in fig. 1 and fig. 2, a method for recommending massive digital information includes:
providing dynamic information representation;
the step is used for inputting massive digital information and providing dynamic information characteristics through a characteristic template;
step two, adaptively filtering original digital information;
the method is used for adaptively screening unprocessed original digital information recommended to communities and users; according to the extracting device included in the information filtering device, the extracting device can be coupled with the network layer and can receive the data stream from the network layer, and the extracting device can identify and extract the original digital information from the data stream;
step three, presenting recommendation information to a user;
this step is for a user, having a plurality of profiles encoded therein, to convey recommended raw digital information to the user; the configuration files comprise an adaptive content configuration file and an adaptive collaboration configuration file, and original digital information is subjected to adaptive filtering in response to the dynamic information representation, so that recommendation information is generated;
step four, receiving feedback from a user; receiving, by the communication device, feedback from the user in response to the proposed raw digital information;
step five, self-adaptive information processing algorithm; processing the fed-back digital information by a term frequency-inverse document frequency weighting method and a minimum description length method, and updating the response dynamic information representation, wherein the adaptive information processing algorithm comprises original digital information reliability filtering responding to adaptive reliability, updating a reliability table responding to user feedback, and responding to the user feedback table;
in general, in particular embodiments, a user feedback table may facilitate providing targeted recommendation information, such as recommendation information, to a preference group by grouping unified feature information in a user into the preference group in response to user preference criteria relative to user preference criteria for which the user is a community.
In the above embodiment, the feature template includes a first feature information module, a second feature information module, and a conversion module, where the first feature information module and the second feature information module respectively implement format information conversion and protocol conversion through the conversion module, the first feature information module extracts a format of the digital information, and the second feature information module extracts a protocol of the data information.
In a specific embodiment, different data information including different data information formats is output through the first characteristic information module, data information of different data information protocols is output through the second characteristic information module, and at least one basic template can be extracted in an application process; the training information is decoded using different feature modules, respectively. And further realize data information representation.
The method for adaptively filtering the original digital information comprises the following steps:
(S21) outputting the data information stream through the network layer,
in particular embodiments, the network layer may be the global internet; the information filtering apparatus may be used with other types of networks, for example, including enterprise-wide networks or "intranets" using a network layer, where user #1 may communicate with other users, such as user #2 and user #3, and may also communicate with distributed network resources, such as distributed network resource #1 and resource # 2; also included is a computer storage means for storing a profile, including an adaptive content profile and an adaptive collaboration profile; additional trend tracking information may be stored in the information storage for later retrieval, or may be transmitted, for example, by user #2 to the network layer for remote analysis;
(S22) coupling with and receiving a data stream from the network layer through the extracting means;
in particular embodiments, the extraction device may identify and extract raw digital information from the data stream; each original digital information has information content; the extraction means using the adaptive content filter and at least a portion of the adaptive content profile to analyze the data stream for the presence of raw digital information; the original digital information is those data entities whose content identifies them as "in the general case, or in the case where the community coupled to the information filtering device may be interested in; the extraction means can delete duplicate information even if the information comes from different sources so that user resources are not wasted by processing and viewing duplicate and accumulated information; the extraction means may also determine whether the information content is related to the community to which the user #1 belongs, using at least part of the community profile and the user profile of the user # 1;
(S23) converting the data information, which the extracting means adaptively filters the original digital information and transmits to the user, into recommendation information through the communication means;
in a specific embodiment, after the user #1 is converted into the recommendation information, the adaptive content filter of the extraction device may include a plurality of community filters and a plurality of user side filters, each filter having a community and user side configuration file, respectively; as the raw digital information is filtered by the extraction means, communicating to it those information predicted to be suitable for a particular member client of a particular community, such as user #1, in response to the respective community and member client profiles; if desired, the extraction device may also include a trustworthiness filter that enables the extraction device to perform trustworthiness filtering on the raw digital information according to a trustworthiness profile.
In the above embodiment, the information filtering method includes:
coupling an extraction device to the network layer and receiving the data stream from the network layer, the extraction device identifying and extracting the original digital information from the data stream; setting a data information communication protocol, and selecting different frequency bands for the original data information through the data information communication protocol so as to filter the data information in the set frequency bands.
In the above embodiment, the recommendation information is generated by the method of
Figure 156878DEST_PATH_IMAGE060
And (4) an algorithm model.
In the above-described embodiment of the present invention,
Figure 525411DEST_PATH_IMAGE061
the working method of the algorithm model comprises the following steps:
(1) setting the number of clusters to
Figure 449505DEST_PATH_IMAGE005
Will be
Figure 818169DEST_PATH_IMAGE062
Value of random value
Figure 853121DEST_PATH_IMAGE064
Each calculated by K-means
Figure 120503DEST_PATH_IMAGE005
Value and recommendation information
Figure 215498DEST_PATH_IMAGE065
The distance between them;
(2)
Figure 805879DEST_PATH_IMAGE008
the value change trend calculation formula is as follows:
Figure 378943DEST_PATH_IMAGE067
(1)
(3) determining from the result of step (1)
Figure 253227DEST_PATH_IMAGE068
The interval in which the rate of change of the value changes most rapidly,
Figure 519123DEST_PATH_IMAGE011
the trend of the change is shown,
Figure 331222DEST_PATH_IMAGE070
is shown as
Figure 724288DEST_PATH_IMAGE071
The distance between each recommended value and the clustering center is determined
Figure 203811DEST_PATH_IMAGE005
Range of values:
Figure 640608DEST_PATH_IMAGE073
(2)
(4) the interval is reduced according to the 2-division method, and finally the result is obtained
Figure 205582DEST_PATH_IMAGE005
A value;
(5) and when the recommended information is in the range in the formula (2), recommending the information, and when the recommended information is not in the range in the formula (2), not recommending the information.
In the above embodiments, the dynamic information characterization adaptively filters the raw digital information to generate the recommendation information. In other embodiments, the obtained data information can be classified and bar identification can be used to determine the cluster type of the target recommendation data information, and different cluster types correspond to different pre-constructed recommendation information generation models; acquiring a user identification mark of a query user, and determining user interest information of the query user according to the user identification mark; generating user interest information in advance according to recommendation information data of a user; inputting the user interest information and the query information of the query user into a recommendation information generation model corresponding to the cluster type to acquire the recommendation information recommended by the target. And clustering the recommendation information in advance, and independently constructing a recommendation information generation model for each cluster type.
In a further embodiment, a plurality of weight values of a plurality of recommendation algorithms are set; acquiring a plurality of groups of objects to be recommended output by the recommendation algorithms according to recommendation information data, wherein the recommendation information data comprises the types of the objects and the data information times of a user; and generating recommendation information for the user according to the plurality of weighted values and the plurality of groups of objects to be recommended.
In the above embodiment, the adaptive information processing algorithm includes the steps of:
step 1: processing each original digital information into a standardized vector with the term frequency-inverse document frequency (TF-IDF), the length of the vector being between 20000 and 100000 tokens; hypothetical token
Figure 104137DEST_PATH_IMAGE019
In digital information documents
Figure 703745DEST_PATH_IMAGE074
The greater the number of occurrences in (a),
Figure 311444DEST_PATH_IMAGE019
the more and more
Figure 363714DEST_PATH_IMAGE075
Subject matter of (1)Correlation; secondly, the first step is to carry out the first,
Figure 567424DEST_PATH_IMAGE019
in all digital information documents
Figure 21539DEST_PATH_IMAGE021
The greater the number of occurrences in (a),
Figure 800140DEST_PATH_IMAGE019
for document
Figure 588973DEST_PATH_IMAGE008
The worse the differentiation of (d); for a given document, the two terms may be passed
Figure 111221DEST_PATH_IMAGE076
Multiplied by each token
Figure 154263DEST_PATH_IMAGE077
Are combined into weights, the file
Figure DEST_PATH_IMAGE078
In token
Figure 323339DEST_PATH_IMAGE019
The weight of (A) is:
Figure 84621DEST_PATH_IMAGE080
(3)
in the formula (3), the first and second groups,
Figure 144981DEST_PATH_IMAGE029
the weight is represented by a weight that is,
Figure 557377DEST_PATH_IMAGE082
representing a set of all digital information documents; since the digital information document d typically contains only a small portion of the total vocabulary, the meaning of the appearing words is much greater than the non-appearing words; to emphasize stronger information content in emerging words, the angle between vectors is usedCosine of degree to measure similarity between them;
step 2: converting the digital information in each category to
Figure DEST_PATH_IMAGE084
Vectors, normalized to unit length, then averaged to obtain a prototype vector for that class, with the advantages of this approach including faster computation speed and more compact representation; to classify new digital information documents, the digital information documents are compared to each prototype vector and a prediction rating is given based on the cosine similarity of each class rating, in which step linear regression may be used to convert the results from the classification process to continuous values;
and step 3: minimizing the Description Length of the entire data set using a Minimum Description Length (MDL),
Figure 881042DEST_PATH_IMAGE085
the system is used for measuring the quality and the cost of the data set of the predicted user recommendation information, optimizing the quality and the cost and balancing the quality of the data set of the predicted user recommendation information and the complex cost for realizing the quality;
Figure DEST_PATH_IMAGE087
the rule function is:
Figure DEST_PATH_IMAGE089
(4)
in the formula (4), the first and second groups of the chemical reaction are shown in the specification,
Figure 83616DEST_PATH_IMAGE037
indicating the quality of the user's recommendation information data set,
Figure DEST_PATH_IMAGE091
represents all of the pieces of recommendation information data,
Figure DEST_PATH_IMAGE093
a probability function representing all recommendation information data,
Figure DEST_PATH_IMAGE094
a probability function representing the quality of all user recommendation information data sets,
Figure DEST_PATH_IMAGE096
representing the probability that the quality of the predicted user recommendation data set accounts for all recommendation information data sets,
Figure DEST_PATH_IMAGE098
representing the probability function output value of the recommendation information data in the probability of all recommendation information data sets;
according to Bayes' rule, this is equivalent to maximization
Figure DEST_PATH_IMAGE099
Because of
Figure DEST_PATH_IMAGE092
Is essentially independent of
Figure DEST_PATH_IMAGE101
Figure DEST_PATH_IMAGE102
Can be maximized; thus, of the above expression
Figure DEST_PATH_IMAGE103
The explanation is that in order to find the most likely hypothesis for a given data, the hypothesis that minimizes the total code length should be found; the code length is equal to the number of bits required to code a hypothesis plus the number of bits required to code the data for a given hypothesis; given having a token vector
Figure DEST_PATH_IMAGE105
(1 non-zero unique token included in the digital information) and training data
Figure DEST_PATH_IMAGE106
Document of
Figure 777027DEST_PATH_IMAGE008
Figure 794662DEST_PATH_IMAGE008
Most probable category of
Figure DEST_PATH_IMAGE107
Is to minimize the coding
Figure DEST_PATH_IMAGE108
A desired bit;
and 4, step 4: given function information length and class assuming data independence
Figure 272917DEST_PATH_IMAGE107
And if the probability of the cost of the user recommendation information data set is the product of the probabilities of the single tokens, the probability function output of the cost of the user recommendation information data set is as follows:
Figure DEST_PATH_IMAGE109
(5)
in equation (5), the token
Figure 212054DEST_PATH_IMAGE019
Is a binary value, if the token
Figure 364949DEST_PATH_IMAGE019
The distribution is a simple binomial distribution, and then the distance to the recommendation is determined
Figure 502669DEST_PATH_IMAGE008
Is not related to the length of the token
Figure 230454DEST_PATH_IMAGE019
Function of (2)
Figure DEST_PATH_IMAGE110
The formula is expressed as:
Figure DEST_PATH_IMAGE112
(6)
in equation (6), each token is assumed
Figure DEST_PATH_IMAGE113
If a distribution function exists in the mass data information, the function for predicting the user recommendation information data set is as follows:
Figure DEST_PATH_IMAGE114
(7)
in the formula (7), the first and second groups,
Figure DEST_PATH_IMAGE116
representing the total bits of the predicted user recommendation information data set.
The total bit is larger than the complexity cost of the extra category specific parameters containing the user recommendation information, the probability model selects the bit required by the actual digital information document to be quickly calculated, and then the problem of low calculation processing efficiency of massive digital information is solved.
TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency). TF-IDF is a statistical method to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of relevance between a document and a user query. In addition to TF-IDF, search engines on the internet use a ranking method based on link analysis to determine the order in which documents appear in search results.
The main idea of TF-IDF is: if a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification. TFIDF is actually: TF, IDF, TF Term Frequency (Term Frequency), IDF Inverse file Frequency (Inverse Document Frequency). TF represents the frequency with which terms appear in document d. The main idea of IDF is: if the documents containing the entry t are fewer, that is, the smaller n is, the larger IDF is, the entry t has good category distinguishing capability. If the number of documents containing the entry t in a certain class of documents C is m, and the total number of documents containing the entry t in other classes is k, it is obvious that the number of documents containing t n = m + k, when m is large, n is also large, and the IDF value obtained according to the IDF formula is small, which means that the category distinguishing capability of the entry t is not strong. In practice, however, if a term frequently appears in a document of a class, it indicates that the term can well represent the characteristics of the text of the class, and such terms should be given higher weight and selected as characteristic words of the text of the class to distinguish the document from other classes. In a given document, the Term Frequency (TF) refers to the frequency with which a given term appears in the document. This number is a normalization of the number of words (term count) to prevent it from biasing towards long documents. (the same word may have a higher number of words in a long document than in a short document, regardless of whether the word is important or not.) for a word in a particular document, its importance may be expressed as: the numerator in the above equation is the number of occurrences of the word in the document, and the denominator is the sum of the number of occurrences of all words in the document.
The recommendation capability can be improved through the method.
Although specific embodiments of the invention have been described herein, it will be understood by those skilled in the art that these embodiments are merely illustrative and that various omissions, substitutions and changes in the form and details of the methods and systems described may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the methods described above to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims (7)

1. A recommendation method for massive digital information is characterized in that: the method comprises the following steps:
providing dynamic information representation;
the step is used for inputting massive digital information and providing dynamic information characteristics through a characteristic template;
step two, adaptively filtering original digital information;
the method is used for adaptively screening unprocessed original digital information recommended to communities and users; according to the extracting device included in the information filtering device, the extracting device can be coupled with the network layer and can receive the data stream from the network layer, and the extracting device can identify and extract the original digital information from the data stream;
step three, presenting recommendation information to a user;
this step is for having a plurality of profiles encoded therein, conveying the recommended original digital information to the user; the configuration files comprise an adaptive content configuration file and an adaptive collaboration configuration file, and original digital information is subjected to adaptive filtering in response to the dynamic information representation, so that recommendation information is generated;
step four, receiving feedback from a user; receiving, by the communication device, feedback from the user in response to the proposed raw digital information;
step five, self-adaptive information processing algorithm; processing the fed-back digital information by the term frequency-inverse document frequency weighting method and the minimum description length method, and updating the response dynamic information characterization, the adaptive information processing algorithm includes original digital information credibility filtering in response to adaptive credibility, updating the credibility table in response to user feedback, and responding to the user feedback table.
2. The recommendation method of massive digital information according to claim 1, characterized in that: the feature template comprises a first feature information module, a second feature information module and a conversion module, wherein the first feature information module and the second feature information module respectively realize format information conversion and protocol conversion through the conversion module, the first feature information module extracts the format of digital information, and the second feature information module extracts the protocol of data information.
3. The recommendation method of massive digital information according to claim 1, characterized in that: the method for adaptively filtering the original digital information comprises the following steps:
(S21) outputting the data information stream through the network layer,
(S22) coupling with and receiving a data stream from the network layer through the extracting means;
(S23) converting the data information, which the extracting means adaptively filters the original digital information to be transmitted to the user, into recommendation information through the communication means.
4. The recommendation method of massive digital information according to claim 3, wherein: the information filtering method comprises the following steps:
coupling an extraction device to the network layer and receiving the data stream from the network layer, the extraction device identifying and extracting the original digital information from the data stream; setting a data information communication protocol, and selecting different frequency bands for the original data information through the data information communication protocol so as to filter the data information in the set frequency bands.
5. The recommendation method of massive digital information according to claim 1, characterized in that: the method for generating the recommendation information comprises
Figure 920132DEST_PATH_IMAGE001
And (4) an algorithm model.
6. The recommendation method of massive digital information according to claim 5, wherein:
Figure 306114DEST_PATH_IMAGE003
the working method of the algorithm model comprises the following steps:
(1) setting the clustering number as
Figure 305425DEST_PATH_IMAGE004
Will be
Figure 372738DEST_PATH_IMAGE005
Value of random value
Figure 655952DEST_PATH_IMAGE007
Each calculated by K-means
Figure 196524DEST_PATH_IMAGE005
Value and recommendation information
Figure 932398DEST_PATH_IMAGE008
The distance therebetween;
(2)
Figure 865719DEST_PATH_IMAGE008
the value change trend calculation formula is as follows:
Figure 3440DEST_PATH_IMAGE010
(1)
(3) determining from the result of step (1)
Figure 216377DEST_PATH_IMAGE008
The interval in which the rate of change of the value changes most rapidly,
Figure 439548DEST_PATH_IMAGE011
the trend of the change is shown,
Figure 848664DEST_PATH_IMAGE013
is shown as
Figure 90158DEST_PATH_IMAGE014
The distance between each recommended value and the clustering center is determined
Figure 988844DEST_PATH_IMAGE015
Range of values:
Figure 699311DEST_PATH_IMAGE017
(2)
(4) the interval is reduced according to the 2-division method, and finally the result is obtained
Figure 443276DEST_PATH_IMAGE004
A value;
(5) and when the recommended information is in the range in the formula (2), recommending the information, and when the recommended information is not in the range in the formula (2), not recommending the information.
7. The mass digital information recommendation method according to claim 1, wherein: the adaptive information processing algorithm comprises the following steps:
step 1: the term frequency-inverse document frequency is used;
processing each raw digital information into a normalized vector having a length of between 20000 to 100000 tokens; hypothesis token
Figure 571901DEST_PATH_IMAGE018
In digital information documents
Figure 375908DEST_PATH_IMAGE019
The greater the number of occurrences in (a),
Figure 776934DEST_PATH_IMAGE018
the more and more
Figure 573858DEST_PATH_IMAGE020
Is related to the subject matter of (1); secondly, the first step is to carry out the first,
Figure 540677DEST_PATH_IMAGE018
in all digital information documents
Figure 781165DEST_PATH_IMAGE008
The greater the number of occurrences in (a),
Figure 466224DEST_PATH_IMAGE018
for document
Figure 568304DEST_PATH_IMAGE008
The worse the differentiation of (a); for a given document, the two terms may be passed
Figure 592891DEST_PATH_IMAGE022
Multiplied by each token
Figure 4281DEST_PATH_IMAGE024
Are combined into weights, the file
Figure 160325DEST_PATH_IMAGE025
In token
Figure 49783DEST_PATH_IMAGE018
The weight of (A) is:
Figure 256774DEST_PATH_IMAGE027
(3)
in the formula (3), the first and second groups,
Figure 839065DEST_PATH_IMAGE028
the weight is represented by a weight that is,
Figure 187132DEST_PATH_IMAGE029
representing a set of all digital information documents;
step 2: converting the digital information in each category to
Figure 880281DEST_PATH_IMAGE031
Vector, normalized to unit length, howeverPost-averaging to obtain prototype vectors for the class;
and step 3: adopting a minimum description length;
the description length of the entire data set is minimized,
Figure 410620DEST_PATH_IMAGE033
for measuring the quality and cost of predicting a user recommendation information data set,
Figure 413080DEST_PATH_IMAGE034
the rule function is:
Figure DEST_PATH_IMAGE035
(4)
in the formula (4), the first and second groups,
Figure 28869DEST_PATH_IMAGE036
indicating the quality of the user recommendation information data set,
Figure 525709DEST_PATH_IMAGE038
represents all of the pieces of recommendation information data,
Figure 915147DEST_PATH_IMAGE040
a probability function representing all recommendation information data,
Figure 573662DEST_PATH_IMAGE042
a probability function representing the quality of all user recommendation information data sets,
Figure 145588DEST_PATH_IMAGE044
representing the probability that the quality of the predicted user recommendation data set accounts for all recommendation information data sets,
Figure DEST_PATH_IMAGE045
representing the probability function output value of the recommendation information data in the probability of all recommendation information data sets;
and 4, step 4: assuming that the data independence gives a function information length and a category c, and the probability of the user recommendation information data set cost is the product of the probabilities of the single tokens, the probability function output of the user recommendation information data set cost is:
Figure DEST_PATH_IMAGE047
(5)
in equation (5), the token
Figure 367491DEST_PATH_IMAGE018
Is a binary value if the token
Figure 138001DEST_PATH_IMAGE018
The distribution is a simple binomial distribution, and then the distance to the recommendation is determined
Figure 983729DEST_PATH_IMAGE048
Is not related to the length of the token
Figure 839689DEST_PATH_IMAGE018
Function of (2)
Figure DEST_PATH_IMAGE049
The formula is expressed as:
Figure 599704DEST_PATH_IMAGE051
(6)
in equation (6), each token is assumed
Figure 959141DEST_PATH_IMAGE053
If a distribution function exists in the mass data information, the function for predicting the user recommendation information data set is as follows:
Figure 225037DEST_PATH_IMAGE055
(7)
in the formula (7), the first and second groups,
Figure 302714DEST_PATH_IMAGE057
represents the total bits of the predicted user recommendation information data set.
CN202210879288.9A 2022-07-25 2022-07-25 Recommendation method for massive digital information Pending CN115098788A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210879288.9A CN115098788A (en) 2022-07-25 2022-07-25 Recommendation method for massive digital information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210879288.9A CN115098788A (en) 2022-07-25 2022-07-25 Recommendation method for massive digital information

Publications (1)

Publication Number Publication Date
CN115098788A true CN115098788A (en) 2022-09-23

Family

ID=83299548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210879288.9A Pending CN115098788A (en) 2022-07-25 2022-07-25 Recommendation method for massive digital information

Country Status (1)

Country Link
CN (1) CN115098788A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867799A (en) * 1996-04-04 1999-02-02 Lang; Andrew K. Information system and method for filtering a massive flow of information entities to meet user information classification needs
CN112328671A (en) * 2020-11-03 2021-02-05 北京思特奇信息技术股份有限公司 A data format conversion method, system, storage medium and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867799A (en) * 1996-04-04 1999-02-02 Lang; Andrew K. Information system and method for filtering a massive flow of information entities to meet user information classification needs
CN112328671A (en) * 2020-11-03 2021-02-05 北京思特奇信息技术股份有限公司 A data format conversion method, system, storage medium and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙伟等: "配电台区全业务智能检索与推荐方法研究", 《信息技术》 *

Similar Documents

Publication Publication Date Title
CN111966917B (en) Event detection and summarization method based on pre-training language model
US8560466B2 (en) Method and arrangement for automatic charset detection
CN106934035A (en) Concept drift detection method in a kind of multi-tag data flow based on class and feature distribution
US10387805B2 (en) System and method for ranking news feeds
CN101692639A (en) Bad webpage recognition method based on URL
CN107391772A (en) A kind of file classification method based on naive Bayesian
US11874866B2 (en) Multiscale quantization for fast similarity search
CN104820703A (en) Text fine classification method
CN118410152B (en) Information processing method, question-answering method and question-answering system
US20160224636A1 (en) Scalable system and method for weighted similarity estimation in massive datasets revealed in a streaming fashion
CN111859955A (en) A public opinion data analysis model based on deep learning
Karwa et al. Automated hybrid Deep Neural Network model for fake news identification and classification in social networks
Sumathi et al. Machine learning algorithm-based spam detection in social networks
CN118014703A (en) Visual intelligent decision system and method based on digital platform
CN114491033A (en) Method for building user interest model based on word vector and topic model
CN116455861A (en) Big data-based computer network security monitoring system and method
CN111784402B (en) Order rate prediction method, device and readable storage medium based on multi-channel
CN103218368A (en) Method and device for discovering hot words
CN110413770B (en) Method and device for classifying group messages into group topics
CN105117466A (en) Internet information screening system and method
TW201243627A (en) Multi-label text categorization based on fuzzy similarity and k nearest neighbors
CN115098788A (en) Recommendation method for massive digital information
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
Liao et al. A Deep Concept-aware Model for predicting and explaining restaurant future status
CN116992119A (en) Content pushing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220923