CN115098788A - Recommendation method for massive digital information - Google Patents
Recommendation method for massive digital information Download PDFInfo
- Publication number
- CN115098788A CN115098788A CN202210879288.9A CN202210879288A CN115098788A CN 115098788 A CN115098788 A CN 115098788A CN 202210879288 A CN202210879288 A CN 202210879288A CN 115098788 A CN115098788 A CN 115098788A
- Authority
- CN
- China
- Prior art keywords
- information
- recommendation
- data
- digital information
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000003044 adaptive effect Effects 0.000 claims abstract description 22
- 238000001914 filtration Methods 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 12
- 230000010365 information processing Effects 0.000 claims abstract description 11
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 14
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 13
- 239000000284 extract Substances 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 9
- 230000008878 coupling Effects 0.000 claims description 6
- 238000010168 coupling process Methods 0.000 claims description 6
- 238000005859 coupling reaction Methods 0.000 claims description 6
- 238000012512 characterization method Methods 0.000 claims description 4
- 230000004069 differentiation Effects 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a recommendation method of massive digital information, which relates to the technical field of data information identification and solves the problem of data information recommendation, and the method comprises the following steps: providing dynamic information representation; step two, adaptively filtering original digital information; step three, presenting recommendation information to a user; step four, receiving feedback from a user; and fifthly, the original digital information can be continuously and adaptively filtered in the dynamic data information through the adaptive information processing algorithm, the application capability of the data information is improved through a recommendation method, and the data processing efficiency is improved. The method and the device can dynamically recommend different data information to the user, classify, calculate and diagnose the recommended data information, improve the recommendation efficiency of the different data information by a function calculation method, and greatly improve the information identification capability.
Description
Technical Field
The invention relates to the technical field of data information identification, in particular to a recommendation method for massive digital information.
Background
With the development of information technology, especially information communication technology, the internet, social networks, internet of things, mobile internet, cloud computing and the like are continuously introduced into daily work and life of people, and the global data information amount is exponentially increased in an explosion manner. The processing demands for large amounts of data within an enterprise have generated a driving force for the development of large amounts of data. The inclusion of big data promotes the development progress of new media industry in China, opens the boundaries between traditional media such as television, radio, newspaper and the like, resolves the boundaries between countries, communities and industries, and resolves the boundaries between information senders and receivers.
In mass data information, how to quickly and effectively acquire data information can improve different data experiences of a user, in the conventional technology, in the face of mass digital information, the user acquires the data information through conventional portal site hot post recommendation and data information link, and the method cannot realize quick recommendation of the data information. In the prior art, massive digital information recommendation is improved by adopting a search engine mode, the method cannot meet autonomous recommendation and discovery of users on massive data information, and data information processing difficulty is high.
Disclosure of Invention
Aiming at the technical defects, the invention discloses a recommendation method of massive digital information, which can continuously and adaptively filter original digital information in dynamic data information, improve the application capability of the data information and improve the data processing efficiency through the recommendation method.
The invention adopts the following technical scheme:
a recommendation method for massive digital information comprises the following steps:
providing dynamic information representation;
the step is used for inputting massive digital information and providing dynamic information characteristics through a characteristic template;
step two, adaptively filtering original digital information;
the method is used for adaptively screening unprocessed original digital information recommended to communities and users; according to the extracting device included in the information filtering device, the extracting device can be coupled with the network layer and can receive the data stream from the network layer, and the extracting device can identify and extract the original digital information from the data stream;
step three, presenting recommendation information to a user;
this step is for having a plurality of profiles encoded therein, conveying the recommended original digital information to the user; the configuration files include an adaptive content configuration file and an adaptive collaboration configuration file, and the original digital information is adaptively filtered by responding to the dynamic information characterization, so that recommendation information is generated;
step four, receiving feedback from a user; receiving, by the communication device, feedback from the user in response to the proposed raw digital information;
step five, self-adaptive information processing algorithm; processing the fed-back digital information by the term frequency-inverse document frequency weighting method and the minimum description length method, and updating the response dynamic information characterization, the adaptive information processing algorithm includes original digital information credibility filtering in response to adaptive credibility, updating the credibility table in response to user feedback, and responding to the user feedback table.
As a further technical solution of the present invention, the feature template includes a first feature information module, a second feature information module, and a conversion module, wherein the first feature information module and the second feature information module respectively implement format information conversion and protocol conversion through the conversion module, the first feature information module extracts a format of the digital information, and the second feature information module extracts a protocol of the data information.
As a further technical scheme of the invention, the method for adaptively filtering the original digital information comprises the following steps:
(S21) outputting the data information stream through the network layer,
(S22) coupling with the network layer through the extracting means and receiving the data stream from the network layer;
(S23) converting the data information, which the extracting means adaptively filters the original digital information and transmits to the user, into recommendation information through the communication means;
as a further technical scheme of the invention, the information filtering method comprises the following steps:
coupling an extraction device to the network layer and receiving the data stream from the network layer, the extraction device identifying and extracting the original digital information from the data stream; setting a data information communication protocol, and selecting different frequency bands for the original data information through the data information communication protocol so as to filter the data information in the set frequency bands.
As a further technical scheme of the invention, a method for generating recommendation informationIs composed ofAnd (4) an algorithm model.
As a further technical solution of the present invention,the working method of the algorithm model comprises the following steps:
(1) setting the number of clusters toWill beValue of random valueEach calculated by K-meansValue and recommendation informationThe distance between them;
(3) determining from the result of step (1)The interval in which the rate of change of the value is the fastest,the trend of the change is shown,is shown asThe distance between each recommended value and the clustering center is determinedRange of values:
(4) the interval is reduced according to the 2-division method, and finally the result is obtainedA value;
(5) and when the recommended information is in the range in the formula (2), recommending the information, and when the recommended information is not in the range in the formula (2), not recommending the information.
As a further technical scheme of the invention, the self-adaptive information processing algorithm comprises the following steps:
step 1: the term frequency-inverse document frequency is used;
processing each raw digital information into a normalized vector having a length of between 20000 to 100000 tokens; hypothetical tokenIn digital information documentsThe more times there are occurrences in (a),the more and moreIs related to the subject matter of (1); secondly, the first step is to carry out the first,in all digital information documentsThe greater the number of occurrences in (a),for documentThe worse the differentiation of (a); for a given document, the two terms may be passedMultiplied by each tokenAre combined into weights, the fileIn tokenThe weight of (A) is:
in the formula (3), the first and second groups,the weight is represented by a weight that is,representing a set of all digital information documents;
step 2: converting the digital information in each category toVector, normalized to a unit length,then averaged to obtain a prototype vector for that category;
and 3, step 3: adopting a minimum description length;
the description length of the entire data set is minimized,for measuring the quality and cost of a predictive user recommendation information data set,the rule function is:
in the formula (4), the first and second groups,indicating the quality of the user recommendation information data set,represents all of the pieces of recommendation information data,a probability function representing all recommendation information data,a probability function representing the quality of all user recommendation information data sets,representing the probability that the quality of the predicted user recommendation data set accounts for all recommendation information data sets,representing the probability function output value of the recommendation information data in the probability of all recommendation information data sets;
and 4, step 4: given function information length and class assuming data independenceAnd if the probability of the cost of the user recommendation information data set is the product of the probabilities of the single tokens, the probability function output of the cost of the user recommendation information data set is as follows:
in equation (5), the tokenIs a binary value if the tokenThe distribution is a simple binomial distribution, and then the distance to the recommendation is determinedIs not related to the length of the tokenFunction of (2)The formula is expressed as:
in equation (6), each token is assumedIf a distribution function exists in the mass data information, the function for predicting the user recommendation information data set is as follows:
in the formula (7), the first and second groups,represents the total bits of the predicted user recommendation information data set.
The invention is different from the conventional technology in that: the method and the device can dynamically recommend different data information to the user, classify, calculate and diagnose the recommended data information, improve the recommendation efficiency of the different data information by a function calculation method, greatly improve the information identification capability, and are beneficial to processing mass digital information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive labor, wherein:
FIG. 1 is a schematic flow chart of a recommendation method of the present invention;
fig. 2 is a schematic structural diagram of an embodiment of adaptively filtering original digital information according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
As shown in fig. 1 and fig. 2, a method for recommending massive digital information includes:
providing dynamic information representation;
the step is used for inputting massive digital information and providing dynamic information characteristics through a characteristic template;
step two, adaptively filtering original digital information;
the method is used for adaptively screening unprocessed original digital information recommended to communities and users; according to the extracting device included in the information filtering device, the extracting device can be coupled with the network layer and can receive the data stream from the network layer, and the extracting device can identify and extract the original digital information from the data stream;
step three, presenting recommendation information to a user;
this step is for a user, having a plurality of profiles encoded therein, to convey recommended raw digital information to the user; the configuration files comprise an adaptive content configuration file and an adaptive collaboration configuration file, and original digital information is subjected to adaptive filtering in response to the dynamic information representation, so that recommendation information is generated;
step four, receiving feedback from a user; receiving, by the communication device, feedback from the user in response to the proposed raw digital information;
step five, self-adaptive information processing algorithm; processing the fed-back digital information by a term frequency-inverse document frequency weighting method and a minimum description length method, and updating the response dynamic information representation, wherein the adaptive information processing algorithm comprises original digital information reliability filtering responding to adaptive reliability, updating a reliability table responding to user feedback, and responding to the user feedback table;
in general, in particular embodiments, a user feedback table may facilitate providing targeted recommendation information, such as recommendation information, to a preference group by grouping unified feature information in a user into the preference group in response to user preference criteria relative to user preference criteria for which the user is a community.
In the above embodiment, the feature template includes a first feature information module, a second feature information module, and a conversion module, where the first feature information module and the second feature information module respectively implement format information conversion and protocol conversion through the conversion module, the first feature information module extracts a format of the digital information, and the second feature information module extracts a protocol of the data information.
In a specific embodiment, different data information including different data information formats is output through the first characteristic information module, data information of different data information protocols is output through the second characteristic information module, and at least one basic template can be extracted in an application process; the training information is decoded using different feature modules, respectively. And further realize data information representation.
The method for adaptively filtering the original digital information comprises the following steps:
(S21) outputting the data information stream through the network layer,
in particular embodiments, the network layer may be the global internet; the information filtering apparatus may be used with other types of networks, for example, including enterprise-wide networks or "intranets" using a network layer, where user #1 may communicate with other users, such as user #2 and user #3, and may also communicate with distributed network resources, such as distributed network resource #1 and resource # 2; also included is a computer storage means for storing a profile, including an adaptive content profile and an adaptive collaboration profile; additional trend tracking information may be stored in the information storage for later retrieval, or may be transmitted, for example, by user #2 to the network layer for remote analysis;
(S22) coupling with and receiving a data stream from the network layer through the extracting means;
in particular embodiments, the extraction device may identify and extract raw digital information from the data stream; each original digital information has information content; the extraction means using the adaptive content filter and at least a portion of the adaptive content profile to analyze the data stream for the presence of raw digital information; the original digital information is those data entities whose content identifies them as "in the general case, or in the case where the community coupled to the information filtering device may be interested in; the extraction means can delete duplicate information even if the information comes from different sources so that user resources are not wasted by processing and viewing duplicate and accumulated information; the extraction means may also determine whether the information content is related to the community to which the user #1 belongs, using at least part of the community profile and the user profile of the user # 1;
(S23) converting the data information, which the extracting means adaptively filters the original digital information and transmits to the user, into recommendation information through the communication means;
in a specific embodiment, after the user #1 is converted into the recommendation information, the adaptive content filter of the extraction device may include a plurality of community filters and a plurality of user side filters, each filter having a community and user side configuration file, respectively; as the raw digital information is filtered by the extraction means, communicating to it those information predicted to be suitable for a particular member client of a particular community, such as user #1, in response to the respective community and member client profiles; if desired, the extraction device may also include a trustworthiness filter that enables the extraction device to perform trustworthiness filtering on the raw digital information according to a trustworthiness profile.
In the above embodiment, the information filtering method includes:
coupling an extraction device to the network layer and receiving the data stream from the network layer, the extraction device identifying and extracting the original digital information from the data stream; setting a data information communication protocol, and selecting different frequency bands for the original data information through the data information communication protocol so as to filter the data information in the set frequency bands.
In the above embodiment, the recommendation information is generated by the method ofAnd (4) an algorithm model.
In the above-described embodiment of the present invention,the working method of the algorithm model comprises the following steps:
(1) setting the number of clusters toWill beValue of random valueEach calculated by K-meansValue and recommendation informationThe distance between them;
(3) determining from the result of step (1)The interval in which the rate of change of the value changes most rapidly,the trend of the change is shown,is shown asThe distance between each recommended value and the clustering center is determinedRange of values:
(4) the interval is reduced according to the 2-division method, and finally the result is obtainedA value;
(5) and when the recommended information is in the range in the formula (2), recommending the information, and when the recommended information is not in the range in the formula (2), not recommending the information.
In the above embodiments, the dynamic information characterization adaptively filters the raw digital information to generate the recommendation information. In other embodiments, the obtained data information can be classified and bar identification can be used to determine the cluster type of the target recommendation data information, and different cluster types correspond to different pre-constructed recommendation information generation models; acquiring a user identification mark of a query user, and determining user interest information of the query user according to the user identification mark; generating user interest information in advance according to recommendation information data of a user; inputting the user interest information and the query information of the query user into a recommendation information generation model corresponding to the cluster type to acquire the recommendation information recommended by the target. And clustering the recommendation information in advance, and independently constructing a recommendation information generation model for each cluster type.
In a further embodiment, a plurality of weight values of a plurality of recommendation algorithms are set; acquiring a plurality of groups of objects to be recommended output by the recommendation algorithms according to recommendation information data, wherein the recommendation information data comprises the types of the objects and the data information times of a user; and generating recommendation information for the user according to the plurality of weighted values and the plurality of groups of objects to be recommended.
In the above embodiment, the adaptive information processing algorithm includes the steps of:
step 1: processing each original digital information into a standardized vector with the term frequency-inverse document frequency (TF-IDF), the length of the vector being between 20000 and 100000 tokens; hypothetical tokenIn digital information documentsThe greater the number of occurrences in (a),the more and moreSubject matter of (1)Correlation; secondly, the first step is to carry out the first,in all digital information documentsThe greater the number of occurrences in (a),for documentThe worse the differentiation of (d); for a given document, the two terms may be passedMultiplied by each tokenAre combined into weights, the fileIn tokenThe weight of (A) is:
in the formula (3), the first and second groups,the weight is represented by a weight that is,representing a set of all digital information documents; since the digital information document d typically contains only a small portion of the total vocabulary, the meaning of the appearing words is much greater than the non-appearing words; to emphasize stronger information content in emerging words, the angle between vectors is usedCosine of degree to measure similarity between them;
step 2: converting the digital information in each category toVectors, normalized to unit length, then averaged to obtain a prototype vector for that class, with the advantages of this approach including faster computation speed and more compact representation; to classify new digital information documents, the digital information documents are compared to each prototype vector and a prediction rating is given based on the cosine similarity of each class rating, in which step linear regression may be used to convert the results from the classification process to continuous values;
and step 3: minimizing the Description Length of the entire data set using a Minimum Description Length (MDL),the system is used for measuring the quality and the cost of the data set of the predicted user recommendation information, optimizing the quality and the cost and balancing the quality of the data set of the predicted user recommendation information and the complex cost for realizing the quality;the rule function is:
in the formula (4), the first and second groups of the chemical reaction are shown in the specification,indicating the quality of the user's recommendation information data set,represents all of the pieces of recommendation information data,a probability function representing all recommendation information data,a probability function representing the quality of all user recommendation information data sets,representing the probability that the quality of the predicted user recommendation data set accounts for all recommendation information data sets,representing the probability function output value of the recommendation information data in the probability of all recommendation information data sets;
according to Bayes' rule, this is equivalent to maximizationBecause ofIs essentially independent of,Can be maximized; thus, of the above expressionThe explanation is that in order to find the most likely hypothesis for a given data, the hypothesis that minimizes the total code length should be found; the code length is equal to the number of bits required to code a hypothesis plus the number of bits required to code the data for a given hypothesis; given having a token vector(1 non-zero unique token included in the digital information) and training dataDocument of,Most probable category ofIs to minimize the codingA desired bit;
and 4, step 4: given function information length and class assuming data independenceAnd if the probability of the cost of the user recommendation information data set is the product of the probabilities of the single tokens, the probability function output of the cost of the user recommendation information data set is as follows:
in equation (5), the tokenIs a binary value, if the tokenThe distribution is a simple binomial distribution, and then the distance to the recommendation is determinedIs not related to the length of the tokenFunction of (2)The formula is expressed as:
in equation (6), each token is assumedIf a distribution function exists in the mass data information, the function for predicting the user recommendation information data set is as follows:
in the formula (7), the first and second groups,representing the total bits of the predicted user recommendation information data set.
The total bit is larger than the complexity cost of the extra category specific parameters containing the user recommendation information, the probability model selects the bit required by the actual digital information document to be quickly calculated, and then the problem of low calculation processing efficiency of massive digital information is solved.
TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency). TF-IDF is a statistical method to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of relevance between a document and a user query. In addition to TF-IDF, search engines on the internet use a ranking method based on link analysis to determine the order in which documents appear in search results.
The main idea of TF-IDF is: if a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification. TFIDF is actually: TF, IDF, TF Term Frequency (Term Frequency), IDF Inverse file Frequency (Inverse Document Frequency). TF represents the frequency with which terms appear in document d. The main idea of IDF is: if the documents containing the entry t are fewer, that is, the smaller n is, the larger IDF is, the entry t has good category distinguishing capability. If the number of documents containing the entry t in a certain class of documents C is m, and the total number of documents containing the entry t in other classes is k, it is obvious that the number of documents containing t n = m + k, when m is large, n is also large, and the IDF value obtained according to the IDF formula is small, which means that the category distinguishing capability of the entry t is not strong. In practice, however, if a term frequently appears in a document of a class, it indicates that the term can well represent the characteristics of the text of the class, and such terms should be given higher weight and selected as characteristic words of the text of the class to distinguish the document from other classes. In a given document, the Term Frequency (TF) refers to the frequency with which a given term appears in the document. This number is a normalization of the number of words (term count) to prevent it from biasing towards long documents. (the same word may have a higher number of words in a long document than in a short document, regardless of whether the word is important or not.) for a word in a particular document, its importance may be expressed as: the numerator in the above equation is the number of occurrences of the word in the document, and the denominator is the sum of the number of occurrences of all words in the document.
The recommendation capability can be improved through the method.
Although specific embodiments of the invention have been described herein, it will be understood by those skilled in the art that these embodiments are merely illustrative and that various omissions, substitutions and changes in the form and details of the methods and systems described may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the methods described above to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.
Claims (7)
1. A recommendation method for massive digital information is characterized in that: the method comprises the following steps:
providing dynamic information representation;
the step is used for inputting massive digital information and providing dynamic information characteristics through a characteristic template;
step two, adaptively filtering original digital information;
the method is used for adaptively screening unprocessed original digital information recommended to communities and users; according to the extracting device included in the information filtering device, the extracting device can be coupled with the network layer and can receive the data stream from the network layer, and the extracting device can identify and extract the original digital information from the data stream;
step three, presenting recommendation information to a user;
this step is for having a plurality of profiles encoded therein, conveying the recommended original digital information to the user; the configuration files comprise an adaptive content configuration file and an adaptive collaboration configuration file, and original digital information is subjected to adaptive filtering in response to the dynamic information representation, so that recommendation information is generated;
step four, receiving feedback from a user; receiving, by the communication device, feedback from the user in response to the proposed raw digital information;
step five, self-adaptive information processing algorithm; processing the fed-back digital information by the term frequency-inverse document frequency weighting method and the minimum description length method, and updating the response dynamic information characterization, the adaptive information processing algorithm includes original digital information credibility filtering in response to adaptive credibility, updating the credibility table in response to user feedback, and responding to the user feedback table.
2. The recommendation method of massive digital information according to claim 1, characterized in that: the feature template comprises a first feature information module, a second feature information module and a conversion module, wherein the first feature information module and the second feature information module respectively realize format information conversion and protocol conversion through the conversion module, the first feature information module extracts the format of digital information, and the second feature information module extracts the protocol of data information.
3. The recommendation method of massive digital information according to claim 1, characterized in that: the method for adaptively filtering the original digital information comprises the following steps:
(S21) outputting the data information stream through the network layer,
(S22) coupling with and receiving a data stream from the network layer through the extracting means;
(S23) converting the data information, which the extracting means adaptively filters the original digital information to be transmitted to the user, into recommendation information through the communication means.
4. The recommendation method of massive digital information according to claim 3, wherein: the information filtering method comprises the following steps:
coupling an extraction device to the network layer and receiving the data stream from the network layer, the extraction device identifying and extracting the original digital information from the data stream; setting a data information communication protocol, and selecting different frequency bands for the original data information through the data information communication protocol so as to filter the data information in the set frequency bands.
6. The recommendation method of massive digital information according to claim 5, wherein:the working method of the algorithm model comprises the following steps:
(1) setting the clustering number asWill beValue of random valueEach calculated by K-meansValue and recommendation informationThe distance therebetween;
(3) determining from the result of step (1)The interval in which the rate of change of the value changes most rapidly,the trend of the change is shown,is shown asThe distance between each recommended value and the clustering center is determinedRange of values:
(4) the interval is reduced according to the 2-division method, and finally the result is obtainedA value;
(5) and when the recommended information is in the range in the formula (2), recommending the information, and when the recommended information is not in the range in the formula (2), not recommending the information.
7. The mass digital information recommendation method according to claim 1, wherein: the adaptive information processing algorithm comprises the following steps:
step 1: the term frequency-inverse document frequency is used;
processing each raw digital information into a normalized vector having a length of between 20000 to 100000 tokens; hypothesis tokenIn digital information documentsThe greater the number of occurrences in (a),the more and moreIs related to the subject matter of (1); secondly, the first step is to carry out the first,in all digital information documentsThe greater the number of occurrences in (a),for documentThe worse the differentiation of (a); for a given document, the two terms may be passedMultiplied by each tokenAre combined into weights, the fileIn tokenThe weight of (A) is:
in the formula (3), the first and second groups,the weight is represented by a weight that is,representing a set of all digital information documents;
step 2: converting the digital information in each category toVector, normalized to unit length, howeverPost-averaging to obtain prototype vectors for the class;
and step 3: adopting a minimum description length;
the description length of the entire data set is minimized,for measuring the quality and cost of predicting a user recommendation information data set,the rule function is:
in the formula (4), the first and second groups,indicating the quality of the user recommendation information data set,represents all of the pieces of recommendation information data,a probability function representing all recommendation information data,a probability function representing the quality of all user recommendation information data sets,representing the probability that the quality of the predicted user recommendation data set accounts for all recommendation information data sets,representing the probability function output value of the recommendation information data in the probability of all recommendation information data sets;
and 4, step 4: assuming that the data independence gives a function information length and a category c, and the probability of the user recommendation information data set cost is the product of the probabilities of the single tokens, the probability function output of the user recommendation information data set cost is:
in equation (5), the tokenIs a binary value if the tokenThe distribution is a simple binomial distribution, and then the distance to the recommendation is determinedIs not related to the length of the tokenFunction of (2)The formula is expressed as:
in equation (6), each token is assumedIf a distribution function exists in the mass data information, the function for predicting the user recommendation information data set is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210879288.9A CN115098788A (en) | 2022-07-25 | 2022-07-25 | Recommendation method for massive digital information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210879288.9A CN115098788A (en) | 2022-07-25 | 2022-07-25 | Recommendation method for massive digital information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115098788A true CN115098788A (en) | 2022-09-23 |
Family
ID=83299548
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210879288.9A Pending CN115098788A (en) | 2022-07-25 | 2022-07-25 | Recommendation method for massive digital information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115098788A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5867799A (en) * | 1996-04-04 | 1999-02-02 | Lang; Andrew K. | Information system and method for filtering a massive flow of information entities to meet user information classification needs |
CN112328671A (en) * | 2020-11-03 | 2021-02-05 | 北京思特奇信息技术股份有限公司 | A data format conversion method, system, storage medium and device |
-
2022
- 2022-07-25 CN CN202210879288.9A patent/CN115098788A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5867799A (en) * | 1996-04-04 | 1999-02-02 | Lang; Andrew K. | Information system and method for filtering a massive flow of information entities to meet user information classification needs |
CN112328671A (en) * | 2020-11-03 | 2021-02-05 | 北京思特奇信息技术股份有限公司 | A data format conversion method, system, storage medium and device |
Non-Patent Citations (1)
Title |
---|
孙伟等: "配电台区全业务智能检索与推荐方法研究", 《信息技术》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111966917B (en) | Event detection and summarization method based on pre-training language model | |
US8560466B2 (en) | Method and arrangement for automatic charset detection | |
CN106934035A (en) | Concept drift detection method in a kind of multi-tag data flow based on class and feature distribution | |
US10387805B2 (en) | System and method for ranking news feeds | |
CN101692639A (en) | Bad webpage recognition method based on URL | |
CN107391772A (en) | A kind of file classification method based on naive Bayesian | |
US11874866B2 (en) | Multiscale quantization for fast similarity search | |
CN104820703A (en) | Text fine classification method | |
CN118410152B (en) | Information processing method, question-answering method and question-answering system | |
US20160224636A1 (en) | Scalable system and method for weighted similarity estimation in massive datasets revealed in a streaming fashion | |
CN111859955A (en) | A public opinion data analysis model based on deep learning | |
Karwa et al. | Automated hybrid Deep Neural Network model for fake news identification and classification in social networks | |
Sumathi et al. | Machine learning algorithm-based spam detection in social networks | |
CN118014703A (en) | Visual intelligent decision system and method based on digital platform | |
CN114491033A (en) | Method for building user interest model based on word vector and topic model | |
CN116455861A (en) | Big data-based computer network security monitoring system and method | |
CN111784402B (en) | Order rate prediction method, device and readable storage medium based on multi-channel | |
CN103218368A (en) | Method and device for discovering hot words | |
CN110413770B (en) | Method and device for classifying group messages into group topics | |
CN105117466A (en) | Internet information screening system and method | |
TW201243627A (en) | Multi-label text categorization based on fuzzy similarity and k nearest neighbors | |
CN115098788A (en) | Recommendation method for massive digital information | |
CN114282119B (en) | Scientific and technological information resource retrieval method and system based on heterogeneous information network | |
Liao et al. | A Deep Concept-aware Model for predicting and explaining restaurant future status | |
CN116992119A (en) | Content pushing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220923 |