Disclosure of Invention
In order to improve the overall optimizing effect of multiple platforms in the process of selecting questions, the application provides a multi-platform fused intelligent selecting question inspiration generating method and a selecting question inspiration engine.
In a first aspect, the application provides a multi-platform fused intelligent topic selection inspiration generation method, which adopts the following technical scheme:
A method for generating a topic-selecting inspiration comprises the following steps:
Crawling information list on the corresponding platform through a plurality of channel interfaces;
Adding a plurality of information lists into a reserved topic pool, wherein each information list is used as an independent topic list;
calculating a first topic weight corresponding to each piece of information based on the bit sequence of each piece of information in the topic selection list;
Selecting keywords corresponding to the titles of the information or keyword groups formed by a plurality of keywords based on a keyword library, and calculating corresponding second topic selection weights based on the content heat of the keywords or the keyword groups;
Calculating a third topic weight corresponding to the keyword or the keyword group based on the number of times of occurrence of the keyword or the keyword group in the corresponding text of the information and the number of times of occurrence of other text of the information in the topic selection list;
Calculating the final weight corresponding to each keyword or the keyword group based on the first topic weight, the second topic weight and the third topic weight;
And selecting a preset number of keywords or keyword groups from high to low in the topic selection pool to construct a topic selection list, wherein the topic selection list comprises numerical values of the final weights corresponding to the keywords or the keyword groups and links of the corresponding information.
In some embodiments, after crawling the information list on the corresponding platform through the plurality of channel interfaces, the method further comprises the following steps:
Judging whether a plurality of pieces of information on each information list belong to negative information or not through a preset negative filtering model, and filtering the negative information;
judging whether specific filtering words exist in a plurality of pieces of information on each information list through a preset specific word stock, and filtering the information with the specific filtering words.
In some embodiments, the calculating the first topic weight corresponding to each information based on the bit sequence of each information in the topic list includes the following steps:
Generating a preliminary weight based on the bit sequence of the information in the topic selection list, wherein the preliminary weight is larger when the bit sequence is more advanced;
Generating maximum weight based on the length of the topic selection list;
generating an effect addition coefficient based on the platform corresponding to the topic selection list;
And multiplying the effect addition coefficient by the ratio of the preliminary weight to the maximum weight to calculate the first topic weight.
In some embodiments, calculating the corresponding second topic weight based on the content popularity of each keyword or the keyword group includes the following steps:
Selecting a daily heat range and a real-time heat range from the keyword library, wherein the daily heat range and the real-time heat range comprise a plurality of keywords;
acquiring a heat retention value of each keyword in the keyword library, and adding the keywords with the heat retention values larger than a first preset value into the daily heat range;
Acquiring a heat mutation value of each keyword in the keyword library, and taking the keywords with the heat mutation values amplified by more than a second preset value as candidate keywords;
Performing global networking search based on the candidate keywords, acquiring event time corresponding to the candidate keywords, and adding the candidate keywords into the real-time heat range if the event time is matched with the current time;
Generating the corresponding second topic weights based on whether the keywords or the keyword groups are in the daily hotness range or the real-time hotness range.
In some embodiments, if the information corresponds to the keyword, calculating a third topic weight corresponding to the keyword, including the following steps:
Acquiring the number of times of occurrence of the key words in the text of the corresponding information, and defining the number of times as the original number of times;
Acquiring the occurrence times of the key words in the text of other information in the topic selection list, and selecting the maximum times and the minimum times;
calculating the third topic weight based on the following formula:
third topic weight= (original number-minimum number)/(maximum number-minimum number).
In some embodiments, if the information corresponds to the keyword group, a third topic weight corresponding to the keyword group is calculated, including the following steps:
acquiring a plurality of keywords in the keyword group;
the number of times that each keyword appears in the text of the corresponding information is respectively obtained and defined as the original number of times;
the number of times that each keyword appears in the text of other information in the topic selection list where the keyword is located is respectively obtained, and the maximum number of times and the minimum number of times corresponding to each keyword are respectively selected;
Calculating the third topic weight corresponding to each keyword based on the following formula:
third topic weight= (original number-minimum number)/(maximum number-minimum number);
judging whether the difference value between the third topic weights of a plurality of keywords is larger than a preset value or not;
if the keyword group is larger than the first question group, selecting the third question weight with the largest numerical value as the third question weight corresponding to the keyword group;
and if the keyword group is not greater than the keyword group, taking the average value of the third topic weights as the third topic weight corresponding to the keyword group.
In some embodiments, calculating the final weight corresponding to each keyword or the keyword group based on the first topic weight, the second topic weight and the third topic weight includes the following steps:
The final weight is calculated by the following formula:
,
Wherein n is characterized by the number of the topic lists, k1, k2 and k3 are all calculation coefficients, and the sum of k1, k2 and k3 is 1.
In some embodiments, the calculating the final weight corresponding to each keyword or the keyword group based on the first topic weight, the second topic weight and the third topic weight further includes the following steps:
judging whether the information list on the corresponding platform has a classification list or not;
If yes, acquiring the classification information corresponding to the classification list, and judging the existence quantity of each piece of classification information in the topic selection pool;
if the keyword or the keyword group corresponds to the classification list, multiplying the calculated final weight by a corresponding classification coefficient;
Wherein, the classification coefficient is larger than 1 and the size thereof is inversely proportional to the quantity of the classification information corresponding to the classification list thereof in the question selection pool.
In some embodiments, in the topic selection list, the method for generating the link of the information corresponding to each keyword or the keyword group includes the following steps:
Acquiring the information with the largest first topic weight in all the information of the keywords or the keyword groups as target information;
Acquiring the link corresponding to the target information based on the channel interface;
And sending the link to the keyword or the keyword group.
In a second aspect, the application provides a topic-selecting inspiration engine, which adopts the following technical scheme:
A topical inspiration engine comprising:
the channel interface is used for butting the platforms;
the crawler module is used for crawling information sheets on the corresponding platforms of the channel interfaces;
The topic selection pool is used for placing a plurality of information lists and converting each information list into a corresponding plurality of independent topic selection lists;
The weight extraction module is used for calculating a first topic weight corresponding to each piece of information based on the bit sequence of each piece of information in the topic selection list; selecting keywords corresponding to the titles of the information or keyword groups formed by a plurality of keywords based on a keyword library, and calculating corresponding second topic selection weights based on the content heat of the keywords or the keyword groups; calculating a third topic weight corresponding to the keyword or the keyword group based on the frequency of occurrence of the keyword or the keyword group in the corresponding text of the information and the frequency of occurrence of the keyword or the keyword group in other text of the information in the topic selection list;
the topic selection weight calculation module is used for calculating the final weight corresponding to each keyword or the keyword group based on the first topic selection weight, the second topic selection weight and the third topic selection weight;
And the topic selection generating module is used for selecting a preset number of keywords or key phrases from high to low in the topic selection pool to construct a topic selection list, wherein the topic selection list comprises the final weight value corresponding to each keyword or key phrase and the corresponding links of the information.
By the technical scheme provided by the embodiment of the application, the following technical effects are achieved:
The method comprises the steps of docking and integrating a large number of platforms, crawling information sheets on various platforms to transversely compare keywords extracted from different hot spot information of the multiple platforms through a weight algorithm, judging the heat condition corresponding to each keyword on the whole, rapidly and accurately acquiring the latest and hottest keywords on the current full platform as preferred topics, and enabling a user to obtain corresponding topic inspiration through calculating the screened keywords directly, so that the workload is reduced.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples for a clearer understanding of the objects, technical solutions and advantages of the present application. However, it will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In some instances, well known methods, procedures, systems, components, and/or circuits have been described at a high-level so as not to obscure aspects of the present application with unnecessary description. It will be apparent to those having ordinary skill in the art that various changes can be made to the disclosed embodiments of the application and that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Thus, the present application is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the scope of the application as claimed.
The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the description of the present application, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is only for the purpose of distinguishing between technical features and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present application, the descriptions of the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic line representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The embodiment of the application discloses a multi-platform fusion intelligent topic selection inspiration generation method.
As shown in FIG. 1, the intelligent topic selection inspiration generation method with multi-platform fusion comprises the following steps:
S100, crawling information list on the corresponding platform through a plurality of channel interfaces.
The channel interface is used for interfacing with each news platform, the self-media platform, the short video platform and the like of the main stream in the Internet, and crawling the latest information list on each main stream popular channel through a crawler algorithm, wherein the information list is a hot list obtained by sorting on each platform based on the information popularity calculation method of the platform, and the crawled content comprises the title of information, the content of the information and the source of the information.
S200, adding a plurality of information lists into the reserved topic pool, wherein each information list is used as an independent topic list.
The topic pool is an integrated library, wherein the information list crawled by each platform of the reserved number can be stored, and meanwhile, a user can also randomly check the information list on each platform in the topic pool.
After the information list of each platform enters the topic pool, each information list is an independent topic list after the subsequent information pretreatment, and each information on the topic list is arranged in the same arrangement sequence as the information list.
S300, calculating first topic weights corresponding to the information based on the bit sequence of the information in the topic selection list.
First, the first topic weight corresponding to each information is obtained by sorting the information on the topic list, in general, the higher the information popularity corresponding to one information is, the higher the information popularity corresponding to the information is, and the higher the information popularity corresponding to the information is, because the topic list is correspondingly generated by the information list, the higher the information popularity corresponding to the information is.
The order of each information is used as the first reference factor for judging the heat condition of the selected questions.
S400, selecting keywords corresponding to titles of the information or keyword libraries composed of a plurality of keywords based on the keyword libraries, and calculating corresponding second topic weights based on content heat of the keywords or keyword libraries.
The keyword library is a word library generated by corresponding a keyword extraction model trained by a large model and a neural network algorithm in advance when the topic selection inspiration engine is created, and a large number of keywords conforming to common semantics and human language are stored in the keyword library, and the keyword library can be used as a keyword library commonly used in the prior art, such as NLTK library, jieba library and the like. The extraction object of the keywords is the title of the information, so that the title of the information can comprehensively summarize the key content in the information, the number of words of the title is relatively small, and the finally extracted keywords are more accurate and the result is relatively small.
The number of keywords selected in the title of each information is not necessarily the same depending on the content of the information, for example: in the information of 'express company accelerating into village and increasing part of regional parts by ten times', the selected keyword is 'express company', and in the information of 'xx player cutting gold medal', the selected keyword is 'xx sport meeting' and 'xxx player'.
The selection of keywords mainly includes the following aspects: major activities, major meetings, major policies, major rewards, major industrial businesses, public figures, and the like. Generally, in order to reduce the workload of subsequent topics and improve the accuracy of topics, at most three keywords exist in a keyword group of one information, and when more than three keywords can be selected from one information, the three keywords which appear first in the word sequence of the information are selected by default.
After the keywords or the keyword groups corresponding to the information are selected, the content heat corresponding to the keywords or the keyword groups is needed to be used as a second reference factor for judging the heat condition of the selected questions. The content popularity of a keyword or a keyword group is characterized by the activity of an event or object corresponding to the keyword in the society during the period of time, for example, during holidays, the content popularity of the keyword related to the traveling of the holidays is relatively high.
The second topic weight is mainly characterized by the activity degree corresponding to the social public opinion environment inferred by the information through the keywords.
S500, calculating a third topic weight corresponding to the keyword or the keyword group based on the number of times the keyword or the keyword group appears in the text of the corresponding information and the number of times the keyword or the keyword group appears in the text of other information in the topic list.
The number of times of occurrence of the keywords or the keyword groups in each information is used as a third reference factor for judging the heat condition of the selected questions, if the keywords are frequently in a plurality of pieces of information in one information list, the current heat of the event or the person corresponding to the keywords is very high, and then the corresponding third selected questions weight is larger.
S600, calculating the final weight corresponding to each keyword or keyword group based on the first topic weight, the second topic weight and the third topic weight.
And calculating the final weight corresponding to each keyword or keyword group through the calculated multiple sub weights, wherein the final weight is used for expressing the weight obtained by the keywords or keyword groups in the information list of all the crawled platforms, and the higher the final weight is, the higher the heat proposal condition of the keyword or keyword groups on various platforms is, and the keyword or keyword groups are more suitable for being used as the topic content.
S700, selecting a preset number of keywords or keyword groups from high to low in the topic selection pool to construct a topic selection list, wherein the topic selection list comprises the numerical value of the final weight numerical value corresponding to each keyword or keyword group and the link of corresponding information.
After the final weight of each keyword or keyword group is obtained, a plurality of keywords or keyword groups with higher final weights can be selected, and a topic selection list is generated. And displaying selected keywords for the topics with better hot spots on the topic selection list, and correspondingly displaying a weight value corresponding to the keywords and links of the information matched with the keywords.
According to the method, a large number of platforms are integrated in a butt joint mode, the information list on each type of platform is crawled, the keywords extracted from the different hot spot information of the plurality of platforms are transversely compared through the weight algorithm, the heat condition corresponding to each keyword is judged on the whole, the latest and hottest keyword on the current whole platform is rapidly and accurately obtained to serve as a preferred topic, a user can directly obtain corresponding topic inspiration through calculating the screened keywords, and workload is reduced.
In other embodiments, after crawling the information list on the corresponding platform through the plurality of channel interfaces, the method further comprises the following steps:
s110, judging whether a plurality of pieces of information on each information list belong to negative information or not through a preset negative filtering model, and filtering the negative information.
Because some short video platforms and promotion and drainage forces of partial sensitive negative contents in the media platform are influenced, in order to reduce the probability of generating negative keywords in the process of selecting questions, misguidance on new media persons is reduced, and information preprocessing can be performed after the information list is crawled.
Firstly, compliance detection is carried out, and contents such as illegal behaviors, war, natural disasters, fraud and the like are filtered through a large model technology.
The negative filtering model is obtained through a machine learning training algorithm in a common technology, the model is trained through a large number of training sets of negative illegal words, the trained negative filtering model is applied to classifying and filtering real-time texts, and whether bad or illegal contents exist in the texts or not is automatically judged.
S111, judging whether specific filtering words exist in the information on each information list through a preset specific word stock, and filtering the information with the specific filtering words.
The specific word stock stores a large number of classification sets which accord with different industries, different fields and different types, and each classification set has keywords corresponding to the corresponding industries, fields and types. The specific part-of-speech library can be selected by a user, for example, a certain new media person user mainly generates information of trolley industry, then the user can sort other irrelevant industries, such as corresponding classification sets of stocks, economy, entertainment and the like, and the information of corresponding specific filtering words in a plurality of pieces of information can be deleted through the selected screening range, so that the selected questions obtained by the user are more suitable for the corresponding fields of the user, and the selected questions corresponding to the non-related fields do not occupy the positions of the final selected question list; meanwhile, when the calculation of the topic selection weight is carried out later, the calculation result of the topic selection weight required by certain specific fields and industries is more accurate finally by deleting irrelevant information.
In other embodiments, the calculating the first topic weight corresponding to each information based on the order in which each information in the topic list is located includes the following steps:
S310, generating a preliminary weight based on the bit sequence of the information in the topic list, wherein the earlier the bit sequence is, the larger the preliminary weight is.
Each topic list is an independent list, and the preliminary weight is calculated according to the position of each piece of information on the list where the topic list is located. The information list of each platform is ranked by a certain algorithm, such as heat priority, time priority, heat ratio priority in unit time, etc., and the information with highest heat or shortest release time is always in the front of the information list, so that the heat condition of the information can be judged according to the order of the information on the list.
S320, generating the maximum weight based on the length of the topic list.
The value of the maximum weight is related to the length of the selected question list, if 10 pieces of information are in the selected question list, the maximum weight is 10, and the larger the length is, the larger the maximum weight is.
S330, generating an effect addition coefficient based on the platform corresponding to the topic selection list.
Different platforms correspond to different scales, such as awareness, user quantity, liveness, online quantity, click quantity, forwarding quantity, comment quantity and the like, different scales represent the volume of the platform, the larger the volume of the platform is, the greater the information on the platform is, the greater authority and audience are, and the effect addition coefficient corresponding to the question selection list is greater.
Because the calculated final weight is used for representing the heat degree of a keyword of one information, the effect addition coefficient is the addition quantity of the first topic weight in the calculation process, and the larger the platform is, the larger the user activity is, the larger the addition of the effect addition coefficient to the heat degree is. In the embodiment of the application, the effect addition coefficient is a numerical value between 1 and 2, and when the scale of the platform is extremely small, the effect addition coefficient is 1, which indicates that no platform addition exists.
S340, calculating the first topic weight according to the ratio of the preliminary weight to the maximum weight and multiplying the ratio by the effect addition coefficient.
And calculating to obtain a first topic weight by using the (preliminary weight/maximum weight) effect addition coefficient, wherein the first topic weight is characterized as an information weight corresponding to single information.
In other embodiments, calculating the corresponding second topic weight based on the content popularity of each keyword or keyword group includes the following steps:
s410, selecting a daily heat range and a real-time heat range from the keyword library, wherein the daily heat range and the real-time heat range comprise a plurality of keywords.
Two ranges, namely two empty sets are set in the reserved keyword library, and any number of keywords meeting the requirements can be placed in each empty set. One of the empty sets corresponds to a daily heat range in which the stored keywords are keywords such as "semiconductor", "electric car", etc., which maintain a certain heat for a long time, and the other empty set corresponds to a real-time heat range in which the stored keywords are keywords such as "release meeting", "concert", etc., which suddenly increase in heat for a certain time.
S420, acquiring a heat retention value of each keyword in the keyword library, and adding the keywords with the heat retention values larger than a first preset value into a daily heat range.
Firstly, judging the heat retention value of each keyword based on the number of times each keyword appears in each list in the history data, the retention time and other data, for example, one keyword frequently appears in each information list for 2-3 months, the whole heat is not high, but the retention time is long, the calculated heat retention value of the keywords is large, and when the heat retention value is larger than a first preset value, the keyword is added into the daily heat range.
S430, acquiring the heat mutation value of each keyword in the keyword library, and taking the keywords with the heat mutation value amplified more than a second preset value as candidate keywords.
And judging the heat mutation value based on the times and the holding time of each keyword in the history data in each list and the times and the holding time of the keyword in the last period of time. If a low-heat keyword which has extremely small frequency in the past appears on the list in a period of time suddenly and frequently appears on the list and is maintained for a period of time, the heat mutation value is considered to be larger, and when the heat mutation value is larger than a second preset value, the keyword is used as a candidate keyword and enters the next round of judgment.
S440, global networking search is conducted based on the candidate keywords, event time corresponding to the candidate keywords is obtained, and if the event time is matched with the current time, the candidate keywords are added into a real-time heat range.
For example, the candidate keyword is "concert", then the keyword is searched in the corresponding internet search engine through the interface, and the event corresponding to the keyword and the event time corresponding to the event are obtained. For example, if the event is xxx to hold a concert, the event time is characterized by a period of time for which social public opinion corresponding to the event appears hot, for example, the time of the concert is 2024.5.20-2024.5.23, the event time can be respectively prolonged forward and backward for half a month, namely, 2024.5.5-2024.6.4, and the prolonged time is taken as public opinion increment corresponding to a certain high-heat event, for example, the pre-selling time before the concert and the hot time after the concert.
If the current time is within the event, the candidate keyword is considered to be within the occurrence event corresponding to a certain high-heat agenda event, and the possibility and reasonability of heat mutation exist, so that the candidate keyword can be added into the real-time heat range.
S450, generating corresponding second topic weights based on whether the keywords or the keyword groups are in the daily heat range or the real-time heat range.
The keywords or keyword groups are in different ranges, and the corresponding second topic weights are different. In general, if a keyword is in the daily heat range, it is explained that the keyword is a reasonably effective heat question point in a longer period of time, and the keyword is used as a topic which does not completely match the optimal topic in the current time, but has a certain heat in any time; when one keyword is in the real-time heat range, the keyword is the hottest topic in the current time, and the keyword can be used as a topic to bring higher flow and attention. Therefore, the second topic weight corresponding to the keyword in the real-time heat range is larger than the second topic weight in the daily heat range, and the specific size of the second topic weight is calculated specifically according to the time, the times and the like of the keyword on the information list in the historical data, and the calculation mode can be modified according to actual conditions.
Meanwhile, if the keyword or the keyword group is not in the daily hotness range or the real-time hotness range, the keyword is not the topic which is proposed in the current or the historical time, so that the generated second topic weight is smaller than the second topic weight generated under the other two conditions.
In other embodiments, if the information is selected as the keyword, a third topic weight corresponding to the keyword is calculated, including the following steps:
S510, the number of times the keyword appears in the text of the corresponding information is obtained and defined as the original number of times.
The number of occurrences of the keyword in the body of the information in which it is located is calculated.
S511, the number of times that the keyword appears in the text of other information in the topic selection list where the keyword is located is obtained, and the maximum number of times and the minimum number of times are selected.
And calculating the occurrence times of the keyword in the text of each other information in the topic selection list where the keyword is located, and selecting the maximum occurrence times and the minimum occurrence times.
S512, calculating a third topic weight based on the following formula:
third topic weight= (original number-minimum number)/(maximum number-minimum number).
The weight of each keyword represented by the occurrence number is calculated through normalization, for example, the original number is 6, the minimum occurrence number is 3, the maximum occurrence number is 8, and then the corresponding third choice question weight is 3/5. When the original times are closer to the maximum times, the corresponding third question weights are larger, otherwise, the original times are closer to the minimum times, and the corresponding third question weights are smaller.
If a keyword exists in a plurality of pieces of information in the corresponding topic list, the third topic weights corresponding to the keywords in the pieces of information are required to be calculated respectively, the calculated third topic weights are added, and the added result is used as the third topic weight value of the keyword in the corresponding topic list.
In other embodiments, if the information is a keyword group, a third topic weight corresponding to the keyword group is calculated, which includes the following steps:
s520, acquiring a plurality of keywords in the keyword group.
If a keyword group is selected, the keyword group is decomposed into a plurality of keywords.
S521, the number of times of the keyword appearing in the text of the corresponding information is obtained and defined as the original number of times.
S522, the number of times that each keyword appears in the text of other information in the topic selection list where the keyword is located is respectively obtained, and the maximum number of times and the minimum number of times corresponding to each keyword are respectively selected.
S523, calculating third topic weights corresponding to the keywords based on the following formulas:
third topic weight= (original number-minimum number)/(maximum number-minimum number).
The third topic weight corresponding to each keyword is obtained through the steps, and is the same as the steps of S510-S512, and when the number of information of the keywords appearing in the topic list is greater than 1, the final third topic weight is obtained through an addition mode.
S524, judging whether the difference value between the third topic weights of the keywords is larger than a preset value.
If the information corresponds to the keyword group, the difference value between the third question weights of the resolved keywords is also needed to be determined, and when the difference value between the keywords is larger than or smaller than a preset value, the different third question weights are all corresponding to the selection method.
And S525, if the number is larger than the preset number, selecting the third topic weight with the largest value as the third topic weight corresponding to the keyword group.
If the difference of the third topic weights between the keywords is greater than the preset value, it is indicated that the independent hotness between the keywords in a keyword group is not balanced, which may be caused by different semantic roles of different words in the text, for example, the title of an information is: the key phrase is x-motion meeting-y, wherein the y athletes chop gold cards in the x-motion meeting. In the text, x motion represents the current overall event, y is a name, and after motion is led out from the beginning in the text according to semantic logic and a line thought, subsequent contents are described around y athletes, so that the occurrence frequency of y is far higher than the occurrence frequency of x motion under the condition, and x motion is used as a description of the overall event, the heat effect of the x motion is not influenced by the fact that the occurrence frequency is less, so that in order to ensure accurate calculation of the weight of the parameter corresponding to the occurrence frequency, the value with the largest occurrence frequency in a plurality of keywords, namely the value with the largest third choice weight, can be selected as the third choice weight of the keyword group, and the keywords with the extremely small occurrence frequency but larger heat degree can not influence the overall weight value.
S526, if not, taking the average value of the third topic weights as the third topic weight corresponding to the keyword group.
If the difference value between the third topic weights of the keywords in the keyword group is smaller than a preset value, the fact that the frequency difference of the occurrence of the keywords is not large is indicated, meanwhile, the weight influence of the keywords on the frequency factor calculation is balanced, at this time, in order to ensure the numerical balance of the third topic weights between the keyword groups consisting of a single keyword or a plurality of keywords, the calculated third topic weights are required to be subjected to average calculation to obtain corresponding third topic weights because the keywords exist in the keyword group.
In other embodiments, the final weight corresponding to each keyword or keyword group is calculated based on the first topic weight, the second topic weight, and the third topic weight, including the steps of:
S610, calculating a final weight by the following formula:
。
wherein n is characterized by the number of the topic lists, k1, k2 and k3 are all calculation coefficients, and the sum of k1, k2 and k3 is 1.
K1, k2 and k3 are all corresponding numerical coefficients, which are characterized by the specific gravity considered by each topic weight in the final weights, and in the embodiment of the present application, k1 and k2 are all 0.3 and k3 is 0.4.
The final weight needs to add the weights corresponding to the topic lists, because the keywords appear in the topic lists, and after the topic lists are transversely compared, the numerical values calculated by the topic lists are needed to be added, so that the overall heat weight of the keywords or the keyword groups is obtained.
In other embodiments, the final weight corresponding to each keyword or keyword group is calculated based on the first topic weight, the second topic weight, and the third topic weight, and further comprising the steps of:
s620, judging whether the information list on the corresponding platform has a classification list.
The classification list is characterized by being divided into lists based on different specific fields and specific industry, for example, an economic information list, an entertainment information list, an industrial information list and the like exist in addition to the whole list of the information list of the part of the platform.
S621, if yes, obtaining the classification information corresponding to the classification list, and judging the existence quantity of each classification information in the topic selection pool.
If so, specific classification information corresponding to the classification list is required to be obtained, and the number of the presence of the list of various classifications in the topic pool is calculated.
S622, if the keyword or the keyword group corresponds to the classification list, multiplying the calculated final weight by the corresponding classification coefficient.
If the keyword or the keyword group corresponds to the classification list, the final weight corresponding to the keyword or the keyword group is multiplied by a coefficient, and the coefficient is an addition value of the list of different fine classifications. For example, a keyword appears in a list of an economic class, which is very likely to correspond to the content of economics, which is difficult to appear in other types of list, but which does not appear in other types of list and does not represent that the current popularity is low, which is only less frequently caused by the fact that the type of list is more special, so that in order to avoid that the keyword of the specific field and the specific industry affects the accuracy of the weight value due to the limitation of the field when calculating the final weight, the classification coefficient needs to be multiplied to compensate the limitation of the choice due to the special type of list.
Wherein, the classification coefficient is larger than 1, and the size of the classification coefficient is inversely proportional to the number of the classification information corresponding to the classification list in the topic pool. That is, when the number of times of the field corresponding to a classification list in the choice question pool is smaller, the smaller the field characterized as the classification list is, the larger the classification coefficient value which needs to be multiplied, the larger the corresponding weight calculation support is, otherwise, when the number of times of the field corresponding to the classification list in the choice question pool is larger, the larger the field characterized as the classification list is, the smaller the classification coefficient value which needs to be multiplied is, and the smaller the corresponding weight calculation support is.
In other embodiments, in the topic selection list, the method for generating the links of the information corresponding to each keyword or keyword group includes the following steps:
s710, obtaining the information with the largest first topic weight in all the information of the keywords or the keyword groups as target information.
S720, obtaining links corresponding to the target information based on the channel interfaces.
S730, binding the link transmission to the keyword or the keyword group.
When one keyword or one keyword group corresponds to a plurality of information, the information with the largest first topic weight in the plurality of information of the keyword or the keyword group is preferentially selected, and the information corresponds to the information which has the keyword or the keyword group and is in the forefront sorting, namely the hottest information.
After the corresponding information is selected, the information is used as target information, the link of the target information is obtained through a channel interface, and then when a user selects a keyword or a keyword group and wants to view the corresponding information content, the link corresponding to the keyword or the keyword group can be directly selected to jump to the corresponding information.
In other embodiments, after the user selects the keyword or the keyword group corresponding to the corresponding topic, the method further includes:
Selecting a link corresponding to the keyword or the keyword group and jumping to the text of the information, if a user wants to generate a topic view based on the information, transmitting the text of the information, acquiring a negative view in the text of the information through a negative filtering model and filtering, acquiring key sentences in the text through other big data models, extracting the key sentences including positive view sentences, summary sentences and the like, and generating the topic view based on corresponding data information.
In other embodiments, after the topic views are generated, topic scripts may be generated based on a preset trained large model, the large model learns the writing style, network popular words, writing methods, etc. of training a specific article through a large number of script learning sets, and the topic scripts of corresponding styles are generated by introducing the topic views into the large model.
As shown in fig. 2, the present application also discloses a topic-selecting inspiration engine, which comprises:
And the channel interface is used for docking each platform.
The crawler module is used for crawling information sheets on the corresponding platforms of the channel interfaces;
the topic selection pool is used for placing a plurality of information lists and converting each information list into a corresponding plurality of independent topic selection lists.
The weight extraction module is used for calculating a first topic weight corresponding to each piece of information based on the bit sequence of each piece of information in the topic selection list; selecting keywords corresponding to the titles of the information or keyword groups formed by a plurality of keywords based on the keyword library, and calculating corresponding second topic weights based on the content heat of the keywords or keyword groups; and calculating a third topic weight corresponding to the keyword or the keyword group based on the frequency of occurrence of the keyword or the keyword group in the text of the corresponding information and the frequency of occurrence of the keyword or the keyword group in the text of other information in the topic selection list.
And the topic selection weight calculation module is used for calculating the final weight corresponding to each keyword or keyword group based on the first topic selection weight, the second topic selection weight and the third topic selection weight.
And the topic selection generating module is used for selecting a preset number of keywords or key phrases from high to low in the topic selection pool in sequence to construct a topic selection list, wherein the topic selection list comprises links of final weight values corresponding to the keywords or key phrases and corresponding information.
The implementation principle is as follows:
The method comprises the steps of docking and integrating a large number of platforms, crawling information sheets on various platforms to transversely compare keywords extracted from different hot spot information of the multiple platforms through a weight algorithm, judging the heat condition corresponding to each keyword on the whole, rapidly and accurately acquiring the latest and hottest keywords on the current full platform as preferred topics, and enabling a user to obtain corresponding topic inspiration through calculating the screened keywords directly, so that the workload is reduced.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein.
The above embodiments are not intended to limit the scope of the present application, so: all equivalent changes in structure, shape and principle of the application should be covered in the scope of protection of the application.