CN108628832A - A kind of information keyword acquisition methods and device - Google Patents
A kind of information keyword acquisition methods and device Download PDFInfo
- Publication number
- CN108628832A CN108628832A CN201810431832.7A CN201810431832A CN108628832A CN 108628832 A CN108628832 A CN 108628832A CN 201810431832 A CN201810431832 A CN 201810431832A CN 108628832 A CN108628832 A CN 108628832A
- Authority
- CN
- China
- Prior art keywords
- keyword
- keywords
- coverage
- preset
- interim
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims description 30
- 230000002860 competitive effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 102100036378 T-cell immunomodulatory protein Human genes 0.000 description 1
- 101710194900 T-cell immunomodulatory protein Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种信息情报关键字获取方法及装置,通过计算当前热点信息关键字集合、各跟踪对象的共性关键字集合、各跟踪对象的个性关键字集合的并集,可以快速确定出信息情报关键字,且确定出的信息情报关键字不但覆盖当前热点信息,而且具有针对性,能够满足各用户(即跟踪对象)的个性化需求,具有多维度、覆盖面广的特点。
The present invention provides a method and device for obtaining information and intelligence keywords. By calculating the union of the current hot information keyword set, the common keyword set of each tracking object, and the individual keyword set of each tracking object, the information intelligence can be quickly determined. keywords, and the determined information intelligence keywords not only cover the current hotspot information, but also are targeted, and can meet the individual needs of each user (that is, the tracking object), and have the characteristics of multi-dimensional and wide coverage.
Description
技术领域technical field
本发明涉及通信技术领域,具体涉及一种信息情报关键字获取方法及装置。The invention relates to the field of communication technology, in particular to a method and device for acquiring information and intelligence keywords.
背景技术Background technique
为适应市场环境变化,情报工作一直受到国内外主流运营商、设备商和互联网企业的高度重视,为公司战略提供决策支撑。In order to adapt to changes in the market environment, intelligence work has always been highly valued by mainstream operators, equipment manufacturers and Internet companies at home and abroad, providing decision-making support for the company's strategy.
根据Gartner的分析报告,世界排名前30的电信运营商均有专门竞争情报部门,且非常重视情报工作,AT&T早在2007年就设计专业门户管理其情报工作;Verizon也早在2008年就建立自服务模型以提高情报服务效率,并在Linkin常年热招情报分析人员,截至2016年发布相关职位招聘77个;德国电信市场与竞争情报部门的领导者亦是世界竞争情报大会及欧洲竞争情报大会的专家。According to Gartner's analysis report, the world's top 30 telecom operators all have specialized competitive intelligence departments and attach great importance to intelligence work. AT&T designed a professional portal to manage its intelligence work as early as 2007; Verizon also established its own competitive intelligence department as early as 2008. Service model to improve the efficiency of intelligence services, and Linkin has been recruiting intelligence analysts all year round. As of 2016, 77 related positions have been released; the leader of Deutsche Telekom's market and competitive intelligence department is also the leader of the World Competitive Intelligence Conference and the European Competitive Intelligence Conference expert.
关键字是获取信息素材的主要检索依据,如何获取信息情报关键字,是信息情报管理面临的一个主要问题。Keywords are the main retrieval basis for obtaining information materials. How to obtain information and intelligence keywords is a major problem faced by information and intelligence management.
发明内容Contents of the invention
本发明针对现有技术中存在的上述不足,提供一种信息情报关键字获取方法及装置,用以至少部分解决如何自动获取信息情报关键字的问题。Aiming at the above-mentioned deficiencies in the prior art, the present invention provides a method and device for obtaining information and intelligence keywords, which are used to at least partially solve the problem of how to automatically obtain information and intelligence keywords.
本发明为解决上述技术问题,采用如下技术方案:In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:
本发明提供一种信息情报关键字获取方法,所述方法包括:The present invention provides a method for obtaining information and intelligence keywords, the method comprising:
确定第二关键字集合,所述第二关键字集合内的关键字为当前热点信息的关键字;Determining a second keyword set, where keywords in the second keyword set are keywords of current hotspot information;
确定第三关键字集合,所述第三关键字集合内的关键字为各跟踪对象的共性关键字;Determining a third keyword set, where keywords in the third keyword set are common keywords of each tracking object;
计算所述第二关键字集合、所述第三关键字集合以及预设的第一关键字集合的并集,以确定信息情报关键字;其中,所述第一关键字集合内的关键字为各跟踪对象的个性关键字。calculating the union of the second keyword set, the third keyword set, and the preset first keyword set to determine information intelligence keywords; wherein, the keywords in the first keyword set are Individual keywords for each tracked object.
优选的,所述确定第三关键字集合,具体包括:Preferably, said determining the third keyword set specifically includes:
计算各待选关键字的覆盖度;Calculate the coverage of each keyword to be selected;
根据所述各待选关键字的覆盖度、预设的阈值、所述第一关键字集合和第二关键字集合,确定第一临时集合;determining a first temporary set according to the coverage of each candidate keyword, a preset threshold, the first set of keywords, and the second set of keywords;
根据预设的第三关键字集合内关键字的数量和所述第一临时集合,确定第三关键字集合。The third keyword set is determined according to the number of keywords in the preset third keyword set and the first temporary set.
优选的,所述计算各待选关键字的覆盖度,具体包括:Preferably, the calculation of the coverage of each keyword to be selected specifically includes:
获取与各待选关键字相关的跟踪对象的数量,以及跟踪对象的总数量;Obtain the number of tracking objects related to each keyword to be selected, and the total number of tracking objects;
分别计算所述与各待选关键字相关的跟踪对象的数量和所述跟踪对象的总数量的比值,以得到各待选关键字的覆盖度。A ratio of the number of tracking objects related to each candidate keyword to the total number of tracking objects is calculated respectively to obtain the coverage of each candidate keyword.
优选的,所述根据预设的第三关键字集合内关键字的数量和所述第一临时集合,确定第三关键字集合,具体包括:Preferably, the determining the third keyword set according to the number of keywords in the preset third keyword set and the first temporary set specifically includes:
将所述第一临时集合内关键字的数量与预设的第三关键字集合内关键字的数量相比较;comparing the number of keywords in the first temporary set with the number of keywords in a preset third keyword set;
若前者大于或等于后者,则将所述第一临时集合内的关键字按照覆盖度从大到小排序,并选取所述排序中前预设数量个关键字作为所述第三关键字集合的元素,所述预设数量为所述第三关键字集合内关键字的数量;If the former is greater than or equal to the latter, sort the keywords in the first temporary set according to the degree of coverage from large to small, and select the first preset number of keywords in the sorting as the third keyword set elements, the preset number is the number of keywords in the third keyword set;
若前者小于后者,则所述第三关键字集合为所述第一临时集合。If the former is smaller than the latter, the third key set is the first temporary set.
优选的,所述根据所述各待选关键字的覆盖度、预设的阈值、所述第一关键字集合和第二关键字集合,确定第一临时集合,具体包括:Preferably, the determining the first temporary set according to the coverage of each candidate keyword, the preset threshold, the first keyword set and the second keyword set specifically includes:
将所述各待选关键字的覆盖度与预设的阈值相比较,若前者大于后者,则将相应的待选关键字作为第二临时集合的元素;Comparing the coverage of each candidate keyword with a preset threshold, if the former is greater than the latter, then using the corresponding candidate keyword as an element of the second temporary set;
计算所述第二临时集合、第一关键字集合和第二关键字集合的交集并取反,以得到所述第一临时集合。Computing and inverting the intersection of the second temporary set, the first key set, and the second key set to obtain the first temporary set.
本发明还提供一种关键字管理装置,所述装置包括:第一处理模块、第二处理模块和第三处理模块;The present invention also provides a keyword management device, which includes: a first processing module, a second processing module and a third processing module;
所述第一处理模块用于,确定第二关键字集合,所述第二关键字集合内的关键字为当前热点信息的关键字;The first processing module is used to determine a second keyword set, and keywords in the second keyword set are keywords of current hotspot information;
所述第二处理模块用于,确定第三关键字集合,所述第三关键字集合内的关键字为各跟踪对象的共性关键字;The second processing module is used to determine a third keyword set, and the keywords in the third keyword set are common keywords of all tracking objects;
所述第三处理模块用于,计算所述第二关键字集合、所述第三关键字集合以及预设的第一关键字集合的并集,以确定信息情报关键字;其中,所述第一关键字集合内的关键字为各跟踪对象的个性关键字。The third processing module is used to calculate the union of the second keyword set, the third keyword set, and the preset first keyword set to determine information intelligence keywords; wherein, the first The keywords in a keyword set are individual keywords of each tracking object.
优选的,所述第二处理模块具体用于,计算各待选关键字的覆盖度;根据所述各待选关键字的覆盖度、预设的阈值、所述第一关键字集合和第二关键字集合,确定第一临时集合;根据预设的第三关键字集合内关键字的数量和所述第一临时集合,确定第三关键字集合。Preferably, the second processing module is specifically configured to calculate the coverage of each candidate keyword; according to the coverage of each candidate keyword, a preset threshold, the first keyword set and the second A keyword set is used to determine a first temporary set; and a third keyword set is determined according to the number of keywords in the preset third keyword set and the first temporary set.
优选的,所述第二处理模块具体用于,获取与各待选关键字相关的跟踪对象的数量,以及跟踪对象的总数量;分别计算所述与各待选关键字相关的跟踪对象的数量和所述跟踪对象的总数量的比值,以得到各待选关键字的覆盖度。Preferably, the second processing module is specifically configured to obtain the number of tracking objects related to each candidate keyword and the total number of tracking objects; respectively calculate the number of tracking objects related to each candidate keyword and the total number of tracking objects to obtain the coverage of each candidate keyword.
优选的,所述第二处理模块用于,将所述第一临时集合内关键字的数量与预设的第三关键字集合内关键字的数量相比较;当前者大于或等于后者时,将所述第一临时集合内的关键字按照覆盖度从大到小排序,并选取所述排序中前预设数量个关键字作为所述第三关键字集合的元素,所述预设数量为所述第三关键字集合内关键字的数量;当前者小于后者时,所述第三关键字集合为所述第一临时集合。Preferably, the second processing module is used to compare the number of keywords in the first temporary set with the number of keywords in the preset third keyword set; when the former is greater than or equal to the latter, Sorting the keywords in the first temporary set according to the degree of coverage from large to small, and selecting the first preset number of keywords in the sorting as elements of the third keyword set, the preset number is The number of keywords in the third keyword set; when the former is less than the latter, the third keyword set is the first temporary set.
优选的,所述第三处理模块用于,将所述各待选关键字的覆盖度与预设的阈值相比较,当前者大于后者时,将相应的待选关键字作为第二临时集合的元素,并计算所述第二临时集合、第一关键字集合和第二关键字集合的交集并取反,以得到所述第一临时集合。Preferably, the third processing module is used to compare the coverage of each candidate keyword with a preset threshold, and when the former is greater than the latter, use the corresponding candidate keyword as a second temporary set , and calculate the intersection of the second temporary set, the first key set and the second key set and invert them to obtain the first temporary set.
本发明通过计算当前热点信息关键字集合、各跟踪对象的共性关键字集合、各跟踪对象的个性关键字集合的并集,可以快速确定出信息情报关键字,且确定出的信息情报关键字不但覆盖当前热点信息,而且具有针对性,能够满足各用户(即跟踪对象)的个性化需求,具有多维度、覆盖面广的特点。The present invention can quickly determine the information intelligence keyword by calculating the union of the current hot information keyword set, the common keyword set of each tracking object, and the individual keyword set of each tracking object, and the determined information intelligence keyword is not only It covers the current hotspot information and is targeted, which can meet the individual needs of each user (that is, the tracking object), and has the characteristics of multi-dimensional and wide coverage.
附图说明Description of drawings
图1为本发明实施例提供的信息情报关键字获取流程图;Fig. 1 is the flow chart of information intelligence keyword acquisition provided by the embodiment of the present invention;
图2为本发明实施例提供的确定第三关键字集合的流程图之一;FIG. 2 is one of the flow charts for determining a third keyword set provided by an embodiment of the present invention;
图3为本发明实施例提供的确定第三关键字集合的流程图之二;Fig. 3 is the second flow chart of determining the third keyword set provided by the embodiment of the present invention;
图4为本发明实施例提供的确定第一临时集合的流程图;FIG. 4 is a flowchart of determining a first temporary set provided by an embodiment of the present invention;
图5为本发明实施例提供的关键字管理装置的结构图。FIG. 5 is a structural diagram of a keyword management device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整的描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Apparently, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
本发明提供一种信息情报关键字获取方法,所述方法应用于信息情报资源系统,所述信息情报资源系统包括:素材采集装置、资源池建立装置、信息情报资源池、跟踪对象数据库、跟踪对象管理装置、关键字数据库和关键字管理装置,跟踪对象管理装置用于更新跟踪对象数据库,关键字管理装置用于更新关键字数据库,素材采集装置通过跟踪对象管理装置和关键字管理装置分别获取跟踪对象信息素材和关键字信息素材,资源池建立装置对跟踪对象信息素材和关键字信息素材进行编码,得到情报信息,并将情报信息存储在信息情报资源池内。The present invention provides a method for obtaining information and intelligence keywords. The method is applied to an information and intelligence resource system, and the information and intelligence resource system includes: a material collection device, a resource pool establishment device, an information and intelligence resource pool, a tracking object database, and a tracking object The management device, the keyword database and the keyword management device, the tracking object management device is used to update the tracking object database, the keyword management device is used to update the keyword database, and the material acquisition device obtains the tracking objects through the tracking object management device and the keyword management device respectively. The object information material and keyword information material, and the resource pool establishment device code the tracking object information material and keyword information material to obtain intelligence information, and store the intelligence information in the information intelligence resource pool.
在本发明实施例中,跟踪对象主要包括全球主流运营商及大型互联网公司,跟踪对象存储在跟踪对象数据库中。In the embodiment of the present invention, the tracking objects mainly include global mainstream operators and large Internet companies, and the tracking objects are stored in the tracking object database.
情报信息关键字包括个性关键字和共性关键字。关键字根据本公司战略和重点工作等提炼形成,各个跟踪对象信息检索的必查关键字为共性关键字,例如5G、云计算、大数据、物联网等,共性关键字存储在关键字数据库的共性关键字模块内。根据跟踪对象的当前热点信息梳理提炼的关键字为该跟踪对象的个性关键字,个性关键字存储在关键字数据库的个性关键字模块内。Intelligence information keywords include individual keywords and common keywords. The keywords are refined and formed according to the company's strategy and key tasks. The must-see keywords for information retrieval of each tracking object are common keywords, such as 5G, cloud computing, big data, Internet of Things, etc. The common keywords are stored in the keyword database. Within the common keyword module. The keywords sorted and extracted according to the current hotspot information of the tracked object are the individualized keywords of the tracked object, and the individualized keywords are stored in the individualized keyword module of the keyword database.
如图1所示,信息情报关键字获取方法包括以下步骤:As shown in Figure 1, the method for obtaining information intelligence keywords includes the following steps:
步骤101,确定第二关键字集合,所述第二关键字集合内的关键字为当前热点信息的关键字。Step 101, determining a second keyword set, where keywords in the second keyword set are keywords of current hotspot information.
具体的,第二关键字集合kw2内关键字的数量为n2,n2为预设值,且n2≥0。kw2={kw21,kw22,kw23,…kw2n2},第二关键字集合kw2内的各关键字为当前热点信息的关键字,可以通过最活跃选取算法计算得到,最活跃选取算法为现有算法,在此不再赘述。Specifically, the number of keywords in the second keyword set kw2 is n2, n2 is a preset value, and n2≥0. kw2={kw2 1 , kw2 2 , kw2 3 ,...kw2 n2 }, each keyword in the second keyword set kw2 is the keyword of the current hotspot information, which can be calculated by the most active selection algorithm, and the most active selection algorithm is Existing algorithms will not be repeated here.
步骤102,确定第三关键字集合,所述第三关键字集合内的关键字为各跟踪对象的共性关键字。Step 102, determining a third keyword set, where keywords in the third keyword set are common keywords of all tracking objects.
具体的,第三关键字集合kw3内关键字的数量为n3,n3为预设值,且n3≥0。kw3={kw31,kw32,kw33,…kw3n3},第三关键字集合kw3内的各关键字为各跟踪对象的共性关键字,包括各跟踪对象均关注的热点词汇,例如:物联网、区块链、大数据等。Specifically, the number of keywords in the third keyword set kw3 is n3, where n3 is a preset value, and n3≥0. kw3={kw3 1 , kw3 2 , kw3 3 ,... kw3 n3 }, each keyword in the third keyword set kw3 is the common keyword of each tracking object, including hot words that each tracking object is concerned about, for example: Networking, blockchain, big data, etc.
确定第三关键字集合kw3的具体实现方案后续结合图2再详细说明。A specific implementation solution for determining the third keyword set kw3 will be described in detail later in conjunction with FIG. 2 .
步骤103,计算第二关键字集合、第三关键字集合以及预设的第一关键字集合的并集,以确定信息情报关键字。Step 103, calculating the union of the second keyword set, the third keyword set, and the preset first keyword set to determine information intelligence keywords.
具体的,第一关键字集合kw1内关键字的数量为n1,n1为预设值,且n1≥0。信息情报关键字的总数量为n,n=n1+n2+n3,0≤n1≤n,0≤n2≤n,0≤n3≤n。kw1={kw11,kw12,kw13,…kw1n1},第一关键字集合kw1内的关键字为各跟踪对象的个性关键字,包括各个跟踪对象长期关注的领域和热点词汇,第一关键字集合kw1内的各关键字可以由各个跟踪对象设置。Specifically, the number of keywords in the first keyword set kw1 is n1, n1 is a preset value, and n1≥0. The total number of information intelligence keywords is n, n=n1+n2+n3, 0≤n1≤n, 0≤n2≤n, 0≤n3≤n. kw1={kw1 1 , kw1 2 , kw1 3 ,...kw1 n1 }, the keywords in the first keyword set kw1 are the individual keywords of each tracking object, including the fields and hot words that each tracking object pays attention to for a long time, the first Each keyword in the keyword set kw1 can be set by each tracking object.
最终确定出的信息情报关键字集合为kw,kw=kw1∪kw2∪kw3。The finally determined information intelligence keyword set is kw, kw=kw1∪kw2∪kw3.
通过步骤101-103可以看出,本发明通过计算当前热点信息关键字集合、各跟踪对象的共性关键字集合、各跟踪对象的个性关键字集合的并集,可以快速确定出信息情报关键字,且确定出的信息情报关键字不但覆盖当前热点信息,而且具有针对性,能够满足各用户(即跟踪对象)的个性化需求,具有多维度、覆盖面广的特点。It can be seen from steps 101-103 that the present invention can quickly determine the information and intelligence keywords by calculating the union of the current hot information keyword set, the common keyword set of each tracking object, and the individual keyword set of each tracking object. Moreover, the determined information and intelligence keywords not only cover the current hot information, but also are targeted, and can meet the individual needs of each user (that is, the tracking object), and have the characteristics of multi-dimensionality and wide coverage.
进一步的,如图2所示,所述确定第三关键字集合(即步骤102),具体包括以下步骤:Further, as shown in Figure 2, the determination of the third keyword set (ie step 102) specifically includes the following steps:
步骤201,计算各待选关键字的覆盖度。Step 201, calculating the coverage of each candidate keyword.
具体的,获取与各待选关键字相关的跟踪对象的数量T1i,以及跟踪对象的总数量T,分别计算所述与各待选关键字相关的跟踪对象的数量和所述跟踪对象的总数量的比值,以得到各待选关键字的覆盖度qi,即qi=T1i/T,其中,i表示待选关键字。Specifically, the number T1 i of tracking objects related to each candidate keyword and the total number T of tracking objects are obtained, and the number of tracking objects related to each candidate keyword and the total number of tracking objects are calculated respectively. The ratio of the numbers to obtain the coverage q i of each candidate keyword, that is, q i =T1 i /T, where i represents the candidate keyword.
与各待选关键字相关的跟踪对象是指,关注该待选关键字的跟踪对象,即选择该待选关键字作为共性关键字和/或个性关键字的跟踪对象。The tracking object related to each candidate keyword refers to the tracking object who pays attention to the candidate keyword, that is, selects the candidate keyword as the tracking object of the common keyword and/or individual keyword.
步骤202,根据各待选关键字的覆盖度qi、预设的阈值Q、第一关键字集合kw1和第二关键字集合kw2,确定第一临时集合temp1kw。Step 202: Determine the first temporary set temp1kw according to the coverage q i of each candidate keyword, the preset threshold Q, the first keyword set kw1 and the second keyword set kw2.
具体的,确定第一临时集合temp1kw的流程后续结合图4再详细说明。Specifically, the process of determining the first temporary set temp1kw will be described in detail later with reference to FIG. 4 .
步骤203,根据预设的第三关键字集合kw3内关键字的数量n3和第一临时集合temp1kw,确定第三关键字集合kw3。Step 203: Determine the third keyword set kw3 according to the number n3 of keywords in the preset third keyword set kw3 and the first temporary set temp1kw.
具体的,第一临时集合temp1kw内的关键字有可能是最终的信息情报关键字,即第三关键字集合kw3与第一临时集合temp1kw相同。第一临时集合temp1kw内的关键字有可能与最终的信息情报关键字不同,即第一临时集合temp1kw的范围大于第三关键字集合kw3的范围。Specifically, the keywords in the first temporary set temp1kw may be final information intelligence keywords, that is, the third keyword set kw3 is the same as the first temporary set temp1kw. The keywords in the first temporary set temp1kw may be different from the final information intelligence keywords, that is, the range of the first temporary set temp1kw is larger than the range of the third keyword set kw3.
具体如何确定第三关键字集合kw3的方案后续结合图3再详细说明。The specific scheme of how to determine the third keyword set kw3 will be described in detail later in conjunction with FIG. 3 .
通过步骤201-203可以看出,将各待选关键字的覆盖度qi作为确定第三关键字集合kw3内关键字的标准,可以选择出覆盖度较高、覆盖面广的关键字,从而能够涵盖各个跟踪对象的不同需求。As can be seen from steps 201-203, the coverage q i of each candidate keyword is used as the criterion for determining the keywords in the third keyword set kw3, and keywords with higher coverage and wide coverage can be selected, thereby being able to Covers the different needs of each tracked object.
以下结合图3,对确定第三关键字集合kw3(即步骤203)的流程进行详细说明。如图3所示,确定第三关键字集合kw3的流程包括以下步骤:The flow of determining the third keyword set kw3 (that is, step 203 ) will be described in detail below in conjunction with FIG. 3 . As shown in Figure 3, the process of determining the third keyword set kw3 includes the following steps:
步骤301,将第一临时集合内关键字的数量与预设的第三关键字集合内关键字的数量相比较,若前者大于后者,则执行步骤302;否则,执行步骤304。Step 301 , compare the number of keywords in the first temporary set with the number of keywords in the preset third keyword set, if the former is greater than the latter, go to step 302 ; otherwise, go to step 304 .
具体的,假设第一临时集合temp1kw内关键字的数量为n’,将n’与第三关键字集合kw3内关键字的数量n3相比较,若n’>n3,说明第一临时集合temp1kw内关键字的数量大于所需的第三关键字集合kw3内关键字的数量,此时,需要进一步从第一临时集合temp1kw内选取更为合适的关键字放入第三关键字集合kw3内(即执行步骤302和步骤303);若n’≤n3,说明第一临时集合temp1kw内关键字的数量小于或等于所需的第三关键字集合kw3内关键字的数量,此时,将第一临时集合temp1kw内的全部关键字放入第三关键字集合kw3内(即执行步骤304)。Specifically, assuming that the number of keywords in the first temporary set temp1kw is n', compare n' with the number n3 of keywords in the third keyword set kw3, if n'>n3, it means that the number of keywords in the first temporary set temp1kw The quantity of keywords is greater than the quantity of keywords in the required third keyword set kw3. At this time, it is necessary to further select more suitable keywords from the first temporary set temp1kw and put them into the third keyword set kw3 (ie Execute step 302 and step 303); if n'≤n3, it means that the quantity of keywords in the first temporary collection temp1kw is less than or equal to the quantity of keywords in the required third keyword collection kw3, at this moment, the first temporary All the keywords in the set temp1kw are put into the third keyword set kw3 (that is, step 304 is executed).
步骤302,将第一临时集合内的关键字按照覆盖度从大到小排序。Step 302, sort the keywords in the first temporary set in descending order of coverage.
具体的,将第一临时集合temp1kw内的n’个关键字按照覆盖度qi从大到小排序,其中各关键字的覆盖度qi在步骤201中已计算得出。Specifically, the n′ keywords in the first temporary set temp1kw are sorted according to the coverage q i from large to small, wherein the coverage q i of each keyword has been calculated in step 201 .
步骤303,选取所述排序中前预设数量个关键字作为第三关键字集合的元素。Step 303, selecting the first preset number of keywords in the ranking as elements of the third keyword set.
具体的,所述预设数量为第三关键字集合内关键字的数量n3,也就是说,在所述覆盖度排序中,选取前n3个关键字形成第三关键字集合kw3。Specifically, the preset number is the number n3 of keywords in the third keyword set, that is, in the coverage sorting, the first n3 keywords are selected to form the third keyword set kw3.
步骤304,第三关键字集合为第一临时集合。Step 304, the third keyword set is the first temporary set.
具体的,若第一临时集合temp1kw内关键字的数量n’未达到第三关键字集合kw3所需的关键字数量,则将整个第一临时集合temp1kw作为第三关键字集合kw3。Specifically, if the number n' of keywords in the first temporary set temp1kw does not reach the number of keywords required by the third keyword set kw3, the entire first temporary set temp1kw is used as the third keyword set kw3.
通过步骤301-步骤303可以看出,通过对第一临时集合temp1kw内关键字的筛选,这样选择出的信息情报关键字(即第三键字集合kw3)的覆盖度更大,覆盖面更广。From steps 301 to 303, it can be seen that through the screening of keywords in the first temporary set temp1kw, the selected information and intelligence keywords (ie the third keyword set kw3) have greater coverage and wider coverage.
以下结合图4,对确定第一临时集合temp1kw(即步骤202)的流程进行详细说明。如图3所示,确定第一临时集合temp1kw的流程包括以下步骤:The flow of determining the first temporary set temp1kw (that is, step 202 ) will be described in detail below in conjunction with FIG. 4 . As shown in Figure 3, the process of determining the first temporary set temp1kw includes the following steps:
步骤401,将各待选关键字的覆盖度与预设的阈值相比较,若前者大于后者,则执行步骤402;否则,丢弃该待选关键字。In step 401, the coverage of each candidate keyword is compared with a preset threshold, and if the former is greater than the latter, step 402 is performed; otherwise, the candidate keyword is discarded.
具体的,预先设定阈值Q,将各待选关键字的覆盖度qi分别与阈值Q相比较,若待选关键字的覆盖度qi>Q,说明该待选关键字合格,则将该待选关键字放入第二临时集合temp2kw中(即执行步骤402);若待选关键字的覆盖度qi≤Q,说明该待选关键字不合格,则丢弃该待选关键字。Specifically, the threshold Q is set in advance, and the coverage q i of each candidate keyword is compared with the threshold Q. If the coverage q i of the candidate keyword is >Q, it means that the candidate keyword is qualified, and the The keyword to be selected is put into the second temporary set temp2kw (ie step 402 is executed); if the coverage of the keyword to be selected is q i ≤ Q, it means that the keyword to be selected is unqualified, and the keyword to be selected is discarded.
步骤402,将相应的待选关键字作为第二临时集合的元素。Step 402, taking corresponding candidate keywords as elements of the second temporary set.
步骤403,计算第二临时集合、第一关键字集合和第二关键字集合的交集并取反,以得到第一临时集合。Step 403, calculating the intersection of the second temporary set, the first key set and the second key set and inverting to obtain the first temporary set.
具体的,第一临时集合这样,可以排除掉第二临时集合temp2kw、第一关键字集合kw1和第二关键字集合kw2中相同的关键字。在后续根据第一临时集合temp1kw确定第三关键字集合kw3时,避免第三关键字集合kw3内的关键字重复。Specifically, the first temporary collection In this way, the same keywords in the second temporary set temp2kw, the first keyword set kw1 and the second keyword set kw2 can be excluded. When the third keyword set kw3 is subsequently determined according to the first temporary set temp1kw, repetition of keywords in the third keyword set kw3 is avoided.
基于相同的技术构思,本发明实施例还提供一种关键字管理装置,如图5所示,该关键字管理装置包括:第一处理模块51、第二处理模块52和第三处理模块53。Based on the same technical concept, the embodiment of the present invention also provides a keyword management device. As shown in FIG. 5 , the keyword management device includes: a first processing module 51 , a second processing module 52 and a third processing module 53 .
第一处理模块51用于,确定第二关键字集合,所述第二关键字集合内的关键字为当前热点信息的关键字。The first processing module 51 is configured to determine a second keyword set, and keywords in the second keyword set are keywords of current hotspot information.
第二处理模块52用于,确定第三关键字集合,所述第三关键字集合内的关键字为各跟踪对象的共性关键字。The second processing module 52 is configured to determine a third keyword set, and keywords in the third keyword set are common keywords of all tracking objects.
第三处理模块53用于,计算所述第二关键字集合、所述第三关键字集合以及预设的第一关键字集合的并集,以确定信息情报关键字;其中,所述第一关键字集合内的关键字为各跟踪对象的个性关键字。The third processing module 53 is used to calculate the union of the second keyword set, the third keyword set, and the preset first keyword set to determine information intelligence keywords; wherein, the first The keywords in the keyword set are the individual keywords of each tracking object.
优选的,第二处理模块52具体用于,计算各待选关键字的覆盖度;根据所述各待选关键字的覆盖度、预设的阈值、所述第一关键字集合和第二关键字集合,确定第一临时集合;根据预设的第三关键字集合内关键字的数量和所述第一临时集合,确定第三关键字集合。Preferably, the second processing module 52 is specifically used to calculate the coverage of each candidate keyword; according to the coverage of each candidate keyword, a preset threshold, the first keyword set and the second keyword A word set is used to determine a first temporary set; and a third keyword set is determined according to the number of keywords in the preset third keyword set and the first temporary set.
优选的,第二处理模块52具体用于,获取与各待选关键字相关的跟踪对象的数量,以及跟踪对象的总数量;分别计算所述与各待选关键字相关的跟踪对象的数量和所述跟踪对象的总数量的比值,以得到各待选关键字的覆盖度。Preferably, the second processing module 52 is specifically configured to obtain the number of tracking objects related to each candidate keyword and the total number of tracking objects; respectively calculate the number and sum of the tracking objects related to each candidate keyword The ratio of the total number of tracking objects to obtain the coverage of each candidate keyword.
优选的,第二处理模块52用于,将所述第一临时集合内关键字的数量与预设的第三关键字集合内关键字的数量相比较;当前者大于或等于后者时,将所述第一临时集合内的关键字按照覆盖度从大到小排序,并选取所述排序中前预设数量个关键字作为所述第三关键字集合的元素,所述预设数量为所述第三关键字集合内关键字的数量;当前者小于后者时,所述第三关键字集合为所述第一临时集合。Preferably, the second processing module 52 is used to compare the quantity of keywords in the first temporary set with the quantity of keywords in the preset third keyword set; when the former is greater than or equal to the latter, the The keywords in the first temporary set are sorted according to the degree of coverage from large to small, and the first preset number of keywords in the sorting are selected as elements of the third keyword set, and the preset number is all The number of keywords in the third keyword set; when the former is less than the latter, the third keyword set is the first temporary set.
优选的,第三处理模块53用于,将所述各待选关键字的覆盖度与预设的阈值相比较,当前者大于后者时,将相应的待选关键字作为第二临时集合的元素,并计算所述第二临时集合、第一关键字集合和第二关键字集合的交集并取反,以得到所述第一临时集合。Preferably, the third processing module 53 is used to compare the coverage of each candidate keyword with a preset threshold, and when the former is greater than the latter, use the corresponding candidate keyword as the second temporary set elements, and calculate the intersection of the second temporary set, the first key set and the second key set and invert to obtain the first temporary set.
可以理解的是,以上实施方式仅仅是为了说明本发明的原理而采用的示例性实施方式,然而本发明并不局限于此。对于本领域内的普通技术人员而言,在不脱离本发明的精神和实质的情况下,可以做出各种变型和改进,这些变型和改进也视为本发明的保护范围。It can be understood that, the above embodiments are only exemplary embodiments adopted for illustrating the principle of the present invention, but the present invention is not limited thereto. For those skilled in the art, various modifications and improvements can be made without departing from the spirit and essence of the present invention, and these modifications and improvements are also regarded as the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810431832.7A CN108628832B (en) | 2018-05-08 | 2018-05-08 | Method and device for acquiring information keywords |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810431832.7A CN108628832B (en) | 2018-05-08 | 2018-05-08 | Method and device for acquiring information keywords |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108628832A true CN108628832A (en) | 2018-10-09 |
CN108628832B CN108628832B (en) | 2022-03-18 |
Family
ID=63695891
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810431832.7A Active CN108628832B (en) | 2018-05-08 | 2018-05-08 | Method and device for acquiring information keywords |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628832B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111651657A (en) * | 2020-06-04 | 2020-09-11 | 深圳前海微众银行股份有限公司 | Intelligence monitoring method, apparatus, device and computer-readable storage medium |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1912872A (en) * | 2006-07-25 | 2007-02-14 | 北京搜狗科技发展有限公司 | Method and system for abstracting new word |
US7181438B1 (en) * | 1999-07-21 | 2007-02-20 | Alberti Anemometer, Llc | Database access system |
CN101296128A (en) * | 2007-04-24 | 2008-10-29 | 北京大学 | A method for monitoring abnormal state of Internet information |
CN101520878A (en) * | 2009-04-03 | 2009-09-02 | 华为技术有限公司 | Method, device and system for pushing advertisements to users |
CN102110269A (en) * | 2011-02-25 | 2011-06-29 | 中兴通讯股份有限公司 | Advertisement releasing method and system |
CN103425677A (en) * | 2012-05-18 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Method for determining classified models of keywords and method and device for classifying keywords |
CN103530398A (en) * | 2013-10-23 | 2014-01-22 | 合山市科学技术情报研究所 | Information collecting, processing and retrieving system |
CN103714413A (en) * | 2013-11-21 | 2014-04-09 | 清华大学 | Position information-based competence model construction system and method |
CN103744873A (en) * | 2013-12-18 | 2014-04-23 | 天脉聚源(北京)传媒科技有限公司 | Method, device and browser for displaying hotspot keyword |
CN104035997A (en) * | 2014-06-13 | 2014-09-10 | 淮阴工学院 | Scientific and technical information acquisition and pushing method based on text classification and image deep mining |
CN104679787A (en) * | 2013-11-27 | 2015-06-03 | 华为技术有限公司 | Interest information statistical method and device |
CN104965893A (en) * | 2015-06-18 | 2015-10-07 | 山东师范大学 | Big data advertisement delivery method |
CN106126588A (en) * | 2016-06-17 | 2016-11-16 | 广州视源电子科技股份有限公司 | Method and device for providing related words |
CN106227735A (en) * | 2016-07-11 | 2016-12-14 | 苏州天梯卓越传媒有限公司 | A kind of word cloud Topic Selection for Publishing Industry and system |
CN106453423A (en) * | 2016-12-08 | 2017-02-22 | 黑龙江大学 | Spam filtering system and method based on user personalized setting |
CN107341199A (en) * | 2017-06-21 | 2017-11-10 | 北京林业大学 | A kind of recommendation method based on documentation & info general model |
CN107786595A (en) * | 2016-08-26 | 2018-03-09 | 阿里巴巴集团控股有限公司 | The processing method of keyword, apparatus and system in distributed memory system |
-
2018
- 2018-05-08 CN CN201810431832.7A patent/CN108628832B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7181438B1 (en) * | 1999-07-21 | 2007-02-20 | Alberti Anemometer, Llc | Database access system |
CN1912872A (en) * | 2006-07-25 | 2007-02-14 | 北京搜狗科技发展有限公司 | Method and system for abstracting new word |
CN101296128A (en) * | 2007-04-24 | 2008-10-29 | 北京大学 | A method for monitoring abnormal state of Internet information |
CN101520878A (en) * | 2009-04-03 | 2009-09-02 | 华为技术有限公司 | Method, device and system for pushing advertisements to users |
CN102110269A (en) * | 2011-02-25 | 2011-06-29 | 中兴通讯股份有限公司 | Advertisement releasing method and system |
CN103425677A (en) * | 2012-05-18 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Method for determining classified models of keywords and method and device for classifying keywords |
CN103530398A (en) * | 2013-10-23 | 2014-01-22 | 合山市科学技术情报研究所 | Information collecting, processing and retrieving system |
CN103714413A (en) * | 2013-11-21 | 2014-04-09 | 清华大学 | Position information-based competence model construction system and method |
CN104679787A (en) * | 2013-11-27 | 2015-06-03 | 华为技术有限公司 | Interest information statistical method and device |
CN103744873A (en) * | 2013-12-18 | 2014-04-23 | 天脉聚源(北京)传媒科技有限公司 | Method, device and browser for displaying hotspot keyword |
CN104035997A (en) * | 2014-06-13 | 2014-09-10 | 淮阴工学院 | Scientific and technical information acquisition and pushing method based on text classification and image deep mining |
CN104965893A (en) * | 2015-06-18 | 2015-10-07 | 山东师范大学 | Big data advertisement delivery method |
CN106126588A (en) * | 2016-06-17 | 2016-11-16 | 广州视源电子科技股份有限公司 | Method and device for providing related words |
CN106227735A (en) * | 2016-07-11 | 2016-12-14 | 苏州天梯卓越传媒有限公司 | A kind of word cloud Topic Selection for Publishing Industry and system |
CN107786595A (en) * | 2016-08-26 | 2018-03-09 | 阿里巴巴集团控股有限公司 | The processing method of keyword, apparatus and system in distributed memory system |
CN106453423A (en) * | 2016-12-08 | 2017-02-22 | 黑龙江大学 | Spam filtering system and method based on user personalized setting |
CN107341199A (en) * | 2017-06-21 | 2017-11-10 | 北京林业大学 | A kind of recommendation method based on documentation & info general model |
Non-Patent Citations (1)
Title |
---|
田野等: "基于关键词相关度的Deep_Web爬虫爬行策略", 《计算机工程》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111651657A (en) * | 2020-06-04 | 2020-09-11 | 深圳前海微众银行股份有限公司 | Intelligence monitoring method, apparatus, device and computer-readable storage medium |
CN111651657B (en) * | 2020-06-04 | 2024-05-24 | 深圳前海微众银行股份有限公司 | Information monitoring method, device, equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108628832B (en) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111382956A (en) | Enterprise group relationship mining method and device | |
CN107220732A (en) | A kind of power failure complaint risk Forecasting Methodology based on gradient boosted tree | |
CN106251114B (en) | Method and device for realizing approval in application | |
CN106326923B (en) | A check-in location data clustering method considering location repetition and density peaks | |
CN106462620A (en) | Distance queries on massive networks | |
CN109996245B (en) | Communication resource delivery evaluation method and device, electronic equipment and storage medium | |
CN110288824B (en) | Analysis method of morning and evening peak congestion status and propagation mechanism based on Granger causality road network | |
CN102388387A (en) | Access-control-policy template generating device, and system, method and program thereof | |
CN106789338A (en) | A kind of method that key person is found in the extensive social networks of dynamic | |
CN115510249A (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
CN114297714A (en) | A method for data privacy protection and secure search in cloud environment | |
CN116701979A (en) | Method and system for social network data analysis based on restricted k-means | |
CN109885797B (en) | A Relational Network Construction Method Based on Multi-Identity Space Mapping | |
CN108628832A (en) | A kind of information keyword acquisition methods and device | |
CN108256083A (en) | Content recommendation method based on deep learning | |
CN108280176A (en) | Data mining optimization method based on MapReduce | |
CN111026863A (en) | Customer behavior prediction method, apparatus, device and medium | |
CN108256086A (en) | Data characteristics statistical analysis technique | |
CN105095264A (en) | Determination method and apparatus for relationship circle | |
CN111125541B (en) | Method for acquiring sustainable multi-cloud service combination for multiple users | |
CN111611397B (en) | Information matching method, device, computer equipment and storage medium | |
CN106780060A (en) | Consult social network user identification method and device in place based on comentropy | |
CN117114202B (en) | Resume delivery probability prediction method, device, equipment and storage medium | |
CN106888237A (en) | A kind of data dispatching method and system | |
US20120054117A1 (en) | Identifying an individual in response to a query seeking to locate personnel with particular experience |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |