[go: up one dir, main page]

CN104615715A - Social network event analyzing method and system based on geographic positions - Google Patents

Social network event analyzing method and system based on geographic positions Download PDF

Info

Publication number
CN104615715A
CN104615715A CN201510061722.2A CN201510061722A CN104615715A CN 104615715 A CN104615715 A CN 104615715A CN 201510061722 A CN201510061722 A CN 201510061722A CN 104615715 A CN104615715 A CN 104615715A
Authority
CN
China
Prior art keywords
social network
network data
data text
geographic position
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510061722.2A
Other languages
Chinese (zh)
Inventor
李建欣
吴博
张日崇
于伟仁
胡春明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510061722.2A priority Critical patent/CN104615715A/en
Publication of CN104615715A publication Critical patent/CN104615715A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种基于地理位置的社交网络事件分析方法及系统,包括:对每个社交网络数据文本进行分词处理,获得所述社交网络数据文本的词;建立所述社交网络数据文本对应的地理位置与所述社交网络数据文本的映射关系,所述社交网络数据文本对应的地理位置为所述社交网络数据文本的词中,与地理位置相关的词;确定预设的各目标地理位置对应的社交网络数据文本;针对每个目标地理位置,对所述目标地理位置对应的社交网络数据文本的词进行权重计算,获得并将所述目标地理位置对应的社交网络数据文本的关键词,作为所述目标地理位置的热门事件进行推送。通过本发明提供的方案,可以帮助用户直观的获取到地理位置相关的热门事件。

The present invention provides a social network event analysis method and system based on geographic location, including: performing word segmentation processing on each social network data text to obtain the words of the social network data text; establishing the geographical location corresponding to the social network data text The mapping relationship between the position and the text of the social network data, the geographical position corresponding to the text of the social network data is a word related to the geographical position in the words of the text of the social network data; Social network data text; for each target geographic location, carry out weight calculation to the words of the social network data text corresponding to the target geographic location, obtain and use the keywords of the social network data text corresponding to the target geographic location as the Push the popular events in the target geographic location. Through the solution provided by the present invention, users can be helped to intuitively acquire popular events related to geographic location.

Description

基于地理位置的社交网络事件分析方法及系统Location-based social network event analysis method and system

技术领域technical field

本发明涉及数据挖掘领域,尤其涉及一种基于地理位置的社交网络事件分析方法及系统。The present invention relates to the field of data mining, in particular to a geographical location-based social network event analysis method and system.

背景技术Background technique

社交网络的社会化特性及迅速、及时的传播,吸引了大量对信息实时性有高需求的用户,使得世界上的每一个人都能够成为信息源,并使之在全球传播,这就使得社交网络事件本身所承载的信息量大大增加。社交网络事件集合了海量的新闻、事件和信息,并且每天都在更新,每天都在流传,并对现实的社会产生巨大的影响。尤其是在突发事件的信息传播上,更是超越了传统媒体,成为了信息快速传播的渠道。社交网络上的信息不仅发布及时,而且也是现实社会生活的缩影,挖掘社交网络事件中的信息有利于从不同角度分析现实世界的情况。The social characteristics and rapid and timely dissemination of social networks have attracted a large number of users who have a high demand for real-time information, enabling everyone in the world to become a source of information and spread it globally. The amount of information carried by the network incident itself has greatly increased. Social network events are a collection of massive news, events and information, and are updated and circulated every day, and have a huge impact on the real society. Especially in the information dissemination of emergencies, it has surpassed the traditional media and become a channel for the rapid dissemination of information. Information on social networks is not only released in a timely manner, but also a microcosm of real social life. Mining information in social network events is conducive to analyzing the real world situation from different angles.

随着移动互联网的爆炸性发展,具有定位功能的设备也越来越普及,用户可以方便地获取到更加精确的地理位置信息,这使得越来越多的数据带有了地理的属性。同时,在城市规划、旅游业、安全等领域应用中,对这类带有地理位置信息的数据的分析需求也越来越旺盛。With the explosive development of the mobile Internet, devices with positioning functions are becoming more and more popular, and users can easily obtain more accurate geographic location information, which makes more and more data with geographic attributes. At the same time, in the application of urban planning, tourism, security and other fields, the demand for analysis of this kind of data with geographic location information is also increasing.

以微博为代表的社交网络已成为中国发展最快的互联网应用,是一个基于用户关系的信息分享、传播以及获取信息的平台。目前在发布信息的同时可以对地理位置进行标记,但是这些地理位置相对孤立,即只和本条微博存在联系。尽管海量的微博信息之间可通过评论、转发、好友发生联系,但其无法在真实的空间范围上发生联系,缺乏地理位置服务(LocationBased Service,简称LBS)相关因素。The social network represented by Weibo has become the fastest growing Internet application in China. It is a platform for information sharing, dissemination and information acquisition based on user relationships. At present, geographical locations can be marked while publishing information, but these geographical locations are relatively isolated, that is, they are only related to this Weibo. Although massive microblog information can be connected through comments, forwarding, and friends, it cannot be connected in a real spatial range, and it lacks location-based service (Location Based Service, LBS for short) related factors.

发明内容Contents of the invention

本发明提供一种基于地理位置的社交网络事件分析方法及系统,用于基于地理位置分析相关的社交网络事件。The present invention provides a method and system for analyzing social network events based on geographic location, which are used for analyzing related social network events based on geographic location.

本发明的第一个方面是提供一种基于地理位置的社交网络事件分析方法,包括:A first aspect of the present invention provides a method for analyzing social network events based on geographic location, including:

对每个社交网络数据文本进行分词处理,获得所述社交网络数据文本的词;Perform word segmentation processing on each social network data text to obtain words of the social network data text;

建立所述社交网络数据文本对应的地理位置与所述社交网络数据文本的映射关系,所述社交网络数据文本对应的地理位置为所述社交网络数据文本的词中,与地理位置相关的词;Establishing a mapping relationship between the geographic location corresponding to the social network data text and the social network data text, where the geographic location corresponding to the social network data text is a word related to the geographic location among the words in the social network data text;

根据各社交网络数据文本对应的地理位置,确定预设的各目标地理位置对应的社交网络数据文本;According to the geographic location corresponding to each social network data text, determine the social network data text corresponding to each preset target geographic location;

针对每个目标地理位置,对所述目标地理位置对应的社交网络数据文本的词进行权重计算,获得并将所述目标地理位置对应的社交网络数据文本的关键词,作为所述目标地理位置的热门事件进行推送。For each target geographic location, the words of the social network data text corresponding to the target geographic location are weighted, and the keywords of the social network data text corresponding to the target geographic location are obtained as the target geographic location Popular events are pushed.

本发明的另一个方面是提供一种基于地理位置的社交网络事件分析系统,包括:Another aspect of the present invention provides a social network event analysis system based on geographic location, comprising:

分词模块,用于对每个社交网络数据文本进行分词处理,获得所述社交网络数据文本的词;A word segmentation module, configured to perform word segmentation processing on each social network data text, to obtain words of the social network data text;

地理位置获取模块,用于建立所述社交网络数据文本对应的地理位置与所述社交网络数据文本的映射关系,所述社交网络数据文本对应的地理位置为所述社交网络数据文本的词中,与地理位置相关的词;A geographic location acquisition module, configured to establish a mapping relationship between a geographic location corresponding to the social network data text and the social network data text, where the geographic location corresponding to the social network data text is a word in the social network data text, words related to geographic location;

地理位置分析模块,用于根据各社交网络数据文本对应的地理位置,确定预设的各目标地理位置对应的社交网络数据文本;The geographic location analysis module is used to determine the preset social network data text corresponding to each target geographic location according to the geographic location corresponding to each social network data text;

事件分析模块,用于针对每个目标地理位置,对所述目标地理位置对应的社交网络数据文本的词进行权重计算,获得并将所述目标地理位置对应的社交网络数据文本的关键词,作为所述目标地理位置的热门事件进行推送。The event analysis module is used to calculate the weight of the words of the social network data text corresponding to the target geographic location for each target geographic location, and obtain and use the keywords of the social network data text corresponding to the target geographic location as The popular events in the target geographic location are pushed.

本发明提供的基于地理位置的社交网络事件分析方法及系统,通过对社交网络数据文本进行研究,分析出与地理位置关联的社交网络数据文本,并将各地理位置对应的社交网络数据文本的关键词,作为该地理位置的热门事件进行推送,可以帮助用户直观的获取到地理位置相关的热门事件。The geographical location-based social network event analysis method and system provided by the present invention analyze the social network data text associated with the geographical location by studying the social network data text, and the key points of the social network data text corresponding to each geographical location are analyzed. Words are pushed as popular events in the geographic location, which can help users intuitively obtain popular events related to the geographic location.

附图说明Description of drawings

图1为本发明实施例一提供的基于地理位置的社交网络事件分析方法的流程示意图;FIG. 1 is a schematic flow diagram of a method for analyzing social network events based on geographic location provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的基于地理位置的社交网络事件分析系统的结构示意图。FIG. 2 is a schematic structural diagram of a social network event analysis system based on geographic location provided by Embodiment 2 of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention.

图1为本发明实施例一提供的基于地理位置的社交网络事件分析方法的流程示意图,如图1所示,所述方法包括:FIG. 1 is a schematic flowchart of a method for analyzing social network events based on geographic location provided in Embodiment 1 of the present invention. As shown in FIG. 1 , the method includes:

101、对每个社交网络数据文本进行分词处理,获得所述社交网络数据文本的词。101. Perform word segmentation processing on each social network data text to obtain words of the social network data text.

实际应用中,可以先从大数据分析平台获取一定时间段内、一定数量的社交网络数据文本,例如,微博,则相应的,在101之前,所述方法还可以包括:获取在预设时间段内发布的预设数量的所述社交网络数据文本。In practical applications, a certain amount of social network data texts within a certain period of time can be obtained from the big data analysis platform, for example, Weibo, and correspondingly, before 101, the method can also include: obtaining A preset number of said social networking data texts are posted within a segment.

具体的,本实施例中的所述社交网络数据文本可以来源于大数据分析平台ElasticSearch搜索引擎,以微博为例,所有的微博均可为原创微博,不包括用户转发的微博。Specifically, the social network data text in this embodiment may come from the ElasticSearch search engine of the big data analysis platform. Taking microblogs as an example, all microblogs may be original microblogs, excluding those forwarded by users.

相应的,获取到一定时间段内、一定数量的社交网络数据文本后,需要对这些社交网络数据文本进行预处理,而预处理则主要是对社交网络数据文本进行分词。仍以微博为例,则需对微博文本信息进行分词处理,具体的,这里的文本信息不包括用户发布的图片等信息。Correspondingly, after obtaining a certain amount of social network data texts within a certain period of time, these social network data texts need to be preprocessed, and the preprocessing is mainly to segment the social network data texts. Still taking Weibo as an example, it is necessary to perform word segmentation processing on Weibo text information. Specifically, the text information here does not include information such as pictures posted by users.

实际应用中,所述分词处理可以通过多种实施方式实现,例如,可以利用分词器对社交网络数据文本进行分词处理。可选的,101具体可以包括:利用IKAnalyzer分词器,对所述社交网络数据文本进行分词处理。具体的,以对微博文本进行分词为例,分词器首先加载词典,分析微博文本,截取一个token,搜索关键词采用从最大词到最小词层层迭代检索方式切分,到词典中检索该搜索词中最大分割词,以此类推继续进行迭代检索方式切分直到结束。In practical applications, the word segmentation processing may be implemented through various implementation manners. For example, a word segmenter may be used to perform word segmentation processing on social network data text. Optionally, 101 may specifically include: using an IKAnalyzer tokenizer to perform word segmentation processing on the social network data text. Specifically, take the word segmentation of microblog text as an example. The word segmenter first loads the dictionary, analyzes the microblog text, and intercepts a token. The search keyword is segmented by iterative retrieval from the largest word to the smallest word, and retrieved in the dictionary. The largest segmented word in the search word, and so on, continue to perform iterative retrieval until the end.

102、建立所述社交网络数据文本对应的地理位置与所述社交网络数据文本的映射关系,所述社交网络数据文本对应的地理位置为所述社交网络数据文本的词中,与地理位置相关的词。102. Establish a mapping relationship between the geographic location corresponding to the social network data text and the social network data text, where the geographic location corresponding to the social network data text is the word related to the geographic location among the words in the social network data text word.

实际应用中,可以通过从社交网络数据文本的各词中,筛选出与地理位置相关的词,获得所述社交网络数据文本对应的地理位置。具体的,可以利用搜狗的三级行政区划地名词典进行筛选,相应的,以微博为例,如果检测到微博文本中有上述词典中的地理位置信息,就结合该微博的上下文,提取出该地理位置信息。相应的,在获得所述社交网络数据文本对应的地理位置后,把所述社交网络数据文本和这些与地理位置相关的词关联起来。具体的,可以采用命名实体识别的地理文本分析方法,利用搜狗的三级行政区划地名词典,如果微博中出现有行政区划地名词典中的地理位置信息,就结合微博的上下文提取该地理位置信息,并和该微博关联。In practical applications, the geographical location corresponding to the social network data text can be obtained by filtering words related to the geographical location from the words in the social network data text. Specifically, Sogou’s three-level administrative division geographical names dictionary can be used for screening. Correspondingly, taking Weibo as an example, if the geographical location information in the above dictionary is detected in the Weibo text, combined with the context of the Weibo, extract output the geographic location information. Correspondingly, after obtaining the geographical location corresponding to the social network data text, associate the social network data text with these geographical location-related words. Specifically, the geographical text analysis method of named entity recognition can be used, and Sogou’s three-level administrative division gazetteer can be used. If there is geographic location information in the administrative division gazetteer in the microblog, the geographic location can be extracted in combination with the context of the microblog. information, and associated with the Weibo.

可选的,为了节省处理资源,提高处理效率,对于不包含与地理位置的词的社交网络数据文本,则确定该社交网络数据文本不含地理位置信息,相应的,可将其丢弃不作处理。Optionally, in order to save processing resources and improve processing efficiency, for social network data texts that do not contain words related to geographical locations, it is determined that the social network data texts do not contain geographical location information, and correspondingly, they can be discarded without processing.

103、根据各社交网络数据文本对应的地理位置,确定预设的各目标地理位置对应的社交网络数据文本。103. According to the geographic location corresponding to each social network data text, determine the social network data text corresponding to each preset target geographic location.

实际应用中,根据各社交网络数据文本与地理位置的映射关系,可以确定出各目标地理位置对应的社交网络数据文本。In practical applications, according to the mapping relationship between each social network data text and the geographic location, the social network data text corresponding to each target geographic location can be determined.

具体的,所述各目标地理位置可以根据实际需要确定,例如,可以以可视化地图中的各地理位置为对象,进行热门事件分析,则相应的,在103之前,所述方法还可以包括:将可视化地图中的地理位置作为所述目标地理位置。举例来说,地理位置分析的可视化可以基于百度地图API实现的Web地图应用。Specifically, the target geographic locations can be determined according to actual needs. For example, the popular event analysis can be performed on the geographic locations in the visualized map. Correspondingly, before step 103, the method can also include: The geographic location in the visualized map is used as the target geographic location. For example, the visualization of geographical location analysis can be based on the Web map application implemented by Baidu Map API.

104、针对每个目标地理位置,对所述目标地理位置对应的社交网络数据文本的词进行权重计算,获得并将所述目标地理位置对应的社交网络数据文本的关键词,作为所述目标地理位置的热门事件进行推送。104. For each target geographic location, perform weight calculation on the words of the social network data text corresponding to the target geographic location, and obtain and use the keywords of the social network data text corresponding to the target geographic location as the target geographic location Push the popular events of the location.

具体的,在确定各目标地理位置对应的社交网络数据文本后,可以采用TF-IDF方法对这些社交网络数据文本的每个词进行权重计算,根据计算结果抽取出关键词,并把该关键词作为该目标地理位置的热门事件进行推送,例如在该目标地理位置上进行标注。则相应的,104中所述对所述目标地理位置对应的社交网络数据文本的词进行权重计算,具体可以包括:利用TF-IDF方法,对所述目标地理位置对应的社交网络数据文本的词进行权重计算。Specifically, after determining the social network data text corresponding to each target geographic location, the weight calculation of each word of these social network data texts can be performed using the TF-IDF method, and the keywords are extracted according to the calculation results, and the keyword It is pushed as a popular event of the target geographic location, for example, marked on the target geographic location. Correspondingly, in step 104, the weight calculation of the words of the social network data text corresponding to the target geographic location may specifically include: using the TF-IDF method to calculate the weight of the words of the social network data text corresponding to the target geographic location Do weight calculations.

具体的,获得关键词具体可以包括以下步骤:词频的归一化计算;逆向词频计算;计算词条权值提取热门词汇。Specifically, obtaining keywords may specifically include the following steps: normalized calculation of word frequency; reverse word frequency calculation; calculation of entry weights to extract popular words.

进一步具体的,词频(Term Frequency,简称TF)指的是某一个给定的词语在某文档中出现的频率,这个数字是对词数的归一化,以防止它偏向长的文档。逆向文件频率(Inverse Document Frequency,简称IDF),用于表征如果包含某词条的文档越少,则IDF越大,也就说明该词条具有很好的类别区分能力。之后求得词条的TF-IDF值,TF-IDF的值就等于TF值和IDF值的乘积。最后把各词的TF-IDF值从大到小进行排序,选出前若干个关键词作为热门事件分析的结果。To be more specific, Term Frequency (TF for short) refers to the frequency with which a given word appears in a document. This number is normalized to the number of words to prevent it from being biased towards long documents. Inverse Document Frequency (Inverse Document Frequency, referred to as IDF), used to represent that if there are fewer documents containing an entry, the larger the IDF, which means that the entry has a good category discrimination ability. After that, the TF-IDF value of the entry is obtained, and the TF-IDF value is equal to the product of the TF value and the IDF value. Finally, the TF-IDF values of each word are sorted from large to small, and the first few keywords are selected as the results of popular event analysis.

实际应用中,当分析对象为短文本时,以微博为例,若采用通常的TF-IDF算法直接提取关键词,因为微博文本通常较短,若把所有微博看成一个文档就会失去IDF信息,而若把单条微博看成一个文档,一条微博字数很少,每个词出现的频率基本都为1,也就会失去TF信息,这就会影响最终选取的关键词的准确性。In practical applications, when the analysis object is short text, take microblog as an example, if the usual TF-IDF algorithm is used to directly extract keywords, because microblog text is usually short, if all microblogs are regarded as a document, it will IDF information is lost, and if a single microblog is regarded as a document, a microblog has very few words, and the frequency of each word is basically 1, and the TF information will be lost, which will affect the final selection of keywords. accuracy.

为了提高分析结果的准确性,104中所述利用TF-IDF方法,对所述目标地理位置对应的社交网络数据文本的词进行权重计算,获得所述目标地理位置对应的社交网络数据文本的关键词,具体包括:In order to improve the accuracy of the analysis results, the TF-IDF method described in 104 is used to calculate the weight of the words of the social network data text corresponding to the target geographic location, and obtain the key of the social network data text corresponding to the target geographic location words, including:

针对所述目标地理位置对应的社交网络数据文本的每个词ti,以所述目标地理位置对应的社交网络数据文本作为第一文档,根据第一公式计算所述词ti的词频tfi,j,所述第一公式为:其中,ni,j为所述词ti在所述第一文档中的出现次数,∑knk,j为所述第一文档中所有词的出现次数之和;For each word t i of the social network data text corresponding to the target geographic location, the social network data text corresponding to the target geographic location is used as the first document, and the word frequency tf i of the word t i is calculated according to the first formula ,j , the first formula is: Wherein, n i,j is the number of occurrences of the word t i in the first document, ∑ k n k,j is the sum of the number of occurrences of all words in the first document;

以所述词所属的社交网络数据文本作为第二文档,根据第二公式计算所述词ti的逆向文件频率idfi,所述第一公式为:其中,|D|为语料库中的文件数,|{j:ti∈dj}|为所述语料库中包括所述词ti的文件数;Taking the social network data text to which the word belongs as the second document, calculate the reverse document frequency idf i of the word t i according to the second formula, and the first formula is: Wherein, |D| is the number of files in the corpus, and |{j:t i ∈ d j }| is the number of files including the word t i in the corpus;

根据第三公式,计算获得所述词ti的权值tfidfi,j,所述第三公式为:tfidfi,j=tfi,j×idfiAccording to the third formula, the weight tfidf i,j of the word t i is calculated and obtained, and the third formula is: tfidf i,j =tf i,j ×idf i ;

根据所述目标地理位置对应的社交网络数据文本的各词的权值,对所述各词进行由大到小的排序,将排在前k位的词作为所述目标地理位置对应的社交网络数据文本的关键词,其中,k为预设的值。According to the weight of each word of the social network data text corresponding to the target geographic location, the words are sorted from large to small, and the words ranked in the top k positions are used as the social network corresponding to the target geographic location The keyword of the data text, where k is a preset value.

本实施方式中,当计算IDF时,把单个社交网络数据文本当作一个文档,当计算TF值时,把所述目标地理位置对应的所有社交网络数据文本作为一个文档,这样既会有不同的词频,也包含了IDF信息,从而有效提高关键词结果的准确性,进而提高热门事件分析的准确性。In this embodiment, when calculating the IDF, a single social network data text is regarded as a document, and when the TF value is calculated, all social network data texts corresponding to the target geographic location are regarded as a document, so that there will be different Word frequency also includes IDF information, which can effectively improve the accuracy of keyword results, thereby improving the accuracy of popular event analysis.

可选的,可以利用可视化地图推送各目标地理位置对应的热门事件,举例来说,可以通过调用百度地图API进行可视化推送,具体的,可分为如下几个步骤:(1)注册百度地图API,加载API JS文件;(2)创建地图容器,对地图进行实例化并初始化;(3)进行地图交互操作数据的整合和显示。具体的,将各目标地理位置及其相应的热门事件封装成JSON的格式传递给前台进行解析处理,解析后的坐标和对应的关键词,即热门事件,在可视化地图上进行标注。Optionally, you can use the visual map to push popular events corresponding to each target geographic location. For example, you can call the Baidu Map API for visual push. Specifically, it can be divided into the following steps: (1) Register Baidu Map API , load the API JS file; (2) create a map container, instantiate and initialize the map; (3) integrate and display map interactive operation data. Specifically, each target geographic location and its corresponding popular events are encapsulated into a JSON format and passed to the front desk for parsing and processing, and the parsed coordinates and corresponding keywords, that is, popular events, are marked on the visual map.

实际应用中,随着移动设备的普及和无线通讯技术的进步,越来越多的数据带有了地理空间属性。以微博为例,由于微博的实时性和信息量大的特点,大量的微博包含地理位置信息,但一个地理位置可能和成千上万条微博相关联,用户不可能去查看所有的微博,只看一两条微博会对地理位置关联的微博事件没有一个全面的认识。In practical applications, with the popularization of mobile devices and the advancement of wireless communication technology, more and more data have geospatial attributes. Taking Weibo as an example, due to the real-time characteristics of Weibo and the large amount of information, a large number of Weibo contains geographical location information, but a geographical location may be associated with thousands of Weibo, and it is impossible for users to view all of them. If you only read one or two microblogs, you will not have a comprehensive understanding of geographically related microblog events.

而通常的社交网络分析工具大多是对微博文本进行综合分析,包括对微博特性的分析、主题的挖掘及突发事件的检测等,社交网络的数据中包含大量的地理位置信息,包括用户的位置信息、微博发布的位置信息及讨论事件的位置信息等,但这些社交网络分析工具提供的地理位置信息的分析功能比较薄弱,主要集中在对用户位置信息的分析上,而对事件的位置信息分析的应用较少,并没有突出微博事件相关的地理位置特点。Most of the usual social network analysis tools are for comprehensive analysis of microblog text, including the analysis of microblog characteristics, topic mining and emergency detection, etc. Social network data contains a large amount of geographical location information, including user However, the analysis functions of the geographical location information provided by these social network analysis tools are relatively weak, mainly focusing on the analysis of the user’s location information, while the analysis of the event’s The application of location information analysis is seldom, and the geographic location characteristics related to Weibo events are not highlighted.

本实施例提供的基于地理位置的社交网络事件分析方法,通过对社交网络数据文本进行研究,分析出与地理位置关联的社交网络数据文本,并将各地理位置对应的社交网络数据文本的关键词,作为该地理位置的热门事件进行推送,可以帮助用户直观的获取到地理位置相关的热门事件。The geographical location-based social network event analysis method provided in this embodiment analyzes the social network data text associated with the geographical location by studying the social network data text, and uses the keywords of the social network data text corresponding to each geographical location , which is pushed as a popular event in the geographic location, which can help users intuitively obtain popular events related to the geographic location.

图2为本发明实施例二提供的基于地理位置的社交网络事件分析系统的结构示意图,如图2所示,所述系统包括:FIG. 2 is a schematic structural diagram of a geographical location-based social network event analysis system provided in Embodiment 2 of the present invention. As shown in FIG. 2 , the system includes:

分词模块21,用于对每个社交网络数据文本进行分词处理,获得所述社交网络数据文本的词;The word segmentation module 21 is used to carry out word segmentation processing to each social network data text, and obtains the words of the social network data text;

地理位置获取模块22,用于建立所述社交网络数据文本对应的地理位置与所述社交网络数据文本的映射关系,所述社交网络数据文本对应的地理位置为所述社交网络数据文本的词中,与地理位置相关的词;The geographic location acquisition module 22 is configured to establish a mapping relationship between the geographic location corresponding to the social network data text and the social network data text, and the geographic location corresponding to the social network data text is in the words of the social network data text , words related to geographic location;

地理位置分析模块23,用于根据各社交网络数据文本对应的地理位置,确定预设的各目标地理位置对应的社交网络数据文本;The geographic location analysis module 23 is used to determine the social network data text corresponding to each preset target geographic location according to the corresponding geographic location of each social network data text;

事件分析模块24,用于针对每个目标地理位置,对所述目标地理位置对应的社交网络数据文本的词进行权重计算,获得并将所述目标地理位置对应的社交网络数据文本的关键词,作为所述目标地理位置的热门事件进行推送。The event analysis module 24 is used for, for each target geographic location, carries out weight calculation to the word of the social network data text corresponding to the target geographic location, obtains and the keyword of the social network data text corresponding to the target geographic location, push as a popular event in the target geographic location.

实际应用中,可以先从大数据分析平台获取一定时间段内、一定数量的社交网络数据文本,例如,微博,则相应的,所述系统还包括:获取模块,用于在分词模块21对每个社交网络数据文本进行分词处理之前,获取在预设时间段内发布的预设数量的所述社交网络数据文本。In practical applications, a certain amount of social network data texts within a certain period of time can be obtained from the big data analysis platform, for example, microblogs, and correspondingly, the system also includes: an acquisition module, which is used in the word segmentation module 21. Before word segmentation processing is performed on each social network data text, a preset number of social network data texts published within a preset time period are obtained.

具体的,本实施例中的所述社交网络数据文本可以来源于大数据分析平台ElasticSearch搜索引擎,以微博为例,所有的微博均可为原创微博,不包括用户转发的微博。Specifically, the social network data text in this embodiment may come from the ElasticSearch search engine of the big data analysis platform. Taking microblogs as an example, all microblogs may be original microblogs, excluding those forwarded by users.

相应的,获取到一定时间段内、一定数量的社交网络数据文本后,需要对这些社交网络数据文本进行预处理,而预处理则主要是对社交网络数据文本进行分词。仍以微博为例,则需对微博文本信息进行分词处理,具体的,这里的文本信息不包括用户发布的图片等信息。Correspondingly, after obtaining a certain amount of social network data texts within a certain period of time, these social network data texts need to be preprocessed, and the preprocessing is mainly to segment the social network data texts. Still taking Weibo as an example, it is necessary to perform word segmentation processing on Weibo text information. Specifically, the text information here does not include information such as pictures posted by users.

实际应用中,所述分词处理可以通过多种实施方式实现,例如,可以利用分词器对社交网络数据文本进行分词处理。可选的,分词模块21,具体可以用于利用IKAnalyzer分词器,对所述社交网络数据文本进行分词处理。具体的,以对微博文本进行分词为例,分词模块21首先加载词典,分析微博文本,截取一个token,搜索关键词采用从最大词到最小词层层迭代检索方式切分,到词典中检索该搜索词中最大分割词,以此类推继续进行迭代检索方式切分直到结束。In practical applications, the word segmentation processing may be implemented through various implementation manners. For example, a word segmenter may be used to perform word segmentation processing on social network data text. Optionally, the word segmentation module 21 may be specifically configured to use the IKAnalyzer tokenizer to perform word segmentation processing on the social network data text. Specifically, taking the word segmentation of microblog text as an example, the word segmentation module 21 first loads the dictionary, analyzes the microblog text, intercepts a token, and uses the iterative retrieval method from the largest word to the smallest word to segment the search keywords into the dictionary Retrieve the largest segmented word in the search term, and so on to continue iterative retrieval until the end.

实际应用中,地理位置获取模块22可以通过从社交网络数据文本的各词中,筛选出与地理位置相关的词,获得所述社交网络数据文本对应的地理位置。具体的,地理位置获取模块22可以利用搜狗的三级行政区划地名词典进行筛选,相应的,以微博为例,如果检测到微博文本中有上述词典中的地理位置信息,就结合该微博的上下文,提取出该地理位置信息。相应的,地理位置获取模块22在获得所述社交网络数据文本对应的地理位置后,把所述社交网络数据文本和这些与地理位置相关的词关联起来。具体的,可以采用命名实体识别的地理文本分析方法,利用搜狗的三级行政区划地名词典,如果微博中出现有行政区划地名词典中的地理位置信息,就结合微博的上下文提取该地理位置信息,并和该微博关联。In practical applications, the geographic location acquisition module 22 can obtain the geographic location corresponding to the social network data text by filtering words related to the geographic location from the words in the social network data text. Specifically, the geographic location acquisition module 22 can use Sogou's three-level administrative division gazetteer for screening. Correspondingly, taking Weibo as an example, if it is detected that there is geographic location information in the above-mentioned dictionary in the Weibo text, it will be combined with the Weibo The context of the blog is used to extract the geographic location information. Correspondingly, after obtaining the geographic location corresponding to the social network data text, the geographic location acquisition module 22 associates the social network data text with these geographical location-related words. Specifically, the geographical text analysis method of named entity recognition can be used, and Sogou’s three-level administrative division gazetteer can be used. If there is geographic location information in the administrative division gazetteer in the microblog, the geographic location can be extracted in combination with the context of the microblog. information, and associated with the Weibo.

可选的,为了节省处理资源,提高处理效率,地理位置获取模块22对于不包含与地理位置的词的社交网络数据文本,则确定该社交网络数据文本不含地理位置信息,相应的,可将其丢弃不作处理。Optionally, in order to save processing resources and improve processing efficiency, the geographic location acquisition module 22 determines that the social network data text does not contain geographic location information for a social network data text that does not contain a word related to a geographic location. It is discarded without processing.

实际应用中,根据各社交网络数据文本与地理位置的映射关系,可以确定出各目标地理位置对应的社交网络数据文本。In practical applications, according to the mapping relationship between each social network data text and the geographic location, the social network data text corresponding to each target geographic location can be determined.

具体的,所述各目标地理位置可以根据实际需要确定,例如,可以以可视化地图中的各地理位置为对象,进行热门事件分析,则相应的,在地理位置分析模块23,还用于在根据各社交网络数据文本对应的地理位置,确定预设的各目标地理位置对应的社交网络数据文本之前,将可视化地图中的地理位置作为所述目标地理位置。举例来说,地理位置分析的可视化可以基于百度地图API实现的Web地图应用。Specifically, the target geographic locations can be determined according to actual needs. For example, the popular event analysis can be performed on each geographic location in the visualized map. Correspondingly, in the geographic location analysis module 23, it is also used to For the geographic location corresponding to each social network data text, before determining the social network data text corresponding to each preset target geographic location, the geographic location in the visualized map is used as the target geographic location. For example, the visualization of geographical location analysis can be based on the Web map application implemented by Baidu Map API.

具体的,在地理位置分析模块23确定各目标地理位置对应的社交网络数据文本后,事件分析模块24可以采用TF-IDF方法对这些社交网络数据文本的每个词进行权重计算,根据计算结果抽取出关键词,并把该关键词作为该目标地理位置的热门事件进行推送,例如在该目标地理位置上进行标注。则相应的,事件分析模块24,具体可以用于利用TF-IDF方法,对所述目标地理位置对应的社交网络数据文本的词进行权重计算。Specifically, after the geographic location analysis module 23 determines the social network data text corresponding to each target geographic location, the event analysis module 24 can use the TF-IDF method to perform weight calculations on each word of these social network data texts, and extract Keyword is generated, and the keyword is pushed as a popular event of the target location, for example, marked on the target location. Correspondingly, the event analysis module 24 may be specifically configured to use the TF-IDF method to perform weight calculation on the words of the social network data text corresponding to the target geographic location.

具体的,事件分析模块24,具体可以用于词频的归一化计算;逆向词频计算;计算词条权值提取热门词汇。Specifically, the event analysis module 24 can specifically be used for normalized calculation of word frequency; reverse word frequency calculation; calculation of entry weights to extract popular words.

进一步具体的,词频(Term Frequency,简称TF)指的是某一个给定的词语在某文档中出现的频率,这个数字是对词数的归一化,以防止它偏向长的文档。逆向文件频率(Inverse Document Frequency,简称IDF),用于表征如果包含某词条的文档越少,则IDF越大,也就说明该词条具有很好的类别区分能力。之后求得词条的TF-IDF值,TF-IDF的值就等于TF值和IDF值的乘积。最后把各词的TF-IDF值从大到小进行排序,选出前若干个关键词作为热门事件分析的结果。To be more specific, Term Frequency (TF for short) refers to the frequency with which a given word appears in a document. This number is normalized to the number of words to prevent it from being biased towards long documents. Inverse Document Frequency (Inverse Document Frequency, referred to as IDF), used to represent that if there are fewer documents containing an entry, the larger the IDF, which means that the entry has a good category discrimination ability. After that, the TF-IDF value of the entry is obtained, and the TF-IDF value is equal to the product of the TF value and the IDF value. Finally, the TF-IDF values of each word are sorted from large to small, and the first few keywords are selected as the results of popular event analysis.

实际应用中,当分析对象为短文本时,以微博为例,若采用通常的TF-IDF算法直接提取关键词,因为微博文本通常较短,若把所有微博看成一个文档就会失去IDF信息,而若把单条微博看成一个文档,一条微博字数很少,每个词出现的频率基本都为1,也就会失去TF信息,这就会影响最终选取的关键词的准确性。In practical applications, when the analysis object is short text, take microblog as an example, if the usual TF-IDF algorithm is used to directly extract keywords, because microblog text is usually short, if all microblogs are regarded as a document, it will IDF information is lost, and if a single microblog is regarded as a document, a microblog has very few words, and the frequency of each word is basically 1, and the TF information will be lost, which will affect the final selection of keywords. accuracy.

为了提高分析结果的准确性,事件分析模块24具体可以包括:In order to improve the accuracy of the analysis results, the event analysis module 24 may specifically include:

第一计算单元,用于针对所述目标地理位置对应的社交网络数据文本的每个词ti,以所述目标地理位置对应的社交网络数据文本作为第一文档,根据第一公式计算所述词ti的词频tfi,j,所述第一公式为:其中,ni,j为所述词ti在所述第一文档中的出现次数,∑knk,j为所述第一文档中所有词的出现次数之和;The first computing unit is configured to use the social network data text corresponding to the target geographic location as the first document for each word t i of the social network data text corresponding to the target geographic location, and calculate the word frequency tf i,j of word t i , the first formula is: Wherein, n i,j is the number of occurrences of the word t i in the first document, ∑ k n k,j is the sum of the number of occurrences of all words in the first document;

第二计算单元,用于以所述词所属的社交网络数据文本作为第二文档,根据第二公式计算所述词ti的逆向文件频率idfi,所述第一公式为:其中,|D|为语料库中的文件数,|{j:ti∈dj}|为所述语料库中包括所述词ti的文件数;The second calculation unit is used to use the social network data text to which the word belongs as the second document, and calculate the inverse document frequency idf i of the word t i according to the second formula, and the first formula is: Wherein, |D| is the number of files in the corpus, and |{j:t i ∈ d j }| is the number of files including the word t i in the corpus;

第三计算单元,用于根据第三公式,计算获得所述词ti的权值tfidfi,j,所述第三公式为:tfidfi,j=tfi,j×idfiThe third calculation unit is used to calculate and obtain the weight tfidf i,j of the word t i according to a third formula, the third formula is: tfidf i,j =tf i,j ×idf i ;

处理单元,用于根据所述目标地理位置对应的社交网络数据文本的各词的权值,对所述各词进行由大到小的排序,将排在前k位的词作为所述目标地理位置对应的社交网络数据文本的关键词,其中,k为预设的值。A processing unit, configured to sort the words from large to small according to the weight of each word in the social network data text corresponding to the target geographic location, and use the top k words as the target geographic location A keyword of the social network data text corresponding to the location, wherein k is a preset value.

其中,k的取值可以根据实际所需的关键词数量确定,本实施例在此不对其进行限制。Wherein, the value of k may be determined according to the actual required number of keywords, which is not limited in this embodiment.

本实施方式中,当第一计算单元计算IDF时,把单个社交网络数据文本当作一个文档,当第二计算单元计算TF值时,把所述目标地理位置对应的所有社交网络数据文本作为一个文档,这样既会有不同的词频,也包含了IDF信息,从而有效提高关键词结果的准确性,进而提高热门事件分析的准确性。In this embodiment, when the first calculation unit calculates the IDF, a single social network data text is regarded as a document, and when the second calculation unit calculates the TF value, all social network data texts corresponding to the target geographic location are regarded as a document. Documents, so that there will be different word frequencies and IDF information, so as to effectively improve the accuracy of keyword results, and then improve the accuracy of popular event analysis.

可选的,可以利用可视化地图推送各目标地理位置对应的热门事件。具体的,可以将各目标地理位置及其相应的热门事件封装成JSON的格式传递给前台进行解析处理,解析后的坐标和对应的关键词,即热门事件,在可视化地图上进行标注。Optionally, a visual map may be used to push popular events corresponding to each target geographic location. Specifically, each target geographic location and its corresponding popular events can be packaged into a JSON format and passed to the front desk for parsing and processing, and the parsed coordinates and corresponding keywords, that is, popular events, can be marked on the visual map.

本实施例提供的基于地理位置的社交网络事件分析系统,通过对社交网络数据文本进行研究,分析出与地理位置关联的社交网络数据文本,并将各地理位置对应的社交网络数据文本的关键词,作为该地理位置的热门事件进行推送,可以帮助用户直观的获取到地理位置相关的热门事件。The social network event analysis system based on the geographic location provided in this embodiment analyzes the social network data text associated with the geographic location by studying the social network data text, and uses the keywords of the social network data text corresponding to each geographic location , which is pushed as a popular event in the geographic location, which can help users intuitively obtain popular events related to the geographic location.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above method embodiments can be completed by program instructions and related hardware. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims (10)

1., based on the social networks affair analytical method in geographic position, it is characterized in that, comprising:
Word segmentation processing is carried out to each social network data text, obtains the word of described social network data text;
Set up the mapping relations of geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position;
The geographic position corresponding according to each social network data text, determines the social network data text that each target geographic position of presetting is corresponding;
For each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
2. method according to claim 1, is characterized in that, the word of the described social network data text corresponding to described target geographic position carries out weight calculation, comprising:
Utilize TF-IDF method, weight calculation is carried out to the word of social network data text corresponding to described target geographic position.
3. method according to claim 2, it is characterized in that, describedly utilize TF-IDF method, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain the keyword of social network data text corresponding to described target geographic position, specifically comprise:
For each word t of social network data text corresponding to described target geographic position i, using social network data text corresponding to described target geographic position as the first document, according to the first formulae discovery institute predicate t iword frequency tf i,j, described first formula is: wherein, n i,jfor institute predicate t ioccurrence number in described first document, Σ kn k,jfor the occurrence number sum of all words in described first document;
Using the social network data text belonging to institute's predicate as the second document, according to the second formulae discovery institute predicate t ireverse document-frequency idf i, described first formula is: wherein, | D| is the number of files in corpus, | { j:t i∈ d j| for described corpus comprises institute predicate t inumber of files;
According to the 3rd formula, calculate and obtain institute predicate t iweights tfidf i,j, described 3rd formula is: tfidf i,j=tf i,j× idf i;
According to the weights of each word of social network data text corresponding to described target geographic position, descending sequence is carried out to described each word, using the word that the comes front k position keyword as social network data text corresponding to described target geographic position, wherein, k is default value.
4. the method according to any one of claim 1-3, is characterized in that, describedly carries out word segmentation processing to social network data text, comprising:
Utilize IKAnalyzer segmenter, word segmentation processing is carried out to described social network data text.
5. the method according to any one of claim 1-3, is characterized in that, the described geographic position corresponding according to each social network data text, before determining the social network data text that each target geographic position of presetting is corresponding, also comprises:
Using the geographic position in visualized map as described target geographic position.
6. the method according to any one of claim 1-3, is characterized in that, described word segmentation processing is carried out to each social network data text before, also comprise:
Obtain the described social network data text of the predetermined number issued in preset time period.
7., based on the social networks event analysis system in geographic position, it is characterized in that, comprising:
Word-dividing mode, for carrying out word segmentation processing to each social network data text, obtains the word of described social network data text;
Geographic position acquisition module, for setting up the mapping relations in geographic position corresponding to described social network data text and described social network data text, the geographic position that described social network data text is corresponding is in the word of described social network data text, the word relevant to geographic position;
Geolocation analysis module, for the geographic position corresponding according to each social network data text, determines the social network data text that each target geographic position of presetting is corresponding;
Event analysis module, for for each target geographic position, weight calculation is carried out to the word of social network data text corresponding to described target geographic position, obtain and by the keyword of social network data text corresponding for described target geographic position, the hot ticket as described target geographic position pushes.
8. system according to claim 7, is characterized in that,
Described event analysis module, specifically for utilizing TF-IDF method, carries out weight calculation to the word of social network data text corresponding to described target geographic position.
9. system according to claim 8, is characterized in that, described event analysis module comprises:
First computing unit, for each word t for social network data text corresponding to described target geographic position i, using social network data text corresponding to described target geographic position as the first document, according to the first formulae discovery institute predicate t iword frequency tf i,j, described first formula is: wherein, n i,jfor institute predicate t ioccurrence number in described first document, Σ kn k,jfor the occurrence number sum of all words in described first document;
Second computing unit, for using the social network data text belonging to institute's predicate as the second document, according to the second formulae discovery institute predicate t ireverse document-frequency idf i, described first formula is: wherein, | D| is the number of files in corpus, | { j:t i∈ d j| for described corpus comprises institute predicate t inumber of files;
3rd computing unit, for according to the 3rd formula, calculates and obtains institute predicate t iweights tfidf i,j, described 3rd formula is: tfidf i,j=tf i,j× idf i;
Processing unit, for the weights of each word according to social network data text corresponding to described target geographic position, descending sequence is carried out to described each word, using the word that the comes front k position keyword as social network data text corresponding to described target geographic position, wherein, k is default value.
10. the system according to any one of claim 7-9, is characterized in that,
Described word-dividing mode, specifically for utilizing IKAnalyzer segmenter, carries out word segmentation processing to described social network data text.
CN201510061722.2A 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions Pending CN104615715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510061722.2A CN104615715A (en) 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510061722.2A CN104615715A (en) 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions

Publications (1)

Publication Number Publication Date
CN104615715A true CN104615715A (en) 2015-05-13

Family

ID=53150157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510061722.2A Pending CN104615715A (en) 2015-02-05 2015-02-05 Social network event analyzing method and system based on geographic positions

Country Status (1)

Country Link
CN (1) CN104615715A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202232A (en) * 2016-06-27 2016-12-07 中国南方电网有限责任公司电网技术研究中心 Power failure event analysis method and device
CN106257448A (en) * 2015-06-19 2016-12-28 阿里巴巴集团控股有限公司 The methods of exhibiting of a kind of key word and device
CN107016556A (en) * 2016-01-27 2017-08-04 阿里巴巴集团控股有限公司 Data processing method and device
CN107454121A (en) * 2016-05-30 2017-12-08 北京搜狗科技发展有限公司 A kind of method, apparatus of location tracking, mobile terminal and server
CN107908766A (en) * 2017-11-28 2018-04-13 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN108446274A (en) * 2018-03-15 2018-08-24 北京科技大学 A kind of keyword extracting method based on time-sensitive tf-idf
CN108509589A (en) * 2018-03-29 2018-09-07 优视科技(中国)有限公司 Information flow methods of exhibiting and system, computer readable storage medium
CN109117446A (en) * 2017-06-26 2019-01-01 精彩旅图(北京)科技发展有限公司 Show the dynamic method, apparatus of user, system and computer-readable medium
CN109255023A (en) * 2017-07-11 2019-01-22 中国移动通信集团浙江有限公司 Hint information processing method and processing device
CN111291176A (en) * 2018-12-06 2020-06-16 北京国双科技有限公司 Hot event mining method and device
CN111323040A (en) * 2018-12-14 2020-06-23 上海博泰悦臻网络技术服务有限公司 Method, system, medium and vehicle-mounted terminal for displaying geographic position information
CN115757565A (en) * 2023-01-09 2023-03-07 无锡容智技术有限公司 Text data geographic position positioning method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079024A (en) * 2006-06-19 2007-11-28 腾讯科技(深圳)有限公司 Special word list dynamic generation system and method
CN101694659A (en) * 2009-10-20 2010-04-14 浙江大学 Individual network news recommending method based on multitheme tracing
CN102364473A (en) * 2011-11-09 2012-02-29 中国科学院自动化研究所 Network news retrieval system and method integrating geographic information and visual information
US20130031458A1 (en) * 2011-07-27 2013-01-31 Microsoft Corporation Hyperlocal content determination
CN102982157A (en) * 2012-12-03 2013-03-20 北京奇虎科技有限公司 Device and method used for mining microblog hot topics
CN104331483A (en) * 2014-11-05 2015-02-04 北京航空航天大学 Method and equipment for detecting area events based on short text data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079024A (en) * 2006-06-19 2007-11-28 腾讯科技(深圳)有限公司 Special word list dynamic generation system and method
CN101694659A (en) * 2009-10-20 2010-04-14 浙江大学 Individual network news recommending method based on multitheme tracing
US20130031458A1 (en) * 2011-07-27 2013-01-31 Microsoft Corporation Hyperlocal content determination
CN102364473A (en) * 2011-11-09 2012-02-29 中国科学院自动化研究所 Network news retrieval system and method integrating geographic information and visual information
CN102982157A (en) * 2012-12-03 2013-03-20 北京奇虎科技有限公司 Device and method used for mining microblog hot topics
CN104331483A (en) * 2014-11-05 2015-02-04 北京航空航天大学 Method and equipment for detecting area events based on short text data

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106257448A (en) * 2015-06-19 2016-12-28 阿里巴巴集团控股有限公司 The methods of exhibiting of a kind of key word and device
US11727075B2 (en) 2015-06-19 2023-08-15 Advanced New Technologies Co., Ltd. Enhancing accuracy of presented search keywords
US11403357B2 (en) 2015-06-19 2022-08-02 Advanced New Technologies Co., Ltd. Enhancing accuracy of presented search keywords
EP3312738A4 (en) * 2015-06-19 2019-02-20 Alibaba Group Holding Limited METHOD AND DEVICE FOR DISPLAYING KEYWORD
CN107016556B (en) * 2016-01-27 2021-02-05 创新先进技术有限公司 Data processing method and device
CN107016556A (en) * 2016-01-27 2017-08-04 阿里巴巴集团控股有限公司 Data processing method and device
CN107454121A (en) * 2016-05-30 2017-12-08 北京搜狗科技发展有限公司 A kind of method, apparatus of location tracking, mobile terminal and server
CN107454121B (en) * 2016-05-30 2021-09-14 北京搜狗科技发展有限公司 Position tracking method and device, mobile terminal and server
CN106202232A (en) * 2016-06-27 2016-12-07 中国南方电网有限责任公司电网技术研究中心 Power failure event analysis method and device
CN109117446A (en) * 2017-06-26 2019-01-01 精彩旅图(北京)科技发展有限公司 Show the dynamic method, apparatus of user, system and computer-readable medium
CN109255023A (en) * 2017-07-11 2019-01-22 中国移动通信集团浙江有限公司 Hint information processing method and processing device
CN107908766A (en) * 2017-11-28 2018-04-13 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN107908766B (en) * 2017-11-28 2019-11-19 深圳市城市规划设计研究院有限公司 A kind of city focus incident dynamic monitoring method and system
CN108446274A (en) * 2018-03-15 2018-08-24 北京科技大学 A kind of keyword extracting method based on time-sensitive tf-idf
CN108509589A (en) * 2018-03-29 2018-09-07 优视科技(中国)有限公司 Information flow methods of exhibiting and system, computer readable storage medium
CN111291176A (en) * 2018-12-06 2020-06-16 北京国双科技有限公司 Hot event mining method and device
CN111323040A (en) * 2018-12-14 2020-06-23 上海博泰悦臻网络技术服务有限公司 Method, system, medium and vehicle-mounted terminal for displaying geographic position information
CN115757565A (en) * 2023-01-09 2023-03-07 无锡容智技术有限公司 Text data geographic position positioning method and device

Similar Documents

Publication Publication Date Title
US11715315B2 (en) Systems, methods and computer readable media for identifying content to represent web pages and creating a representative image from the content
CN104615715A (en) Social network event analyzing method and system based on geographic positions
US11122009B2 (en) Systems and methods for identifying geographic locations of social media content collected over social networks
US9218427B1 (en) Dynamic semantic models having multiple indices
CN104182389B (en) A kind of big data analyzing business intelligence service system based on semanteme
US20130304818A1 (en) Systems and methods for discovery of related terms for social media content collection over social networks
CN108701155B (en) Expert Detection in Social Networks
US20130297581A1 (en) Systems and methods for customized filtering and analysis of social media content collected over social networks
US20210034819A1 (en) Method and device for identifying a user interest, and computer-readable storage medium
US20130297694A1 (en) Systems and methods for interactive presentation and analysis of social media content collection over social networks
CN102426610B (en) Microblog rank searching method and microblog searching engine
CN112559747B (en) Event classification processing method, device, electronic equipment and storage medium
CN104504024B (en) Keyword method for digging based on content of microblog and system
CN107291886A (en) A kind of microblog topic detecting method and system based on incremental clustering algorithm
US20130198240A1 (en) Social Network Analysis
CN112765366A (en) APT (android Package) organization portrait construction method based on knowledge map
Suma et al. Automatic detection and validation of smart city events using hpc and apache spark platforms
CN112148701A (en) Method and device for document retrieval
CN114241501B (en) Image document processing method and device and electronic equipment
CN104794209B (en) Chinese microblogging mood sorting technique based on Markov logical network and system
CN104636386A (en) Information monitoring method and device
US20170235835A1 (en) Information identification and extraction
CN109726292A (en) Text analysis method and device for large-scale multilingual data
Bagdouri et al. Profession-based person search in microblogs: Using seed sets to find journalists
Heravi et al. Tweet location detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150513

RJ01 Rejection of invention patent application after publication