计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 316-321.doi: 10.11896/jsjkx.180901624
邓尧1, 冀汶莉1, 李勇军2, 高兴1
DENG Yao1, JI Wen-li1, LI Yong-jun2, GAO Xing1
摘要: 利用用户生成短文本(User Generated Short Text,UGST)推测用户的细粒度位置对基于位置服务的应用有重要的意义。现有的细粒度位置推测方法较少引入UGST中的语义信息,且未考虑UGST中语义实体的权重,因此性能较低。针对这些问题,提出了一种基于位置社交网络(Location-based Social Network,LBSN)的UGST细粒度位置推测方法。该方法包括如下3个过程:1)使用Foursquare中的UGST构建实体和位置之间的关联模型,以解决位置标记稀疏问题;2)判断待推测位置的UGST中是否含有位置信息,过滤不包含任何位置语义信息的UGST,以消除噪声短文本的干扰;3)根据UGST内容推测可能的候选位置,并对每个候选位置进行排名,选择排名最靠前的位置作为推测位置。实验结果验证了所提方法的有效性。
中图分类号:
[1]ATEFEH F,KHREICH W.A Survey of Techniques for Event Detection in Twitter [J].Computational Intelligence,2013,31(1):132-164. [2]OZDIKIS O,OĞUZTÜZÜN H,KARAGOZ P.A Survey on Location Estimation Techniques for Events Detected in Twitter [J].Knowledge and Information Systems,2017,52(2):291-339. [3]NOULAS A,MOFFATT C,HRISTOVA D,et al.Foursquare to the Rescue:Predicting Ambulance Calls Across Geographies[C]//Proceedings of the 2018 International Conference on Di-gital Health.New York:ACM,2018:100-109. [4]ACHREKAR H,GANDHE A,LAZARUS R,et al.Predicting Flu Trends Using Twitter Data[C]//2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).New York:IEEE,2011:702-707. [5]MCCREADIE R,MACDONALD C,OUNIS I.EAIMS:Emer-gency Analysis Identification and Management System[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2016:1101-1104. [6]BAUCOM E,SANJARI A,LIU X Z,et al.Mirroring the Real World in Social Media:Twitter,Geolocation,and Sentiment Analysis[C]//Proceedings of the 2013 International Workshop on Mining Unstructured Big Data Using Natural Language Processing.New York:ACM,2013:61-68. [7]CHONG W H,LIM E P.Tweet Geolocation:Leveraging Location,User and Peer Signals[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.New York:ACM,2017:1279-1288. [8]LI Y J,ZHANG Z,PENG Y,et al.Matching User Accounts Based on User Generated Content across Social Networks [J].Future Generation Computer Systems,2018,83(1):104-115. [9]GRAHAM M,HALE S A,GAFFNEY D.Where in the World Are You? Geolocation and Language Identification in Twitter [J].The Professional Geographer,2014,66(4):568-578. [10]LEE K,GANTI R K,SRIVATSA M,et al.When Twitter Meets Foursquare:Tweet Location Prediction Using Foursquare[C]//Proceedings of the 11th International Conference on Mobile and Ubiquitous Systems:Computing,Networking and Services.Brussels:ICST,2014:198-207. [11]MURDOCK V.Your Mileage May Vary:On the Limits of Social Media [J].SIGSPATIAL Special,2011,3(2):62-66. [12]HAN B,COOK P,BALDWIN T.Text-Based Twitter User Geo-location Prediction [J].Journal of Artificial Intelligence Research,2014,49(1):451-500. [13]EBRAHIMI M,SHAFIEIBAVANI E,WONG R,et al.Twitter User Geolocation by Filtering of Highly Mentioned Users [J].Journal of the Association for Information Science and Techno-logy,2018,69(7):879-889. [14]HUANG B X,CARLEY K M.On Predicting Geolocation of Tweets Using Convolutional Neural Networks[C]//International Conference on Social Computing,Behavioral-Cultural Modeling and Prediction and Behavior Representation in Mo-deling and Simulation.Berlin:Springer,2017:281-291. [15]KINSELLA S,MURDOCK V,O’HARE N.I’m Eating a Sandwich in Glasgow:Modeling Locations with Tweets[C]//Proceedings of the 3rd International Workshop on Search and Mi-ning User-generated Contents.New York:ACM,2011:61-68. [16]PARASKEVOPOULOS P,PALPANAS T.Fine-Grained Geolocalisation of Non-Geotagged Tweets[C]//Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015.New York:ACM,2015:105-112. [17]PAULE J D G,MOSHFEGHI Y,JOSE J M,et al.On Fine-Grained Geolocalisation of Tweets[C]//Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval.New York:ACM,2017:313-316. [18]MANNING C D,SURDEANU M,BAUER J,et al.The Stanford CoreNLP Natural Language Processing Toolkit[C]//Meeting of the Association for Computational Linguistics:System Demonstrations.Stroudsburg:ACL,2014:55-60. [19]WANG Z,WANG H,WEN J R,et al.An Inference Approach to Basic Level of Categorization[C]//Proceedings of the 24th ACM International on Conference on Information and Know-ledge Management.New York:ACM,2015:653-662. [20]LI C,SUN A.Extracting Fine-Grained Location with Temporal Awareness in Tweets:A Two-Stage Approach [J].Journal of the Association for Information Science and Technology,2017,68(7):1652-1670. [21]SALTON G,FOX E A,WU H.Extended Boolean Information Retrieval [J].Commun.ACM,1983,26(11):1022-1036. |
[1] | 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇. 基于异质信息网的短文本特征扩充方法 Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network 计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241 |
[2] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[3] | 邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089 |
[4] | 张文轩, 吴秦. 基于多分支注意力增强的细粒度图像分类 Fine-grained Image Classification Based on Multi-branch Attention-augmentation 计算机科学, 2022, 49(5): 105-112. https://doi.org/10.11896/jsjkx.210100108 |
[5] | 刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027 |
[6] | 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁. 融合双重权重机制和图卷积神经网络的微博细粒度情感分类 Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network 计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073 |
[7] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[8] | 史伟, 付月. 考虑语境的微博短文本挖掘:情感分析的方法 Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis 计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089 |
[9] | 赵潇, 李仕林, 李凡, 余正涛, 张林华, 杨勇. 局部细粒度信息引导的双循环一致性绝缘子缺陷样本生成 Double-cycle Consistent Insulator Defect Sample Generation Method Based on Local Fine-grainedInformation Guidance 计算机科学, 2021, 48(6A): 581-586. https://doi.org/10.11896/jsjkx.200500026 |
[10] | 鲁博仁, 胡世哲, 娄铮铮, 叶阳东. 面向铁路文本分类的字符级特征提取方法 Character-level Feature Extraction Method for Railway Text Classification 计算机科学, 2021, 48(3): 220-226. https://doi.org/10.11896/jsjkx.200200061 |
[11] | 纪南巡, 孙晓燕, 李祯其. 多源异构用户生成内容的融合向量化表示学习 Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents 计算机科学, 2021, 48(10): 51-58. https://doi.org/10.11896/jsjkx.200900194 |
[12] | 刘洋, 金忠. 一种结合非局部和多区域注意力机制的细粒度图像识别方法 Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism 计算机科学, 2021, 48(1): 197-203. https://doi.org/10.11896/jsjkx.191000135 |
[13] | 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲. FS-CRF:基于特征切分与级联随机森林的异常点检测模型 FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest 计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162 |
[14] | 程婧, 刘娜娜, 闵可锐, 康昱, 王新, 周扬帆. 一种低频词词向量优化方法及其在短文本分类中的应用 Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification 计算机科学, 2020, 47(8): 255-260. https://doi.org/10.11896/jsjkx.191000163 |
[15] | 倪海清, 刘丹, 史梦雨. 基于语义感知的中文短文本摘要生成模型 Chinese Short Text Summarization Generation Model Based on Semantic-aware 计算机科学, 2020, 47(6): 74-78. https://doi.org/10.11896/jsjkx.190600006 |
|