CN111552865A - User interest portrait method and related equipment - Google Patents
User interest portrait method and related equipment Download PDFInfo
- Publication number
- CN111552865A CN111552865A CN202010243221.7A CN202010243221A CN111552865A CN 111552865 A CN111552865 A CN 111552865A CN 202010243221 A CN202010243221 A CN 202010243221A CN 111552865 A CN111552865 A CN 111552865A
- Authority
- CN
- China
- Prior art keywords
- user
- interest
- target
- probability value
- registration information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种用户兴趣画像方法及相关设备。所述用户兴趣画像方法根据识别信息判断多个网站是否存在用户的注册信息,得到多个目标网站;根据多个网站是否存在用户的注册信息的判断结果生成用户的注册特征向量;采用聚类方法根据用户的注册特征向量确定用户的每个兴趣标签的第一概率值;从每个目标网站爬取用户的多个目标命名实体;用训练好的神经网络计算每个兴趣标签的第二概率值;基于统计方法计算每个兴趣标签的第三概率值;将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值;将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。本发明提升了抽取用户的兴趣标签的准确率。
The present invention provides a user interest portrait method and related equipment. The user interest portrait method judges whether multiple websites have user registration information according to the identification information, and obtains multiple target websites; generates user registration feature vectors according to the judgment results of whether multiple websites have user registration information; adopts a clustering method Determine the first probability value of each interest tag of the user according to the user's registration feature vector; crawl multiple target named entities of the user from each target website; use the trained neural network to calculate the second probability value of each interest tag ; Calculate the third probability value of each interest tag based on a statistical method; determine the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag; An interest tag whose target probability value is greater than the first preset threshold is determined as an interest tag of the user. The present invention improves the accuracy of extracting interest tags of users.
Description
技术领域technical field
本发明涉及实体识别技术领域,具体涉及一种用户兴趣画像方法、装置、计算机设备及计算机可读存储介质。The present invention relates to the technical field of entity recognition, and in particular, to a user interest portrait method, device, computer equipment and computer-readable storage medium.
背景技术Background technique
用户兴趣画像中的兴趣爱好是现代金融场景中重要的数据,被广泛运用于营销、服务甚至风控等多个领域。Interests and hobbies in user interest portraits are important data in modern financial scenarios, and are widely used in many fields such as marketing, services, and even risk control.
用户兴趣画像需要抽取用户的兴趣标签(如旅游、编程学习等),现有的用户兴趣画像方法根据某一平台上的用户的社交、使用习惯数据抽取用户的兴趣标签,容易因数据单一、数据缺陷导致抽取用户兴趣标签的准确率低。如何准确抽取用户的兴趣标签成为亟待解决的问题。User interest portraits need to extract users’ interest tags (such as travel, programming learning, etc.). The existing user interest portrait methods extract users’ interest tags based on the social and usage data of users on a certain platform. Defects lead to low accuracy in extracting user interest tags. How to accurately extract users' interest tags has become an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
鉴于以上内容,有必要提出一种用户兴趣画像方法、装置、计算机设备及计算机可读存储介质,其可以根据用户在各个网站的注册信息抽取出用户的兴趣标签。In view of the above content, it is necessary to propose a user interest portrait method, apparatus, computer equipment and computer-readable storage medium, which can extract the user's interest tag according to the user's registration information on various websites.
本申请的第一方面提供一种用户兴趣画像方法,所述用户兴趣画像方法包括:A first aspect of the present application provides a user interest portrait method, and the user interest portrait method includes:
获取多个网站、多个兴趣标签和用户的识别信息;Obtain identifying information from multiple websites, multiple interest tags, and users;
根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站;Determine whether the user's registration information exists in the plurality of websites according to the identification information, and obtain a plurality of target websites in which the user's registration information exists;
根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量;generating the registration feature vector of the user according to the judgment result of whether the registration information of the user exists in the plurality of websites;
采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值;Using a clustering method to determine the first probability value of each interest tag of the user according to the registration feature vector of the user;
从每个目标网站爬取所述用户的多个目标命名实体;Crawl a plurality of target named entities for said user from each target website;
用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值;Using the trained neural network to calculate the second probability value of each interest tag according to the multiple target named entities and the target website to which each target named entity belongs;
基于统计方法计算每个兴趣标签的第三概率值;Calculate the third probability value of each interest tag based on a statistical method;
将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值;determining the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag;
将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。An interest tag whose target probability value is greater than the first preset threshold is determined as the user's interest tag.
另一种可能的实现方式中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In another possible implementation manner, the determining, according to the identification information, whether the multiple websites have the registration information of the user includes:
在所述多个网站中的指定网站搜索所述识别信息;searching for the identifying information at a designated website among the plurality of websites;
若所述指定网站的搜索结果中包括所述识别信息,则所述指定网站存在所述用户的注册信息;If the identification information is included in the search result of the designated website, the designated website has the registration information of the user;
若所述指定网站的搜索结果中不包括所述识别信息,则所述指定网站不存在所述用户的注册信息。If the search result of the designated website does not include the identification information, the designated website does not have the registration information of the user.
另一种可能的实现方式中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In another possible implementation manner, the determining, according to the identification information, whether the multiple websites have the registration information of the user includes:
根据所述识别信息从所述多个网站中的指定网站授权的接口查询所述用户的注册信息;Query the registration information of the user from an interface authorized by a designated website among the plurality of websites according to the identification information;
若所述指定网站返回所述用户的注册信息,所述指定网站存在所述用户的注册信息;If the designated website returns the user's registration information, the designated website has the user's registration information;
若所述指定网站没有返回所述用户的注册信息或返回值为空,所述指定网站不存在所述用户的注册信息。If the designated website does not return the user's registration information or the return value is empty, the designated website does not have the user's registration information.
另一种可能的实现方式中,所述采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值包括:In another possible implementation manner, the adopting a clustering method to determine the first probability value of each interest tag of the user according to the registration feature vector of the user includes:
获取多个第一历史用户;Get multiple first historical users;
根据所述多个第一历史用户的注册特征向量对所述多个第一历史用户进行聚类,得到多个用户簇和每个用户簇的中心向量;Clustering the plurality of first historical users according to the registration feature vectors of the plurality of first historical users to obtain a plurality of user clusters and the center vector of each user cluster;
根据所述用户的注册特征向量和每个用户簇的中心向量的距离确定所述用户所属的目标用户簇;Determine the target user cluster to which the user belongs according to the distance between the registration feature vector of the user and the center vector of each user cluster;
将所述目标用户簇中每个目标用户有关指定兴趣标签的概率值的均值确定为所述用户的指定兴趣标签的第一概率值,或将所述目标用户簇中指定兴趣标签的概率值大于第二预设阈值的目标用户的数量与所述目标用户簇中目标用户的总数量的比值确定为所述用户的指定兴趣标签的第一概率值。Determine the mean value of the probability values of each target user in the target user cluster about the specified interest label as the first probability value of the user's specified interest label, or determine the probability value of the specified interest label in the target user cluster greater than The ratio of the number of target users at the second preset threshold to the total number of target users in the target user cluster is determined as the first probability value of the specified interest tag of the user.
另一种可能的实现方式中,所述用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值包括:In another possible implementation manner, calculating the second probability value of each interest tag according to the multiple target named entities and the target website to which each target named entity belongs by using the trained neural network includes:
将每个目标命名实体和该目标命名实体所属的目标网站编码为该目标命名实体的特征向量;Encoding each target named entity and the target website to which the target named entity belongs as a feature vector of the target named entity;
将每个目标命名实体的特征向量输入所述训练好的神经网络,得到该目标命名实体对应的每个兴趣标签的概率值;Input the feature vector of each target named entity into the trained neural network to obtain the probability value of each interest label corresponding to the target named entity;
计算所述多个命名实体对应的每个兴趣标签的概率值的均值,得到所述兴趣标签的第二概率值。The mean value of the probability values of each interest tag corresponding to the multiple named entities is calculated to obtain the second probability value of the interest tag.
另一种可能的实现方式中,所述基于统计方法计算每个兴趣标签的第三概率值包括:In another possible implementation manner, the calculating the third probability value of each interest tag based on a statistical method includes:
获取在所述多个目标网站存在注册信息的多个第二历史用户,每个第二历史用户的用户兴趣画像包括该第二历史用户的多个标签;Acquiring multiple second historical users with registration information on the multiple target websites, and the user interest portrait of each second historical user includes multiple tags of the second historical user;
统计用户兴趣画像中存在该兴趣标签的第二历史用户的第一数量;Counting the first number of second historical users with the interest tag in the user interest portrait;
统计所述多个第二历史用户的第二数量;Counting the second number of the plurality of second historical users;
计算所述第一数量与所述第二数量的比值,将所述第一数量与所述第二数量的比值作为所述第三概率值。A ratio of the first number to the second number is calculated, and the ratio of the first number to the second number is used as the third probability value.
另一种可能的实现方式中,在所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息之前,所述用户兴趣画像方法还包括:获取所述用户的授权。In another possible implementation manner, before judging whether the multiple websites have the registration information of the user according to the identification information, the user interest portrait method further includes: obtaining the authorization of the user.
本申请的第二方面提供一种用户兴趣画像装置,所述用户兴趣画像装置包括:A second aspect of the present application provides a user interest portrait device, and the user interest portrait device includes:
获取模块,用于获取多个网站、多个兴趣标签和用户的识别信息;The acquisition module is used to acquire the identification information of multiple websites, multiple interest tags and users;
判断模块,用于根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站;a judging module, configured to judge whether the multiple websites have the registration information of the user according to the identification information, and obtain multiple target websites that have the registration information of the user;
生成模块,用于根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量;a generating module, configured to generate the registration feature vector of the user according to the judgment result of whether the registration information of the user exists in the multiple websites;
第一确定模块,用于采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值;a first determination module, configured to adopt a clustering method to determine the first probability value of each interest tag of the user according to the registration feature vector of the user;
爬取模块,用于从每个目标网站爬取所述用户的多个目标命名实体;a crawling module for crawling multiple target named entities of the user from each target website;
第一计算模块,用于用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值;The first calculation module is used to calculate the second probability value of each interest tag according to the plurality of target named entities and the target website to which each target named entity belongs with the trained neural network;
第二计算模块,用于基于统计方法计算每个兴趣标签的第三概率值;The second calculation module is used to calculate the third probability value of each interest tag based on a statistical method;
第二确定模块,用于将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值;a second determination module, configured to determine the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag;
第三确定模块,用于将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。The third determination module is configured to determine an interest tag whose target probability value is greater than the first preset threshold as an interest tag of the user.
本申请的第三方面提供一种计算机设备,所述计算机设备包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现所述用户兴趣画像方法。A third aspect of the present application provides a computer device, the computer device includes a processor, and the processor is configured to implement the user interest portrait method when executing a computer program stored in a memory.
本申请的第四方面提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现所述用户兴趣画像方法。A fourth aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the user interest portrait method.
本发明通过与所述用户的兴趣关联的网站和所述用户在所述目标网站中的目标命名实体确定所述用户的兴趣标签,可以提升识别用户的兴趣标签准确率;通过聚类方法得到的兴趣标签的第一概率值、神经网络得到的兴趣标签的第二概率值和通过基于统计得到的兴趣标签的第三概率值可以确定兴趣标签的目标概率值,可以降低出现偏差的风险。因此,本发明实现了根据用户在各个网站的注册信息抽取出用户的兴趣标签,提升了抽取用户的兴趣标签的准确率。The present invention determines the interest label of the user through the website associated with the user's interest and the target named entity of the user in the target website, which can improve the accuracy of identifying the user's interest label; The first probability value of the interest label, the second probability value of the interest label obtained by the neural network, and the third probability value of the interest label obtained by statistics can determine the target probability value of the interest label, which can reduce the risk of deviation. Therefore, the present invention realizes that the user's interest tags are extracted according to the user's registration information on various websites, and the accuracy of extracting the user's interest tags is improved.
附图说明Description of drawings
图1是本发明实施例提供的用户兴趣画像方法的流程图。FIG. 1 is a flowchart of a user interest portrait method provided by an embodiment of the present invention.
图2是本发明实施例提供的用户兴趣画像装置的结构图。FIG. 2 is a structural diagram of a user interest portrait device provided by an embodiment of the present invention.
图3是本发明实施例提供的计算机设备的示意图。FIG. 3 is a schematic diagram of a computer device provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施例对本发明进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present invention, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments may be combined with each other in the case of no conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本发明,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In the following description, many specific details are set forth in order to facilitate a full understanding of the present invention, and the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention.
优选地,本发明的用户兴趣画像方法应用在一个或者多个计算机设备中。所述计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific IntegratedCircuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。Preferably, the user interest profile method of the present invention is applied in one or more computer devices. The computer device is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (Digital Signal Processor, DSP), embedded devices, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
实施例一Example 1
图1是本发明实施例一提供的用户兴趣画像方法的流程图。所述用户兴趣画像方法应用于计算机设备,用于根据用户在各个网站的注册信息抽取出用户的兴趣标签。FIG. 1 is a flowchart of a user interest portrait method provided by Embodiment 1 of the present invention. The user interest portrait method is applied to computer equipment, and is used for extracting user interest tags according to the user's registration information on various websites.
如图1所示,所述用户兴趣画像方法包括:As shown in Figure 1, the user interest portrait method includes:
101,获取多个网站、多个兴趣标签和用户的识别信息。101. Acquire identification information of multiple websites, multiple interest tags, and users.
在一具体实施例中,所述多个网站可以包括网易云音乐、网易云课堂、百度贴吧、CSDN、微博、小红书、携程等。In a specific embodiment, the plurality of websites may include NetEase Cloud Music, NetEase Cloud Classroom, Baidu Tieba, CSDN, Weibo, Xiaohongshu, Ctrip, and the like.
所述多个兴趣标签可以包括健身、教育、母婴、旅游等。有不同兴趣的用户可能在使用与兴趣相关联的网站(即与用户的兴趣相关联的网站存在用户的注册信息),用户的兴趣与用户注册的网站之间存在关联。如旅游与携程相关联,教育与网易云课堂相关联。The plurality of interest tags may include fitness, education, mother and child, travel, and the like. Users with different interests may be using websites associated with their interests (that is, the websites associated with the interests of the users have user registration information), and there is an association between the interests of the users and the websites registered by the users. For example, travel is associated with Ctrip, and education is associated with NetEase Cloud Classroom.
可以接收所述用户输入的识别信息或接收用户识别装置传输的用户的识别信息。The identification information input by the user or the identification information of the user transmitted by the user identification device may be received.
在一具体实施例中,所述识别信息包括手机号、身份证号、加密手机号或加密身份证号。In a specific embodiment, the identification information includes a mobile phone number, an ID card number, an encrypted mobile phone number or an encrypted ID card number.
例如,可以接收所述用户通过键盘输入的手机号或身份证号,或接收文字识别装置传输的用户的身份证号,文字识别装置可以识别用户的身份证上的身份证号。可以通过哈希加密或MD5加密算法对手机号进行加密,得到加密手机号、对身份证号进行加密,得到加密身份证号。For example, the mobile phone number or ID number input by the user through the keyboard, or the ID number of the user transmitted by the text recognition device can be received, and the text recognition device can recognize the ID number on the ID card of the user. The mobile phone number can be encrypted by hash encryption or MD5 encryption algorithm to obtain the encrypted mobile phone number, and the ID card number can be encrypted to obtain the encrypted ID card number.
在另一实施例中,所述识别信息还可以包括指纹信息、虹膜信息或人脸信息等。In another embodiment, the identification information may further include fingerprint information, iris information, or face information, and the like.
102,根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站。102. Determine, according to the identification information, whether the multiple websites have the user's registration information, and obtain multiple target websites that have the user's registration information.
在一具体实施例中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In a specific embodiment, the determining whether there is the user's registration information on the multiple websites according to the identification information includes:
根据所述识别信息从所述多个网站中的指定网站授权的接口查询所述用户的注册信息;Query the registration information of the user from an interface authorized by a designated website among the plurality of websites according to the identification information;
若所述指定网站返回所述用户的注册信息,所述指定网站存在所述用户的注册信息;If the designated website returns the user's registration information, the designated website has the user's registration information;
若所述指定网站没有返回所述用户的注册信息或返回值为空,所述指定网站不存在所述用户的注册信息。If the designated website does not return the user's registration information or the return value is empty, the designated website does not have the user's registration information.
例如,向CSDN的注册信息查询接口查询用户A的注册信息(查询参数为用户A的电话号码);若CSDN返回用户A的注册信息(如用户A的注册时间,注册状态、用户名等),则CSDN存在用户A的注册信息。For example, query the registration information of user A from the registration information query interface of CSDN (the query parameter is the phone number of user A); if CSDN returns the registration information of user A (such as user A's registration time, registration status, user name, etc.), Then the CSDN has the registration information of user A.
在另一实施例中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In another embodiment, the determining, according to the identification information, whether the multiple websites have registration information of the user includes:
用所述识别信息向所述多个网站中的指定网站注册新账户;registering a new account with a designated one of the plurality of websites using the identifying information;
若所述指定网站提示所述用户已注册,则所述指定网站存在所述用户的注册信息;If the designated website prompts that the user has registered, the designated website has the registration information of the user;
若所述指定网站提示输入注册验证信息,则所述指定网站不存在所述用户的注册信息。If the designated website prompts to input registration verification information, the designated website does not have the user's registration information.
例如,可以通过用户A的电话号码向CSDN请求注册新账户;若CSDN提示输入注册验证信息(如CSDN下发给用户A的电话号码的验证码),则CSDN不存在用户A的注册信息。For example, user A's phone number can be used to request CSDN to register a new account; if CSDN prompts to enter registration verification information (such as the verification code of the phone number issued by CSDN to user A), then CSDN does not have user A's registration information.
在另一实施例中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In another embodiment, the determining, according to the identification information, whether the multiple websites have registration information of the user includes:
在所述多个网站中的指定网站搜索所述识别信息;searching for the identifying information at a designated website among the plurality of websites;
若所述指定网站的搜索结果中包括所述识别信息,则所述指定网站存在所述用户的注册信息;If the identification information is included in the search result of the designated website, the designated website has the registration information of the user;
若所述指定网站的搜索结果中不包括所述识别信息,则所述指定网站不存在所述用户的注册信息。If the search result of the designated website does not include the identification information, the designated website does not have the registration information of the user.
103,根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量。103. Generate a registration feature vector of the user according to a result of determining whether the user's registration information exists on the multiple websites.
例如,生成的用户A的注册特征向量为(1,1,0,1,0),其中,从左至右的第一个维度的1表示网易云音乐存在用户A的注册信息;第二个维度的1表示百度贴吧存在用户A的注册信息;第三个维度的0表示CSDN不存在用户A的注册信息;第四个维度的1表示微博存在用户A的注册信息;第五个维度的0表示小红书不存在用户A的注册信息。For example, the generated registration feature vector of user A is (1, 1, 0, 1, 0), where 1 in the first dimension from left to right indicates that NetEase Cloud Music has user A's registration information; the second The 1 of the dimension means that the registration information of user A exists in Baidu Tieba; the 0 of the third dimension means that the registration information of user A does not exist in CSDN; the 1 of the fourth dimension means that the registration information of user A exists in Weibo; the 0 means that the registration information of user A does not exist in Xiaohongshu.
104,采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值。104. Use a clustering method to determine a first probability value of each interest tag of the user according to the registered feature vector of the user.
在一具体实施例中,所述采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值包括:In a specific embodiment, the adopting a clustering method to determine the first probability value of each interest tag of the user according to the registration feature vector of the user includes:
(1)获取多个第一历史用户。(1) Acquire multiple first historical users.
(2)根据所述多个第一历史用户的注册特征向量对所述多个第一历史用户进行聚类,得到多个用户簇和每个用户簇的中心向量。(2) Clustering the plurality of first historical users according to the registration feature vectors of the plurality of first historical users to obtain a plurality of user clusters and a center vector of each user cluster.
(3)根据所述用户的注册特征向量和每个用户簇的中心向量的距离确定所述用户所属的目标用户簇。例如,聚类得到两个用户簇(分别为第一用户簇和第二用户簇),用户的注册特征向量与第一用户簇的中心向量的欧氏距离为num1,用户的注册特征向量与第二用户簇的中心向量的欧氏距离为num2,num1大于num2,则将第二用户簇确定为目标用户簇。(3) Determine the target user cluster to which the user belongs according to the distance between the registered feature vector of the user and the center vector of each user cluster. For example, two user clusters (respectively the first user cluster and the second user cluster) are obtained by clustering, the Euclidean distance between the user's registration feature vector and the center vector of the first user cluster is num1, and the user's registration feature vector and the first user cluster are num1. The Euclidean distance of the center vector of the two user clusters is num2, and if num1 is greater than num2, the second user cluster is determined as the target user cluster.
(4)将所述目标用户簇中每个目标用户有关指定兴趣标签的概率值的均值确定为所述用户的指定兴趣标签的第一概率值,或将所述目标用户簇中指定兴趣标签的概率值大于第二预设阈值的目标用户的数量与所述目标用户簇中目标用户的总数量的比值确定为所述用户的指定兴趣标签的第一概率值。例如,目标用户簇中包括3个用户,指定兴趣标签为旅游,3个用户的旅游兴趣标签的概率值分别为0.5、0.6、0.4,则用户的旅游兴趣标签的第一概率值为0.5。所述第二预设阈值是根据实验数据调整的预设值。(4) Determine the mean value of the probability values of each target user in the target user cluster about the specified interest label as the first probability value of the user's specified interest label, or determine the average value of the specified interest label in the target user cluster as the first probability value of the specified interest label. The ratio of the number of target users whose probability value is greater than the second preset threshold to the total number of target users in the target user cluster is determined as the first probability value of the specified interest tag of the user. For example, if the target user cluster includes 3 users, the specified interest label is travel, and the probability values of the travel interest labels of the three users are 0.5, 0.6, and 0.4, respectively, then the first probability value of the user's travel interest label is 0.5. The second preset threshold is a preset value adjusted according to experimental data.
105,从每个目标网站爬取所述用户的多个目标命名实体。105. Crawl multiple target named entities of the user from each target website.
可以从每个目标网站爬取所述用户的多个网页文本,该目标网站的多个网页文本包括所述用户在该目标网站的社交信息、行为信息,从该目标网站的多个网页文本抽取该目标网站的多个目标命名实体。例如,目标网站为网易云音乐,多个网页文本包括用户A关注或分享的歌单,从多个网页文本进行命名实体抽取,得到的多个目标命名实体为民谣、校园等(被抽取出民谣、校园的用户被发现普遍喜爱“旅游”)。再如,目标网站为小红书,多个网页文本包括用户B关注或分享的海淘经验,从多个网页文本进行命名实体抽取,得到的多个目标命名实体为奶粉、婴儿车等(被抽取出奶粉、婴儿车的用户被发现普遍倾向于“母婴”)。再如,目标网站为网易云课堂,多个网页文本包括用户B关注或分享的视频介绍,从多个网页文本进行命名实体抽取,得到的多个目标命名实体为JAVA、SPRING等(被抽取出JAVA、SPRING的用户被发现普遍倾向于“编程教育”)。A plurality of webpage texts of the user can be crawled from each target website, and the plurality of webpage texts of the target website include the social information and behavior information of the user on the target website, extracted from the plurality of webpage texts of the target website. Multiple target named entities for this target website. For example, the target website is NetEase Cloud Music, and the multiple webpage texts include the playlists that User A follows or shares. Named entity extraction is performed from multiple webpage texts, and the obtained multiple target named entities are folk ballads, campuses, etc. (folk ballads are extracted from the , campus users were found to be generally fond of "travel"). Another example is that the target website is Xiaohongshu, and multiple webpage texts include the overseas shopping experience that User B follows or shares. Named entities are extracted from multiple webpage texts, and the obtained multiple target named entities are milk powder, baby carriage, etc. Users who extracted milk powder and strollers were found to be generally inclined to "mother and baby"). For another example, the target website is NetEase Cloud Classroom, and multiple webpage texts include video introductions that User B follows or shares. Named entity extraction is performed from multiple webpage texts, and the obtained multiple target named entities are JAVA, SPRING, etc. JAVA, SPRING users are found to generally tend to "programming education").
106,用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值。106. Use the trained neural network to calculate the second probability value of each interest tag according to the multiple target named entities and the target website to which each target named entity belongs.
在一具体实施例中,所述用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值包括:In a specific embodiment, calculating the second probability value of each interest tag according to the multiple target named entities and the target website to which each target named entity belongs by using the trained neural network includes:
将每个目标命名实体和该目标命名实体所属的目标网站编码为该目标命名实体的特征向量;Encoding each target named entity and the target website to which the target named entity belongs as a feature vector of the target named entity;
将每个目标命名实体的特征向量输入所述训练好的神经网络,得到该目标命名实体对应的每个兴趣标签的概率值;Input the feature vector of each target named entity into the trained neural network to obtain the probability value of each interest label corresponding to the target named entity;
计算所述多个命名实体对应的每个兴趣标签的概率值的均值,得到所述兴趣标签的第二概率值。The mean value of the probability values of each interest tag corresponding to the multiple named entities is calculated to obtain the second probability value of the interest tag.
例如,两个命名实体分别为JAVA、SPRING,JAVA对应“编程教育”(兴趣标签)的概率值为0.9,SPRING对应编程教育(兴趣标签)的概率值为0.7,则“编程教育”(兴趣标签)的第二概率值为0.8。For example, the two named entities are JAVA and SPRING. The probability value of JAVA corresponding to "programming education" (interest label) is 0.9, and the probability value of SPRING corresponding to programming education (interest label) is 0.7, then "programming education" (interest label) has a probability value of 0.9. ) has a second probability value of 0.8.
所述将每个目标命名实体和该目标命名实体所属的目标网站编码为该目标命名实体的特征向量包括:The feature vector encoding each target named entity and the target website to which the target named entity belongs as the target named entity includes:
根据预设编码器(如one-hot编码器、word2vec编码器)将该目标命名实体编码为第一中间向量;Encode the target named entity into a first intermediate vector according to a preset encoder (such as one-hot encoder, word2vec encoder);
根据所述预设编码器将该目标命名实体所属的目标网站编码为第二中间向量;According to the preset encoder, the target website to which the target named entity belongs is encoded as a second intermediate vector;
连接所述第一中间向量和所述第二中间向量,或将所述第一中间向量和所述第二中间向量的进行元素相乘,得到该目标命名实体的特征向量。Connect the first intermediate vector and the second intermediate vector, or multiply the elements of the first intermediate vector and the second intermediate vector to obtain the feature vector of the target named entity.
训练所述神经网络可以包括:Training the neural network may include:
获取一个训练样本和该训练样本的标签,该训练样本包括一个目标命名实体和该目标命名实体所属的目标网站编码;Obtain a training sample and a label of the training sample, where the training sample includes a target named entity and the target website code to which the target named entity belongs;
根据预设编码表将该目标命名实体编码为第一向量,根据预设编码表将该目标命名实体所属的目标网站编码为第二向量;The target named entity is encoded into a first vector according to a preset encoding table, and the target website to which the target named entity belongs is encoded into a second vector according to the preset encoding table;
拼接所述第一向量和所述第二向量得到该目标命名实体的特征向量;Splicing the first vector and the second vector to obtain the feature vector of the target named entity;
将该目标命名实体的特征向量输入初始化参数值的神经网络,得到输出向量;Input the feature vector of the target named entity into the neural network of the initialization parameter value to obtain the output vector;
根据所述输出向量和该训练样本的标签通过反向传播算法优化所述神经网络的参数值。According to the output vector and the label of the training sample, the parameter values of the neural network are optimized through a back-propagation algorithm.
107,基于统计方法计算每个兴趣标签的第三概率值。107. Calculate a third probability value of each interest tag based on a statistical method.
所述基于统计方法计算每个兴趣标签的第三概率值包括:The calculation of the third probability value of each interest tag based on the statistical method includes:
获取在所述多个目标网站存在注册信息的多个第二历史用户,每个第二历史用户的用户兴趣画像包括该第二历史用户的多个标签;Acquiring multiple second historical users with registration information on the multiple target websites, and the user interest portrait of each second historical user includes multiple tags of the second historical user;
统计用户兴趣画像中存在该兴趣标签的第二历史用户的第一数量;Counting the first number of second historical users with the interest tag in the user interest portrait;
统计所述多个第二历史用户的第二数量;Counting the second number of the plurality of second historical users;
计算所述第一数量与所述第二数量的比值,将所述第一数量与所述第二数量的比值作为所述第三概率值。A ratio of the first number to the second number is calculated, and the ratio of the first number to the second number is used as the third probability value.
例如,获取在网易云音乐和携程存在注册信息的4个第二历史用户(分别为用户A、用户B、用户C、用户D);统计用户兴趣画像中存在“旅游”(兴趣标签)的第二历史用户的第一数量为3;第二历史用户的第二数量为4;“旅游”(兴趣标签)的第三概率值为0.75。For example, obtain 4 second historical users (respectively, user A, user B, user C, and user D) with registration information in NetEase Cloud Music and Ctrip; count the first users with "travel" (interest tag) in the user's interest portrait. The first number of second historical users is 3; the second number of second historical users is 4; the third probability value of "travel" (interest tag) is 0.75.
108,将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值。108. Determine the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag.
例如,“旅游”(兴趣标签)的第一概率值为0.65,“旅游”(兴趣标签)的第三概率值为0.70,“旅游”(兴趣标签)的第三概率值为0.75,则将0.75确定为“旅游”(兴趣标签)的目标概率值。For example, the first probability value of "Travel" (interest tag) is 0.65, the third probability value of "Travel" (interest tag) is 0.70, and the third probability value of "Travel" (interest tag) is 0.75, then 0.75 The target probability value determined as "tour" (interest label).
109,将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。109. Determine an interest tag with a target probability value greater than a first preset threshold as an interest tag of the user.
例如,“旅游”(兴趣标签)的目标概率值为0.75,“编程教育”(兴趣标签)的目标概率值为0.85,所述第一预设阈值为0.80,则将“编程教育”确定为所述用户的兴趣标签。For example, the target probability value of "travel" (interest label) is 0.75, the target probability value of "programming education" (interest label) is 0.85, and the first preset threshold is 0.80, then "programming education" is determined as the the user's interest tags.
实施例一的用户兴趣画像方法通过与所述用户的兴趣关联的网站和所述用户在所述目标网站中的目标命名实体确定所述用户的兴趣标签,可以提升识别用户的兴趣标签准确率;通过聚类方法得到的兴趣标签的第一概率值、神经网络得到的兴趣标签的第二概率值和通过基于统计得到的兴趣标签的第三概率值可以确定兴趣标签的目标概率值,可以降低出现偏差的风险。实施例一根据用户在各个网站的注册信息抽取出用户的兴趣标签,提升了抽取用户的兴趣标签的准确率,用抽取的用户的兴趣标签描述用户兴趣画像,提升了描述用户兴趣画像的准确率。The user interest profile method of Embodiment 1 determines the user's interest tag by using a website associated with the user's interest and the user's target named entity in the target website, which can improve the accuracy of identifying the user's interest tag; The first probability value of the interest label obtained by the clustering method, the second probability value of the interest label obtained by the neural network, and the third probability value of the interest label obtained by statistics can determine the target probability value of the interest label, which can reduce the occurrence of Risk of Bias. The first embodiment extracts the user's interest tags according to the user's registration information on various websites, which improves the accuracy of extracting the user's interest tags, and uses the extracted user's interest tags to describe the user's interest portrait, which improves the accuracy of describing the user's interest portrait. .
在另一实施例中,在所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息之前,所述用户兴趣画像方法还包括:获取所述用户的授权。In another embodiment, before judging whether the multiple websites have registration information of the user according to the identification information, the user interest portrait method further includes: obtaining authorization from the user.
在所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息之前,可以给用户下发授权选项框,接收用户在所述授权选项框中勾选的授权选项。Before determining whether the user's registration information exists in the plurality of websites according to the identification information, an authorization option box may be issued to the user, and an authorization option checked by the user in the authorization option box may be received.
实施例二Embodiment 2
图2是本发明实施例二提供的用户兴趣画像装置的结构图。所述用户兴趣画像装置20应用于计算机设备。所述用户兴趣画像装置20用于根据用户在各个网站的注册信息抽取出用户的兴趣标签。FIG. 2 is a structural diagram of a user interest portrait device according to Embodiment 2 of the present invention. The user
如图2所示,所述用户兴趣画像装置20可以包括获取模块201、判断模块202、生成模块203、第一确定模块204、爬取模块205、第一计算模块206、第二计算模块207、第二确定模块208、第三确定模块209。As shown in FIG. 2, the user
获取模块201,用于获取多个网站、多个兴趣标签和用户的识别信息。The acquiring
在一具体实施例中,所述多个网站可以包括网易云音乐、网易云课堂、百度贴吧、CSDN、微博、小红书、携程等。In a specific embodiment, the plurality of websites may include NetEase Cloud Music, NetEase Cloud Classroom, Baidu Tieba, CSDN, Weibo, Xiaohongshu, Ctrip, and the like.
所述多个兴趣标签可以包括健身、教育、母婴、旅游等。有不同兴趣的用户可能在使用与兴趣相关联的网站(即与用户的兴趣相关联的网站存在用户的注册信息),用户的兴趣与用户注册的网站之间存在关联。如旅游与携程相关联,教育与网易云课堂相关联。The plurality of interest tags may include fitness, education, mother and child, travel, and the like. Users with different interests may be using websites associated with their interests (that is, the websites associated with the interests of the users have user registration information), and there is an association between the interests of the users and the websites registered by the users. For example, travel is associated with Ctrip, and education is associated with NetEase Cloud Classroom.
可以接收所述用户输入的识别信息或接收用户识别装置传输的用户的识别信息。The identification information input by the user or the identification information of the user transmitted by the user identification device may be received.
在一具体实施例中,所述识别信息包括手机号、身份证号、加密手机号或加密身份证号。In a specific embodiment, the identification information includes a mobile phone number, an ID card number, an encrypted mobile phone number or an encrypted ID card number.
例如,可以接收所述用户通过键盘输入的手机号或身份证号,或接收文字识别装置传输的用户的身份证号,文字识别装置可以识别用户的身份证上的身份证号。可以通过哈希加密或MD5加密算法对手机号进行加密,得到加密手机号、对身份证号进行加密,得到加密身份证号。For example, the mobile phone number or ID number input by the user through the keyboard, or the ID number of the user transmitted by the text recognition device can be received, and the text recognition device can recognize the ID number on the ID card of the user. The mobile phone number can be encrypted by hash encryption or MD5 encryption algorithm to obtain the encrypted mobile phone number, and the ID card number can be encrypted to obtain the encrypted ID card number.
在另一实施例中,所述识别信息还可以包括指纹信息、虹膜信息或人脸信息等。In another embodiment, the identification information may further include fingerprint information, iris information, or face information, and the like.
判断模块202,用于根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站。The
在一具体实施例中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In a specific embodiment, the determining whether there is the user's registration information on the multiple websites according to the identification information includes:
根据所述识别信息从所述多个网站中的指定网站授权的接口查询所述用户的注册信息;Query the registration information of the user from an interface authorized by a designated website among the plurality of websites according to the identification information;
若所述指定网站返回所述用户的注册信息,所述指定网站存在所述用户的注册信息;If the designated website returns the user's registration information, the designated website has the user's registration information;
若所述指定网站没有返回所述用户的注册信息或返回值为空,所述指定网站不存在所述用户的注册信息。If the designated website does not return the user's registration information or the return value is empty, the designated website does not have the user's registration information.
例如,向CSDN的注册信息查询接口查询用户A的注册信息(查询参数为用户A的电话号码);若CSDN返回用户A的注册信息(如用户A的注册时间,注册状态、用户名等),则CSDN存在用户A的注册信息。For example, query the registration information of user A from the registration information query interface of CSDN (the query parameter is the phone number of user A); if CSDN returns the registration information of user A (such as user A's registration time, registration status, user name, etc.), Then the registration information of user A exists in CSDN.
在另一实施例中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In another embodiment, the determining, according to the identification information, whether the multiple websites have registration information of the user includes:
用所述识别信息向所述多个网站中的指定网站注册新账户;registering a new account with a designated one of the plurality of websites using the identifying information;
若所述指定网站提示所述用户已注册,则所述指定网站存在所述用户的注册信息;If the designated website prompts that the user has registered, the designated website has the registration information of the user;
若所述指定网站提示输入注册验证信息,则所述指定网站不存在所述用户的注册信息。If the designated website prompts to input registration verification information, the designated website does not have the user's registration information.
例如,可以通过用户A的电话号码向CSDN请求注册新账户;若CSDN提示输入注册验证信息(如CSDN下发给用户A的电话号码的验证码),则CSDN不存在用户A的注册信息。For example, user A's phone number can be used to request CSDN to register a new account; if CSDN prompts to enter registration verification information (such as the verification code of the phone number issued by CSDN to user A), then CSDN does not have user A's registration information.
在另一实施例中,所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息包括:In another embodiment, the determining, according to the identification information, whether the multiple websites have registration information of the user includes:
在所述多个网站中的指定网站搜索所述识别信息;searching for the identifying information at a designated website among the plurality of websites;
若所述指定网站的搜索结果中包括所述识别信息,则所述指定网站存在所述用户的注册信息;If the identification information is included in the search result of the designated website, the designated website has the registration information of the user;
若所述指定网站的搜索结果中不包括所述识别信息,则所述指定网站不存在所述用户的注册信息。If the search result of the designated website does not include the identification information, the designated website does not have the registration information of the user.
生成模块203,用于根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量。The
例如,生成的用户A的注册特征向量为(1,1,0,1,0),其中,从左至右的第一个维度的1表示网易云音乐存在用户A的注册信息;第二个维度的1表示百度贴吧存在用户A的注册信息;第三个维度的0表示CSDN不存在用户A的注册信息;第四个维度的1表示微博存在用户A的注册信息;第五个维度的0表示小红书不存在用户A的注册信息。For example, the generated registration feature vector of user A is (1, 1, 0, 1, 0), where 1 in the first dimension from left to right indicates that NetEase Cloud Music has user A's registration information; the second The 1 of the dimension means that the registration information of user A exists in Baidu Tieba; the 0 of the third dimension means that the registration information of user A does not exist in CSDN; the 1 of the fourth dimension means that the registration information of user A exists in Weibo; the 0 means that the registration information of user A does not exist in Xiaohongshu.
第一确定模块204,用于采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值。The first determining
在一具体实施例中,所述采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值包括:In a specific embodiment, the adopting a clustering method to determine the first probability value of each interest tag of the user according to the registration feature vector of the user includes:
(1)获取多个第一历史用户。(1) Acquire multiple first historical users.
(2)根据所述多个第一历史用户的注册特征向量对所述多个第一历史用户进行聚类,得到多个用户簇和每个用户簇的中心向量。(2) Clustering the plurality of first historical users according to the registration feature vectors of the plurality of first historical users to obtain a plurality of user clusters and a center vector of each user cluster.
(3)根据所述用户的注册特征向量和每个用户簇的中心向量的距离确定所述用户所属的目标用户簇。例如,聚类得到两个用户簇(分别为第一用户簇和第二用户簇),用户的注册特征向量与第一用户簇的中心向量的欧氏距离为num1,用户的注册特征向量与第二用户簇的中心向量的欧氏距离为num2,num1大于num2,则将第二用户簇确定为目标用户簇。(3) Determine the target user cluster to which the user belongs according to the distance between the registered feature vector of the user and the center vector of each user cluster. For example, two user clusters (respectively the first user cluster and the second user cluster) are obtained by clustering, the Euclidean distance between the user's registration feature vector and the center vector of the first user cluster is num1, and the user's registration feature vector and the first user cluster are num1. The Euclidean distance of the center vector of the two user clusters is num2, and if num1 is greater than num2, the second user cluster is determined as the target user cluster.
(4)将所述目标用户簇中每个目标用户有关指定兴趣标签的概率值的均值确定为所述用户的指定兴趣标签的第一概率值,或将所述目标用户簇中指定兴趣标签的概率值大于第二预设阈值的目标用户的数量与所述目标用户簇中目标用户的总数量的比值确定为所述用户的指定兴趣标签的第一概率值。例如,目标用户簇中包括3个用户,指定兴趣标签为旅游,3个用户的旅游兴趣标签的概率值分别为0.5、0.6、0.4,则用户的旅游兴趣标签的第一概率值为0.5。所述第二预设阈值是根据实验数据调整的预设值。(4) Determine the mean value of the probability values of each target user in the target user cluster about the specified interest label as the first probability value of the user's specified interest label, or determine the average value of the specified interest label in the target user cluster as the first probability value of the specified interest label. The ratio of the number of target users whose probability value is greater than the second preset threshold to the total number of target users in the target user cluster is determined as the first probability value of the specified interest tag of the user. For example, if the target user cluster includes 3 users, the specified interest label is travel, and the probability values of the travel interest labels of the three users are 0.5, 0.6, and 0.4, respectively, then the first probability value of the user's travel interest label is 0.5. The second preset threshold is a preset value adjusted according to experimental data.
爬取模块205,用于从每个目标网站爬取所述用户的多个目标命名实体。The
可以从每个目标网站爬取所述用户的多个网页文本,该目标网站的多个网页文本包括所述用户在该目标网站的社交信息、行为信息,从该目标网站的多个网页文本抽取该目标网站的多个目标命名实体。例如,目标网站为网易云音乐,多个网页文本包括用户A关注或分享的歌单,从多个网页文本进行命名实体抽取,得到的多个目标命名实体为民谣、校园等(被抽取出民谣、校园的用户被发现普遍喜爱“旅游”)。再如,目标网站为小红书,多个网页文本包括用户B关注或分享的海淘经验,从多个网页文本进行命名实体抽取,得到的多个目标命名实体为奶粉、婴儿车等(被抽取出奶粉、婴儿车的用户被发现普遍倾向于“母婴”)。再如,目标网站为网易云课堂,多个网页文本包括用户B关注或分享的视频介绍,从多个网页文本进行命名实体抽取,得到的多个目标命名实体为JAVA、SPRING等(被抽取出JAVA、SPRING的用户被发现普遍倾向于“编程教育”)。A plurality of webpage texts of the user can be crawled from each target website, and the plurality of webpage texts of the target website include the social information and behavior information of the user on the target website, extracted from the plurality of webpage texts of the target website. Multiple target named entities for this target website. For example, the target website is NetEase Cloud Music, and the multiple webpage texts include the playlists that User A follows or shares. Named entity extraction is performed from multiple webpage texts, and the obtained multiple target named entities are folk ballads, campuses, etc. (folk ballads are extracted from the , campus users were found to be generally fond of "travel"). Another example is that the target website is Xiaohongshu, and multiple webpage texts include the overseas shopping experience that User B follows or shares. Named entities are extracted from multiple webpage texts, and the obtained multiple target named entities are milk powder, baby carriage, etc. Users who extracted milk powder and strollers were found to be generally inclined to "mother and baby"). For another example, the target website is NetEase Cloud Classroom, and multiple webpage texts include video introductions that User B follows or shares. Named entity extraction is performed from multiple webpage texts, and the obtained multiple target named entities are JAVA, SPRING, etc. JAVA, SPRING users are found to generally tend to "programming education").
第一计算模块206,用于用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值。The
在一具体实施例中,所述用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值包括:In a specific embodiment, calculating the second probability value of each interest tag according to the multiple target named entities and the target website to which each target named entity belongs by using the trained neural network includes:
将每个目标命名实体和该目标命名实体所属的目标网站编码为该目标命名实体的特征向量;Encoding each target named entity and the target website to which the target named entity belongs as a feature vector of the target named entity;
将每个目标命名实体的特征向量输入所述训练好的神经网络,得到该目标命名实体对应的每个兴趣标签的概率值;Input the feature vector of each target named entity into the trained neural network to obtain the probability value of each interest label corresponding to the target named entity;
计算所述多个命名实体对应的每个兴趣标签的概率值的均值,得到所述兴趣标签的第二概率值。例如,两个命名实体分别为JAVA、SPRING,JAVA对应“编程教育”(兴趣标签)的概率值为0.9,SPRING对应编程教育(兴趣标签)的概率值为0.7,则“编程教育”(兴趣标签)的第二概率值为0.8。The mean value of the probability values of each interest tag corresponding to the multiple named entities is calculated to obtain the second probability value of the interest tag. For example, the two named entities are JAVA and SPRING. The probability value of JAVA corresponding to "programming education" (interest label) is 0.9, and the probability value of SPRING corresponding to programming education (interest label) is 0.7, then "programming education" (interest label) has a probability value of 0.9. ) has a second probability value of 0.8.
所述将每个目标命名实体和该目标命名实体所属的目标网站编码为该目标命名实体的特征向量包括:The feature vector encoding each target named entity and the target website to which the target named entity belongs as the target named entity includes:
根据预设编码器(如one-hot编码器、word2vec编码器)将该目标命名实体编码为第一中间向量;Encode the target named entity into a first intermediate vector according to a preset encoder (such as one-hot encoder, word2vec encoder);
根据所述预设编码器将该目标命名实体所属的目标网站编码为第二中间向量;According to the preset encoder, the target website to which the target named entity belongs is encoded as the second intermediate vector;
连接所述第一中间向量和所述第二中间向量,或将所述第一中间向量和所述第二中间向量的进行元素相乘,得到该目标命名实体的特征向量。Connect the first intermediate vector and the second intermediate vector, or multiply the elements of the first intermediate vector and the second intermediate vector to obtain the feature vector of the target named entity.
训练所述神经网络可以包括:Training the neural network may include:
获取一个训练样本和该训练样本的标签,该训练样本包括一个目标命名实体和该目标命名实体所属的目标网站编码;Obtain a training sample and a label of the training sample, where the training sample includes a target named entity and the target website code to which the target named entity belongs;
根据预设编码表将该目标命名实体编码为第一向量,根据预设编码表将该目标命名实体所属的目标网站编码为第二向量;The target named entity is encoded into a first vector according to a preset encoding table, and the target website to which the target named entity belongs is encoded into a second vector according to the preset encoding table;
拼接所述第一向量和所述第二向量得到该目标命名实体的特征向量;Splicing the first vector and the second vector to obtain the feature vector of the target named entity;
将该目标命名实体的特征向量输入初始化参数值的神经网络,得到输出向量;Input the feature vector of the target named entity into the neural network of the initialization parameter value to obtain the output vector;
根据所述输出向量和该训练样本的标签通过反向传播算法优化所述神经网络的参数值。According to the output vector and the label of the training sample, the parameter values of the neural network are optimized through a back-propagation algorithm.
第二计算模块207,用于基于统计方法计算每个兴趣标签的第三概率值。The
所述基于统计方法计算每个兴趣标签的第三概率值包括:The calculation of the third probability value of each interest tag based on the statistical method includes:
获取在所述多个目标网站存在注册信息的多个第二历史用户,每个第二历史用户的用户兴趣画像包括该第二历史用户的多个标签;Acquiring multiple second historical users with registration information on the multiple target websites, and the user interest portrait of each second historical user includes multiple tags of the second historical user;
统计用户兴趣画像中存在该兴趣标签的第二历史用户的第一数量;Counting the first number of second historical users with the interest tag in the user interest portrait;
统计所述多个第二历史用户的第二数量;Counting the second number of the plurality of second historical users;
计算所述第一数量与所述第二数量的比值,将所述第一数量与所述第二数量的比值作为所述第三概率值。A ratio of the first number to the second number is calculated, and the ratio of the first number to the second number is used as the third probability value.
例如,获取在网易云音乐和携程存在注册信息的4个第二历史用户(分别为用户A、用户B、用户C、用户D);统计用户兴趣画像中存在“旅游”(兴趣标签)的第二历史用户的第一数量为3;第二历史用户的第二数量为4;“旅游”(兴趣标签)的第三概率值为0.75。For example, obtain 4 second historical users (respectively, user A, user B, user C, and user D) with registration information in NetEase Cloud Music and Ctrip; count the first users with "travel" (interest tag) in the user's interest portrait. The first number of second historical users is 3; the second number of second historical users is 4; the third probability value of "travel" (interest tag) is 0.75.
第二确定模块208,用于将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值。The second determination module 208 is configured to determine the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag.
例如,“旅游”(兴趣标签)的第一概率值为0.65,“旅游”(兴趣标签)的第三概率值为0.70,“旅游”(兴趣标签)的第三概率值为0.75,则将0.75确定为“旅游”(兴趣标签)的目标概率值。For example, the first probability value of "Travel" (interest tag) is 0.65, the third probability value of "Travel" (interest tag) is 0.70, and the third probability value of "Travel" (interest tag) is 0.75, then 0.75 The target probability value determined as "tour" (interest label).
第三确定模块209,用于将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。The
例如,“旅游”(兴趣标签)的目标概率值为0.75,“编程教育”(兴趣标签)的目标概率值为0.85,所述第一预设阈值为0.80,则将“编程教育”确定为所述用户的兴趣标签。For example, the target probability value of "travel" (interest label) is 0.75, the target probability value of "programming education" (interest label) is 0.85, and the first preset threshold is 0.80, then "programming education" is determined as the the user's interest tags.
实施例二的用户兴趣画像装置20通过与所述用户的兴趣关联的网站和所述用户在所述目标网站中的目标命名实体确定所述用户的兴趣标签,可以提升识别用户的兴趣标签准确率;通过聚类方法得到的兴趣标签的第一概率值、神经网络得到的兴趣标签的第二概率值和通过基于统计得到的兴趣标签的第三概率值可以确定兴趣标签的目标概率值,可以降低出现偏差的风险。实施例二根据用户在各个网站的注册信息抽取出用户的兴趣标签,提升了抽取用户的兴趣标签的准确率,用抽取的用户的兴趣标签描述用户兴趣画像,提升了描述用户兴趣画像的准确率。The user
在另一实施例中,所述获取模块还用于在所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息之前,获取所述用户的授权。In another embodiment, the acquiring module is further configured to acquire the authorization of the user before determining whether the multiple websites have the registration information of the user according to the identification information.
在所述根据所述识别信息判断所述多个网站是否存在所述用户的注册信息之前,可以给用户下发授权选项框,接收用户在所述授权选项框中勾选的授权选项。Before determining whether the user's registration information exists in the plurality of websites according to the identification information, an authorization option box may be issued to the user, and an authorization option checked by the user in the authorization option box may be received.
实施例三Embodiment 3
本实施例提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述用户兴趣画像方法实施例中的步骤,例如图1所示的步骤101-109:This embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps in the above embodiment of the user interest portrait method, for example, as shown in FIG. 1 . Steps 101-109:
101,获取多个网站、多个兴趣标签和用户的识别信息;101. Obtain identification information of multiple websites, multiple interest tags and users;
102,根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站;102. Determine whether the multiple websites have the registration information of the user according to the identification information, and obtain multiple target websites that have the registration information of the user;
103,根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量;103. Generate a registration feature vector of the user according to the judgment result of whether the registration information of the user exists in the multiple websites;
104,采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值;104, adopting a clustering method to determine the first probability value of each interest tag of the user according to the registration feature vector of the user;
105,从每个目标网站爬取所述用户的多个目标命名实体;105. Crawl multiple target named entities of the user from each target website;
106,用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值;106, calculating the second probability value of each interest tag with the trained neural network according to the multiple target named entities and the target website to which each target named entity belongs;
107,基于统计方法计算每个兴趣标签的第三概率值;107. Calculate the third probability value of each interest tag based on a statistical method;
108,将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值;108, determining the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag;
109,将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。109. Determine an interest tag with a target probability value greater than a first preset threshold as an interest tag of the user.
或者,该计算机程序被处理器执行时实现上述装置实施例中各模块的功能,例如图2中的模块201-209:Alternatively, when the computer program is executed by the processor, the functions of the modules in the above-mentioned apparatus embodiments are implemented, for example, the modules 201-209 in FIG. 2:
获取模块201,用于获取多个网站、多个兴趣标签和用户的识别信息;an
判断模块202,用于根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站;A
生成模块203,用于根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量;The
第一确定模块204,用于采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值;a first determining
爬取模块205,用于从每个目标网站爬取所述用户的多个目标命名实体;A
第一计算模块206,用于用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值;The
第二计算模块207,用于基于统计方法计算每个兴趣标签的第三概率值;The
第二确定模块208,用于将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值;The second determination module 208 is configured to determine the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag;
第三确定模块209,用于将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。The
实施例四Embodiment 4
图3为本发明实施例三提供的计算机设备的示意图。所述计算机设备30包括存储器301、处理器302以及存储在所述存储器301中并可在所述处理器302上运行的计算机程序303,例如用户兴趣画像程序。所述处理器302执行所述计算机程序303时实现上述用户兴趣画像方法实施例中的步骤,例如图1所示的101-109:FIG. 3 is a schematic diagram of a computer device according to Embodiment 3 of the present invention. The
101,获取多个网站、多个兴趣标签和用户的识别信息;101. Obtain identification information of multiple websites, multiple interest tags and users;
102,根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站;102. Determine whether the multiple websites have the registration information of the user according to the identification information, and obtain multiple target websites that have the registration information of the user;
103,根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量;103. Generate a registration feature vector of the user according to the judgment result of whether the registration information of the user exists in the multiple websites;
104,采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值;104, adopting a clustering method to determine the first probability value of each interest tag of the user according to the registration feature vector of the user;
105,从每个目标网站爬取所述用户的多个目标命名实体;105. Crawl multiple target named entities of the user from each target website;
106,用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值;106, calculating the second probability value of each interest tag with the trained neural network according to the multiple target named entities and the target website to which each target named entity belongs;
107,基于统计方法计算每个兴趣标签的第三概率值;107. Calculate the third probability value of each interest tag based on a statistical method;
108,将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值;108, determining the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag;
109,将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。109. Determine an interest tag with a target probability value greater than a first preset threshold as an interest tag of the user.
或者,该计算机程序被处理器执行时实现上述装置实施例中各模块的功能,例如图2中的模块201-209:Alternatively, when the computer program is executed by the processor, the functions of the modules in the above-mentioned apparatus embodiments are implemented, for example, the modules 201-209 in FIG. 2:
获取模块201,用于获取多个网站、多个兴趣标签和用户的识别信息;an
判断模块202,用于根据所述识别信息判断所述多个网站是否存在所述用户的注册信息,得到存在所述用户的注册信息的多个目标网站;A
生成模块203,用于根据所述多个网站是否存在所述用户的注册信息的判断结果生成所述用户的注册特征向量;The
第一确定模块204,用于采用聚类方法根据所述用户的注册特征向量确定所述用户的每个兴趣标签的第一概率值;a first determining
爬取模块205,用于从每个目标网站爬取所述用户的多个目标命名实体;A
第一计算模块206,用于用训练好的神经网络根据所述多个目标命名实体和每个目标命名实体所属的目标网站计算每个兴趣标签的第二概率值;The
第二计算模块207,用于基于统计方法计算每个兴趣标签的第三概率值;The
第二确定模块208,用于将每个兴趣标签的第一概率值、第二概率值和第三概率值中的最大值确定为该兴趣标签的目标概率值;The second determination module 208 is configured to determine the maximum value among the first probability value, the second probability value and the third probability value of each interest tag as the target probability value of the interest tag;
第三确定模块209,用于将目标概率值大于第一预设阈值的兴趣标签确定为所述用户的兴趣标签。The
示例性的,所述计算机程序303可以被分割成一个或多个模块,所述一个或者多个模块被存储在所述存储器301中,并由所述处理器302执行,以完成本方法。所述一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序303在所述计算机设备30中的执行过程。例如,所述计算机程序303可以被分割成图2中的获取模块201、判断模块202、生成模块203、第一确定模块204、爬取模块205、第一计算模块206、第二计算模块207、第二确定模块208、第三确定模块209,各模块具体功能参见实施例二。Exemplarily, the
本领域技术人员可以理解,所述示意图3仅仅是计算机设备30的示例,并不构成对计算机设备30的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机设备30还可以包括输入输出设备、网络接入设备、总线等。Those skilled in the art can understand that the schematic diagram 3 is only an example of the
所称处理器302可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器302也可以是任何常规的处理器等,所述处理器302是所述计算机设备30的控制中心,利用各种接口和线路连接整个计算机设备30的各个部分。The so-called
所述存储器301可用于存储所述计算机程序303,所述处理器302通过运行或执行存储在所述存储器301内的计算机程序或模块,以及调用存储在存储器301内的数据,实现所述计算机设备30的各种功能。所述存储器301可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备30的使用所创建的数据等。此外,存储器301可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart MediaCard,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。The
所述计算机设备30集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。If the modules integrated in the
在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述用户兴趣画像方法的部分步骤。The above-mentioned integrated modules implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute the user interests described in the various embodiments of the present invention. Some steps of the portrait method.
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他模块或步骤,单数不排除复数。系统权利要求中陈述的多个模块或装置也可以由一个模块或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other modules or steps, and the singular does not exclude the plural. Several modules or means recited in the system claims can also be implemented by one module or means by means of software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.
最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010243221.7A CN111552865A (en) | 2020-03-31 | 2020-03-31 | User interest portrait method and related equipment |
PCT/CN2020/105900 WO2021196474A1 (en) | 2020-03-31 | 2020-07-30 | User interest profiling method and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010243221.7A CN111552865A (en) | 2020-03-31 | 2020-03-31 | User interest portrait method and related equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111552865A true CN111552865A (en) | 2020-08-18 |
Family
ID=72003804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010243221.7A Pending CN111552865A (en) | 2020-03-31 | 2020-03-31 | User interest portrait method and related equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111552865A (en) |
WO (1) | WO2021196474A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112883269A (en) * | 2021-02-26 | 2021-06-01 | 上海连尚网络科技有限公司 | Method and equipment for adjusting label data information |
CN115034289A (en) * | 2022-04-28 | 2022-09-09 | 国网河北省电力有限公司营销服务中心 | Method, device and terminal equipment for user portrait of environmental protection enterprise |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114840743B (en) * | 2022-03-01 | 2023-02-07 | 深圳市小秤砣科技有限公司 | Model recommendation method and device, electronic equipment and readable storage medium |
CN114489447B (en) * | 2022-03-28 | 2022-07-12 | 山东大学 | Word processing control method and system based on user behavior and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104603773A (en) * | 2012-06-14 | 2015-05-06 | 诺基亚公司 | Method and apparatus for associating interest tags with media items based on social diffusions among users |
CN106874435A (en) * | 2017-01-25 | 2017-06-20 | 北京航空航天大学 | User portrait construction method and device |
CN109189904A (en) * | 2018-08-10 | 2019-01-11 | 上海中彦信息科技股份有限公司 | Individuation search method and system |
CN109815381A (en) * | 2018-12-21 | 2019-05-28 | 平安科技(深圳)有限公司 | User portrait construction method, system, computer equipment and storage medium |
CN110134860A (en) * | 2019-04-12 | 2019-08-16 | 阿里巴巴集团控股有限公司 | User's portrait generation method, device and equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011062883A1 (en) * | 2009-11-20 | 2011-05-26 | Ustream, Inc. | Broadcast notifications using social networking systems |
CN108596655A (en) * | 2018-04-10 | 2018-09-28 | 四川金亿信财务咨询有限公司 | A kind of statistics extension system for information of being registered based on advertisement |
CN109408735B (en) * | 2018-10-11 | 2021-06-25 | 杭州飞弛网络科技有限公司 | Stranger social user portrait generation method and system |
CN109992632A (en) * | 2019-01-14 | 2019-07-09 | 江苏智途科技股份有限公司 | A kind of spatial data intelligence distribution method of servicing based on big data |
CN110298029B (en) * | 2019-05-22 | 2022-07-12 | 平安科技(深圳)有限公司 | Friend recommendation method, device, equipment and medium based on user corpus |
-
2020
- 2020-03-31 CN CN202010243221.7A patent/CN111552865A/en active Pending
- 2020-07-30 WO PCT/CN2020/105900 patent/WO2021196474A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104603773A (en) * | 2012-06-14 | 2015-05-06 | 诺基亚公司 | Method and apparatus for associating interest tags with media items based on social diffusions among users |
CN106874435A (en) * | 2017-01-25 | 2017-06-20 | 北京航空航天大学 | User portrait construction method and device |
CN109189904A (en) * | 2018-08-10 | 2019-01-11 | 上海中彦信息科技股份有限公司 | Individuation search method and system |
CN109815381A (en) * | 2018-12-21 | 2019-05-28 | 平安科技(深圳)有限公司 | User portrait construction method, system, computer equipment and storage medium |
CN110134860A (en) * | 2019-04-12 | 2019-08-16 | 阿里巴巴集团控股有限公司 | User's portrait generation method, device and equipment |
Non-Patent Citations (2)
Title |
---|
王凯;潘玮;杨枢;翟菊叶;: "基于模糊概念格的丁香园社区用户多粒度画像研究", 情报理论与实践, no. 08, 31 March 2020 (2020-03-31), pages 103 - 111 * |
王程子;姜慧;: "信念网络在用户画像中的应用", 中国信息化, no. 10, 10 October 2017 (2017-10-10), pages 58 - 60 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112883269A (en) * | 2021-02-26 | 2021-06-01 | 上海连尚网络科技有限公司 | Method and equipment for adjusting label data information |
CN112883269B (en) * | 2021-02-26 | 2024-05-31 | 上海连尚网络科技有限公司 | A method and device for adjusting label data information |
CN115034289A (en) * | 2022-04-28 | 2022-09-09 | 国网河北省电力有限公司营销服务中心 | Method, device and terminal equipment for user portrait of environmental protection enterprise |
Also Published As
Publication number | Publication date |
---|---|
WO2021196474A1 (en) | 2021-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110069709B (en) | Intention recognition method, device, computer readable medium and electronic equipment | |
CN111552865A (en) | User interest portrait method and related equipment | |
CN107463605B (en) | Method and device for identifying low-quality news resource, computer equipment and readable medium | |
WO2021196825A1 (en) | Abstract generation method and apparatus, and electronic device and medium | |
CN111602147A (en) | Machine learning model based on non-local neural network | |
JP6987209B2 (en) | Duplicate document detection method and system using document similarity measurement model based on deep learning | |
US20230104757A1 (en) | Techniques for input classification and response using generative neural networks | |
CN110855648B (en) | Early warning control method and device for network attack | |
CN111785384A (en) | Artificial intelligence-based abnormal data identification method and related equipment | |
JP2021096858A (en) | Method and system for detecting duplicate documents using vector quantization | |
US20220139063A1 (en) | Filtering detected objects from an object recognition index according to extracted features | |
US20230109260A1 (en) | Techniques for cursor trail capture using generative neural networks | |
CN111414122A (en) | Intelligent text processing method and device, electronic equipment and storage medium | |
CN114219971B (en) | Data processing method, device and computer readable storage medium | |
WO2024222714A1 (en) | Data processing method and apparatus based on multi-modal model, and device and medium | |
CN114416998A (en) | Recognition method, device, electronic device and storage medium of text label | |
WO2021147404A1 (en) | Dependency relationship classification method and related device | |
US20170212949A1 (en) | Auditing and Augmenting User-Generated Tags for Digital Content | |
CN113627186B (en) | Entity relation detection method based on artificial intelligence and related equipment | |
CN113705468B (en) | Digital image recognition method based on artificial intelligence and related equipment | |
CN111597453B (en) | User image drawing method, device, computer equipment and computer readable storage medium | |
CN115269781A (en) | Modal association degree prediction method, device, equipment, storage medium and program product | |
CN116468043A (en) | Nested entity identification method, device, equipment and storage medium | |
CN109933679A (en) | Object type recognition method, apparatus and device in image | |
CN116340551A (en) | A method and device for determining similar content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40033541 Country of ref document: HK |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |