[go: up one dir, main page]

CN103577405A - Interest analysis based micro-blogger community classification method - Google Patents

Interest analysis based micro-blogger community classification method Download PDF

Info

Publication number
CN103577405A
CN103577405A CN201210250181.4A CN201210250181A CN103577405A CN 103577405 A CN103577405 A CN 103577405A CN 201210250181 A CN201210250181 A CN 201210250181A CN 103577405 A CN103577405 A CN 103577405A
Authority
CN
China
Prior art keywords
label
bloger
microblogging
occurrence
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210250181.4A
Other languages
Chinese (zh)
Inventor
施晓菁
梁循
曹润
周晨曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN201210250181.4A priority Critical patent/CN103577405A/en
Publication of CN103577405A publication Critical patent/CN103577405A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明新提出基于兴趣分析的微博博主社区分类方法,根据博主的微博内容对其进行客观的直接划分。现有的对微博内容的研究中,局限于将其用于各种事件的关键词提取,而没有根据内容对博主进行直观的分析。在本发明中,我们使用适当的API接入技术,在抓取微博信息的基础上,对每个博主发布的微博信息进行分析,从中抽取出适合每个博主的若干人工标签,并根据所有博主的标签集合来对博主进行分类。本发明对微博博主的分类提供了一种新的方法,并对微博博主进行个性化推荐等提供了一种新的参考,具有较强的实践意义。The invention newly proposes a microblog blogger community classification method based on interest analysis, and objectively and directly classifies bloggers according to their microblog content. In the existing research on microblog content, it is limited to use it for keyword extraction of various events, and does not conduct an intuitive analysis of bloggers according to the content. In the present invention, we use appropriate API access technology to analyze the microblog information published by each blogger on the basis of grabbing microblog information, and extract a number of artificial tags suitable for each blogger, And classify the bloggers according to the tag collection of all bloggers. The invention provides a new method for classifying microblog bloggers, and provides a new reference for individualized recommendation of microblog bloggers, and has strong practical significance.

Description

Microblogging bloger community sorting technique based on interest analysis
Technical field
The invention belongs to community's sorting technique, be specifically related to a kind of microblogging bloger community sorting technique based on tag extraction.
Background technology
Along with the high speed development of infotech and network, the Web2.0 service that people can select is more and more, and wherein the appearance of microblogging class website is to have met the instant demand of sharing information and linking up with other people that exchanges of people.Microblogging is paid close attention to the Information Sharing and propagation platform forwarding with microblogging based on user as one, bloger on it can pass through 140 characters of the various ways issues such as computing machine, mobile phone, MI software and outside api interface with interior micro-blog information, substantially be not subject to the restriction of space-time, information updating is more convenient and rapid, has the instantaneity of height and the powerful features such as interactivity.Compare with traditional blog, the principal feature of microblogging has " short, clever, fast ", more can cater to modern's fast pace life.Therefore, microblogging is with its rapid fashionable whole world of characteristic quickly and easily.At present, China's microblog users quantity surpasses 300,000,000.
In the Web2.0 epoch, user is not only the viewer of web site contents, also becomes the creator of web site contents.User is the soul of network, thereby user's specificity analysis is become to the emphasis of network Development.In this platform of issuing based on bloger's user profile of microblogging, bloger's behavioural characteristic more directly has influence on the development of platform, and bloger's interest analysis is seemed to particularly important.
Except part, like diving, the bloger who does not substantially make a speech or seldom forward, it is exactly to issue the information such as the state of oneself, mood, event whenever and wherever possible that other blogers use the main mode of microblogging.From the microblogging of bloger's issue, can find out bloger's idiom, point of interest place and personality feature.So analysis based on content distributed to bloger, can be to a great extent bloger being carried out to current interest analysis, to bloger, stamp personalized labels, and then bloger is carried out to labeling, for example, for providing personalized service (, the recommendation of similar microblogging, the similar bloger's of interest recommendation) to provide support to bloger from now on.
Although in existing microblogging research, had many to the theory of bloger's classification aspect.Relatively more outstanding is that bloger is divided three classes: mass medium, famous person and grassroots-layer, can also to it, classify according to residing professional domain.But such classification is nowhere near for bloger's individual research.These classification are only more rough divisions, and the bloger's degree of difference in each classification is very large.The most of common bloger who forms grassroots-layer can not well be divided.In microblogging platform at home, although bloger can select to add different groups or micro-group, to exchange with other blogers with a common goal, a lot of blogers' inherence difference is still left in the basket.Bloger can only classify to oneself on one's own initiative, probably oneself itself has also ignored the numerous characteristics of oneself, lacks a kind of more objective and perfect partition mechanism.
The present invention newly proposes the microblogging bloger community sorting technique based on interest analysis, according to bloger's microblogging content, it is carried out objectively directly dividing.In the present invention, we use suitable API access technology, capturing on the basis of micro-blog information, micro-blog information to each bloger's issue is analyzed, therefrom extract some manual tag of applicable bloger (for preventing repeating, below use " label " replacement " manual tag "), and according to each bloger's label, bloger is classified.The present invention provides a kind of new method to microblogging bloger's classification, and for recommend to provide a kind of reference to microblogging bloger later.For example, if most of label of microblogging bloger belongs to a certain classification, can recommend other labels in this classification for him; And for most of label two blogers in the same few classification all, can be using the other side as friend recommendation.
Summary of the invention
The present invention utilizes existing microblogging open platform, by open API, access, capture the content of microblogging bloger and special time period thereof, comprise microblogging and comment and the forwarding of issue, and these contents are carried out to text analyzing, therefrom extract applicable some labels, thereby finally realize the tagsort to microblogging bloger.
Domestic all microblogging platforms are all open at present, and these open microblogging platforms have all been announced the API that can use for its platform in succession.Based on these API, can obtain the basis and the approach that capture microblogging data, as shown in Figure 2, main thes contents are as follows whole microblogging data capture program design framework: microblogging open platform, OAuth Certificate Authority, API Access, API source program, access queue control program, storage control program and SQL Sever database.
After obtaining data, microblogging content, through pre-service, is also needed microblogging carry out word segmentation processing and remove stop words.
To removing the microblogging data of stop words, carry out tag extraction.The type of label has following two classes:
1. extract the keyword in bloger's microblogging content, as hobby, life habit etc.;
2. extract the everyday words in bloger's microblogging content, as pet phrase, idiom etc.
In keyword extraction, can use existing TF-IDF method.The method particular content is as follows:
1.TF-IDF, as a kind of statistical method, is mainly for being evaluated in certain file in a file set or a corpus, the importance of some words.Application in the present invention, just can be used as the importance of the words in a microblogging of assessment, by importance ranking, extracts the keyword in microblogging.
2.TF (Term Frequency): i.e. word frequency, represents the correlativity of a words and certain document.It is here a frequency that words occurs in certain microblogging.Formula is N/Nt, and wherein N is the number of times that this words occurs in certain microblogging, and Nt is the sum of all words of comprising of this microblogging.
3.IDF (Inverse Document Frequency): i.e. anti-document frequency, represents that a words represents the weight size of the theme of document.Mainly that microblogging quantity by having comprised this words and the total quantity of microblogging are carried out comparison.The number of times occurring is more, and weight is less.Formula is-log (D/Dt) that wherein D is in all microbloggings of this bloger, the sum of the microblogging that has comprised this words; Dt is certain bloger's microblogging total number.
4. finally using the value of TF*IDF of words as its TF-IDF mark, carry out descending sort, as this bloger's keyword.
Everyday words extracting method is as follows:
1. all notional words, auxiliary word and interjection in a microblogging after pre-service are extracted.This wherein will comprise can embody the compose a piece of writing word of custom, personality feature of bloger.
2. by these word deduplications, also as this bloger's keyword.
Above-mentioned keyword is combined with everyday words, as each bloger's keyword.Add up the access times of these keywords in nearest 30 days of this microblogging bloger issuing microblog (removing the microblogging of forwarding), and carry out descending sort, get front L and (if be less than L, directly use this number, for convenient, we still remember that this number is L), as this microblogging bloger's label.
Each bloger obtains L label, for all U bloger (i.e. all blogers that are selected), all tag sets is got up, and obtains all bloger's label total collection, uses following algorithm to carry out community's classification to all U bloger.First provide the definition of following two concepts:
1. co-occurrence: when two labels appear in same bloger's tag set simultaneously, claim these two label co-occurrences once.
2. classification degree: the nodes that has been divided into each class accounts for the ratio of total nodes.
This is a kind of network analysis based on label co-occurrence:
1. each a bloger's obtained above L tag set is got up, all bloger's label total collection are carried out to deduplication, the statistics of the line frequency number of going forward side by side, according to the descending sort of frequency.The frequency has here referred to how many different blogers and has occurred this label.To frequency setting threshold, be 3 simultaneously, remove frequency and be 1 and 2 label.To data carry out this processing according to being: the frequency that label occurs is higher, with the label of its co-occurrence also can be more; Otherwise set up.Thereby to remove frequency be 1 and 2 label, can analyze more clearly remaining label.
2. pair data are processed and are obtained each label co-occurrence number of times each other, and the maximal value that obtains co-occurrence number of times is C.
3. label is positioned in network chart and is analyzed, a label is as a node.In network chart, every two nodes undirected connection respectively that is first C by co-occurrence value, more every two nodes undirected connection respectively that is C-1 by co-occurrence value, the like.Until classification degree reaches 90%, stop this step.
4. remove node independent in network chart and the component that only has two nodes to be connected, each connected graph being left of usining is classified as each.
5. after the classification that obtains label, can analyze bloger.Which classification is the L having according to a bloger tag body belong to, and can judge this bloger's principal character.
After the classification that obtains label, other labels in this classification can be recommended to this bloger, or using the bloger similar to this bloger's label as its commending friends.Specific rules is as follows:
Rule 1: for the specific bloger who accomplishes fluently L label, first recommend Frequent Set intermediate value (being co-occurrence value) maximum and the label in this bloger's tag set not to it, then according to the recommendation of successively decreasing of Frequent Set intermediate value;
Rule 2: for the specific bloger who accomplishes fluently L label, the label of analyzing this bloger belongs at most a certain class, and other labels in this classification are recommended to this bloger;
Rule 3: for the specific bloger who accomplishes fluently L label, the label of analyzing this bloger belongs at most a certain class, and maximum labels are also belonged to the bloger of this class as this bloger's friend recommendation;
Rule 4: consider special circumstances 1, if this bloger's label is uniformly distributed in each classification,, from each classification, the random not label in his tag set of selecting, recommends this bloger;
Rule 5: consider special circumstances 2, if the frequency of all labels is all identical, to certain specific bloger, only need, from all not his label of tag set, select at random label to recommend him.
For specific implementation this method, need to follow following steps:
Step 1: first obtain certain microblogging bloger's essential information, form basic bloger's list.
Step 2: capture all blogers in bloger's list and end current all microbloggings of nearest 6 months.If this bloger microblogging total number of nearest 6 months is greater than 100, think that this bloger often sends out microblogging, be heavyweight bloger, continue step below; Otherwise abandon this bloger, process next microblogging bloger.
Step 3: because the content forwarding can not embody current bloger's feature, so remove the content of each bloger's microblogging repeating, remaining microblogging is carried out to pre-service.
Step 4: will carry out word segmentation processing and remove stop words through pretreated microblogging content.
Step 5: to removing the microblogging data of stop words, extract the keyword of every microblogging.
Step 5.1: length is surpassed to the microblogging of 30 characters, use aforementioned TF-IDF method to carry out keyword extraction.
Step 5.2: length is less than or equal to the microblogging of 30 characters, uses aforementioned everyday words extracting method to carry out keyword extraction.
Step 6: for each bloger's obtained above keyword, add up the access times of these keywords in nearest 30 days of this microblogging bloger issuing microblog (removing the microblogging of forwarding), and carry out descending sort, get front L and (if be less than L, directly use this number, for convenient, we still remember that this number is L), as this microblogging bloger's label.
Step 7: U bloger L tag set separately got up, according to aforementioned algorithm, this all bloger's label total collection is carried out to community's classification.By the method for the Frequent Set in this similar correlation rule, the user tag occurring together of being everlasting is most assigned to same classification, therefrom excavate out the relevance between these labels and bloger.
Step 8: obtain classification results and graphically show.According to aforementioned rule 1~rule 5, bloger is analyzed.
Accompanying drawing explanation
Tu1Shi Tengxun microblogging schematic diagram.
Fig. 2 is the system flowchart that captures micro-blog information.
Fig. 3 is system flowchart of the present invention.
Fig. 4 .1 and Fig. 4 .2 are the database list examples of the microblogging bloger list in the present invention, because data line is long, are divided into two parts and show, are respectively database list (1) and database list (2).
Fig. 5 .1 and Fig. 5 .2 are participle examples.Fig. 5 .1 is original paragraph before participle, and Fig. 5 .2 is word segmentation result example.
Fig. 6 .1, Fig. 6 .2 and Fig. 6 .3 are TF-IDF arithmetic result examples.Fig. 6 .1 is bloger's part word TF value, and Fig. 6 .2 is bloger's part IDF value, and Fig. 6 .3 is part TF-IDF score value, and Fig. 6 .4 is that user " liubulang " part keyword is shown.
Fig. 7 .1 is application result example to Fig. 7 .6, is respectively classification 1 to classification 6.
Embodiment
Below in conjunction with drawings and Examples, the inventive method is further described.
The inventive method be take Tengxun's microblogging and is carried out the examples show of work in every as example:
Step 1: first obtain certain microblogging bloger's essential information, form basic bloger's list.
Step 2: microblogging content obtaining.Use java compiler eclipse to realize.By calling the API of open platform, the mode of the Oauth granted access of process platform is obtained microblogging data.Capture end current nearest 6 months all microbloggings in (from year May in Dec, 2011 to 2012) of all blogers in bloger's list.Only choosing heavyweight bloger is added in table.In database, choose two tables of generation, one is bloger's information table, and one is micro-blog information table.Because the data volume capturing is too large, from database table, choose at random 52 (being U=52) blogers' data and carry out ensuing displaying.Tableau format is as follows:
Table1:userinfo//bloger information table
Figure DEST_PATH_GSB00000937386400091
Table2:status//micro-blog information table
Figure DEST_PATH_GSB00000937386400092
Step 3: remove the forwarding content in selected bloger's microblogging, be that the content of " 1 " is removed by the value of zhuanfa item in database table.In addition, because some part in microblogging content there is no help to our subsequent treatment, but appearance is more frequent, so first carry out pre-service.
1), by the symbol in microblogging, as emoticon, the contents such as sign and bloger's pet name below, topic sign " ## " are removed.
2) also need to remove the contents such as punctuate in microblogging, space, carriage return.
Step 4: bloger's microblogging is carried out to word segmentation processing and stop words removal.Shown in being implemented as follows.
1) due to the singularity of microblogging term, some word, with microblogging characteristic, causes some bloger's word not in dictionary for word segmentation, so we are first added into bloger's dictionary to carry out perfect to it.
2) there is at present multiple participle mode, our Words partition system ICTLAS of the choice for use Chinese Academy of Sciences here, the bloger's dictionary according to importing, carries out participle to microblogging.Word segmentation result example is shown in accompanying drawing 5.1 and Fig. 5 .2.
3), according to given inactive vocabulary, the microblogging through participle is carried out to the removal of stop words.We remove stop words according to inactive vocabulary, the pronoun in removal microblogging, article, conjunction etc.These word frequencies of occurrences are very high, but the extraction of label is not had to help, and remove them and can not affect microblogging theme.In addition, because we select some idioms of bloger in the extraction of label, be label, so must be noted that these words to be added to inactive vocabulary.Stop words is for example: " ", " he ", " with ", " " etc.
4) content of having divided word, remove after stop words is put into different files according to different blogers, every microblogging is put into a txt document.
Step 5: to removing the microblogging data of stop words, write java program according to preceding method and carry out keyword extraction.
Step 5.1: length is surpassed to the microblogging of 30 characters, use aforementioned TF-IDF method to carry out keyword extraction.Fig. 6 .1, Fig. 6 .2 and Fig. 6 .3 are TF-IDF arithmetic result examples.The value for name item of showing in example is the bloger's of " liubulang " result.Fig. 6 .1 is the value of this bloger's part word TF, and Fig. 6 .2 is the value of this bloger's part IDF, and Fig. 6 .3 is TF-IDF score value, and Fig. 6 .4 is that keyword is shown.
Step 5.2: length is less than or equal to the microblogging of 30 characters, uses aforementioned everyday words extracting method to carry out keyword extraction.
Step 6: obtain after each bloger's keyword, add up the access times of these keywords in nearest 30 days of this microblogging bloger (May 1 was to May 31) issuing microblog (removing the microblogging of forwarding), and carry out descending sort, get first 30 (being L=30), as this microblogging bloger's label.So all 52 blogers obtain 1560 labels altogether, i.e. all bloger's label total collection.
Step 7: use preceding method to carry out following classification step to all bloger's label total collection that obtain.
Step 7.1: to all these labels, use array according to (label value, affiliated user) (be array[]=(tag, user)) form storage, the go forward side by side statistics of line frequency number, according to the descending sort of frequency, with array (frequency[]=(tag, frequency)) storage.To frequency setting threshold, be 3 simultaneously, remove frequency and be 1 and 2 label.Finally obtain totally 1347, the label of frequency >=3.
Step 7.2: data are processed, by above-mentioned array is carried out to recycle ratio, there is the co-occurrence value of two labels of identical user value to add 1, obtain every two labels co-occurrence number of times each other, take that this sets up label-label co-occurrence matrix that dimension is respectively a 1347x1347 of label, content co-occurrence number of times between two labels.That is, if the co-occurrence number of times of label i and label j is n, k ijfor n.The maximal value that simultaneously obtains co-occurrence number of times is 43.
Step 7.3: label is positioned in a network and is analyzed, and label is node.Two-dimensional array co_occur[i] [j] store the co-occurrence number of times of i label and j label, then use a two-dimensional array link[i] whether [j] store i label and be connected with j label, initial value is 0.If a numeric type variable num counts.From co-occurrence maximal value 43, start descending circulation searching co-occurrence value, if exist, the link value of these two nodes being made as to 1, num increases by 1.Until classification degree reaches 90%, namely num/2 is (undirected owing to connecting, so exist and be connected between i node and j node, be expressed as link[i] [j] and link[j] [i] all can be made as 1, num is counted as actual twice) value surpass 1347*90%, existing 1212 labels find the classification of oneself, stop circulation, and classification completes.
Step 7.4: use Graph Drawing Algorithm the above results to be drawn as to the form of network chart.Remove node independent in network chart and the component that only has two nodes to be connected, each connected graph being left of usining is classified as each.In this example, obtain 6 classification.
Step 8: obtain classification results and graphically show.Refer to accompanying drawing 7.1 to 7.6.Label in Fig. 7 .1 is main relevant with amusement, also comprise some and medium, label that news is relevant, for example, if the label that this method beats to a microblogging bloger comprises " open air " this vocabulary, we just infer that this bloger is also interested in " going for an outing in early spring ", " medium ", " video display ", " tourism " etc. so, so just recommend these products of this bloger; Label in Fig. 7 .2 is main relevant to internet, also comprises some marketing simultaneously, manages such label, illustrates that internet has brought very large opportunity to new industry, and blogers relatively pay close attention to the situation of this respect; In Fig. 7 .3, be the main label relevant to fashion, beauty and performing career, main more relevant to women; Label in Fig. 7 .4 is relevant to blogers' Comparison of Gardening Activities often carrying out Network Based, has occurred the network words as " praising ", " practical joke "; Label in Fig. 7 .5 is mainly concerned with physical culture, public good and finance and economics aspect, compares with Fig. 7 .3, has more the male sex's speciality; Fig. 7 .6 is a huge figure who comprises quite a lot of label, relates to every aspect.After having obtained these classification, just can recommend according to each bloger's 30 labels, as long as the label that this bloger is endowed in certain class, just recommends the Related product of such other vocabulary to this bloger, label in certain class is more, preferentially recommends such product.For example, to bloger " yueguangtaotao ", because its most of label all belongs to the 3rd class as " fashion ", " beauty " etc., according to rule 2, other labels in this class are recommended to her as " star ", " shopping " etc.If most of label of another bloger " xingganxuexue " also belongs to the 3rd class, according to rule 3, " xingganxuexue " is recommended as to the good friend of " yueguangtaotao ".

Claims (7)

1. for the microblogging data that capture, be only chosen at the bloger that nearest six months issuing microblogs surpass 100 and add in table as heavyweight user.
2. every microblogging through pre-service, participle and removal stop words of couple bloger carries out number of words judgement.The microblogging that surpasses 30 characters is used TF-IDF algorithm to carry out keyword extraction; Number of characters is less than the microblogging that equals 30 characters and directly gets all notional words, auxiliary word and interjection in microblogging as keyword, and these words can embody bloger's style of writing custom, personality feature.
3. obtain after each bloger's keyword, add up the access times of these keywords in nearest 30 days of this microblogging bloger issuing microblog (removing the microblogging of forwarding), and carry out descending sort, get front L, as this microblogging bloger's manual tag (hereinafter to be referred as " label ").
4. pair all labels that obtain, use array according to (label value, affiliated bloger) (be array[]=(tag, user)) form storage, the go forward side by side statistics of line frequency number, according to the descending sort of frequency, with array (frequency[]=(tag, frequency)) storage.The frequency has here referred to how many different blogers and has occurred this label.To frequency setting threshold, be 3 simultaneously, remove frequency and be 1 and 2 label.
5. by above-mentioned array is carried out to recycle ratio, there is the co-occurrence value of two labels of identical user value to add 1, obtain every two labels co-occurrence number of times each other, take that this sets up label-label co-occurrence matrix that dimension is respectively label, content co-occurrence number of times between two labels.That is if the co-occurrence number of times of label i and label j is n, kij is n.The co-occurrence here refers to, when two labels appear in same bloger's tag queue simultaneously, claims these two label co-occurrences once.
6. label is positioned in a network and analyzes, label is node.Two-dimensional array co_occur[i] [j] store the co-occurrence number of times of i label and j label, then use a two-dimensional array link[i] whether [j] store i label and be connected with j label, initial value is 0.If a numeric type variable num counts.From co-occurrence maximal value C, start descending circulation searching co-occurrence value, if exist, the link value of these two nodes being made as to 1, num increases by 1.Until classification degree reaches 90%, stop circulation, classification completes.
7. use Graph Drawing Algorithm the above results to be drawn as to the form of network chart.Remove node independent in network chart and the component that only has two nodes to be connected, each connected graph being left of usining is classified as each.
CN201210250181.4A 2012-07-19 2012-07-19 Interest analysis based micro-blogger community classification method Pending CN103577405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210250181.4A CN103577405A (en) 2012-07-19 2012-07-19 Interest analysis based micro-blogger community classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210250181.4A CN103577405A (en) 2012-07-19 2012-07-19 Interest analysis based micro-blogger community classification method

Publications (1)

Publication Number Publication Date
CN103577405A true CN103577405A (en) 2014-02-12

Family

ID=50049213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210250181.4A Pending CN103577405A (en) 2012-07-19 2012-07-19 Interest analysis based micro-blogger community classification method

Country Status (1)

Country Link
CN (1) CN103577405A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376041A (en) * 2014-10-11 2015-02-25 北京中搜网络技术股份有限公司 Information extraction method based on classification of microblogs
CN105447196A (en) * 2015-12-31 2016-03-30 深圳中泓在线股份有限公司 Key blogger tracking confirmation method and device
CN105740366A (en) * 2016-01-26 2016-07-06 哈尔滨工业大学深圳研究生院 Inference method and device of MicroBlog user interests
CN105809559A (en) * 2016-03-15 2016-07-27 微梦创科网络科技(中国)有限公司 Method and device for mining ability user in social network
CN105989077A (en) * 2015-02-09 2016-10-05 北京字节跳动科技有限公司 Recommendation-based interest community user guide method
CN106294564A (en) * 2016-07-27 2017-01-04 乐视控股(北京)有限公司 A kind of video recommendation method and device
CN106453069A (en) * 2016-12-21 2017-02-22 北京奇虎科技有限公司 Instant communication message sending, viewing and viewing control method and device thereof
CN107864192A (en) * 2017-10-25 2018-03-30 广东欧珀移动通信有限公司 Information-pushing method, device, server and readable storage medium storing program for executing
CN108305180A (en) * 2017-01-13 2018-07-20 中国移动通信有限公司研究院 A kind of friend recommendation method and device
CN111538859A (en) * 2020-04-23 2020-08-14 北京达佳互联信息技术有限公司 Method and device for dynamically updating video label and electronic equipment
CN111708952A (en) * 2020-06-18 2020-09-25 小红书科技有限公司 Label recommendation method and system
CN112800222A (en) * 2021-01-26 2021-05-14 天津科技大学 A multi-task-assisted extreme multi-label short text classification method using co-occurrence information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094840A1 (en) * 2007-03-30 2010-04-15 Stuart Donnelly Method of searching text to find relevant content and presenting advertisements to users
CN107916256A (en) * 2017-12-07 2018-04-17 王艺璇 A kind of phosphatidase

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094840A1 (en) * 2007-03-30 2010-04-15 Stuart Donnelly Method of searching text to find relevant content and presenting advertisements to users
CN107916256A (en) * 2017-12-07 2018-04-17 王艺璇 A kind of phosphatidase

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RADA MIHALCEA 等: "TextRank: Bringing Order into Texts", 《UNT SCHOLARLY WORKS》 *
WEI WU等: "Automatic Generation of Personalized Annotation Tags for Twitter Users", 《HUMAN LANGUAGE TECHNOLOGIES: THE 2010 ANNUAL CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ACL》 *
王昕光: "基于关键词依赖的用户兴趣模型建模方法的研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
石季辉 等: "网络社区用户兴趣挖掘的数据预处理研究", 《情报理论与实践》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376041B (en) * 2014-10-11 2018-05-18 北京中搜网络技术股份有限公司 A kind of information extraction method based on microblogging classification
CN104376041A (en) * 2014-10-11 2015-02-25 北京中搜网络技术股份有限公司 Information extraction method based on classification of microblogs
CN105989077A (en) * 2015-02-09 2016-10-05 北京字节跳动科技有限公司 Recommendation-based interest community user guide method
CN105989077B (en) * 2015-02-09 2019-05-07 北京字节跳动科技有限公司 A kind of interest community user's bootstrap technique based on recommendation
CN105447196A (en) * 2015-12-31 2016-03-30 深圳中泓在线股份有限公司 Key blogger tracking confirmation method and device
CN105447196B (en) * 2015-12-31 2019-03-05 深圳中泓在线股份有限公司 A kind of emphasis bloger tracks confirmation method and device
CN105740366A (en) * 2016-01-26 2016-07-06 哈尔滨工业大学深圳研究生院 Inference method and device of MicroBlog user interests
CN105809559A (en) * 2016-03-15 2016-07-27 微梦创科网络科技(中国)有限公司 Method and device for mining ability user in social network
CN106294564A (en) * 2016-07-27 2017-01-04 乐视控股(北京)有限公司 A kind of video recommendation method and device
CN106453069A (en) * 2016-12-21 2017-02-22 北京奇虎科技有限公司 Instant communication message sending, viewing and viewing control method and device thereof
CN108305180A (en) * 2017-01-13 2018-07-20 中国移动通信有限公司研究院 A kind of friend recommendation method and device
CN108305180B (en) * 2017-01-13 2020-08-25 中国移动通信有限公司研究院 Friend recommendation method and device
CN107864192A (en) * 2017-10-25 2018-03-30 广东欧珀移动通信有限公司 Information-pushing method, device, server and readable storage medium storing program for executing
CN107864192B (en) * 2017-10-25 2019-12-31 Oppo广东移动通信有限公司 Information pushing method and device, server and readable storage medium
CN111538859A (en) * 2020-04-23 2020-08-14 北京达佳互联信息技术有限公司 Method and device for dynamically updating video label and electronic equipment
CN111538859B (en) * 2020-04-23 2023-10-10 北京达佳互联信息技术有限公司 Method and device for dynamically updating video tag and electronic equipment
CN111708952A (en) * 2020-06-18 2020-09-25 小红书科技有限公司 Label recommendation method and system
CN111708952B (en) * 2020-06-18 2023-10-20 小红书科技有限公司 Label recommending method and system
CN112800222A (en) * 2021-01-26 2021-05-14 天津科技大学 A multi-task-assisted extreme multi-label short text classification method using co-occurrence information
CN112800222B (en) * 2021-01-26 2022-07-19 天津科技大学 Multi-task auxiliary limit multi-label short text classification method using co-occurrence information

Similar Documents

Publication Publication Date Title
CN103577405A (en) Interest analysis based micro-blogger community classification method
CN106980692B (en) Influence calculation method based on microblog specific events
CN105512245B (en) A method of enterprise's portrait is established based on regression model
CN102831234B (en) Personalized news recommendation device and method based on news content and theme feature
CN104111941B (en) The method and apparatus that information is shown
CN102929873B (en) Method and device for extracting searching value terms based on context search
CN103324665B (en) Hot spot information extraction method and device based on micro-blog
CN105723402A (en) Systems and methods for determining influencers in a social data network
CN106886567B (en) Microblog emergency detection method and device based on semantic extension
CN104035972B (en) A kind of knowledge recommendation method and system based on microblogging
CN104866554B (en) A personalized search method and system based on social annotation
CN103226576A (en) Comment spam filtering method based on semantic similarity
CN109214454B (en) A Weibo-Oriented Emotional Community Classification Method
CN104915443B (en) A kind of abstracting method of Chinese microblogging evaluation object
CN106354818A (en) Dynamic user attribute extraction method based on social media
CN104572757B (en) Microblogging group processing method and processing device
US20150120708A1 (en) Information aggregation, classification and display method and system
CN106126605B (en) Short text classification method based on user portrait
CN106294358A (en) The search method of a kind of information and system
JP5952711B2 (en) Prediction server, program and method for predicting future number of comments in prediction target content
Narducci et al. Exploiting big data for enhanced representations in content-based recommender systems
CN112989208A (en) Information recommendation method and device, electronic equipment and storage medium
Al-Saqqa et al. Stemming effects on sentiment analysis using large arabic multi-domain resources
JP6130270B2 (en) Comment list public server, program and method for sorting and specifying comment sets corresponding to media contents
JP2014085694A (en) Search device, search program, and search method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140212