US20150019565A1 - Method And System For Scoring Credibility Of Information Sources - Google Patents
Method And System For Scoring Credibility Of Information Sources Download PDFInfo
- Publication number
- US20150019565A1 US20150019565A1 US13/939,755 US201313939755A US2015019565A1 US 20150019565 A1 US20150019565 A1 US 20150019565A1 US 201313939755 A US201313939755 A US 201313939755A US 2015019565 A1 US2015019565 A1 US 2015019565A1
- Authority
- US
- United States
- Prior art keywords
- score
- event
- source
- information
- sources
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 239000011159 matrix material Substances 0.000 claims description 23
- 230000007704 transition Effects 0.000 claims description 20
- 238000004458 analytical method Methods 0.000 claims description 13
- 230000001186 cumulative effect Effects 0.000 claims description 5
- 230000003068 static effect Effects 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 3
- 238000013459 approach Methods 0.000 description 22
- 238000004422 calculation algorithm Methods 0.000 description 15
- 238000000605 extraction Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008094 contradictory effect Effects 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004821 distillation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000011524 similarity measure Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 101000911753 Homo sapiens Protein FAM107B Proteins 0.000 description 1
- 102100026983 Protein FAM107B Human genes 0.000 description 1
- 241000510091 Quadrula quadrula Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004138 cluster model Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010899 nucleation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- G06F17/30861—
Definitions
- the present invention relates generally to information extraction. More particularly, the present invention relates to classifying or ranking information sources and events during extraction.
- the internet is one of the primary sources of information of modern life. However, on the web, there coexists a lot of valuable, useful and accurate information together with misleading or inaccurate information. There also exists sources of information that are more trusted and those that are less trusted, and other sources which cannot readily be identified as trusted or not trusted. General web-based searching can return information that is harmful or misleading.
- the use of non-credible sources of information as basis for decisions can have a severe impact in fields like politics, health, finance and many others. For instance, in the 2008 U.S. presidential campaign of Barack Obama, misleading information connecting the future president to a Muslim faith organization resulted in substantial confusion among voters.
- Various other instances of false or misleading reports emanating from the internet have been document, and have had consequences affecting lives and decisions.
- Prior art approaches to this problem have attempted to reduce web spam by developing credibility based link analysis algorithms like the ones used in common search engines.
- Common examples include the PageRank algorithm developed and used by GoogleTM, the TrustRank algorithm developed by Stanford University and Yahoo!TM, and the HITS algorithm which was a precursor to the PageRank algorithm.
- PageRank algorithm developed and used by GoogleTM
- TrustRank algorithm developed by Stanford University and Yahoo!TM
- HITS algorithm which was a precursor to the PageRank algorithm.
- Each of these prior art approaches rely on the assumption that the quality of a web page is correlated to the quality of its links, and return, in response to a search query, a ranked list of web pages as a result of a search.
- Spammers have created several ways to take advantage of how search engines operate like “hijacking” trusted web pages and building “honeypots” or groups of legitimate-appearing web pages to induce trusted pages to link them. Recent studies (such as (i) D.
- the present invention does not attempt to determine if a source is spam, but rather, attempts to assess the underlying credibility of sources and the probability that information from the underlying source, such as an event or a purported fact has occurred, is truthful or reliable.
- Events or facts may be derived from more than one source, and it is the events or facts themselves that are assessed for their reliability, rather than the web pages themselves.
- a computer-implemented method for ranking information stored on a computer readable medium includes extracting a content item describing an event from an information source; parsing by a parsing module the content item for a plurality of source links; attributing by a content analysis module a sentiment score to each source link; wherein the sentiment score is indicative of the relative credibility of each of the source links; scoring by a scoring module the information source based on the source links and on the sentiment score ranking the content item based on a score associated with the information source.
- the scoring comprises
- T is a transition matrix
- ⁇ is a predetermined constant
- the method further includes storing on a score database implemented on a computer readable medium the score for the information source.
- the method further includes prior to the parsing step determining whether the information source has an associated score in the score database, and upon determining that the information source has an associated score in the score database, retrieving the score and returning to the extracting step.
- the method further includes identifying an event from each source in a set of information sources; calculating an event score for the content item describing the event based on a credibility score for each of the known sources in the set of information sources; and combining the score for a respective information source with the credibility score to determine a cumulative event score.
- each of the sources in the set of information sources is classified as one of a known good source, a known bad source and an unknown reliability source, and wherein the calculating an event score is biased towards sources identified as known good sources.
- the event score is calculated as:
- EventScore ⁇ A ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ contains ⁇ ⁇ a ⁇ ⁇ KG , A m ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ contains ⁇ ⁇ b ⁇ ⁇ unknown ⁇ ⁇ and ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ nor ⁇ ⁇ KB A n ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ has ⁇ ⁇ less ⁇ ⁇ than ⁇ ⁇ b ⁇ ⁇ unknowns ⁇ ⁇ and ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ nor ⁇ KB A p ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ has ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ and ⁇ ⁇ includes ⁇ ⁇ a ⁇ ⁇ KB
- A, m, n and p are parameters selected such that A/p ⁇ A/n ⁇ A/m ⁇ A; KG is a known good source; KB is a known bad source.
- the identified events are compared to identify contradictions, and the calculating an event score includes biasing events from known good sources to resolve the contradictions.
- the combining comprises calculating an event ranking representative of the event being reliable.
- calculating the event ranking is calculated as:
- EventRank a *EventScore+ b * ⁇ LinkScore+ c *LinksToEvent
- LinksToEven is calculated as:
- n is the number of sources and LinkScore is the score of the information source, and Sent is the sentiment score.
- the method further includes calculating an accumulated event rank for the information provider from a plurality of event ranks by:
- EventRank(j) is a plurality of event scores for a plurality of content items and N i is a total number of content items of the information provider in the known source database.
- a computer-implemented method for ranking information stored on a computer readable medium including identifying an event from each source in a set of information sources; calculating an event score for a content item describing the event based on a credibility score for each of the known sources in the set of information sources; and combining a score for a respective information source with the credibility score to determine a cumulative event score.
- each of the sources in the set of information sources is classified as one of a known good source, a known bad source and an unknown reliability source, and wherein the calculating an event score is biased towards sources identified as known good sources.
- the event score is calculated as:
- EventScore ⁇ A ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ contains ⁇ ⁇ a ⁇ ⁇ KG , A m ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ contains ⁇ ⁇ b ⁇ ⁇ unknown ⁇ ⁇ and ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ nor ⁇ ⁇ KB A n ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ has ⁇ ⁇ less ⁇ ⁇ than ⁇ ⁇ b ⁇ ⁇ unknowns ⁇ ⁇ and ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ nor ⁇ KB A p ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ has ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ and ⁇ ⁇ includes ⁇ ⁇ a ⁇ ⁇ KB
- A, m, n and p are parameters selected such that A/p ⁇ A/n ⁇ A/m ⁇ A; KG is a known good source; KB is a known bad source.
- the identified events are compared to identify contradictions, and the calculating an event score includes biasing events from known good sources to resolve the contradictions.
- the combining comprises calculating an event ranking representative of the event being reliable.
- EventRank a *EventScore+ b * ⁇ LinkScore+ c *LinksToEvent
- LinksToEven is calculated as:
- n is the number of sources and LinkScore is the score of the information source, and Sent is the sentiment score.
- the method further includes comprising calculating an accumulated event rank for the information provider from a plurality of event ranks by:
- EventRank(j) is a plurality of event scores for a plurality of content items and N i is a total number of content items of the information provider in the known source database.
- the score for a respective information source is determined by: extracting a content item describing an event from an information source; parsing by a parsing module the content item for a plurality of source links; attributing by a content analysis module a sentiment score to each source link; wherein the sentiment score is indicative of the relative credibility of each of the source links; scoring by a scoring module the information source based on the source links and on the sentiment score; and ranking the content item based on a score associated with the information source.
- the scoring comprises
- T is a transition matrix
- ⁇ is a predetermined constant
- the method further includes storing on a score database implemented on a computer readable medium the score for the information source.
- the method further includes prior to the parsing step determining whether the information source has an associated score in the score database, and upon determining that the information source has an associated score in the score database, retrieving the score and returning to the extracting step.
- the sentiment score is derived using a sentiment scorer that was created at least in part using:
- the ranking score is used for any one or more of insurance underwriting, assessing suspected fraudulent activity, credit decisioning, securities trading, insurance underwriting.
- non-transitory computer readable media include computer executable instructions for carrying out the methods as herein described.
- computer systems for implementing the methods of the above-described embodiments are disclosed.
- FIG. 1 shows a high-level architecture of a system for acquiring content items and applying scoring and ranking to the content.
- FIG. 2 shows a schematic of a computer system that may be used to implement various parts of the invention.
- FIG. 3 shows a flow diagram of the method of scoring links associated with the content item.
- FIG. 4 shows a flow diagram of the method of the event analyzer where the event classification and scoring is performed.
- FIG. 5 shows an example of event classification and scoring.
- the present invention assesses the underlying credibility of sources and the probability that information from underlying sources, such as an event or a purported fact has occurred, is truthful or reliable.
- Events or facts may be derived from more than one source, and it is the events or facts themselves that are assessed for their reliability, rather than the web pages themselves.
- the term “event” is used to describe a piece of information that is being subjected to the credibility assessment.
- An event as used herein may be any piece of information or purported fact, generally determined to be of significance to a request for information, such as a web-search.
- the term event is being used, in part because by definition the invention in its preferred embodiment is used to assess the reliability of a reported event having occurred, or facts identified as being relevant to a reported event.
- the invention does not, per se, relate to determining whether known facts are applicable to a user's query, for example, whether a particular scientific formula is relevant to solving a problem posed by a user's query.
- the preferred embodiments are described with respect to online news source, but the sources of information for assessing the credibility of a reported event are not limited to these.
- Other sources may equally be used as inputs to the invention for the credibility analysis, including but not limited to RSS feeds, discussion forums, social media, such as FacebookTM or TwitterTM, posts, emails, electronic journals, databases and/or web pages from a multitude of other sources. It may also be applied to information available on local networks that are not generally available to the public. In this manner, where the invention is being used by an institution for diligence purposes, such as fraud, insurance or personnel research, source of information belonging to or accessible only by the institution can also be included in the search universe to generate a higher degree of confidence in the results.
- FIG. 1 shows a network of computer systems 2 having an information provider 4 that provides information content via the Internet 6 .
- Client devices such as a desktop computer 8 , a tablet computer 10 , or a mobile smartphone 12 request the information content using a hypertext transfer protocol (HTTP) that are transmitted over a wired or wireless link to the Internet 6 to the server systems of the information provider 4 .
- HTTP hypertext transfer protocol
- the information provider in turn supplies the requested article to the client device.
- the computing structure 14 can reside on the client device, a proxy server(s), or other trusted computer system(s) on the Internet 6 or a combination thereof.
- FIG. 2 shows a computer system 2 , and includes a number of physical and logical components, including a central processing unit (“CPU”) 24 , random access memory (“RAM”) 28 , an input/output (“I/O”) interface 32 , a network interface 36 , non-volatile storage 4 , a display 40 and a local bus 44 enabling the CPU 24 to communicate with the other components.
- the CPU 24 executes an operating system, and a number of software systems and/or software modules.
- RAM 28 provides relatively-responsive volatile storage to the CPU 24 .
- the I/O interface 32 allows for human-computer input to be received from one or more devices, such as a keyboard, a mouse, touch screen etc., and outputs information to output devices, such as a display and/or speakers.
- the network interface 36 (e.g. Ethernet, WiFi, Bluetooth, etc) permits communication with elements in network communication, and provides access to the internet.
- a number of these computer systems may be networked together, host information from other sources, etc.
- Non-volatile storage 4 stores the operating system and programs. During operation of the computer system, the operating system, the programs and the data may be retrieved from the non-volatile storage 4 and placed in RAM 28 to facilitate execution.
- These computer systems are known in the art, and their communications with the internet and other networks are also known. It is within this infrastructure that the preferred embodiments of the invention operate.
- the invention provides for two complementary approaches for generating a rank or score, although it is worth noting that each of the approaches could also be used independently to arrive at partial or intermediate results that are also useful.
- a method is described that assesses the reliability of the source of information, particularly the reliability of links.
- a method is described that assesses the reliability of the content of the information regarding the event itself.
- a method of combining these approaches completes the preferred embodiment provides for a two-pronged approach to assessing the reliability of derived information.
- FIG. 3 there is illustrated a method for ranking or scoring links obtained from a source of information, such as a webpage. While prior art methods for assessing the quality of web pages by virtue of the links on those web pages do exist, the present invention provides this functionality in a more robust manner as will shortly be described.
- Prior art processes and algorithms use a random or biased web crawler to evaluate the rank of a page. After a certain number of iterations, the random crawler will locate the pages with a higher rank with a higher probability of being relevant. This approach includes the assumption that at a given web page the crawler randomly selects the links located at the web page, or select pages related to given subjects in a biased manner.
- This assumption is to some extent contrived since a real crawler will not act randomly when selecting links to follow but will select links based mainly on information accompanying the link.
- Some prior art web crawlers will mainly follow positive links and discard the negative ones. For example, if a link to a page said “this content is wrong” and another link said “here is the right answer”, the positive link would receive a positive bias.
- One implementation of this includes assigning a probability factor (score) associated with the sentiment related to a link, a (0-1) value which would map from (negative sentiment—positive sentiment).
- the map could be done from a discrete sentiment score base for example with 3 levels, (positive, negative, no sentiment) or it can be done from a continue sentiment score base assigning sentiment scores to terms.
- the preferred embodiment of the present invention includes a crawler that considers the sentiment (more specifically, the trust or credibility) relating to the link source to influence the crawling decisions.
- a probability function is determined that assigns different probability values to the links in the content item (typically a web page) according to the sentiments attached to the link sources.
- a sentiment or sentiment score or sentiment ranking refers to the relative trust or credibility of links or references found at an information source in respect of an event.
- the links on a particular source of information can be manually reviewed and identified as having a net “positive” or a net “negative” sentiment.
- the sentiment does not have to be a binary indicator, and it is preferable to have a plurality of degrees of sentiment.
- One method of establishing sentiment is described below. Alternate methods of determining sentiment are also contemplated, including algorithms, references to databases of known sentiment levels, etc.
- a content item in respect of an event or other piece of information is retrieved from the information provider via a source on the internet 6 .
- sources e.g. source links
- link sentiment can be composed as both a component of the source and the fundamental text around and included in the link to the content being analyzed.
- the sentiment can be learned in an iterative semi-supervised or unsupervised approach post-seeding.
- a “dictionary” of known events or facts can be used to train the sentiment analysis scorer. As an example, consider the event “the Toronto Maple Leafs won the Stanley Cup in 1967” as a true event.
- the training process may include:
- the link sentiment information is then used to build a sentiment-adjusted matrix via module 312 which includes the sentiments related to the link connections.
- the scoring calculation engine 314 uses the adjacent matrix to iteratively compute the sentiment scores of each of the source links in the content item.
- the sentiment scores are then saved in the score database 304 which can be further accessed for fast score returns during future content item evaluations as described above.
- an algorithm is provided to score or rank (the sources based on their link connections using equation (1):
- ⁇ represents a decay factor, which is a constant that adjusts for the reliability of information, as represented by the probability that the crawler will follow an outlink from a given page.
- the decay constant is usually in the range 0.8-0.9 and it represents the probability that the crawler will follow an outlink from a given page.
- This rank estimation is similar to the TrustRank algorithm where linear dependencies on the number of in-links and out-links are considered. The rank is evaluated iteratively assuring convergence conditions are fulfilled. Generation the transition matrix is generally known from the TrustRank approach, and from other sources and is not described in further detail herein.
- the sentiment score of each of the source links is used to adjust the values of the links in the transition matrix, giving different probabilities depending on the sentiment score for different links.
- the probabilities for the transitions e.g. link clicks
- the node has A positive links, B negative links and C sentiment-unknown links, then it is possible to calculate this probability as: n
- x + , x ⁇ , x nons are the probabilities for a positive, negative and non-sentiment links.
- the positive link will be clicked with higher probability than the non-sentiment and that the negative link will have the smallest probability, for example:
- a positive sentiment transition matrix can be constructed by removing all source links with negative sentiment.
- a negative sentiment transition matrix can be constructed by removing all links with positive sentiment. From the positive transition matrix one obtains the higher rank for the highest trusted sources. In the case of the negative matrix the highest ranks will be obtained for the least trusted sources. These two ranks or scores can then be merged in a normalized way to obtain a final score.
- One simple approach is to divide every value by the maximum score in each case (positive and negative). In the case of the non-sentiment links, they can be treated together with the positive links giving smaller probabilities for these events in the transition matrix.
- the event-based engine generally assumes that a source of a content item describing an event has a credibility score based on other known sources describing the same or similar event. For example, a trusted source will share more information on an event with a known good (KG) source and less information with a known bad (KB) source.
- a content item is retrieved from the Internet 6 .
- the request for the content item can be done manually by direct user request 404 or by an automated crawler 406 .
- Event analyzer 410 is a software implemented module that carries out a classification based on the similarity to other events stored within a defined time window.
- the event analyzer 410 retrieves the translated content item from the translation engine and passes it through a natural language processing algorithm (NLP) 412 to identify the events present within the content item to create a set of identified events.
- NLP 412 algorithms are based on similarity measures plus keyword search and can have machine learning components known to those of skill in the art.
- regular expressions (regex) or other pattern based approaches are used to identify set events as represented by set patterns of text. For example, “Acme Co.
- the identified events and associated information provider may be stored in an events 414 database for further use.
- a calendar may provide temporal information such as time and date to the event analyzer 410 in order to reduce noise levels when comparing events.
- the event analyzer 410 then performs a time-limited query of the events database 414 to identify candidate events for comparison to the identified events from the content item.
- An analyses of the candidate events with respect to the identified events from the content item is carried out to evaluate the candidate events with respect to the identified events in the content item and determines if any of the candidate events are contradictory to each other
- the candidate events and associated sources as well as the identified events from the content item are then evaluated by an event scoring module 420 .
- the event scoring module 420 optionally first performs a query of a scores database 422 for each of the candidate events in order to determine if the source of information is a Known Good (KG), a Known Bad (KB), or an unknown source in terms of credibility and trust.
- the event score (or rank) for the content item is then determined according to the distribution between KG and KB sources by the following formula:
- EventScore ⁇ A ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ contains ⁇ ⁇ a ⁇ ⁇ KG , A m ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ contains ⁇ ⁇ b ⁇ ⁇ unknown ⁇ ⁇ and ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ nor ⁇ ⁇ KB A n ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ has ⁇ ⁇ less ⁇ ⁇ than ⁇ ⁇ b ⁇ ⁇ unknowns ⁇ ⁇ and ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ nor ⁇ KB A p ⁇ ⁇ if ⁇ ⁇ event ⁇ ⁇ has ⁇ ⁇ no ⁇ ⁇ KG ⁇ ⁇ and ⁇ ⁇ includes ⁇ ⁇ a ⁇ ⁇ KB
- an event contained in at least one KG source will have an event score A for the corresponding news source.
- An event with no KG or KB included but with b unknown included is scored as A/m.
- the score will be A/n.
- A/A1 if the event contradicts a KB and A/p1 when the event contradicts a KG where A1 and p1 are parameters as well which need are predetermined, and selected depending on the context in which the invention is used. Generally, A1 ⁇ 1 and p1 ⁇ p.
- a suitable corpus from a known set of seed sources must be present in the events database 410 .
- This seed set of sources comprises KG and KB sources that can be progressively adjusted and enriched when more content items (and their sources) are analyzed.
- the seed set can be manually generated or could rely on other approaches based on spam or trust detection. For example, the New York TimesTM may receive a high trust score from an anti-spam algorithm and thus would fall into the KG classification in the database.
- an input seed source can initially include a list of relevant sources and a result of the highest ranked sources of the link-based algorithm of the present embodiment.
- An information provider or source can gain a KG status if a sufficient number of content items and events is consistently shared with other previously KG sources within the database. Similarly, an information provider can gain a KB status if a sufficient number of content items and events is consistently shared with other previously KB sources. Sources with content items and events sharing both KG and KB sources or only untrusted sources will continue to be labeled as an untrusted source.
- content items and events with no corresponding KG or KB sources can be scored according to the number of sources agreeing or disagreeing with the risk of adding additional noise to the scoring process. If the noise is too great, these content items can be omitted when ranking.
- Another approach could be to use the previous link-based score, which gives already an independent score base characterizing the sources and have some insights on the distribution and apply a corresponding score. For example, imagine we have the case of 5 news sources we want to characterize and 4 of them are un-trusted sources (without KG or KB included) there will be one source that will contradict the other 4. If we know that the sources' distribution is dominated or most likely be dominated by un-trusted sources we can score the source that contradicts more times like a signature of trust and opposite in the case of a distribution dominated by trusted-sources.
- the event ranking module contains the EventScore, the LinkScore associated with the sources within the content item and a 3rd term related to links and sentiment attached to the event itself as it will be further described.
- the second term takes into account the LinkScores of the sources which reproduced the event (t is the number of sources), and LinksToEvent will map all links.
- t is the number of sources
- LinksToEvent will map all links.
- n is the number of links directing to the content that contains the given event.
- LinkScore is the score of the source of the link directing to the event and Sent is a factor which considers the sentiment attached to the link.
- the content of a source could be rather unique. For example, if one looks at a local newspaper and compares with what is being published in a national or international newspaper, just because they cover different objects, geographies or they have different interests, there will likely be very little overlap in events. This does not mean a source is un-trusted, and these cases will be complemented by the LinkScore algorithm.
- a final source score will be obtained after normalization of both LinkScore and AccumulatedEventRank.
- the final source score can be calculated in first approach as the weighted average value of both scores which we call SourceRank as:
- weightLink and weightEvent are weighting factors which are estimated using a test database with result cases, and can be optimized by one skilled in the art based on the data set being used.
- event scoring and source scoring approaches allow for detection and handling of badly extracted data.
- An important consideration is that event detection, even, with state-of-the-art systems, can be error-prone. In such cases, the extracted information may not be accurate compared to the source.
- One advantage to the invention, as described, is that an incorrectly extracted event can be discriminated against via a low event trust score, removing noise from the event extraction process. If a particular source is extracted in a particularly noisy fashion, perhaps because of the way the source is structured, then this is reflected in the SourceRank.
- FIG. shows 8 sources with hypothetical link connections and a discrete 3 level sentiment score associated with it.
- the original transition matrix would look like:
- Neural networks, cluster models, hidden Markov models, Bayesian networks, or other machine learning methods can also be used to classify or create clusters for further analysis, potentially optimizing the best-fitting algorithms, performing the calculations on a subset of documents or acting as a replacement or first-pass against large sets of documents. Alternatively creating decision tress or other path optimization approaches can be used.
- the methods described may be useful in assessing possible fraudulent activity.
- Automated monitoring systems may generate many alerts based on detected “events” that may not be verified.
- the methods, as described, can be used to score events to determine their validity. Alternatively, abnormal or unexpected events or facts could be flagged for further scrutiny.
- the methods described may be employed to help with credit decisioning, either by an automated system or to support the decision of a loan officer.
- assessing the truthfulness or validity of detected information can be an important part of determining what information impacts the credit decision.
- the methods described may be used for securities trading, either as support for a human trader or as part of an automated system.
- Automated systems that trade on news or events detected are already used by traders. Adding the ability to measure the trustworthiness of detected events could be an important advantage for these systems, for example, by preventing trading decisions based on false or poor information.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method for classifying information sources and content based on credibility, reliability, or trust. A content item describing an event is retrieved from an information provider and parsed for links. Each link is evaluated and attributed a sentiment score. The same event is identified in a set of know sources and an event score is calculated based on the credibility of each of the known sources. Finally, the content item is ranked based on the event and sentiment scores.
Description
- The present invention relates generally to information extraction. More particularly, the present invention relates to classifying or ranking information sources and events during extraction.
- The internet is one of the primary sources of information of modern life. However, on the web, there coexists a lot of valuable, useful and accurate information together with misleading or inaccurate information. There also exists sources of information that are more trusted and those that are less trusted, and other sources which cannot readily be identified as trusted or not trusted. General web-based searching can return information that is harmful or misleading. The use of non-credible sources of information as basis for decisions can have a severe impact in fields like politics, health, finance and many others. For instance, in the 2008 U.S. presidential campaign of Barack Obama, misleading information connecting the future president to a Muslim faith organization resulted in substantial confusion among voters. Various other instances of false or misleading reports emanating from the internet have been document, and have had consequences affecting lives and decisions. In more daily and personal applications, information obtained from the internet serves as a basis for decision making in insurance underwriting processes, credit and lending decisions, merger and acquisitions, fraud detection, hiring decisions and many others. In this sense, credibility assessments are becoming of increasing importance in order to build judgment skills to properly discern between different sources of information, and to address contradictions in information from various sources.
- Prior art approaches to this problem have attempted to reduce web spam by developing credibility based link analysis algorithms like the ones used in common search engines. Common examples include the PageRank algorithm developed and used by Google™, the TrustRank algorithm developed by Stanford University and Yahoo!™, and the HITS algorithm which was a precursor to the PageRank algorithm. Each of these prior art approaches rely on the assumption that the quality of a web page is correlated to the quality of its links, and return, in response to a search query, a ranked list of web pages as a result of a search. Spammers have created several ways to take advantage of how search engines operate like “hijacking” trusted web pages and building “honeypots” or groups of legitimate-appearing web pages to induce trusted pages to link them. Recent studies (such as (i) D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics. WebDB, 2004 and (ii) Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. VLDB, 2004.) suggest 26% of web content is spam. On top of this, there is some amount of inaccurate or mistrusted information that cannot be properly described as spam.
- As is evident, prior art approaches have been suitable for ranking web pages and providing a list of hits in response to a search request, but are inadequate for assessing the reliability of the information, the reliability of the links to other sources on web pages, or the reliability of events being described with sufficient confidence to permit decision-makers to rely on this information without a significant due diligence burden.
- In contrast to prior art approaches, the present invention does not attempt to determine if a source is spam, but rather, attempts to assess the underlying credibility of sources and the probability that information from the underlying source, such as an event or a purported fact has occurred, is truthful or reliable. Events or facts may be derived from more than one source, and it is the events or facts themselves that are assessed for their reliability, rather than the web pages themselves.
- According to one embodiment of the invention, there is provided a computer-implemented method for ranking information stored on a computer readable medium; the method includes extracting a content item describing an event from an information source; parsing by a parsing module the content item for a plurality of source links; attributing by a content analysis module a sentiment score to each source link; wherein the sentiment score is indicative of the relative credibility of each of the source links; scoring by a scoring module the information source based on the source links and on the sentiment score ranking the content item based on a score associated with the information source.
- According to one aspect of this embodiment, the scoring comprises
- calculating r from equation (1):
-
r=α*T+(1−α)*d (1) - where d is a non-zero static score distribution vector, T is a transition matrix, and α is a predetermined constant; and, wherein each term in the transition matrix is modified by a non-zero sentiment score.
- According to another aspect of this embodiment, the method further includes storing on a score database implemented on a computer readable medium the score for the information source.
- According to another aspect of this embodiment, the method further includes prior to the parsing step determining whether the information source has an associated score in the score database, and upon determining that the information source has an associated score in the score database, retrieving the score and returning to the extracting step.
- According to another aspect of this embodiment, the method further includes identifying an event from each source in a set of information sources; calculating an event score for the content item describing the event based on a credibility score for each of the known sources in the set of information sources; and combining the score for a respective information source with the credibility score to determine a cumulative event score.
- According to another aspect of this embodiment, each of the sources in the set of information sources is classified as one of a known good source, a known bad source and an unknown reliability source, and wherein the calculating an event score is biased towards sources identified as known good sources.
- According to another aspect of this embodiment, the event score is calculated as:
-
- where A, m, n and p are parameters selected such that A/p<A/n<A/m<A; KG is a known good source; KB is a known bad source.
- According to another aspect of this embodiment, the identified events are compared to identify contradictions, and the calculating an event score includes biasing events from known good sources to resolve the contradictions.
- According to another aspect of this embodiment, the combining comprises calculating an event ranking representative of the event being reliable.
- According to another aspect of this embodiment, calculating the event ranking is calculated as:
-
EventRank=a*EventScore+b*ΣLinkScore+c*LinksToEvent - where a, b and c are weighted coefficients and LinksToEven is calculated as:
-
LinksToEvent=Σi=1 nLinkScore(i)×Sent - where n is the number of sources and LinkScore is the score of the information source, and Sent is the sentiment score.
- According to another aspect of this embodiment, the method further includes calculating an accumulated event rank for the information provider from a plurality of event ranks by:
-
- where EventRank(j) is a plurality of event scores for a plurality of content items and Ni is a total number of content items of the information provider in the known source database.
- According to a second embodiment of the invention, there is provided a computer-implemented method for ranking information stored on a computer readable medium; the method including identifying an event from each source in a set of information sources; calculating an event score for a content item describing the event based on a credibility score for each of the known sources in the set of information sources; and combining a score for a respective information source with the credibility score to determine a cumulative event score.
- According to one aspect of this second embodiment, each of the sources in the set of information sources is classified as one of a known good source, a known bad source and an unknown reliability source, and wherein the calculating an event score is biased towards sources identified as known good sources.
- According to another aspect of this second embodiment, the event score is calculated as:
-
- where A, m, n and p are parameters selected such that A/p<A/n<A/m<A; KG is a known good source; KB is a known bad source.
- According to another aspect of this second embodiment, the identified events are compared to identify contradictions, and the calculating an event score includes biasing events from known good sources to resolve the contradictions.
- According to another aspect of this second embodiment, the combining comprises calculating an event ranking representative of the event being reliable.
- According to another aspect of this second embodiment, wherein calculating the event ranking is calculated as:
-
EventRank=a*EventScore+b*ΣLinkScore+c*LinksToEvent - where a, b and c are weighted coefficients and LinksToEven is calculated as:
-
LinksToEvent=Σi=1 nLinkScore(i)×Sent - where n is the number of sources and LinkScore is the score of the information source, and Sent is the sentiment score.
- According to another aspect of this second embodiment, the method further includes comprising calculating an accumulated event rank for the information provider from a plurality of event ranks by:
-
- where EventRank(j) is a plurality of event scores for a plurality of content items and Ni is a total number of content items of the information provider in the known source database.
- According to another aspect of this second embodiment, the score for a respective information source is determined by: extracting a content item describing an event from an information source; parsing by a parsing module the content item for a plurality of source links; attributing by a content analysis module a sentiment score to each source link; wherein the sentiment score is indicative of the relative credibility of each of the source links; scoring by a scoring module the information source based on the source links and on the sentiment score; and ranking the content item based on a score associated with the information source.
- According to another aspect of this second embodiment, the scoring comprises
- calculating r from:
-
r=α*T+(1−α)*d - where d is a non-zero static score distribution vector, T is a transition matrix, and α is a predetermined constant; and, wherein each term in the transition matrix is modified by a non-zero sentiment score.
- According to another aspect of this second embodiment, the method further includes storing on a score database implemented on a computer readable medium the score for the information source.
- According to another aspect of this second embodiment, the method further includes prior to the parsing step determining whether the information source has an associated score in the score database, and upon determining that the information source has an associated score in the score database, retrieving the score and returning to the extracting step.
- According to another aspect of this second embodiment, the sentiment score is derived using a sentiment scorer that was created at least in part using:
- a training set of known true and known false events; and
- known links and the associated text of the known links to content that specifies the known true and known false events.
- According to another aspect of this second embodiment, the ranking score is used for any one or more of insurance underwriting, assessing suspected fraudulent activity, credit decisioning, securities trading, insurance underwriting.
- According to other aspects of the invention, non-transitory computer readable media include computer executable instructions for carrying out the methods as herein described. In still other embodiments, computer systems for implementing the methods of the above-described embodiments are disclosed.
- An embodiment will now be described, by way of example only, with reference to the attached Figures, wherein:
-
FIG. 1 shows a high-level architecture of a system for acquiring content items and applying scoring and ranking to the content. -
FIG. 2 shows a schematic of a computer system that may be used to implement various parts of the invention. -
FIG. 3 shows a flow diagram of the method of scoring links associated with the content item. -
FIG. 4 shows a flow diagram of the method of the event analyzer where the event classification and scoring is performed. -
FIG. 5 shows an example of event classification and scoring. - As mentioned earlier, the present invention assesses the underlying credibility of sources and the probability that information from underlying sources, such as an event or a purported fact has occurred, is truthful or reliable. Events or facts may be derived from more than one source, and it is the events or facts themselves that are assessed for their reliability, rather than the web pages themselves. For the purposes of this description, the term “event” is used to describe a piece of information that is being subjected to the credibility assessment. An event as used herein may be any piece of information or purported fact, generally determined to be of significance to a request for information, such as a web-search. The term event is being used, in part because by definition the invention in its preferred embodiment is used to assess the reliability of a reported event having occurred, or facts identified as being relevant to a reported event. The invention does not, per se, relate to determining whether known facts are applicable to a user's query, for example, whether a particular scientific formula is relevant to solving a problem posed by a user's query.
- Furthermore, the preferred embodiments are described with respect to online news source, but the sources of information for assessing the credibility of a reported event are not limited to these. Other sources may equally be used as inputs to the invention for the credibility analysis, including but not limited to RSS feeds, discussion forums, social media, such as Facebook™ or Twitter™, posts, emails, electronic journals, databases and/or web pages from a multitude of other sources. It may also be applied to information available on local networks that are not generally available to the public. In this manner, where the invention is being used by an institution for diligence purposes, such as fraud, insurance or personnel research, source of information belonging to or accessible only by the institution can also be included in the search universe to generate a higher degree of confidence in the results.
-
FIG. 1 shows a network ofcomputer systems 2 having aninformation provider 4 that provides information content via theInternet 6. Client devices such as adesktop computer 8, atablet computer 10, or amobile smartphone 12 request the information content using a hypertext transfer protocol (HTTP) that are transmitted over a wired or wireless link to theInternet 6 to the server systems of theinformation provider 4. The information provider in turn supplies the requested article to the client device. Thecomputing structure 14 can reside on the client device, a proxy server(s), or other trusted computer system(s) on theInternet 6 or a combination thereof. -
FIG. 2 shows acomputer system 2, and includes a number of physical and logical components, including a central processing unit (“CPU”) 24, random access memory (“RAM”) 28, an input/output (“I/O”)interface 32, anetwork interface 36,non-volatile storage 4, adisplay 40 and alocal bus 44 enabling theCPU 24 to communicate with the other components. TheCPU 24 executes an operating system, and a number of software systems and/or software modules.RAM 28 provides relatively-responsive volatile storage to theCPU 24. The I/O interface 32 allows for human-computer input to be received from one or more devices, such as a keyboard, a mouse, touch screen etc., and outputs information to output devices, such as a display and/or speakers. The network interface 36 (e.g. Ethernet, WiFi, Bluetooth, etc) permits communication with elements in network communication, and provides access to the internet. A number of these computer systems may be networked together, host information from other sources, etc.Non-volatile storage 4 stores the operating system and programs. During operation of the computer system, the operating system, the programs and the data may be retrieved from thenon-volatile storage 4 and placed inRAM 28 to facilitate execution. These computer systems are known in the art, and their communications with the internet and other networks are also known. It is within this infrastructure that the preferred embodiments of the invention operate. - Broadly, the invention provides for two complementary approaches for generating a rank or score, although it is worth noting that each of the approaches could also be used independently to arrive at partial or intermediate results that are also useful. First, a method is described that assesses the reliability of the source of information, particularly the reliability of links. Next, a method is described that assesses the reliability of the content of the information regarding the event itself. A method of combining these approaches completes the preferred embodiment provides for a two-pronged approach to assessing the reliability of derived information.
- Assessing the Reliability of Sources
- Turning now to
FIG. 3 , there is illustrated a method for ranking or scoring links obtained from a source of information, such as a webpage. While prior art methods for assessing the quality of web pages by virtue of the links on those web pages do exist, the present invention provides this functionality in a more robust manner as will shortly be described. Prior art processes and algorithms use a random or biased web crawler to evaluate the rank of a page. After a certain number of iterations, the random crawler will locate the pages with a higher rank with a higher probability of being relevant. This approach includes the assumption that at a given web page the crawler randomly selects the links located at the web page, or select pages related to given subjects in a biased manner. This assumption is to some extent contrived since a real crawler will not act randomly when selecting links to follow but will select links based mainly on information accompanying the link. Some prior art web crawlers will mainly follow positive links and discard the negative ones. For example, if a link to a page said “this content is wrong” and another link said “here is the right answer”, the positive link would receive a positive bias. One implementation of this includes assigning a probability factor (score) associated with the sentiment related to a link, a (0-1) value which would map from (negative sentiment—positive sentiment). The map could be done from a discrete sentiment score base for example with 3 levels, (positive, negative, no sentiment) or it can be done from a continue sentiment score base assigning sentiment scores to terms. A similar method was applied for Blog Distillation, the reference is “Blog Distillation via Sentiment-Sensitive Link Analysis, Giacomo Berardi et al. Natural Language Processing and Information Systems, Lecture Notes in Computer Science Volume 7337, 2012, pp 228-233 - The preferred embodiment of the present invention includes a crawler that considers the sentiment (more specifically, the trust or credibility) relating to the link source to influence the crawling decisions. A probability function is determined that assigns different probability values to the links in the content item (typically a web page) according to the sentiments attached to the link sources.
- For the purposes of this disclosure, a sentiment or sentiment score or sentiment ranking refers to the relative trust or credibility of links or references found at an information source in respect of an event. To illustrate this in simplest terms, the links on a particular source of information can be manually reviewed and identified as having a net “positive” or a net “negative” sentiment. Of course, the sentiment does not have to be a binary indicator, and it is preferable to have a plurality of degrees of sentiment. One method of establishing sentiment is described below. Alternate methods of determining sentiment are also contemplated, including algorithms, references to databases of known sentiment levels, etc.
- As shown in
FIG. 3 , a content item in respect of an event or other piece of information is retrieved from the information provider via a source on theinternet 6. Optionally, a determination is made at 302 whether the source of the retrieved content item has been previously scored. If the source has been previously scored, the sentiment score is retrieved from the savedscore database 304. If the source of the content item has not been previously scored, aparsing engine 306 parses within the content item for links to other sources (e.g. source links). Each of the source links found are extracted by anextraction module 308, optionally along with the associated information of the source link such as the sentence in which the source link is found, the link descriptor, etc. The associated source itself or the information contained therein may then be analyzed by acontent analysis module 310 to determine the sentiment or credibility associated with the source link. - In one embodiment, link sentiment can be composed as both a component of the source and the fundamental text around and included in the link to the content being analyzed. In one embodiment, the sentiment can be learned in an iterative semi-supervised or unsupervised approach post-seeding. In such an approach, a “dictionary” of known events or facts can be used to train the sentiment analysis scorer. As an example, consider the event “the Toronto Maple Leafs won the Stanley Cup in 1967” as a true event. The training process may include:
-
- A. Create a dictionary of “trusted events” known to have occurred.
- B. For each trusted event:
- B1. Determine known content that specify this event.
- B2. Find sources that link to that content.
- B3. Extract the text from the source that corresponds to the link.
- C. Build a corpus of “trusted texts” representing the links to trusted events
- D. Repeat B-C against controversial or untrue events to build a corpus of “untrusted texts” representing the links to untrusted events.
- E. Build a text classifier or scorer based on similarity measures or other approaches to determine the link sentiment of unknown texts.
- For a given event, the link sentiment information is then used to build a sentiment-adjusted matrix via
module 312 which includes the sentiments related to the link connections. The scoringcalculation engine 314 uses the adjacent matrix to iteratively compute the sentiment scores of each of the source links in the content item. The sentiment scores are then saved in thescore database 304 which can be further accessed for fast score returns during future content item evaluations as described above. - While the general method described above, is thought to be novel, additional details of implementation will now be described, which enable certain method steps in a manner that would not be apparent to one skilled in the art. These details of implementation described in detail are considered non-obvious contributions to the art.
- In particular embodiments, an algorithm is provided to score or rank (the sources based on their link connections using equation (1):
-
r=α*T+(1−α)*d (1) - where r is the score, d is a static score distribution vector with a given non zero entry and T is the transition matrix, α represents a decay factor, which is a constant that adjusts for the reliability of information, as represented by the probability that the crawler will follow an outlink from a given page. The decay constant is usually in the range 0.8-0.9 and it represents the probability that the crawler will follow an outlink from a given page. This rank estimation is similar to the TrustRank algorithm where linear dependencies on the number of in-links and out-links are considered. The rank is evaluated iteratively assuring convergence conditions are fulfilled. Generation the transition matrix is generally known from the TrustRank approach, and from other sources and is not described in further detail herein.
- Of particular pertinence to this preferred embodiment is that the sentiment score of each of the source links is used to adjust the values of the links in the transition matrix, giving different probabilities depending on the sentiment score for different links. At a given node in the transition matrix, the probabilities for the transitions (e.g. link clicks) must be evaluated depending on the number of nodes and sentiments. If the node has A positive links, B negative links and C sentiment-unknown links, then it is possible to calculate this probability as: n
-
Ax + +Bx − +Cx nons=1 (2) - where x+, x−, xnons are the probabilities for a positive, negative and non-sentiment links. In general, it is assumed that the positive link will be clicked with higher probability than the non-sentiment and that the negative link will have the smallest probability, for example:
-
- where m and n are parameters that can be varied with the only condition that n>m. The sum of the probabilities must be normalized to 1.
- For example, in an extreme case a positive sentiment transition matrix can be constructed by removing all source links with negative sentiment. Similarly, a negative sentiment transition matrix can be constructed by removing all links with positive sentiment. From the positive transition matrix one obtains the higher rank for the highest trusted sources. In the case of the negative matrix the highest ranks will be obtained for the least trusted sources. These two ranks or scores can then be merged in a normalized way to obtain a final score. One simple approach is to divide every value by the maximum score in each case (positive and negative). In the case of the non-sentiment links, they can be treated together with the positive links giving smaller probabilities for these events in the transition matrix.
- It will be understood that all approaches described herein are implemented on computer readable media and executed by a computer system as described earlier.
- Assessing the Trustworthiness of Events
- Turning now to
FIG. 4 , the event-based engine generally assumes that a source of a content item describing an event has a credibility score based on other known sources describing the same or similar event. For example, a trusted source will share more information on an event with a known good (KG) source and less information with a known bad (KB) source. A content item is retrieved from theInternet 6. The request for the content item can be done manually bydirect user request 404 or by anautomated crawler 406. - The information is translated via a
translation engine 408 into a form readable by event analyzer 410. Event analyzer 410 is a software implemented module that carries out a classification based on the similarity to other events stored within a defined time window. The event analyzer 410 retrieves the translated content item from the translation engine and passes it through a natural language processing algorithm (NLP) 412 to identify the events present within the content item to create a set of identified events. TheNLP 412 algorithms are based on similarity measures plus keyword search and can have machine learning components known to those of skill in the art. In one approach, regular expressions (regex) or other pattern based approaches are used to identify set events as represented by set patterns of text. For example, “Acme Co. was acquired by Bob's company”, could be represented by a rule that extracts based on the pattern that looks for text containing “was acquired by”. More sophisticated examples, such as parser-based extraction, knowledge-based extraction, etc are described by Hogenboom et al (http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/DeRiVE/derive2011_submission—1.pdf). The identified events and associated information provider may be stored in anevents 414 database for further use. - A calendar may provide temporal information such as time and date to the event analyzer 410 in order to reduce noise levels when comparing events. The event analyzer 410 then performs a time-limited query of the
events database 414 to identify candidate events for comparison to the identified events from the content item. An analyses of the candidate events with respect to the identified events from the content item is carried out to evaluate the candidate events with respect to the identified events in the content item and determines if any of the candidate events are contradictory to each other - The candidate events and associated sources as well as the identified events from the content item are then evaluated by an
event scoring module 420. Theevent scoring module 420 optionally first performs a query of ascores database 422 for each of the candidate events in order to determine if the source of information is a Known Good (KG), a Known Bad (KB), or an unknown source in terms of credibility and trust. The event score (or rank) for the content item is then determined according to the distribution between KG and KB sources by the following formula: -
- where an event contained in at least one KG source will have an event score A for the corresponding news source. An event with no KG or KB included but with b unknown included is scored as A/m. In the case of an event with no KG or KB involved and a number of unknowns less than a set value b then the score will be A/n. An event with no KG and with a KB involved KG will result in the score A/p, where A could be any natural number for example A=10 and m, n, p are also parameters which can be adjusted for better score discrimination provided that m<n<p and A/p<A/n<A/m<A. Particular values for each of these parameters can be optimized depending on the context in which the invention is used. Once a score has been generated for the content item, it can be stored so that future content requests drawing from the same source can make reference to a stored score rather than being processed again.
- As noted above, identifying contradictory information to an event or other data that throws into dispute the trustworthiness of an event. This contradictory information is referred to herein alternatively as an anti-event, when the content item contradicts another known source. These anti-events can be scored in as: A/A1 if the event contradicts a KB and A/p1 when the event contradicts a KG where A1 and p1 are parameters as well which need are predetermined, and selected depending on the context in which the invention is used. Generally, A1<≈1 and p1<≈p.
- Preferably, a suitable corpus from a known set of seed sources must be present in the events database 410. This seed set of sources comprises KG and KB sources that can be progressively adjusted and enriched when more content items (and their sources) are analyzed. The seed set can be manually generated or could rely on other approaches based on spam or trust detection. For example, the New York Times™ may receive a high trust score from an anti-spam algorithm and thus would fall into the KG classification in the database. In one embodiment, an input seed source can initially include a list of relevant sources and a result of the highest ranked sources of the link-based algorithm of the present embodiment.
- An information provider or source can gain a KG status if a sufficient number of content items and events is consistently shared with other previously KG sources within the database. Similarly, an information provider can gain a KB status if a sufficient number of content items and events is consistently shared with other previously KB sources. Sources with content items and events sharing both KG and KB sources or only untrusted sources will continue to be labeled as an untrusted source.
- Alternatively, content items and events with no corresponding KG or KB sources can be scored according to the number of sources agreeing or disagreeing with the risk of adding additional noise to the scoring process. If the noise is too great, these content items can be omitted when ranking. Another approach could be to use the previous link-based score, which gives already an independent score base characterizing the sources and have some insights on the distribution and apply a corresponding score. For example, imagine we have the case of 5 news sources we want to characterize and 4 of them are un-trusted sources (without KG or KB included) there will be one source that will contradict the other 4. If we know that the sources' distribution is dominated or most likely be dominated by un-trusted sources we can score the source that contradicts more times like a signature of trust and opposite in the case of a distribution dominated by trusted-sources.
- Cumulative Scoring
- The event ranking module contains the EventScore, the LinkScore associated with the sources within the content item and a 3rd term related to links and sentiment attached to the event itself as it will be further described.
-
EventRank=a*EventScore+b/t*Σ i=1 tLinkScore(i)+c*LinksToEvent - where the a, b and c coefficients are weights, the second term takes into account the LinkScores of the sources which reproduced the event (t is the number of sources), and LinksToEvent will map all links. In this case we want to differentiate links directing to a webpage and links directing to a given event. From our adjacent matrix we know the links directing to a webpage, so we can select the ones directing only to the event we are analyzing and sum over their LinkScores (for each source with a link directing to the event) to the given event together with the sentiment analysis attached to the link. This factor can be estimated as:
-
LinksToEvent=Σi=1 nLinkScore(i)×Sent - where n is the number of links directing to the content that contains the given event. LinkScore is the score of the source of the link directing to the event and Sent is a factor which considers the sentiment attached to the link.
- The integrated AccumulatedEventRank will be evaluated in both cases as:
-
- where the individual scores are added for an information provider and a normalization is done to the total number of events of the respective information provider Ni.
- In practice, the content of a source could be rather unique. For example, if one looks at a local newspaper and compares with what is being published in a national or international newspaper, just because they cover different objects, geographies or they have different interests, there will likely be very little overlap in events. This does not mean a source is un-trusted, and these cases will be complemented by the LinkScore algorithm.
- A final source score will be obtained after normalization of both LinkScore and AccumulatedEventRank. The final source score can be calculated in first approach as the weighted average value of both scores which we call SourceRank as:
-
SourceRank=weightLink*LinkScore+weightEvent*AccumulatedEventRank - where weightLink and weightEvent are weighting factors which are estimated using a test database with result cases, and can be optimized by one skilled in the art based on the data set being used.
- Note that both the event scoring and source scoring approaches allow for detection and handling of badly extracted data. An important consideration is that event detection, even, with state-of-the-art systems, can be error-prone. In such cases, the extracted information may not be accurate compared to the source. One advantage to the invention, as described, is that an incorrectly extracted event can be discriminated against via a low event trust score, removing noise from the event extraction process. If a particular source is extracted in a particularly noisy fashion, perhaps because of the way the source is structured, then this is reflected in the SourceRank.
- As an example FIG. shows 8 sources with hypothetical link connections and a discrete 3 level sentiment score associated with it. The original transition matrix would look like:
-
- If we consider the sentiments attached to the links we can generate a new transition matrix as for example:
-
- Where we selected some probability values for the links according to the sentiments attached. The real probability distribution might be different from what we showed in this example. We can create a positive and a negative transition matrix in the same way, by taking only positive and negative sentiment links.
- In the FIG we specified that
source -
d=[0,½,0,½,0,0,0,0] - In the last example we only took into account link connections between sources as a whole, for the EventRank we need to include also links directing to a given event.
- As an example, in
FIG. 5 , consider seven sources (labeled S1-S7): 4 unknown which are to be evaluated for the EventRank value, 2 KG sources and 1 KB source which were previously analyzed and are now used as seeds. Suppose an event (news) that appears in theunknown sources KG source 2. The EventScore is evaluated and in parallel the link-based analysis is conducted to determine the LinkScores associated with the event. - If we continue running many events then we are going to have accumulation data for the AccumulatedEventRank and the final SourceRank. Best and worse values for the SourceRank will provide feedback in the seed of KG and KB sources, and improving the EventRank estimation gradually. Convergence conditions need to be settle providing that optimum EventRank and SourceRank results are obtained.
- Neural networks, cluster models, hidden Markov models, Bayesian networks, or other machine learning methods can also be used to classify or create clusters for further analysis, potentially optimizing the best-fitting algorithms, performing the calculations on a subset of documents or acting as a replacement or first-pass against large sets of documents. Alternatively creating decision tress or other path optimization approaches can be used.
- The above-described embodiments may be useful in a number of contexts where the integrity of an event or fact may be critical to ascertain. Several examples of use are now described. However, these examples are not meant to be comprehensive. One example is for use in scoring and verifying information for an applicant for insurance underwriting. In such an example, it is important an applicant is not, for example, denied insurance based on incorrect information.
- In another example, the methods described may be useful in assessing possible fraudulent activity. Automated monitoring systems may generate many alerts based on detected “events” that may not be verified. The methods, as described, can be used to score events to determine their validity. Alternatively, abnormal or unexpected events or facts could be flagged for further scrutiny.
- In another example, the methods described may be employed to help with credit decisioning, either by an automated system or to support the decision of a loan officer. In such a scenario, assessing the truthfulness or validity of detected information can be an important part of determining what information impacts the credit decision.
- In another example, the methods described may be used for securities trading, either as support for a human trader or as part of an automated system. Automated systems that trade on news or events detected are already used by traders. Adding the ability to measure the trustworthiness of detected events could be an important advantage for these systems, for example, by preventing trading decisions based on false or poor information.
- The above-described embodiments are intended to be examples of the present invention and alterations and modifications may be effected thereto, by those of skill in the art, without departing from the scope of the invention, which is defined solely by the claims appended hereto.
Claims (25)
1. A computer-implemented method for ranking information stored on a computer readable medium; the method comprising:
extracting a content item describing an event from an information source;
parsing by a parsing module the content item for a plurality of source links;
attributing by a content analysis module a sentiment score to each source link; wherein said sentiment score is indicative of the relative credibility of each of said source links;
scoring by a scoring module said information source based on said source links and on said sentiment score
ranking said content item based on a score associated with said information source.
2. The method according to claim 1 , wherein said scoring comprises
calculating r from equation (1):
r=α*T+(1−α)*d
r=α*T+(1−α)*d
where d is a non-zero static score distribution vector, T is a transition matrix, and α is a predetermined constant; and, wherein each term in said transition matrix is modified by a non-zero sentiment score.
3. The method of claim 1 , further comprising storing on a score database implemented on a computer readable medium said score for said information source.
4. The method according to claim 3 , further comprising prior to said parsing step determining whether said information source has an associated score in said score database, and upon determining that said information source has an associated score in said score database, retrieving said score and returning to said extracting step.
5. The method according to claim 1 , further comprising
identifying an event from each source in a set of information sources;
calculating an event score for the content item describing said event based on a credibility score for each of the known sources in said set of information sources; and
combining said score for a respective information source with said credibility score to determine a cumulative event score.
6. The method according to claim 5 , wherein each of said sources in said set of information sources is classified as one of a known good source, a known bad source and an unknown reliability source, and wherein said calculating an event score is biased towards sources identified as known good sources.
7. The method according to claim 6 , wherein said event score is calculated as:
where A, m, n and p are parameters selected such that A/p<A/n<A/m<A; KG is a known good source; KB is a known bad source.
8. The method according to claim 5 , wherein said identified events are compared to identify contradictions, and said calculating an event score includes biasing events from known good sources to resolve said contradictions.
9. The method according to claim 5 , wherein said combining comprises calculating an event ranking representative of said event being reliable.
10. The method according to claim 9 , wherein calculating said event ranking is calculated as:
EventRank=a*EventScore+b*ΣLinkScore+c*LinksToEvent
EventRank=a*EventScore+b*ΣLinkScore+c*LinksToEvent
where a, b and c are weighted coefficients and LinksToEven is calculated as:
where n is the number of sources and LinkScore is the score of said information source, and Sent is the sentiment score.
11. The method according to claim 10 , further comprising calculating an accumulated event rank for the information provider from a plurality of event ranks by:
where EventRank(j) is a plurality of event scores for a plurality of content items and Ni is a total number of content items of the information provider in the known source database.
12. A computer-implemented method for ranking information stored on a computer readable medium; the method comprising:
identifying an event from each source in a set of information sources;
calculating an event score for a content item describing said event based on a credibility score for each of the known sources in said set of information sources; and
combining a score for a respective information source with said credibility score to determine a cumulative event score.
13. The method according to claim 12 , wherein each of said sources in said set of information sources is classified as one of a known good source, a known bad source and an unknown reliability source, and wherein said calculating an event score is biased towards sources identified as known good sources.
14. The method according to claim 13 , wherein said event score is calculated as:
where A, m, n and p are parameters selected such that A/p<A/n<A/m<A; KG is a known good source; KB is a known bad source.
15. The method according to claim 12 , wherein said identified events are compared to identify contradictions, and said calculating an event score includes biasing events from known good sources to resolve said contradictions.
16. The method according to claim 12 , wherein said combining comprises calculating an event ranking representative of said event being reliable.
17. The method according to claim 16 , wherein calculating said event ranking is calculated as:
EventRank=a*EventScore+b*ΣLinkScore+c*LinksToEvent
EventRank=a*EventScore+b*ΣLinkScore+c*LinksToEvent
where a, b and c are weighted coefficients and LinksToEven is calculated as:
where n is the number of sources and LinkScore is the score of said information source, and Sent is the sentiment score.
18. The method according to claim 17 , further comprising calculating an accumulated event rank for the information provider from a plurality of event ranks by:
where EventRank(j) is a plurality of event scores for a plurality of content items and Ni is a total number of content items of the information provider in the known source database.
19. The method according to claim 23 , wherein said score for a respective information source is determined by:
extracting a content item describing an event from an information source;
parsing by a parsing module the content item for a plurality of source links;
attributing by a content analysis module a sentiment score to each source link; wherein said sentiment score is indicative of the relative credibility of each of said source links;
scoring by a scoring module said information source based on said source links and on said sentiment score; and
ranking said content item based on a score associated with said information source.
20. The method according to claim 19 , wherein said scoring comprises
calculating r from equation (1):
r=α*T+(1−α)*d
r=α*T+(1−α)*d
where d is a non-zero static score distribution vector, T is a transition matrix, and α is a predetermined constant; and, wherein each term in said transition matrix is modified by a non-zero sentiment score.
21. The method of claim 19 , further comprising storing on a score database implemented on a computer readable medium said score for said information source.
22. The method according to claim 21 , further comprising prior to said parsing step determining whether said information source has an associated score in said score database, and upon determining that said information source has an associated score in said score database, retrieving said score and returning to said extracting step.
23. The method according to claim 17 , whereby the sentiment score is derived using a sentiment scorer that was created at least in part using:
a training set of known true and known false events; and
known links and the associated text of said known links to content that specifies the known true and known false events.
24. The method of claim 1 , wherein the ranking score is used for any one or more of insurance underwriting, assessing suspected fraudulent activity, credit decisioning, securities trading, insurance underwriting.
25. The method of claim 12 , wherein the ranking score is used for any one or more of insurance underwriting, assessing suspected fraudulent activity, credit decisioning, securities trading, insurance underwriting.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/939,755 US20150019565A1 (en) | 2013-07-11 | 2013-07-11 | Method And System For Scoring Credibility Of Information Sources |
US15/237,163 US20160364392A1 (en) | 2013-07-11 | 2016-08-15 | Method and system for scoring credibility of information sources |
US15/629,355 US10678798B2 (en) | 2013-07-11 | 2017-06-21 | Method and system for scoring credibility of information sources |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/939,755 US20150019565A1 (en) | 2013-07-11 | 2013-07-11 | Method And System For Scoring Credibility Of Information Sources |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/237,163 Continuation US20160364392A1 (en) | 2013-07-11 | 2016-08-15 | Method and system for scoring credibility of information sources |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150019565A1 true US20150019565A1 (en) | 2015-01-15 |
Family
ID=52278002
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/939,755 Abandoned US20150019565A1 (en) | 2013-07-11 | 2013-07-11 | Method And System For Scoring Credibility Of Information Sources |
US15/237,163 Abandoned US20160364392A1 (en) | 2013-07-11 | 2016-08-15 | Method and system for scoring credibility of information sources |
US15/629,355 Active 2034-08-03 US10678798B2 (en) | 2013-07-11 | 2017-06-21 | Method and system for scoring credibility of information sources |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/237,163 Abandoned US20160364392A1 (en) | 2013-07-11 | 2016-08-15 | Method and system for scoring credibility of information sources |
US15/629,355 Active 2034-08-03 US10678798B2 (en) | 2013-07-11 | 2017-06-21 | Method and system for scoring credibility of information sources |
Country Status (1)
Country | Link |
---|---|
US (3) | US20150019565A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160034565A1 (en) * | 2014-07-29 | 2016-02-04 | International Business Machines Corporation | Managing credibility for a question answering system |
US20160283952A1 (en) * | 2013-11-04 | 2016-09-29 | Agingo Corporation | Ranking information providers |
US20170220952A1 (en) * | 2016-02-03 | 2017-08-03 | International Business Machines Corporation | Intelligent selection and classification of oracles for training a corpus of a predictive cognitive system |
CN107370664A (en) * | 2017-07-17 | 2017-11-21 | 陈剑桃 | A kind of effective microblogging junk user finds system |
US20200356615A1 (en) * | 2017-02-21 | 2020-11-12 | Sony Interactive Entertainment LLC | Method for determining news veracity |
CN112153221A (en) * | 2020-09-16 | 2020-12-29 | 北京邮电大学 | Communication behavior identification method based on social network diagram calculation |
US11170319B2 (en) * | 2017-04-28 | 2021-11-09 | Cisco Technology, Inc. | Dynamically inferred expertise |
US20220358521A1 (en) * | 2021-05-07 | 2022-11-10 | Rovi Guides, Inc. | Mechanism to add insightful intelligence to flowing data by inversion maps |
US11954750B2 (en) * | 2019-03-08 | 2024-04-09 | GoWare Inc. | Systems and methods for multi-factor validation of information, dynamically supporting common operational picture and decision making, in real time |
US12231311B2 (en) | 2009-10-23 | 2025-02-18 | Www.Trustscience.Com Inc. | Parallel computational framework and application server for determining path connectivity |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201713821D0 (en) * | 2017-08-29 | 2017-10-11 | Factmata Ltd | Content scoring |
US10963688B2 (en) | 2017-12-10 | 2021-03-30 | Walmart Apollo, Llc | Systems and methods for a customer feedback classification system |
US11195178B2 (en) * | 2018-03-14 | 2021-12-07 | Coupa Software Incorporated | Integrating tracked transaction data into approval chains for digital transactions |
CN108595574B (en) * | 2018-04-16 | 2021-11-02 | 上海达梦数据库有限公司 | Database cluster connection method, device, equipment and storage medium |
US20220342943A1 (en) * | 2018-11-14 | 2022-10-27 | Hints Inc. | System and Method for Detecting Misinformation and Fake News via Network Analysis |
US20240256522A1 (en) * | 2019-03-08 | 2024-08-01 | GoWare Inc. | Systems And Methods For Autonomous Validation of Crowd Sourced and Open Sourced Information |
WO2021262180A1 (en) * | 2020-06-25 | 2021-12-30 | Hints Inc. | System and method for detecting misinformation and fake news via network analysis |
US12153587B1 (en) * | 2021-11-05 | 2024-11-26 | Validate Me LLC | Decision making analysis engine |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070239701A1 (en) * | 2006-03-29 | 2007-10-11 | International Business Machines Corporation | System and method for prioritizing websites during a webcrawling process |
US20080228675A1 (en) * | 2006-10-13 | 2008-09-18 | Move, Inc. | Multi-tiered cascading crawling system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6981040B1 (en) * | 1999-12-28 | 2005-12-27 | Utopy, Inc. | Automatic, personalized online information and product services |
US7567959B2 (en) * | 2004-07-26 | 2009-07-28 | Google Inc. | Multiple index based information retrieval system |
JP4359787B2 (en) * | 2007-07-02 | 2009-11-04 | ソニー株式会社 | Information processing apparatus, content reputation search method, and content reputation search system |
US8346756B2 (en) * | 2007-08-31 | 2013-01-01 | Microsoft Corporation | Calculating valence of expressions within documents for searching a document index |
US8117223B2 (en) * | 2007-09-07 | 2012-02-14 | Google Inc. | Integrating external related phrase information into a phrase-based indexing information retrieval system |
US20130325877A1 (en) * | 2012-05-31 | 2013-12-05 | Razieh Niazi | Uses Of Root Cause Analysis, Systems And Methods |
-
2013
- 2013-07-11 US US13/939,755 patent/US20150019565A1/en not_active Abandoned
-
2016
- 2016-08-15 US US15/237,163 patent/US20160364392A1/en not_active Abandoned
-
2017
- 2017-06-21 US US15/629,355 patent/US10678798B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070239701A1 (en) * | 2006-03-29 | 2007-10-11 | International Business Machines Corporation | System and method for prioritizing websites during a webcrawling process |
US20080228675A1 (en) * | 2006-10-13 | 2008-09-18 | Move, Inc. | Multi-tiered cascading crawling system |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12231311B2 (en) | 2009-10-23 | 2025-02-18 | Www.Trustscience.Com Inc. | Parallel computational framework and application server for determining path connectivity |
US20160283952A1 (en) * | 2013-11-04 | 2016-09-29 | Agingo Corporation | Ranking information providers |
US9886479B2 (en) * | 2014-07-29 | 2018-02-06 | International Business Machines Corporation | Managing credibility for a question answering system |
US20160034456A1 (en) * | 2014-07-29 | 2016-02-04 | International Business Machines Corporation | Managing credibility for a question answering system |
US20160034565A1 (en) * | 2014-07-29 | 2016-02-04 | International Business Machines Corporation | Managing credibility for a question answering system |
US9886480B2 (en) * | 2014-07-29 | 2018-02-06 | International Business Machines Corporation | Managing credibility for a question answering system |
US20170220952A1 (en) * | 2016-02-03 | 2017-08-03 | International Business Machines Corporation | Intelligent selection and classification of oracles for training a corpus of a predictive cognitive system |
US20200356615A1 (en) * | 2017-02-21 | 2020-11-12 | Sony Interactive Entertainment LLC | Method for determining news veracity |
US12072943B2 (en) * | 2017-02-21 | 2024-08-27 | Sony Interactive Entertainment LLC | Marking falsities in online news |
US11170319B2 (en) * | 2017-04-28 | 2021-11-09 | Cisco Technology, Inc. | Dynamically inferred expertise |
CN107370664A (en) * | 2017-07-17 | 2017-11-21 | 陈剑桃 | A kind of effective microblogging junk user finds system |
US11954750B2 (en) * | 2019-03-08 | 2024-04-09 | GoWare Inc. | Systems and methods for multi-factor validation of information, dynamically supporting common operational picture and decision making, in real time |
CN112153221A (en) * | 2020-09-16 | 2020-12-29 | 北京邮电大学 | Communication behavior identification method based on social network diagram calculation |
US20220358521A1 (en) * | 2021-05-07 | 2022-11-10 | Rovi Guides, Inc. | Mechanism to add insightful intelligence to flowing data by inversion maps |
Also Published As
Publication number | Publication date |
---|---|
US20170286431A1 (en) | 2017-10-05 |
US10678798B2 (en) | 2020-06-09 |
US20160364392A1 (en) | 2016-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10678798B2 (en) | Method and system for scoring credibility of information sources | |
US11475143B2 (en) | Sensitive data classification | |
Ahmad et al. | Fake news detection using machine learning ensemble methods | |
US11546223B2 (en) | Systems and methods for conducting more reliable assessments with connectivity statistics | |
Shu et al. | Beyond news contents: The role of social context for fake news detection | |
Vosoughi et al. | Rumor gauge: Predicting the veracity of rumors on Twitter | |
Olteanu et al. | Web credibility: Features exploration and credibility prediction | |
Linton et al. | Dynamic topic modelling for cryptocurrency community forums | |
Chen et al. | User reputation in a comment rating environment | |
Zhou et al. | Classifying the political leaning of news articles and users from user votes | |
US9881059B2 (en) | Systems and methods for suggesting headlines | |
Herron et al. | A careful look at modern case selection methods | |
US20140229164A1 (en) | Apparatus, method and computer-accessible medium for explaining classifications of documents | |
Wawer et al. | Predicting webpage credibility using linguistic features | |
Lee et al. | Explainable deep learning for false information identification: An argumentation theory approach | |
Greiner et al. | Divergent pathways on the road to sustainability: A multilevel model of the effects of geopolitical power on the relationship between economic growth and environmental quality | |
US20140272842A1 (en) | Assessing cognitive ability | |
Gupta et al. | Fake news detection using machine learning | |
Marche et al. | Implementation of a multi-approach fake news detector and of a trust management model for news sources | |
Granskogen | Automatic detection of fake news in social media using contextual information | |
Ding et al. | Towards building a word similarity dictionary for personality bias classification of phishing email contents | |
Popović et al. | Extraction of temporal networks from term co-occurrences in online textual sources | |
Weiss et al. | Detection of phishing webpages using heterogeneous transfer learning | |
Buscemi et al. | Large Language Models' Detection of Political Orientation in Newspapers | |
Thi et al. | A novel solution for anti-money laundering system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: OUTSIDE INTELLIGENCE, INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIJACHEV, VICTOR FERNANDOVICH COMAS;ADAMSON, DANIEL RYAN;REEL/FRAME:030779/0963 Effective date: 20130710 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: EXIGER CANADA, INC., ONTARIO Free format text: MERGER;ASSIGNOR:OUTSIDE INTELLIGENCE INC.;REEL/FRAME:045473/0632 Effective date: 20171001 |