EP1999692A2 - Knowledge repository - Google Patents
Knowledge repositoryInfo
- Publication number
- EP1999692A2 EP1999692A2 EP06765371A EP06765371A EP1999692A2 EP 1999692 A2 EP1999692 A2 EP 1999692A2 EP 06765371 A EP06765371 A EP 06765371A EP 06765371 A EP06765371 A EP 06765371A EP 1999692 A2 EP1999692 A2 EP 1999692A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- knowledge
- query
- user
- natural language
- facts
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Definitions
- Keyword indexing documents can provide no general solution.
- Another problem may be that the knowledge is conceptualised in a way that is different from the way that it is described on the web page. For example, if one is trying to locate bi-monthly magazines with a search engine, one is unlikely to turn up any examples where they are described as being published "every two months". Another example would be trying to find all hotels within two kilometres of a specific geographical location. It is extremely unlikely that any description of the hotel will be expressed in exactly that form so any keyword searching for this will fail. i.e. Because search engines don't generally understand the knowledge within a document, they cannot infer new knowledge from what is said.
- FIG. 1 An example of a prior art search-engine interaction illustrating some of these problems is shown in Figure 1.
- the user has typed a very simple question about a popular musician in the search box (102) and the search engine has responded with a list of documents (104).
- the web contains a very strong bias towards contemporary people, especially celebrities, and there is no shortage of information on the web which would allow a perfect system to answer this question. In fact there are many thousands of web pages with information in them suitable for answering it.
- the list of documents bears very little similarity to what is being asked and the user would have to experiment further and read through a number of documents to get an answer.
- Keyword searching are even more extreme when the user is not human but rather an automated system such as another computer.
- the software within a website or other automated system needs the knowledge it requires for its processing in a form it can process.
- documents found with keyword searching are not sufficiently processable to provide what is needed.
- almost all the world's computer systems have all the knowledge they need stored in a local database in a local format.
- automated scheduling systems wanting to know whether a particular date is a national holiday access a custom written routine to provide this information, they do not simply consult the internet to find out the answer.
- Knowledge in structured form is knowledge stored in a form designed to be directly processable to a computer. It is designed to be read and processed automatically. Structured form means that it is not stored as natural language.
- Knowledge in structured form will include identifiers which denote objects in the real world and examples will include assertions of information about these identified objects.
- An example of such an assertion would be the assertion that an identified relationship exists between two or more identified objects or that a named attribute applies to an identified object. (Individual instances of structured knowledge are referred to herein as "facts" or "assertions”.)
- General knowledge in structured form has a variety of uses by a computer, including direct answering of natural language questions, and assistance with other forms of natural language processing (such as mining data from documents). It can even assist with keyword searching. For example, with the example above, if the structural knowledge exists that the strings “Abe Lincoln” and “President Abraham Lincoln” both denote the same unique entity a search engine using such a knowledge base could return documents containing either term when only one was entered by the user.
- adding knowledge should desirably not requne any previous knowledge of what is already in the knowledge base If pnor familiarity with the ontology or other facts that are already m the knowledge base is lequired, untrained users will find it more difficult to add knowledge
- Embodiments of the present invention may be considered as internet-based knowledge repositories of general knowledge, stored m structured foim, to which anyone may add Various embodiments include a static knowledge base of geneial knowledge stored in structuied form in one or more persistent computer-accessible stores The knowledge is repiesented within the static knowledge base using a structured knowledge iepiesentation method
- a knowledge iepiesentation system which includes a data store having a knowledge base stored therein comprising first knowledge represented in a structured, machine -leadable foimat which encodes meaning
- the system also includes at least one computing device operable to add second knowledge to the knowledge base
- the second knowledge is geneiated with refeience to input fiom a plurality of users which is not in the structuied, machine-readable format At least some of the input from the users is in a natural language
- the at least one computing device is furthei operable to geneiate third knowledge not represented in the knowledge base by inferring the third knowledge from at least one of the fust knowledge and the second knowledge
- the at least one computing device is further operable to respond to queries using at least one of the fust knowledge, the second knowledge, and the third knowledge
- methods and appaiatus are piovided for facilitating addition to a knowledge base
- the knowledge base includes first knowledge represented in a structured, machine- ieadable format which encodes meaning
- At least one interface is provided by which a first user may enter information which is not in the structuied, machme-ieadable format At least some of the information is in a natural language
- the at least one interface is operable to transmit the information to at least one remote computing device foi generation of second knowledge represented m the machine-readable format for addition to the knowledge base
- Responses to knowledge requests are presented using at least one of the first knowledge, the second knowledge, and third knowledge not represented in the knowledge base and inferred from at least one of the first knowledge and the second knowledge
- a computing system in a network is provided.
- the system includes a knowledge repository which includes first knowledge represented in a structured, machine-readable format operable to store information about any entity that can be denoted in a natural language.
- the system further includes at least one computing device operable to facilitate addition of second knowledge to the knowledge repository by collecting input from a plurality of users via the network using natural language requests, and translating the input to the machine-readable format.
- a computing system in a network includes a knowledge repository which includes first knowledge represented in a structured, machine- readable format operable to store information about any entity that can be denoted in a natural language.
- the system further includes at least one computing device operable to facilitate addition of second knowledge to the knowledge repository by a plurality of users without requiring knowledge of the machine-readable format by the users.
- methods and apparatus are provided for responding to knowledge requests.
- a first natural language response to a first knowledge request is presented.
- the first natural language response is derived from knowledge represented in a structured, machine- readable format operable to store information about any entity that can be denoted in a natural language.
- Search results are presented in response to a second knowledge request where a second natural language response derived from the knowledge is not available.
- the search results include a plurality of natural language documents identified using a conventional search engine
- methods and apparatus are provided for responding to a knowledge request.
- a natural language response to the knowledge request is presented.
- the natural language response is derived from knowledge represented in a structured, machine-readable format operable to store information about any entity that can be denoted in a natural language.
- Search results are presented in conjunction with the natural language response.
- the search results include a plurality of natural language documents retrieved using a conventional search engine.
- methods and apparatus are provided for facilitating addition of a first entity to a knowledge base by a first human user.
- Identification by the first human user of a class to which the first entity belongs is facilitated.
- Generation by the first human user of at least one first natural language string denoting the first entity is facilitated.
- Generation by the first human user of first recognition data corresponding to the first entity is facilitated.
- the first recognition data is specified to facilitate unique recognition of the first entity by humans.
- Transmission is facilitated of data representing the class, the first natural language strings and the first recognition data for storage in a knowledge base in association with an identifier identifying the first entity within the knowledge base.
- second recognition data is presented to the first human user.
- the second recognition data corresponds to a second entity represented in the knowledge base and is specified to facilitate unique recognition of the second entity by humans. Verification by the first human user that the first entity is distinct from the second entity is facilitated.
- FIG 1 gives an example of a prior art search with a search-engine. A question has been turned into a list of documents based on them containing similar words.
- FIG 2 shows an embodiment of the present invention "plugged into” the same search engine and responding to the same question using structured knowledge. A perfect answer is provided to the user and the list of documents is relegated to serving as supplementary information.
- FIG 3 illustrates components in the preferred embodiment of the invention.
- FIG 4 shows a method for answering a query with "no" instead of "unknown"
- FIG 5 shows how knowledge about the completeness of the results returned can be given in query processing.
- FIG 6 shows how queries are processed in one embodiment.
- FIG 7 shows a question answered with multiple answers and completeness information provided.
- FIG 8 shows a question answered with both a concise and a detailed explanation.
- FIG 9 shows a method for translating a question or fact assertion from natural language into internal form.
- FIG 10 shows a method for eliminating improbable candidate translations using semantic constraint knowledge.
- FIG 11 shows how multiple translation candidates are dealt with more generally.
- FIG 12 shows two example questions with ambiguity being dealt with.
- FIG 13 illustrates the profile system with four different profiles being given for the same entity.
- FIG 14 illustrates the profile showing system specific data.
- FIG 15 shows a method for selecting a default profile template for a given object.
- FIG 16 shows a method for turning a profile template and object into a profile.
- FIG 17 shows part of a profile template being processed.
- FIG 18 shows part of a profile template containing iterator nodes being processed.
- FIG 19 shows a method of authenticating a user using their real world identity.
- FIG 20 shows a method of selecting an object.
- FIG 21 shows a method of allowing a user to add a new (non class, non relation) object.
- FIG 22 illustrates an exemplary interaction with a user adding a new object.
- FIG 23 is a continuation of FIG 22.
- FIG 24 shows a method of allowing a user to add a new class.
- FIG 25 illustrates an exemplary interaction with a user adding a new class.
- FIG 26 is a continuation of FIG 26
- FIG 27 shows a method of allowing a user to add a new relation
- FIG 28 illustrates an exemplaiy interaction with a user adding a new ielation
- FIG 29 is a continuation of FIG 28
- FIG 30 is a continuation of FIG 29
- FIG 31 is a continuation of FIG 30
- FIG 32 shows a method of dealing with a sequence of facts collected for assertion by a process
- FIG 33 shows a method of collecting denotational strings for a new object
- FIG 34 shows a method of allowing a user to add a new fact to the static knowledge base
- FIG 35 illustrates a usei adding a new fact where all but one element has been pie-specified
- FIG 36 shows a method for collecting essential facts from a user about a newly added object
- FIG 37 shows a method for collecting temporal data fiom a usei pertaining to a tiansient fact
- FIG 38 shows a method for collecting source mfoimation about a fact from a user
- FIG 39 shows a method usable in the user assessment subsystem for collecting endoisements or contiadictions of a fact from a usei
- FIG 40 shows a method usable in the system assessment subsystem for automatically calculating various types of state information about a fact
- FIG 41 illustiates an exemplaiy interaction with a user wheie usei assessment and system assessment methods allow an incorrect fact to be removed from the static knowledge base and the correct version to be published
- FIG 42 illustiates an exemplaiy mtei action with a user where the user's attempts to abusively assert knowledge aie thwarted by two different abuse prevention techniques
- FIG 43 shows a method of utilising a prior art seaich engine in combination with an embodiment of the current invention to piocess a user search query
- FIG 44 shows a method of enhancing a user seaich query using knowledge obtainable from an embodiment of the present invention
- the structured knowledge representation employed by specific embodiments of the invention uses primarily a collection of assertions of named relationships between pairs of named entities.
- Each assertion also referred to herein as a "fact” is also a named entity and temporal data about when a fact is true can be asserted using similar assertions.
- the preferred embodiment supports "negative facts”: assertions of a relationship not being true and "parametered objects” where entities are identified by a combination of a class with one or more other named entities.
- the structured knowledge representation described herein is advantageous in that it allows representation of knowledge of an extremely broad class. That is, it is operable to represent any entity
- the structured knowledge representation is also operable to represent the presence or absence of any relationship between two or more such entities, and whether or not a particular attribute applies to a specific entity.
- the structured knowledge representation is also operable to represent points in time when these relationships are valid.
- the information represented and manipulated is of an extremely narrow domain.
- the developer typically creates a schema of database tables to store the entities and the relationships between entities that the application needs.
- the developer then hard-codes a program that manipulates the data in these tables, e.g., using SQL.
- queries and query answering are also supported.
- Queries are a machine-readable analogue to a question or knowledge request designed to elicit knowledge from the system.
- the query answering system can answer queries with a list of objects that match the query and can answer "truth queries” (the query analogue to a yes/no question) with "yes", "no” and "unknown” responses.
- "completeness information” whether the list of responses contains all the possible responses) can be provided when the query requests a list of entities.
- Knowledge generation enables facts to be generated by the system which are not present in the static knowledge base. This can be achieved by inference from the facts in the static knowledge base.
- the knowledge generation system can also generate facts sourced from a third-party database or dynamic source such as (for example) financial information.
- Knowledge generation is implemented in the preferred embodiment via a collection of "generators” which comprise a pattern of the facts which they can generate in combination with one or more mechanisms to generate facts which match this pattern. Some generators achieve this by providing a query linked to the pattern which if answered provides values for unknowns in the pattern thus enabling the generation of the facts (“dumb generators”). Other generators use some executable code possibly in combination with a query to generate facts matching the pattern (“smart generators"). Smart generators can be used to generate facts sourced from an external source or database by accessing this external source and converting the knowledge so retrieved into facts matching its pattern. Smart generators can also be used to do inference where at least one calculation step is needed to generate the new facts.
- Various embodiments also support the creation of detailed natural language explanations of how a query was answered.
- the preferred embodiment additionally supports a summarised concise explanation showing only the facts in the static knowledge base (or an essential subset thereof) that were used to respond to the query.
- the preferred embodiment also supports question translation. This is the capability to translate natural language questions or knowledge requests provided by a user into a query. In combination with the query answering system this enables internet users to type a natural language question directly into the system and obtain an answer directly. Various embodiments also support ambiguity resolution by elimination of improbable interpretations of the question.
- Various embodiments also support the retranslation of a query back into unambiguous natural language. In combination with the question translation system, this enables the user to have confidence that their question has been correctly understood. If the question translation system determines that the user's question is ambiguous it also enables it to present the list of interpretations of their question for selection of the user's intended query.
- various embodiments also support use by remote automated systems.
- a number of services are provided including responding to queries.
- this service can be of genuine use by a remote non-human user in a way that a traditional document-returning search-engine cannot.
- Knowledge addition in the preferred embodiment is achieved by a number of "processes" which interact with general internet users via a sequence of web pages containing prompts, text input boxes and buttons. These processes receive, check and refine the answers provided by users and include confirmation pages. Processes can also call other processes as sub-processes (which can in turn call additional processes etc.) creating intervening additional sequences of pages within the parent process. For example, when a user adds a new entity to the knowledge base and asserts that this entity belong to a class which is also not in the knowledge base, the process for adding the class can be immediately implemented returning the user to the initial process (with the class so added) when it is finished. The calling parent process receives the class name exactly as if it was an existing class which had been selected by the user.
- the knowledge addition system comprises processes for adding new classes, new relations and new entities of other types.
- the preferred embodiment also has support for natural language translation of facts asserted by users whereby a natural language sentence can be translated into a combination of one or more facts using a method similar to the translation of questions and, after confirmation, this knowledge added to the static knowledge base. Prompting for the two objects and the named relationship individually is used as a fall-back if the entire assertion cannot be understood.
- Various embodiments also support "user assessment" where users can endorse or contradict facts in the static knowledge base and these assessments are used to remove or hide untrue facts.
- links to endorse or contradict a fact are provided next to facts in the static knowledge base displayed to the user. For example, this occurs when presenting the summary explanation generated in response to a question or knowledge request provided by a user. When a great deal of confidence has been gained in the veracity of a fact the preferred embodiment ceases to accept user assessment on it.
- users can authenticate themselves with the id of class [human being] that corresponds to their real identity.
- the preferred embodiment additionally contains mechanisms for users to establish that they have not appropriated the identity of someone other than themselves.
- knowledge addition and user assessment are associated with the user's true identity as the reporter, thereby giving a clear record of the provenance of the knowledge.
- Various embodiments also contain a "system assessment” component operable to assess the veracity of facts based at least on their semantic interaction with other facts in the knowledge base.
- facts can be labelled as “contradicted” (in semantic conflict with other facts in the static knowledge base) and “superfluous” (believed true but which can already be generated by the system).
- System assessment is done on all newly added facts to the static knowledge base and the user who has added a fact that is contradicted by other facts in the static knowledge base is given an opportunity to use user assessment to draw attention to and potentially change the status of any of those facts which they believe to be untrue.
- system assessment can be used to resuscitate facts previously thought to be untrue when for example, one or all of the facts in conflict with the newly added fact is later reassessed (via user assessment or otherwise) as untrue.
- Other embodiments may use system assessment to prevent untrue facts from being added to the system at all.
- Various embodiments also support additional mechanisms for preventing the addition of untrue facts by mistaken or abusive users including the ability to block certain patterns of facts from being added and ranking of users based on their track record of adding highly quality knowledge. More trust is associated with the users of higher rank, more weight given to the facts they assert and more weight to their user assessments resulting in a higher probability of publication.
- Various embodiments also support the generation of "profiles" giving general information about a particular entity based on its class and the knowledge about that entity in the system. This is implemented in the preferred embodiment via a collection of profile templates which define the contents of an information screen and what queries need to be run to populate it.
- the preferred embodiment supports one or more different profiles being supported for a particular class giving a different emphasis to the object being profiled. It is also possible to navigate through the classes that an object is a member of, giving a profile tailored to that class for the same entity.
- links can be provided enabling a user to add the missing knowledge with only the missing knowledge being prompted for.
- various embodiments support user interactions with the system via multiple natural languages and with the users using different natural languages sharing access to at least some of the structured knowledge.
- Various embodiments also comprise a search-engine component operable to produce a list of documents (e.g. web pages) ordered by relevance to a query entered by a user. This component can be used to produce results in addition to the normal response to a user's question or as a fall-back when the question has not been successfully translated or the system cannot respond to the query.
- the present invention is implemented as a "plug-in" to a pre-existing search engine.
- the search-engine query entered by the user is processed by both the search-engine to produce a list of documents and by this embodiment to possibly produce a result originating from the structured-knowledge source.
- a successful response from the plug-in is presented above the search- engine results. If unsuccessful, the standard search-engine output is presented to the user and the user is no worse off than they would have been without the plug- in.
- a user interaction with this plug-in embodiment is illustrated in Figure 2.
- the user question or knowledge request (202) has been passed both through the search-engine search to produce a list of documents and additionally through an embodiment of this invention.
- the question translation component has received the user question and produced a query.
- the query answering system has then processed this query using knowledge generation and references to structured knowledge facts in the static knowledge base, producing an answer which is translated back into natural language for presentation to the user (204).
- the query answering system has also produced a concise explanation for the answer by presenting the facts in the static knowledge base which were used to answer this query (206). (The needed generated facts are not shown.)
- One of the facts used to answer the question can be confirmed or contradicted by the user (207) via the user assessment system.
- a detailed explanation including the generated facts and the steps taken to generate them was also produced, accessible to the user via a link (212).
- This embodiment has also retranslated the query back into unambiguous natural language to demonstrate that the user's question has been understood (208).
- the prior art list of web pages is still produced but has now been relegated to supplementary information (210).
- Figure 3 shows some of the components in the preferred embodiment. (Many of these components are optional and simply add to the overall functionality/utility of the system. They may not be present in other embodiments.)
- One or more client computers (302) with a human user (303) can access the system via a web-interface (310) on at least one server (308).
- one or more remote computers making automated queries (306) can access the system via a remote computer interface (312).
- the remote computer interface is described in section 2.15.
- the underlying knowledge is stored in one or more static knowledge bases (318).
- the static knowledge base is described in section 2.2 and the preferred embodiment knowledge representation method used to represent the knowledge stored in the static knowledge is described in section 2.3
- Knowledge can be added to the static knowledge base by users using the knowledge addition subsystem (326). This component and its subcomponents are described in section 2.9. Users are also able to correct and endorse added knowledge via the user assessment component (334). This is described in section 2.10
- the system is also able to analyse and label facts using system assessment (316). This is described in section 2.11
- Natural language translation (324) enables translation between natural language and internal representations, e.g. It can translate a natural language question into a query and natural language assertions of knowledge into one or more corresponding facts. Translation of questions is described in section 2.6.6, translation of factual assertions is described in section 2.6.9). Both these components are implemented in the preferred embodiment by referring to a store of translation templates (325). These provide a pre-determined pattern for matching against natural language strings and further data enabling natural language strings matching the pattern to be converted to the internal representation.
- Query processing (314) enables the retrieval of knowledge from the system. Queries may be the output of the natural language translation system (324) or provided by remote computers (306). Query processing is described in section 2.5
- the knowledge generation subsystem (320) provides facts which are not present in the static knowledge base often by inferring new knowledge from the facts that are present in the static knowledge base.
- the preferred embodiment uses a store of generators (322) which describe patterns of fact which they are capable of generating along with one or more mechanisms to generate these facts. Such a mechanism can be just a query (a dumb generator), or some program code optionally in combination with a query (a smart generator).
- Knowledge generation is described in section 2.4
- the profile generation system (330) enables the creation of a collection of information about a particular object. In the preferred embodiment this is a web page.
- profile generation is achieved by use of a store of profile templates (332) which specify the knowledge to be displayed, its format and how to obtain it.
- a "static knowledge base" is the term for a computer-accessible persistent store comprising knowledge represented in structured form.
- a persistent store could be a memory or memories of any type capable of holding the knowledge long term.
- various embodiments may hold the data in a long term store but temporarily cache it in a fast non-persistent memory such as RAM for access by other components of the system.
- the static knowledge base is a collection of facts represented using the knowledge representation method of the preferred embodiment described below, stored in one or more relational databases on one or more server computers.
- Knowledge representation is the methodology by which knowledge in structured form is represented within at least the static knowledge base.
- Methods of representing knowledge in structured form include:
- Semantic nets graph-like representations where the nodes correspond to objects and the edges to relationships
- Logic a machine-readable mathematical language of pre-determined syntax used to represent the knowledge. Logics are substantially simpler and more rigorously defined than natural language. Types of logic include predicate logic and propositional logic.
- Frames represent objects as a set of slots (attributes) and associated values.
- Embodiments of the current invention can contain a static knowledge base containing facts using at least one alternative structured knowledge representation.
- the preferred embodiment uses primarily a combination of simple assertions asserting a named relationship between two objects to represent knowledge.
- the relation can be negative and certain objects can comprise one or more further objects ("parametered objects").
- Each fact is also an object allowing facts to make assertions about other facts.
- Objects are individual entities. They can include physical objects in the real world (individual people, places, buildings etc.), conceptual objects (numbers, organisations etc.), attributes, quantities, classes etc.
- All identified objects have a unique id within the system. This name must be unique to identify the object and in the preferred embodiment should correspond to a common, yet fairly specific natural language noun or noun phrase for the same object (for relations, see section 2.3.3, a present tense central form is used). Instances are usually given the proper name for the object if there is one. If the propei name is not unique then a noun phrase is used including the proper name In the preferred embodiment these names can include spaces making them veiy close to natural language
- the prefened embodiment is the class of stimgs (sequences of characters) Instances of this class aie simply the stimg itself put in quotes, e g ["Willxam"] is the name for the sequence of characters 'W Y T T Y 'a' 'm' - it means nothing more than that Such objects are useful foi stating information used foi tianslation and for parametered objects
- strings are stimgs which aie used in natuial language to denote an object in the system Foi example, the strings “Abe Lincoln”, “Abiaham Lincoln” and “President Lincoln” are denotational strings for former US piesident Abraham Lincoln, "green” is a denotational string for the attiibute gieen, etc Denotational stimgs can also denote objects of all types including relations, classes etc
- Some classes contain an infinite (oi extiemely laige) number of objects that can be consistently undeistood in some way We can choose to denote such objects by a combination of the class name and data
- the syntax of a parametered object in the preferred embodiment is
- Parametered objects have at least one object within the name as a parameter
- the number of parameters can be fixed for a particular class, e g timepoint (a moment in time), oi vary, e g group (a collection of objects regaided as a single object) Foi some objects, stimgs containing the important information aie used as the parametei or parameters This is especially useful where there is already a well-established "real-world" syntax for members of the class.
- a simple example is the class of integers, e.g. [integer: ["8128"] ] . Integers already have a universal syntax and meaning using the digits 0-9 in sequence and the decimal system.
- chess position where a standard way of denoting it as strings (and including all the other information such as the side to move and castling rights) has already been established, e.g. [chess position: ["R7/5plp/5Kpl/8/k6P/plr5/2P5/8 b - -"] ] .
- timepoint class Another common class of parametered objects used in the preferred embodiment is the timepoint class.
- a single string object is used with a format that is not widely used. It is a sequence of integers separated by "/" characters, denoting (in order), the year, the month, the day, the hour in 24-hour clock, the minute, and the second. Any further integers are tenths, hundredths, thousandths of seconds, etc., e.g.
- [timepoint: ["1999/6/3/15/0"] ] is 3pm on the 3rd of June 1999 UTC. The accuracy of this timepoint is within one minute, [timepoint: ["1999”] ] specifies a "moment" of time but the accuracy is one year.
- Parametered objects are compared by comparing each parameter in turn. If the nature of the class means that order is unimportant (e.g. group) the parameters need to be considered in a pre-determined order (e.g. alphabetical) so that the same objects will be compared as equal.
- parametered objects can also have other parametered objects as parameters.
- This nested nature of parametered objects can be extended indefinitely deeply. For example, we could define a class "pair" specifically for objects consisting of exactly two things, e.g.
- unique recognition data is data associated with an object which has the following properties:
- someone's name plus their passport number would be perceivable (people can read names and passport numbers). It would also uniquely distinguish that person from all other people (passport numbers are unique). However, it would not be generally appreciable in most circumstances if that person's name was common as most people do not know other people's passport numbers. As it is not generally appreciable, it would not count as unique recognition data. However, the name of a person, a collection of common details about them and a photograph probably would count, as most people wishing to identify that person are likely to be able to pick out enough detail from the data to uniquely identify that person from anyone else it might be, even if some of the data was not known to them.
- a unique recognition string is unique recognition data coded as a sequence of printable characters, readable and understandable by a human user.
- objects are associated with a unique recognition string. This association is done with a simple fact using the relation [uniquely translates as] (see section 2.3.6 for how facts are asserted). This fact might be generated (see section 2.4).
- this string is to both uniquely distinguish the object from all other objects which may have similar names and to do so in a manner which allows this to happen in the minds of all (or almost all) the human users who may see this string and who have some familiarity with the object.
- MacDonald, software developer, date of birth 3rd of April 1975, resident in Cambridge England and employed by Ficton Engineering Ltd may be sufficient for a non-famous person as even people who not very familiar with that individual will probably see enough of what they know to make an identification.
- an identifying image may be part of the unique recognition data. In cases where everyone who wishes to communicate about the object has seen it (or knows what it looks like), it may be the only unique recognition data.
- Embodiments may use a collection of stored knowledge about the object together as unique recognition data. Embodiments can offer this via a profile of the object (see section 2.7). For example, an embodiment could display the id for the object linked to a profile for the object. If the user didn't recognise the id, they could click on the link to see the profile and use this information collected together to recognise the object. 2.3.3 Relations
- Relationships are things which link together objects.
- the preferred embodiment uses relationships between two objects. Relationships can exist between physical objects and also between physical objects and non-physical objects (concepts), e.g. "John is married to Sarah” is a natural language assertion about a relationship between two physical objects (in this case people). "The apple is green” asserts a relationship between the attribute "green” with the instance of apple being talked about. "The book is about Albert Einstein's career” asserts a relationship between a book and the concept of Albert
- relationships are also objects. For example:
- [is married to] is the object (relation) that corresponds to the Western concept of marriage between a man and woman, i.e. a formalised monogamous marriage.
- [is an instance of] relates an instance object to a class object, e.g. the relationship between Albert Einstein and the class [human being] .
- [applies to] relates an attribute object to another object, i.e. it says that a certain property applies to something.
- This second object can be anything: an instance, a class, a relation or even another attribute.
- [is a subclass of] relates one class to another and says that the first class is a more specific class than the second and that all objects that are members of the first class are also members of the second. For example, this relationship applies between the class [apple] and the class [fruit] .
- relations are typically named by finding a present tense verb phrase that unambiguously describes the relationship.
- all objects are members of at least one class.
- Classes define objects with similar characteristics. Class information is thus useful for generation and profile screens (see section 2.7).
- An object is related to a class of which it is a member by the [is an instance of] relation.
- Classes are related by the relation [is a subclass of] , so if B is a subclass of A then all objects which are members of B are also members of A. For example all members of [human being] are members of [living thing] because [human being] is a subclass of [living thing] .
- classes can also partially overlap.
- a class could be defined of male living things which would be a subclass of [living thing] with the attribute [male] .
- members would include male human beings as well as male animals while female human beings would be excluded.
- Another example would be the class of (say) [blonde person] and [ woman] .
- Classes with no member in common have the relation [is a distinct class from]
- the classes in the knowledge base can be considered a free with the [object] class as the root.
- a permanent class is one where membership by an object cannot change as time goes by.
- the object is a member of that class for the entire timeline, i.e. the properties of the class are so core to objects within it, that is reasonable to say that the object would cease to be that object (i.e. a different identifier would be needed) if those properties were ever to change.
- An example of a permanent class would be
- any object which is a tree is always a tree and if something radical were to be done to it to make it not a tree, such as cutting it down and turning it into a table, it is reasonable to think of the new object as a different object with a different identifier.
- the table would be the successor object to the tree but it would be represented as a different object in a different permanent class.
- Non-permanent class An example of a non-permanent class would be [lawyer] .
- a particular lawyer can only be an instance of this class for part of the time. Prior to qualifying (e.g. during his or her childhood) and perhaps after leaving the profession they would not be a member of the class. However he or she is a member of the class [human being] for the entire timeline as [human being] is a permanent class.
- an object is considered a member of a permanent class for the entire timeline even for parts of the timeline where that object isn't alive or doesn't exist.
- an object is considered a member of a non-permanent class only for the time period when the relevant attributes/class membership applied.
- PC principal class
- a PC is a class which is considered the most useful in instantly identifying what sort of object something is. In general it should be sufficiently specific a class as to give most of the common properties of an object, yet not so specific as to represent an obscure concept. Examples might include the class of human beings, nation states, trees, cities.
- the PC is useful for quickly stating what an object is in a way that a human user will understand. In some embodiments it can be used by the system for identification purposes too. For example, if several objects have the same name the system may use the principal class in combination with the name to uniquely identify the object to the user.
- all objects must have a PC or having one is strongly encouraged.
- a class cannot be a PC for some objects and not others which are members of it (i.e. it is a property of the class). For this reason when an object is added to the knowledge base and an assertion is made about a class of which the object is a member, there must normally be a PC on the way up the tree (if the asserted class itself is not principal).
- the PC of the object is the lowest (most specific) principal class of which the object is a member.
- One method for finding the principal class of an object is first to identify the classes of which the object is a member, i.e. a query is done looking for objects to which the entity has the relation [is an instance of] .
- the resulting class objects are then ordered using the [is a subclass of] relation and the most specific class labelled as a principal class is then considered the PC for the object.
- Principal classes are organised so that they are distinct from any other principal class at the same level in the ontology so there are no complications with overlapping (non-distinct) classes which would prevent identifying a single principal class for the object.
- classes can also be represented in terms of attributes. For example, being a member of the class [human being] can also be thought of as having the attribute [human] .
- a single attribute is equivalent to class membership. For some classes more than one attribute may be equivalent. For others a Boolean equation of attributes may define class membership. 2.3.5 Data/document objects
- URLs are named within the invention by using a parametered class [url] with a single-string object parameter, e.g. [url: ["http: //www.sanscript.oom/"] ] .
- the relation [is a url of] relates the object name for a document to a URL which contains the document's data.
- Core to the preferred embodiment knowledge representation method is the four object fact.
- the basic syntax is:
- [nams of fact] [object 1] [object 2] [object 3] i.e. four objects listed in order on one line, with a colon after the first one.
- object 1 and Object 3 can be of any type.
- Object 2 has to be a relation. This fact itself is an object with the name [name of fact] . When asserting knowledge all four objects have to be names.
- the names of facts are of the form [fact . ⁇ unique stringXSnetwork .machine . name]
- the network machine name e.g. an internet host name
- An alternative embodiment would associate the machine with the fact but include the name of the machine separately from the fact name.
- the other advantage of the fact concept is its lack of complexity.
- a sequence of four objects with an extremely straightforward syntax can be regarded as a permanent atom of knowledge.
- An unordered collection of such atoms can communicate and permanently store real knowledge without any of the problems of natural language.
- Yet another advantage of the representation is that facts such as the above can easily be stored in a standard relational database consisting of four columns with each field being text. Use of indexes means that combinations of known and unknown objects can rapidly be looked up.
- a further advantage is that as each atom of knowledge has a name, it is very easy to represent facts about facts. This is typically how time is represented (see section 2.3.7 below) but could also include knowledge about when the fact was added to the knowledge base, what person or entity added it or any of a large number of other possible assertions. The naming also gives a source that "owns" the fact enabling all sorts of possibilities relating to maintaining and verifying the fact over a network.
- Natural language generally asserts or implies tense relative to the present.
- static knowledge can be stored long term and we express time in absolute terms, i.e. we assert that things are true for periods or moments of time expressed as a date/time-of-day and not relative to the moment when they are expressed, i.e.
- Temporal data is associated with facts which in the preferred embodiment assert when the facts are true. Alternative methods are possible but doing this avoids the complexity of having to adjust the meaning of facts from moment to moment as time goes by.
- Temporal partners are facts that reference other facts and make assertions about when another fact is valid, i.e. we represent the temporal data about when a fact is true with one or more further facts.
- [fact.2143@semscript. can] [alejandro toledo] [is the president of] [peru] [fact.2144@semscript.com] : [fact.2143@semscript.com] [applies for timeperiod] [timeperiod: [timepoint: ["2001/7/28"] ] ; [iafter] ]
- attributes [young] and [asleep] are examples of transient attributes
- [blood group o] and [sagittarian] are examples of permanent attributes.
- Attributes which apply to a relationship and which are a consequence of their semantics, such as [symmetric] are permanent.
- this method is used for facts used for translating to and from natural language. The reason being partly that their use is in translating questions and statements that happen in the present and thus old versions of these facts are not very useful, partly because they would almost never be used and partly because they change very infrequently. Temporal partners could be included but it would needlessly complicate the translation process. Another common situation where this method is (has to be) used is when querying the system for the current time. A temporal partner for such a fact would be pointless. (An alternative approach for translation knowledge is to make such relations permanent. Although not strictly true, in practice words don't change their meaning very frequently and this approach is practical in a similar way.)
- a third situation where true now methodology is used is when the semantics of the fact are based partly or entirely on what is in the knowledge base.
- the relation [is a direct subclass of] (whether one class is immediately below another in the ontology) has the attribute [relation is true-now] as its meaning is affected by whether an intervening class is present in the knowledge base.
- This relation could exist between two classes and then cease to exist when someone inserted a new class between them.
- Another situation is temporal partners asserting a time period terminating with the [iaf ter] object. As this can be closed at any time such an assertion uses true now methodology.
- a temporal partner using the object [ti ⁇ u-period: [ti ⁇ igpoint: ["1987"] ] ; [iaf ter] ] asserts the time period from 1987 until the indefinite future.
- the fact may cease to be true.
- the fact ceases to be true and a new assertion needs to be made with the closing time period being an absolute time point. (Other embodiments could simply update the fact rather than asserting a new fact and labelling the old one as false.)
- the [tdiHsperiod] class is a class of parametered objects where the two descriptive objects are the point in time when the period of time commenced and the point in time when it finished.
- the first is [iafter] which indicates an unknown point in the future. It is used for things that are true at the time they were asserted but which are not guaranteed to remain true.
- the second and third are [time zero] and [forever] which indicate respectively a point in time infinitely long ago and a point in time in the infinite future. They are used to indicate infinite periods of time, for example the object [timeperiod: [tine zero] ; [forever] ] indicates the entire time line and would be used, for example, in a temporal partner for facts that are true by definition.
- the preferred embodiment has a special timepoint called [earliest meaningful point] .
- This is useful for situation where the user may not know or care about the timepoint when the relationship started but knows it was always true for as long as the fact could have been meaningful. In these situations [time zero] may be inaccurate and the alternative would be to just assert a recent time point when the user was sure the relation was true without saying it wasn't true before.
- An example would be asserting that the English city of Cambridge is geographically located within the English county of Cambridgeshire. Neither Cambridge nor Cambridgeshire have existed for all time but for as long as they both existed one has been located within the other, [earliest meaningful point] thus saves the user from investigating what this earliest meaningful date might be.
- facts are categorised as either permanent, true-now or transient.
- Permanent facts have one of the forms: ⁇ anything> [is an instance of] ⁇ permanent class> ⁇ anything> ⁇ permanent relation> ⁇ anything> ⁇ permanent attribute> [applies to] ⁇ anything> ⁇ anything> [applies for timeperiod] [timeperiod: ⁇ fixed start>; ⁇ fixed end>]
- the Golden Rule is that a relationship cannot both exist and not exist between the same pair of objects at the same moment in time. Contradictions or inconsistencies in knowledge represented by facts are produced by finding or logically generating breaches of this rule. Note that the representation of a timepoint is imprecise no matter how accurately it is specified. In order to create a contradiction we have to show that a relationship between the same pair of objects both existed and did not exist for two overlapping periods of time implied by the accuracy of the timepoint. For example the British queen Victoria was both alive and dead (not alive) in 1901 : she was alive in the part of 1901 before her death and dead in the rest of it. If someone remarries an hour after their divorce goes through they are married to two different people on the same day but without being married bigamously. If, however, you can show that someone was alive for one timeperiod and dead for another and show that the two time periods overlap, only then have you found a contradiction.
- this golden rule is used to answer "no" to yes/no queries. See section 2.5 for details. 2.3.10 Categories of knowledge
- Various embodiments of the system classify knowledge into certain categories in order to determine appropriate policies and actions for facts within these categories.
- Various embodiments can analyse a fact to determine (in at least some cases) which of these categories it falls into and act accordingly (e.g. when assessing the reliability of a fact or the penalties for it later being contradicted). For example, the [uniquely translates as] relation is always associated with true by choice facts.
- Example includes the capital city of a country.
- Another category is things that are widely believed, yet also untrue according to the strong preponderance of evidence. Some urban myths would fall into this category. In the preferred embodiment these are essentially untrue facts that would be dealt with like other factual knowledge with strict policies for what is needed before they can be asserted and removed using the same methods by which other knowledge is removed. In other embodiments special policies may be needed for knowledge that appeared in this class.
- Various embodiments can store and process fictional knowledge in the knowledge base by clearly labelling all "true by declaration” and "true from evidence” facts as belonging to a specific context (e.g. a fictional movie or novel). This way inappropriate facts can be ignored by the query processing system unless the query is specifically about the context requested. When a specific context is part of the query, all "true by declaration” and “true from evidence” facts not belonging to that context can be ignored and correct answers returned. "True by definition” knowledge can be used across contexts, even fictional ones. This method can also be used to extend the knowledge base to include facts belonging to contexts which are not strictly fiction but would otherwise fail to be considered as fact.
- the unique recognition string (see section 2.3.2.1) of a fictional object must make this clear to avoid any confusion. Thus the unique recognition string for [sherlock holmes] might be ["The fictional detective Sherlock Holmes”] .
- Contexts can also sometimes be inferred directly from a reference in a query to an object or relationship that only belongs to one particular context. For example, the question “What is the address of Sherlock Holmes?" would infer the context from the reference to the fictional character. "True from evidence” facts include the assertion of the class membership of an entity (e.g. of its principal class) so the fact [sherlock holmes] [is a member of] [human being] would be associated with a fictional context and not the base context. Some embodiments also use contexts to store conflicting "true by declaration” and “true by choice” facts. For example, when two different authorities disagree. Users can then resolve these conflicts by selecting contexts which they wish to be used when queries are answered. These selections can be permanently associated with a user and used until the user changes them. Knowledge associated with a particular religion can be modelled this way by associating it with a context pertaining to that religion.
- the universe is modelled as a huge array of objects and relationships between pairs of objects.
- named relationships between pairs of objects spring in and out of existence.
- Some of those relationships are known to exist at a particular timepoint, some of those relationships are known not to exist at a particular timepoint (negative facts) and with others the embodiment does not know.
- Queries are a machine-readable representation of a question, i.e. data which communicates to an embodiment what knowledge is desired.
- a number of representations are possible and the representation will often be at least partly determined by the chosen knowledge representation method.
- queries look very much like a series of facts but the purpose is to see whether they can be justified from knowledge found in, or inferred from, the knowledge base rather than to assert information.
- Variables can also replace objects in the facts (including objects within parametered objects).
- query f [abraham lincoln] [is married to] faiary todd lincoln] f [applies at timepoint] [timepoint: ["1859/5/3"] ] asks the question "Was Abraham Lincoln married to Mary Todd Lincoln on the 3rd of May 1859?".
- Variables can also be used in place of other objects in the facts. For example: query a f : a [is married to] [abraham lincoln] f [applies at timspoint] [timepoint: ["1859/5/3"] ] asks the question "Who was married to Abraham Lincoln on the 3rd of May 1859?".
- the query representation is extremely elementary in form and yet also extremely expressive in what questions can be represented. This simplicity in form has many advantages for automatic processing and the efficacy of additional techniques. Embodiments with more complicated or additional syntax in the query - e.g. with constructs taken from logic or programming languages — would fail to have these advantages.
- this simple representation means that the semantics of the query is unrelated to the order of the lines. Each line places a constraint on the value or values of each variable within the line. The collection of constraints define the information being sought and the query header specifies what variable values are the results of the query. Although the semantics of the query is unaltered by the line order, some lines may need to be processed prior to other lines in order to obtain results from the knowledge base. The query processing engine is thus free to reorder or chose to process lines in a different order should the query be presented in an order which cannot be processed.
- a more complicated query is the following: query a a [is an instance of] [nation state] t: a [is geographically located within] [the continent of Europe] t [applies at timepoint] [timepoint: ["1999”] ] tl: f [is the capital of] a tl [applies at timepoint] [timepoint: ["1999”] ] f [commonly translates as] d c [is the first letter of] d c [equals] ["p"] which translates as "Which continental European countries have capital cities whose names start with a 'p' in 1999?".
- the first line will generate a list of several hundred possible values for a (current and former countries) which will be whittled down by the tests in the next few lines (for location within Europe, etc.).
- the capital cities are looked up, translated into strings which are their usual English names and the first letter is checked to be a "p". Any values of a remaining after the last line is checked are returned by the query.
- lines in the query can be regarded as filters if they reference variables that have been mentioned in earlier lines. Such lines reduce the possible values for that variable by doing tests on it, substituting in all previously found values one by one and seeing if the resulting fact can be found (directly or after inference) in the knowledge base.
- the line uses a variable for the first time it can be regarded as something that generates values — finding all possible values for the variable that are passed downwards. If any values (or combinations of values) survive the generating lines and filters to the end of the query they result in a "Yes" answer for a truth query, or a list of objects for object queries.
- the preferred embodiment also contains certain parameters that can be added to lines in a query for efficiency and other reasons. These include: /s means that the current line should only be processed using static knowledge. There is no need to use knowledge generation to find this out (see section 2.4). A typical situation for this is to see whether a common attribute applies. If the attribute is a fundamental property that can be assumed to be always stored statically if it applies, then there is no point in doing anything more complicated to find it, e.g. a line in a query might be:
- this parameter also enables the query to "see" superfluous facts which have been labelled as invisible.
- Various embodiments of the present invention can generate facts not asserted directly in the static knowledge base usually (but not exclusively) by referencing and inferring these new facts from facts in the static knowledge base (and possibly other generated facts).
- One method of doing this is to hard code the generation rules using program code.
- the preferred embodiment takes a more flexible scheme by using generators.
- a "generator” is a stored entity used by the knowledge generation system to generate facts not present in the static knowledge base.
- a generator has one or J l
- taiget lines which specify a pattern for the facts that can be generated by this generatoi (these are termed "taiget lines" herein) m combination with mechanisms for geneiatmg facts that match this pattern
- a “dumb generator” such a mechanism may simply be a query
- the query gives values to the unknowns m the taiget line or lines and the results of the query are substituted into the target line (or lines) to generate the facts, if the query is successful
- a "smart generator” there is some program code (termed a "toot' heiem) optionally m combination with a query which is used to generate the facts
- the queiy format of the prefened embodiment although very expressive, is not Turing powerful This has many advantages m terms of efficient processing of the query but means that some inference steps cannot be achieved without additional piocessmg
- Tu ⁇ ng powerful step By adding a Tu ⁇ ng powerful step to the header query, as desciibed here, the full universe of possible inference steps can be achieved
- a simple example of a dumb generator is the following generator a%,b%, tp f : a% [is married to] b% f [applies for timeperiod] tp
- Dumb generators express inferences about how, for example, the existence of a relationship implies the existence of other relationships or how the existence of an attribute can be used to infer other facts.
- the query answering system first checks information stored statically, and then goes on to look at generators later by matching the line of the query it is currently on with lines in the footer of the generator (i.e. it works backwards). Only the lines marked with an asterisk can be matched. If the line matches, the top of the generator is run as a query (perhaps with values substituted for variables) to see whether the bottom lines can be considered as facts. If they are, the footer facts are generated and the generated facts are added to a cache. Any objects that match variables are included in the answering of the query.
- the character that ends a variable name indicates rules on what can be matched with it. Sometimes, when comparing the current line of a query with the asterisked footer line, a variable will match a variable, sometimes a named object will match a variable, and sometimes a variable will match a named object. Such matches can happen within parametered objects as well as at the top level.
- the percent sign after the variables in the matched line says that the variable can be either left as a variable (i.e. matched with a variable in the query line and filled by the query in the top half of the generator) or textually substituted for a name. If substituted, the variable is removed from the query statement at the top, and the object name is substituted into the header query wherever the footer variable appears.
- variable A dollar sign following the variable says that the variable must be replaced and textually substituted for a real object name from the query line being looked at — matching with other variables is not permitted and the generator will not be used if that is the kind of match found. If the variable has no percent or dollar sign it must correspond to a variable in the query line. By 'must' we mean that we cannot use the generator if the correct match is not present.
- the unique fact names for the results of a generator are created automatically by the inference engine and are assigned to variables if they are needed for temporal partners (as with the above example). Facts generated by generators are also inserted into a temporary cache by the engine so they can be quickly found for use in subsequent processing of the query. This cache is checked by the engine even before searching statically-stored local facts. The cache enables facts generated in earlier parts of the query to be accessed without running the generator a second time with the same objects. By keeping a record of what generators with what parameters generated items in the cache, the engine can avoid doing the same operation twice simply by using the cache items.
- This generator is vital as it simply is not practical to list, say, every instant when two people are married as there are an infinite number of instants in any time period. We instead statically store a period of time and if a query asks whether they are married at a given instant the above smart generator is put into action.
- timeperiod_to_timepoint tool essentially an executable function
- timeperiod_to_timepoint tool essentially an executable function
- the tool determines that the time point lies within the timeperiod, it generates the footer with an appropriate name for the newly-generated fact, otherwise it does not. Note that it is not possible to do this directly using a dumb generator as calculation is needed to determine whether one point in time lies within a named time period.
- Smart generators can also be used to retrieve highly dynamic knowledge from a conventional database. For example, a smart generator could be written to return the current share price of a particular company by querying systems in the stock market. (This knowledge in turn may be used by another generator to calculate the company's market capitalization.) In this case, as with the example of the current time, the smart generator is retrieving knowledge from a third source rather than calculating from facts originating from the static knowledge base.
- the computer code (“tool") that provides the intelligence to the smart generator is named in the preferred embodiment by name@machine.on.internet
- the machine. on.internet is a named machine which owns the tool and where the code can possibly be executed remotely.
- the term "local" refers to the code that can be found on the local machine and/or is part of the local knowledge processing engine.
- the generator description is stored in a relational database which is accessed by the query answering system.
- the name of the tool identifies the computer code to run.
- Many tools are hard-coded within the system and not accessible externally.
- the preferred embodiment also allows for users to add generators including smart generator tools using an interpreted language and an approval step. This is described in more detail in section 2.9.14.
- ageOlocal a is the age of] [group: b$; tp$]
- the way queries are answered is determined in part by the knowledge representation and query representation method chosen.
- queries can be run in a number of modes. Establish mode simply checks whether values can be found in the knowledge base that confirm the facts: "no" and "unknown” are thus the same result for truth queries.
- Step 402 involves searching for a temporal partner for the first line of the query. If there is one, step 404 is performed: creating a reverse query by making the relation negative (or positive if it is negative), and switching the semantics of the temporal partner between the concept of "within” and “for all of for the corresponding timeperiod (or, in the case of a time point, the time period implied by the accuracy of the time point). So, the [applies at timepoint] relation is replaced by [applies for all of timepoint] relation and the [applies for timeperiod] relation is replaced by [applies for some of timeperiod] and vice versa
- step 406 the reverse queiy cieated is simply the query line with a positive i elation made negative oi a negative relation made positive
- step 404 or 406 The reverse query created in step 404 or 406 is then run, and the result examined (step 408)
- a "yes" answer to the reveise query means that the routine can answer the original query with a "no" (step 410) If the answer to the reverse query is "no", then the answer to the original query remains unknown
- step 412 For example, although it might be possible for both the facts "John is married to Sarah in 1999” and “John is not married to Saiah m 1999” to be true (if they divorced in that same year) it would not be possible for both to be true if the second statement was instead "John is not married to Sarah for all of 1999" and in this case one statement being true implies that the other is false
- Queiylme objects are parametered objects that repiesent a possible line in a query (excluding the fact name) Each queryline object, therefore, has exactly three parameters These parameters are either the special object [queryline unknown] which repiesents a vanable or they are the names of specific objects For example, the possible line of a query n [is a child of] [president james monroe] and all similai lines with another variable are represented by the single queryline object [queryline: [queryline unknown] ; [is a child of] ; [president jarres monroe] ]
- step 504 looks up whether any information is available on the number of objects known to exist for the query (step 504). In the preferred embodiment, it does this by converting the query to a queryline object and running a second query to see whether there is a [has order] fact in the knowledge base. If there is no information on the number of objects, the completeness flag is set to unknown (step 506), and that line of the query is ran (step 508); the flag will then stay unknown for the remainder of the query. If there is information on the number of objects, it compares the number of results found after executing the line (step 510) with the number of objects known to exist (step 512), as asserted by the queryline fact in the preferred embodiment. If they match, the completeness status is preserved as complete. If the number of objects found is smaller than the number indicated, the flag is set to incomplete (step 514). (If larger, there is an inconsistency in the knowledge base, so the completeness is unknown, and the flag is set accordingly - step 516.)
- Step 518 checks whether there are further lines in the query. If there are no further lines, the process simply returns the objects found, and the status of the completeness flag. If there are further lines, then, for as long as the completeness flag remains complete, the engine does extra work to determine whether the results it has found so far continue to be complete.
- Subsequent lines in the query may filter the objects found (i.e. the line may include only a variable used to generate the objects on a previous line so when reached it substitutes the previously found objects in and only ones which can be justified survive).
- the completeness status is checked (step 520).
- step 522 If the completeness status going into a filtering line is unknown, the remaining lines of the query are executed (step 522), but no further checks on completeness will be undergone (the flag remains set to unknown). If the status is incomplete, the completeness status changes to unknown afterwards no matter what the result (step 524): we do not know whether the missing objects would have passed through the filter or not without knowing what they are.
- step 526 If the completeness flag is set to complete it then becomes important to do extra work if the object fails to pass through that line (step 526). If the answer can be shown as a "no” then the completeness status of the query so far is unchanged. If, however, it is unknown, then the completeness flag has to be changed to unknown as well.
- the method used to determine between "no” and “unknown” is exactly the same as the one used to answer a truth query with "no" described above (and illustrated in Figure 4): essentially the relation in the query line is made negative and any temporal partner is added to cover all of the timeperiod specified - if this new query is found to be true we can answer "no" to the original mini-query and preserve the status so far as complete.
- One of the desirable (but optional) features of various embodiments is the generation of a justification for its answer to a query. Such explanations are a helpful feature because they demonstrate where the answer "magically" produced came from, thus greatly improving the confidence the user has in the result. Moreover, although the results may have come from a computer, a human being ultimately has to use that knowledge and take responsibility for its accuracy.
- Another advantage in embodiments which include user assessment is that the user has a chance to see where an incorrect answer came from and do something about the incorrect fact or facts that resulted in that incorrect response.
- the preferred embodiment is operable to produce two types of explanation: a detailed explanation which is essentially a step-by-step proof of the answer and a concise explanation designed to give the user a hint about where the result came from.
- Other embodiments may produce one or the other (or none).
- Figure 8 shows an example of both types of explanation in an embodiment. (This figure is described in more detail in section 2.5.6.)
- a data structure which is a linked list of items where each item can either be a string containing a line of natural language (typically describing an event in the processing of a query), or a fact.
- This data structure can either hold the entire explanation, or the explanation for some part of the answering of the query.
- many smaller queries are executed because many of the lines in the query involve the use or possible use of generators and the header queries in the generators need to be run. Some of these generator queries succeed and some fail - when they succeed, the explanation for those queries producing the used fact forms part of the parent explanation.
- the final step is to translate our data structure into the natural language explanation. Translation involves the following three steps:
- the third line can be eliminated.
- Various embodiments including the preferred embodiment can also display a concise explanation.
- this is just the statically stored facts that were referenced on the way to answering the query using a method similar to that described above but with all the inference steps and inferred facts not shown.
- the human user can intuitively understand any inference that was done and any incorrect knowledge used to answer the query is most likely to be in static form. (If the generator is incorrect in some way this can be seen with the detailed explanation which can be selected by the user if they cannot understand what has happened.)
- Purely calculated facts are generated facts which are not inferred from static facts, e.g. They are facts which the generator has sourced from somewhere external to the static knowledge base.
- this abbreviated explanation enables links to be placed next to the static facts referenced thereby allowing the user rapid access to this functionality.
- the concise explanation is also often short enough that it can be displayed under the answer to the question without occupying excessive screen space.
- generating a concise explanation can be achieved by scanning the lines of the detailed explanation and extracting out the facts that came from the static knowledge base (avoiding duplication).
- Alternative embodiments can generate a concise explanation from scratch without the need to generate a detailed explanation.
- the concise explanation is generated by keeping track of the essential facts which were referenced while the query was being processed.
- Various embodiments may refine the concise explanation to include only an essential subset of the static facts referenced when answering the query in order to make the information presented to the user even more concise.
- Candidates for elimination are the more unintuitive facts such as properties of relationships which users may know intuitively anyway, e.g. [syinnetxic] [applies to] [is narried to] .
- facts whose veracity are not in dispute and which have these characteristics are an especially high priority for elimination.
- Facts come from three sources: (1) the static knowledge base, (2) the knowledge generation system and (3) a cache of all facts previously discovered when processing this query (and in some embodiments possibly earlier than processing this query if the cache is not flushed between queries).
- the routines that retrieve from these three sources are static_search, generator_search and cache_search.
- the static facts are stored in a table in a standard relational database (the 'facts' table).
- the table has the objects in the fact stored in fields id, left_ ⁇ bject, relation and right_dbject. Each combination of these is indexed for speed.
- - negative a Boolean field which makes the relation negative (corresponding to the presence of the tilde ' ⁇ ' character when the fact is written out).
- true whether the system believes the fact is true (set by user assessment and system assessment - see below).
- visible whether the fact is being used to answer queries.
- AU untrue facts are invisible and some superfluous ones are also invisible in certain embodiments.
- superf lu ⁇ us whether the fact can be generated by the system anyway.
- contradicted whether the fact is in semantic conflict with other believed-true facts challengeable: Boolean: whether further user assessment is allowed for this fact.
- lastjupdate the date and time of the last system assessment of this fact. superf lu ⁇ us and contradicted are set by system assessment.
- the true field is set by system assessment (sometimes using user assessment data). User assessment is described in section 2.10. System assessment is described in section 2.11.
- the parameters passed to the static_search routine are: The queryline currently being searched; - A pointer to a list of facts into which the routine will place the static facts that match the queryline (i.e. a place to put the returned facts);
- a pointer to a list of explanations to explain each fact returned A pointer to the query that is being processed;
- routine When the routine is called it builds a SQL SELECT statement to retrieve the static facts from the table that may match the queryline.
- the WHERE clause also needs to specify the negative field according to whether the relation in the queryline is positive or negative.
- the SQL query is executed to retrieve a list of static facts. Each of these facts is then tested against the queryline if necessary to ensure it matches. The facts that match are added to the fact list with a simple explanation added to the explanation list.
- the explanation consists of two lines: "I know from statically stored knowledge that" and the fact itself.
- the generator_search routine receives as parameters the queryline and a pointer to a list of facts and explanations where the matching generated facts are to be placed. In combination with the generators themselves and tool implementations it forms part of the knowledge generation subsystem in the preferred embodiment. If the queryline ends 1 Vs" generator_search simply exits. If it ends "/1" it exits if or when there is one returned value.
- the first thing it does is assemble a list of generators that are capable of producing facts which match the queryline provided. It does this by matching the queryline against the target lines of the generators and selecting the generators that have one that matches. In embodiments where generators can have more than one line to match, the routine may need to scan later lines in the query to match against the other target lines once the first line has been matched. In these embodiments, a pointer to the query will need to be passed to enable this scanning. For each matching generator it then does the following: If there is a header query it:
- This explanation is the explanation for the set of values used, generated by the processing of the header query, plus an introduction line, plus the facts generated using this set of values.
- the introduction line is "Therefore:” and the name of the generator.
- smart generators without a header query it is "By calculation:” and the name of the smart generator.
- the cache is where facts previously found using the other two sources are stored.
- the cache contains the facts and the best (shortest) explanation associated with each fact.
- the routine receives a queryline and a pointer to fact list and explanation list as parameters.
- the facts in the cache that match the queryline are to be placed in the fact list and their corresponding explanations in the explanation list.
- the correspondence between the explanation and fact is established by the ordering, e.g.
- the 5th explanation in the list corresponds to the 5th fact in the list. It also receives a pointer to the query being processed as a parameter. This enables the routine to keep the detailed explanation a little neater by avoiding explaining the same fact twice.
- the process_query routine maintains a record of all the queries that are currently being recursively processed by maintaining a pointer in the query object that points to its parent query.
- Child queries are queries which are being processed to provide answers for another query. That is, a child query is the query that is formed from the remaining lines of a query when the first line is resolved (see below for how this is done) or a query in the header of a generator called when processing a queryline for a parent query.
- the query object holds a 'pre-explanation' which contains the explanation for a set of values which is pending while the remainder of the lines using those values are evaluated. It also contains a standard explanation which is the partial explanation so far for the query.
- the cache_search routine can determine whether this fact has been explained previously.
- one implementation is to hash each fact several times to enable fast lookup even with the unknowns.
- one simple implementation designed to rapidly locate facts in the cache could create three open (externally-chained) hash tables for left_object, relation and right_object pointing at all facts with a named object in the hashed position.
- Possible cache matches for a queryline could then be located by looking up cache facts that match the known object(s)/positions(s) in the queryline.
- a full check needs to be done on the candidates but the hash tables would mean the number of candidates checked was substantially smaller than an exhaustive scan of the cache.
- a faster implementation is to additionally create a hash table for each combination of two known objects, e.g. facts matching a queryline containing a known left object and known relation could be rapidly looked up if all facts were hashed on their objects in those positions.
- the process query routine receives the following parameters: - A pointer to the query to be processed.
- a pointer to a list of strings used to return variable results results.
- the number of sets of results can be determined by dividing the number of strings in the string list by the number of header variables in the query. (For truth queries no variable values are returned.)
- the process_query routine also returns a status value indicating the status of the query when processing has finished.
- the possible return values for truth queries are: - Yes: the truth query can be satisfied.
- Figure 6 shows the process_query method of the preferred embodiment. This figure assumes the query is being run in full mode and that explanations are being generated. (If it isn't, the steps necessary for completeness, answering no and generating explanations can be skipped.)
- the "unresolved stack" In order to avoid infinite loops a record of all querylines currently being recursively processed is maintained, the "unresolved stack". The first thing that is done with the queryline is to check whether it is anywhere in this stack (608). If it is, unknown/completeness unknown is returned (610) and the routine ends. Otherwise the queryline is added to the unresolved stack. (612).
- the "queryline cache” is a record of all querylines that have been successfully processed. By keeping a record of all processed querylines and storing every result matched to a queryline in a cache, the static search and generator search routines can be skipped when the queryline has been processed before, making the routine more efficient. (For this reason both the queryline cache and the fact cache must be flushed simultaneously or not at all.) In step 618 the queryline cache is checked.
- step 620 If the queryline has not been cached the static and generator searches are undertaken (step 620) and the queryline added to the queryline cache (step 622). (Either or both of these search routines may be skipped if the queryline ends "/1" and a fact has already been found.) Control then passes to step 624 which sees whether the queryline contains any variables and whether any matching facts have been found.
- step 626 If there are no variables and no results, we test for "no" as described above (step 626) and return no/complete if successful (step 628) or unknown/completeness unknown if not (step 610). In either case, the queryline is removed from the unresolved stack before completion (step 611) If there are results or variables in the queryline, control goes to step 630 where a check is made to see whether there are any facts found which match the queryline.
- routine returns unknown/completeness unknown (step 610).
- step 632 duplicate facts are removed. If there are duplicate facts the one with the shortest associated explanation is the one kept. Control then proceeds to step 634 where a provisional return result is set. If it is a truth query the provisional result is yes; if an object query and the order isn't known, the result is completeness unknown, if an order query and the number of matching facts matches the order the result is set to complete, otherwise the result is set to incomplete
- Each query has an explanation called a 'preexplanation' that is used to retain a potential part of the query's explanation should the query be successful. It is the explanation for the fact which is being substituted into the remaining lines. It is also scanned by the cache_search routine to avoid explaining the same fact twice.
- Each child query has its preexplanation explanation stored and set as the explanation for the fact being used generate it.
- the header variables for each subquery are also reduced for each variable that is matched to the current fact. For example if the header query contains the variable "a" and the queryline contains an "a”, the child query will no longer have "a” as a query variable as this is now satisfied in the child query.
- step 626 we now have for each matching fact a preexplanation of the fact a set of iesults for the corresponding query and an explanation for each set a ietum value foi the query a set of header variable values that were determined from the first line (possibly null)
- Success of a child query is defined as follows an object query ieturmng >0 results a truth query ieturmng yes a truth query returning no when current query is a truth query and all other child queries have ietumed no as well
- step 638 all duplicate sets of iesults are eliminated from those that succeeded When duplicates are located, the result that is ietained is the one with the shortest explanation
- Contiol then passes to step 640 wheie the explanations are taken care of This is done by merging the preexplanation foi the fact with the explanation returned by the query that returned the results This combined explanation is appended to the explanation for the mam query and associated with the ietumed iesult set by adding it and the result set to the lists passed as parameters to the process_query call
- step 642 the ieturn result is calculated and returned
- the ieturn result is no if all the child queues ietumed no, yes if any_yes is set and unknown otherwise
- the ieturn iesult is completeness unknown if any_unknown is true otherwise it is the result set provisionally m step 634
- One approach in some embodiments is to simply leave it to the person writing the query (e.g. in the translation template) to put the lines into a sensible order.
- Another approach is to add some line reordering code in the process query routine where a flag is set if the current queryline is potentially producing too many results to store and instead of just failing, the line is reordered to the end of the query. Failure would only occur if the line failed a second time (when being processed in its new position).
- the question "Is Sean Connery resident in the UK?" has been entered into a web browser connected to an embodiment of the invention (802).
- the question has been entered into the embodiment's "general prompt” (804).
- the embodiment is able to immediately answer the question in the negative (806) and produce a list of the static facts it used to provide that answer (808).
- the key one of importance to the human user is that he has been resident in the Bahamas since at least the 15th of March 1996.
- the static fact expressing this is:
- the process_query routine proceeds line by line as described above in section 2.5.4.
- the first line is readily solved by a smart generator which generated the single fact: [current time] [applies to] [timepoint: ["2006/7/3/11/12/02"] ] satisfying the first line.
- the solution for the variable now was then substituted into the remaining lines to produce the following query: query f : [sean cannery] [is living in] [united Kingdan] f [applies at ti ⁇ epoint] [timepoint: ["2006/7/3/11/12/02"] ]
- process_query then tries to answer "no" to the question by inverting the relationship and changing the relation in the temporal partner to the corresponding one as described above.
- the resulting query is: query f : [sean cannery] - [is living in] [united Kingdan] f [applies for all of timepoint] [timepoint: ["2006/7/3/11/12/02"] ]
- This query is then passed recursively to the process_query routine which sets about trying to justify the first line.
- the tool [equals2@local] is passed the values of a$ and b$ ( [the bahamas] and [united kingdom] ) and simply checks that they are different objects. It then generates the fact: [the bahamas] - [equals] [united kingdom]
- bahamas [is an instance of] [geographical area] f : [the bahamas] ⁇ [is geographically located within] [united kingdom] f [applies for timeperiod] tp
- the first line of this query is satisfied from the static knowledge that:
- generator tp,c% t a$ [is an instance of] b /s t [applies for timeperiod] tp b [is a subclass of] c%
- bahamas [is geographically located within] [united kingdom] the generator [geog_distinct2®semscript.coni] is used: generator t3 a$ - [equals] b$ fl: a$ [is an instance of] c fl [applies for timeperiod] tl
- Generator [generator. geog_distinct3®semscript. com] gives meaning to the relation [is geographically distinct from] : generator t v: a$ [is geographically distinct from] b$ v [applies for timeperiod] t
- timeperiod [timeperiod: [timepoint: ["1996/3/15"] ] ; [iaf ter] ] the tool is able to generate the fact that it is true for all of this timepoint.
- Denotational strings are strings in a specific natural language that denote objects in the knowledge base.
- Denotational strings are linked to their corresponding objects via facts. These facts can be stored statically or in some cases generated by the knowledge generation system. e.g. The facts:
- Generators can also be used to generate denotational strings.
- the tool string_to_id simply converts a string in the form "[ ⁇ id>]" to [ ⁇ id>] and creates the fact in the event that the right object is specified and the left not; converts an id to its string form with square brackets around it in the event that only the left object is specified; does nothing if neither are specified; and checks that the two match and generates the fact if they do, if both are specified.
- This generator thus generates all facts of the form: [" [abraham lincoln] "] [can denote] [abraham lincoln] in response to any query line with the relation and at least one specified object.
- This generator enables users to use any internal id to communicate with an embodiment.
- timepoint_parser receives the string s$ (and a% if specified) and sees whether s$ corresponds to any of the various formats that we use to specify points in time. If the string can denote one or more points in time the corresponding facts are generated (after comparing to see if they match a% in the unlikely event that a% is specified).
- This generator can generate facts like: ["the 3rd of January 1992”] [can denote] [timepoint: ["1992/1/3”] ] ["June 1732"] [can denote] [ti ⁇ epoint: ["1732/6"] ]
- the common translation string is a concept which exists is various embodiments. It is a natural short string that denotes the object in natural language. It need not be unique but needs to be fairly specific and suitable for communication about the object in context. Common translation strings are asserted with the [ccmmonly translates as] relation. An example is: [william jeffers ⁇ n clint ⁇ n] [c ⁇ m ⁇ nly translates as] ["Bill Clinton"]
- generators can be used to generate common translation strings for certain special objects such as integers, strings, timepoints etc. 2.6.3 Unique translation
- generators can be used to generate unique recognition strings for certain classes of object such as strings, timepoints, parametered objects etc. e.g. [integer: ["8128”] ] [commonly translates as] ["8128"]
- [group: [abraham lincoln] ; [florence nightingale] ] [commonly translates as] ["Abraham Lincoln and Florence Nightingale”] are all examples of translation facts generated by generators.
- the third example uses a smart generator to query the knowledge base for the common translation strings for each object in the group and then ties them together in to a list.
- the preferred embodiment contains the following smart generator: [tool . centralpresentf ormconversionl@semscript . can] generator
- Another string translation is [is an attribute form of] where the form in combination with the second object can be considered a kind of attribute of the first object, e.g. ["the capital of"] [is an attribute form of] [is the capital of]
- the example embodiments described give support for the English language. However the principles described herein can also be used to create embodiments which support other natural languages. There are several thousand living languages used throughout the world and a desirable feature in various embodiments is to provide support to either an alternative language to English or to multiple languages either including or not including English. As the underlying knowledge representation method is distinct from natural language (unlike document based systems) this support can allow access to at least some of the same underlying facts to users communicating in multiple natural languages.
- a translation template m the preferred embodiment contains • the pattern: a sequence of known and unknown strings using variables for the unknown strings;
- An example translation template is: "what is” /"what 1 S" a b
- the top line is the template. Any sequence of three recognised strings where the first is "What is” or “what's” will be matched with this line and the query at the top run to see if it produces results.
- the templates are indexed by facts in the form [ ⁇ string>] [is part of the translation] [ ⁇ te ⁇ plate name>] , When analyzing the string, we therefore only need to look at a small number of templates which may match — we do not need to scan them all.
- this search for the string in the knowledge base is done with two checks. The first to see if it is labelled as being part of a translation template using the query: query
- Recognised strings can be hashed to save having to check whether they are recognised more than once.
- [is an attribute form of] is a translation relation that describes how English phrases can express a relation in a function sort of way. For example, “the spouse of, “the mother of, "a child of, etc.
- [can denote] is the translation relation that relates singular nouns (or noun phrases) to an object name within the knowledge representation system.
- the query is then run and the results will then be substituted into the bottom query as the correct translation of the question: query e
- This query is the correct (and only) translation of the natural language question. This query is then executed as follows:
- the first line will result in a smart generator call to a tool which will give a single value to the variable now.
- the second line will be found in the static database with e given the value [the french city of paris] and f given its fact name.
- the final line will finally be verified by using the smart generator which infers the truth of [applies at timepoint] statements from [applies for timeperiod] statements found in the static database.
- the final line will be verified as true if the current time lies within it (or at least one of them if more than one time period is found).
- Figure 9 shows the method of translating an item of natural language using translation templates.
- Step 902 is to break the natural language question into sequences of recognised substrings.
- Step 904 checks to see whether there are any unprocessed sequences left, and ends the process if there are no more (or none to start with). If there are sequences still to be examined, the next one is selected (step 906) and all translation templates that might translate this sequence are then looked up (step 908).
- Step 910 checks to see whether any of these possible translation templates remain and returns to step 904 if not, otherwise it proceeds to step 912 where the next unprocessed translation template is selected.
- the current translation template is compared with the current sequence of strings (step 914), and if they do not match then control is passed back to step 910. (These steps ensure that every sequence is matched with every possible translation template that might match.) If they do match, step 916 is then done, and substitutions are created between the variables in the template representing unspecified strings and the strings that actually appear in the sequence. These string objects are substituted for those variables in the header query.
- Step 918 which executes the query is then done.
- Step 920 sees whether any results from this query are still to be processed and if so it selects the next set (step 922) and substitutes the results into the translation query to produce a possible translation (step 924). If not, it returns control to step 910.
- question templates can also contain fields which helps the system translate the question or fact assertion back into natural language. Translating back into natural language has value in demonstrating to the user that the system has correctly understood the question asked. In cases where the question is ambiguous, it also has value in enabling the system to list various alternative understandings of the question asked so that the user can select the one intended.
- the field is a sequence of natural language strings and variables resolved by the queries in the template.
- the system translates the objects into natural language and outputs the sequence of pre-determined strings and translations to generate a translation of the entire question.
- the variables are all generated by a further query (equery) which generates string objects from variables and objects resolved with the other queries in the translation. These string objects are the ones referenced in the translation sequence.
- Ambiguity is where the natural language has more than one potential translation. Ambiguity can sometimes be resolved from other information in the knowledge base.
- semantic constraint knowledge is knowledge about the meaning/use of objects in the knowledge base which limits how they are used by any entity that understands the object's meaning.
- Semantic constraint knowledge can be used to distinguish between translations which are likely to have been intended and those which are unlikely.
- Left and Right classes of a relation are properties of a relation present in some embodiments including the preferred embodiment.
- Left and right classes are a form of semantic constraint knowledge used in the preferred embodiment.
- the engine can resolve ambiguity as a last resort by asking the user for more information. It does this by translating the queries back into English and listing them on the screen. The user then selects the query that he or she intended to ask. Although individual words and phrases translating into multiple objects are a common cause of ambiguity, different translations may also come from different translation templates.
- [is the right class of] and [is the left class of] are permanent relations. Furthermore, in the preferred embodiment the classes they indicate are always permanent classes. This simplifies the ambiguity resolution as there is no need for temporal partners.
- attributes can also have a class associated with them. [human being] [defines the scope of] [unmarried]
- the scope of an attribute is defined by the semantics of the concept the attribute represents and thus provides a sanity check on any interpretation where the object is outside this scope.
- ["single”] [can denote] [single track music recording] so queries can also be generated with lines starting: [single track music recording] [applies to] ... which can be eliminated by the fact that the left class of [applies to] is [attribute] and [single track music recording] is a [class] and not an attribute.
- the template for this question could be:
- the header query will eliminate the [single track music recording] interpretation without semantic constraint knowledge even being needed.
- FIG. 10 shows a process of testing a single translation to see whether it can be rejected. Step 1002 sees whether there are any remaining lines in the current translation that have not yet been tested. If not, the translation is declared OK (step 1004) and the process ends.
- step 1006 the next unchecked line is selected (step 1006) and a check is made to see whether the relation in the line is a variable or a known object (step 1008). If it is a variable, control is passed back to step 1002, otherwise a check is made to see whether the left object is named (step 1010). If yes, the knowledge base is consulted to see whether the allowed classes of the relation determined by [is the left class of] facts contradict the actual class of the left object (step 1012). If they do the translation is rejected (step 1014) and the process ends. If the information is not there, or if the class is OK, control passes to step 1016 where a check is made to see if the right object is named.
- step 1018 a check is made to see whether the query line is a test of an attribute against an object (step 1018). If it is, a check is made to see whether the object is outside the scope of the attribute (step 1020) and the query is rejected if so. If it isn't a check is made on the right object against the right class of the relation (step 1022) and again the query is rejected if it fails (1014). If all the checks are passed, control passes back to step 1002.
- a process for dealing with the results of translation, including rejecting ones that can be rejected, presenting possibilities on the display, and using a fall-back strategy (see section 2.6.8 below) is illustrated in Figure 11.
- Step 1102 obtains a list of possible translations (possibly using the process illustrated in Figure 9 described above).
- Step 1104 tests to see whether there are any remaining translations and if there are not it advances to step 1112. If there are, the next one is selected (step 1106), and it is tested to see whether it can be rejected (step 1108). This step perhaps uses the process described in Figure 10 as explained above. If it can be rejected it is deleted (step 1110) and control returns to step 1104.
- step 1112 tests to see how many translations remain. If more than one translation remains step 1114 is done and all the remaining translations are displayed on screen, and the user is asked to select the intended one (an example being illustrated in Figure 12 and described in more detail below). If exactly one translation remains, it is assumed to be correct and presented as the answer (step 1116). If no translations remain, step 1118 is done, in which the system sufferes that it was unable to translate the question and uses a fall-back strategy. This fall-back is described in more detail in section 2.6.8 below.
- Figure 12 illustrates how the question "When was Paris released?” would be dealt with by one embodiment of the present invention.
- the system found eight translations for the string "paris" and created queries for seven of them.
- the one involving the city in France was rejected by the translation template because the initial query asked for the translation to be an [animated visual medium] (but it might also have been rejected later by checks using the semantic constrain knowledge that the left class for [was published at timepoint] has to be an [animated visual medium] ).
- the possible results were translated back into English and presented to the user to select from (screen 1202).
- the user selected one of them by clicking on the link and the result of that selection is the corresponding query being executed and the result displayed (screen 1204). This is achieved by encoding the query as a string and passing it as a parameter in the URL using HTML GET protocol.
- a refinement found in some embodiments is to track the frequency of use of differing objects corresponding to a single denotational string and use this data to suppress very rare interpretations. For example, a contemporary non-famous person named "Abraham Lincoln” would be entitled to have a fact saying that Ms name can denote him. However, it is very likely that anyone using his name is trying to denote the former US President and being offered a choice every time in such circumstances could cause irritation to users. Avoiding this can be achieved by associating the denotational possibilities (string and object) with each translation used and logging the selection when a user selects from a list of possibilities. When one denotational possibility is noticed to be significantly less commonly used than the others (e.g. if it is the intended selection less than one in a hundred times) the embodiment can choose to suppress it completely or relegate it to a list behind a link (e.g. saying "click here for other less common interpretations").
- a further refinement extends this disambiguation strategy further by seeing whether the answers to the various questions are the same before prompting the user to choose between them. If the answers are all the same, the answer is then output instead of asking the user to choose the intended question. With only a relatively small number of possible inteipretations a further embodiment may output the answer to each interpretation after each interpretation instead of letting the user select first.
- Two questions having the same answer may happen by coincidence when (say) the objects being identified have the same answer.
- a question asking the nationality of a person where the name entered denotes two different people need not ask the user which of these two people is meant if they both have the same nationality.
- Another example is when the question is parsed in two distinct but nevertheless semantically similar ways.
- the phrase "british city" within a question may be parsed as identifying a specific subclass of cities [british city] or it may be parsed as identifying members of the class [city] with the attribute [british] .
- An additional refinement present in some embodiments is to eliminate duplicate queries.
- Multiple translation templates may produce identical queries from a different way of viewing the translation.
- To eliminate the duplicates when this happens involves a test for equality. Testing queries for equality can be done with the following steps:
- Sorting the lines of the query into a pre-determined order (unaffected by variable names). This can be achieved by assigning all variable names a fixed value and sorting the lines into alphabetical order.
- Normalising variable names This can simply be done by renaming the variables in the order they appear in the sorted lines taking variable names from a pre-determined list. e.g. vl, v2, v3 etc. A substitution table is maintained so that variables that have already been renamed can have their new name substituted in. The header variables also need looking up and substituting from this table. Equality is then a matter of testing for: An identical sequence of lines, and The same set of header variables.
- the engine fails to translate the natural language text entered by the user it can do better than simply say "Sorry".
- the program lists all the sub-strings of the question that it has recognised. This information gives feedback to the user about how close the system came to understanding the question and which bits were not understood.
- the string profile screen includes any objects that are denoted by the string. Clicking on those gives a profile screen for the object. It is possible that a standard profile for a recognised object will answer the question that the user asked even though the question was not fully understood.
- the preferred embodiment can often translate assertions of fact using a method almost identical to the question translation described above in section 2.5.
- FACTLIST This is achieved by the creation of an entity called a FACTLIST which looks a lot like a query but with no variables.
- a FACTLIST is simply a list of assertions of fact.
- To translate assertions from natural language the template simply has a FACTLIST as the result of the translation instead of a query.
- the preferred embodiment will then also prompt for when the fact is true.
- ambiguity resolution techniques described above can also apply to FACTLISTs as semantic constraint knowledge applies to facts as well as querylines.
- a FACTLIST can be looked at as structurally similar to a truth query.
- the preferred embodiment provides the user with an unambiguous retranslation of their question back into natural language which is done without referencing the original question provided by the user. As seen above this enables the user to have confidence that their question has been correctly understood. In the case that there are several interpretations of their question, it also enables the user to select the intended one.
- Various other embodiments are also operable to translate a query into natural language if these fields are absent or if the query came from somewhere other than being the output of the translation system.
- the unique recognition strings can be looked up with a query.
- Similar special cases can be generated with either the left object or right object unknown or when the timepoint is specified.
- left and right objects are unknown various embodiments can refine the language by checking the [left unique] and [right unique] properties of the relation.
- Other common patterns of queries can be translated by similar matching.
- the fall-back translation can be used when the query doesn't match any of the checked-for patterns. It may be less natural than a pre-determined translation but can still be understandable. It can be implemented in some embodiments by:
- Determining the most specific likely class for each variable in the query This can be achieved by using the semantic constraint knowledge to determine a class based on the variable's position within a query line and selecting the smaller class if more than one is generated (distinct classes would imply a query that cannot be answered).
- the class will start as [object] (the root class) .
- a “profile” is a collection of user-perceivable information pertaining to a specific object represented within the system.
- Profile generation is the facility for an embodiment of the invention to generate profiles.
- the user perceivable information is an information screen delivered as a web page. It is commonly used when users wish to find out general information about an object rather than something specific (where they may choose to type a question instead).
- the preferred embodiment also implements its profile generation system by the use of multiple profile templates.
- Profile templates are data which describe the general form of a profile and, in combination with knowledge extracted from the system, enable the profile generation system to generate a profile for a specific object.
- a translation template exists which will translate a single denotational string of an object to a specially formatted query starting "profile:”. Queries matching this format are passed to the profile system for rendering instead of to the query answering system, thereby generating an information page. This enables users to see a profile for an object just by typing a denotational string which can denote that object.
- the profile generation system of the preferred embodiment includes the ability to generate a profile of an object showing key information about the object in a standard form. Any object within the system can be the subject of a profile, including objects, classes, relations, facts etc.
- the information shown about an object, and the format in which it is displayed, is a consequence of the profile template selected and the class the object belongs to: for example, a profile of a human being might include information about their date of birth and occupation, while a profile of a fact might include information about when the fact was asserted and by whom.
- Profiles in the preferred embodiment can contain both knowledge from the knowledge base (e.g. Abraham Lincoln's date of birth) and information about the knowledge base (e.g. the history of people endorsing a fact). That is, even if the implementation of the embodiment stores certain system specific information outside the static knowledge base the embodiment can choose to display it in a profile.
- knowledge base e.g. Abraham Lincoln's date of birth
- information about the knowledge base e.g. the history of people endorsing a fact
- the system also allows that the same class of object may have multiple types of profile available for different memeposes. These different types of profile may be formatted in different ways, and may also contain different information. For example, the 'employment' profile of a human being might show their current and previous occupations, while the 'family' profile of the same human being may show their parents, spouse and children.
- This embodiment could still show emphasised profiles in a similar fashion by adding classes to accommodate multiple profiles.
- the family profile described above could be attached to a [human being with family] class, essentially with the same members as [human being] .
- the data about what information is included in a particular profile and how it is formatted is encapsulated in the template.
- profiles are output as HTML for display to the user, but other embodiments may include output of profile information in any perceivable format, even including non- visual formats such as synthesised speech.
- Figure 13 shows an example of the profile system in operation in the preferred embodiment.
- the object [abraham lincoln] is being profiled through several different profiles.
- Screen 1302 shows him being profiled through a special profile designed specifically for members of the class [us president] (current and former Presidents of the United States). This is the narrowest class of which [abraham lincoln] is a member and is the default if nothing else was specified.
- This screen gives information specific to this class such as the start and end dates of his term of office and his predecessor and successors in the job.
- Each profile screen contains a drop-down list of classes of which the object is a member and which have one or more profiles attached to them (1304).
- screen 1306 the user has switched the selection from "us president" to "human being” and is now being shown [abraham lincoln] through the default [human being] profile.
- US president related knowledge is absent but information common to all humans is shown, including date of birth, place of birth and marital status (the marital status fact is at death for deceased people and the current time for live ones in this embodiment).
- a second drop-down list enables the user to navigate between profiles for a specific class (1308).
- screen 1310 the user has selected the "family" profile for [human being] and the system has responded with a screen emphasising Abraham Lincoln's family members.
- Figure 14 illustrates how the profile system can also display knowledge stored outside the static knowledge base and how profile screens can be linked together.
- Screen 1402 shows a profile screen of a single fact in the static knowledge base. It describes the fact (1403), giving details of any temporal partners (or subject facts) with links, gives access to user assessment (see section 2.10) by providing endorse (1404) and contradict (1406) buttons, gives the status of the fact (1408) and provides a button to immediately redo the system assessment (1409) (System assessment is described in section 2.11). It also provides an endorsement/contradiction history of the fact (1410). Screen 1412 is a standard [human being] profile that could be obtained by clicking on any of the links under [william tunstall-pedoe] in screen 1402.
- Screen 1414 is the [human being] profile with the emphasis on their contribution to adding knowledge to the illustrated embodiment.
- This subcategory of the [human being] template is labelled "worldkb user”. It contains statistical information about the number of facts reported, as well as listing recent fact assertions and assessments by this user which can be browsed by clicking on the link to open the relevant corresponding profiles.
- the choice of profile template is a function of a particular class that the object belongs to (called the “profile class”) and a string (called the “profile type”), both of which are optionally specified by the user. If one or both of these parameters is unspecified, the behaviour is as follows:
- the system finds the most specific class to which the object belongs which has a profile template. This is achieved (in the preferred embodiment) with the following steps shown in Figure IS
- this process will yield only one result, but if there is more than one the system can prompt the user to choose between these possibilities, or the system can choose automatically based on some deterministic criteria (for example, choosing the most frequently used profile class). Other embodiments may attempt to determine which class has the smallest number of elements.
- the [us president] class will be used since this is the narrowest class in the set. If the profile type parameter is unspecified, the string "default" is used.
- Alternative embodiments may use a procedure for selecting a profile template that can be customised to suit a particular user.
- the template is expanded to generate a profile screen to display to the user (1508)
- a profile contains transient facts, it may be that the facts in question do not have meaningful values at the current time because the object in question no longer exists.
- the preferred embodiment deals with this by showing a profile for the last time at which the object existed (e.g. a dead person's date of death).
- Other embodiments may deal with this in various ways, including prompting the user for a different timepoint to generate data for, displaying a historical view of all values of data over the course of the object's lifetime, only displaying values which are applicable at the current time, or a combination of these techniques.
- profile templates are stored as XML documents
- the template can intersperse XHTML nodes (which have their oidmary meanings regarding formatting content) with system-defined nodes (which have special behaviour associated with them)
- system-defined nodes can contain arbitrary XML data mside them (including XHTML nodes, othei system-defined nodes or charactei data) and cany out a variety of operations, including
- nodes can be combined with each other to cairy out arbitrarily complex opeiations
- Figme 16 shows the process of expanding the profile template At the beginning of this process, the profile template is selected as descubed above (1602)
- the output of this parse process is a tiee structuie where each node is represented by an object
- Each node object has an (oidered) array of references to child node objects, and a single reference to a parent object
- Each node object can have an arbitrary list of parameters, extracted from the node attiibutes in the original XML source, which can affect the output of the subsequent piocessmg step
- the pieferred embodiment uses an object-onented model where each node object is an instance of a class documentjnode, or some subclass
- the class document_node provides a method called iender(), which can be overridden by child classes to provide special behaviour for these nodes
- a namespace prefix is used for the purposes of this document, the prefix 'tmpl' will be used to identify nodes relevant to the templatmg system, although any prefix could be used so long as it is consistently applied
- node objects will perform the renderQ method on each of their children in rum, although particular types of node object may ove ⁇ ide this behaviom
- the results of each of these render functions are combined together (in a way that may depend on the type of the node in question) and returned to the caller
- the value returned by the root node of the paise tree is the HTML document to be displayed to the user
- Figure 17 shows an example template expansion
- the example template 1702 This includes two queiy objects (1708 and 1710), which fetch information fiom the knowledge base It also includes two value-of nodes (1712 and 1714), which identify places within the mark-up where the results of these queries will be embedded
- the output will be HTML suitable for displaying to the user, with the corresponding values expanded (1706) 2.7.4 Template node class hierarchy
- the character_data_node represents chaiacter data from the XML document Identifying which parts of the template XML to treat as chaiacter data is the job of the XML parsei Nodes of this type are forbidden to have any child nodes (attempting to add a child node throws an exception). During the parse phase, the character data is copied from the template document.
- values prefixed with a '$' symbol indicate special variables, which may be expanded by the profile system to allow information about the environment to be passed in to the profile.
- the variable '$object' will be replaced with the ID of the object that is being profiled, which can be used both in knowledge base queries and in text to be displayed to the user. This is seen in Figure 17 where 'Sobject' is expanded to the string [sean cannery] during profile expansion.
- a query_node can carry out a query to the knowledge base or to any other source of data (e.g. SQL database) accessible by the system on which the template expansion is executing.
- a query_node object is instantiated when a tmpl:query node is encountered in the source template.
- this query is conceptually carried out when the query_node is first encountered (though execution can in fact be delayed for optimization memeposes).
- An iterator is a pointer that runs through values in a result set and executes other nodes for each value.
- a class that inherits from iterator_controlled_node is one that will vary its behaviour depending on the presence or otherwise of an iterator that can control it.
- the iterator_controlled_node class has an abstract method find_controlling_iterator, which implements the logic for searching through the page hierarchy for an iterator that controls the output of this node. 2.7.4.5 value of node (extends iterator controlled node)
- An instance of the value_of_node class is generated by a tmpl:value-of node in the source XML. It is forbidden to have any child elements.
- This node selects only one variable from a result set: this variable is specified by the select attribute.
- the query from which to select results is specified by the "query" attribute.
- the value selected from the result set may be influenced by a controlling iterator.
- a value_of_node will regard another node as a controlling iterator if it satisfies all of the following conditions:
- the iterator is in the node hierarchy above the current node
- the iterator is selecting from the same query as the current node ⁇
- the iterator is selecting the same result variable to the current node If a controlling iterator is found, then the node requests the current value of the select variable for the controlling iterator.
- the value_of_node selects the value of the variable specified in the result set specified. If there is more than one result in the specified result set, then it will take the first result in this set (according to the default ordering of this result set).
- the for_each_node object is generated from a tmpl:for-each node in the source XML. It is an iterator that acts on a result set from a query_node object.
- a variable to select can also be specified, in which case the iterator ranges over distinct values of this variable.
- for_each_node can also itself be controlled by an iterator, allowing for nested loops.
- a for_each_node will regard another iterator as a controlling iterator if it satisfies the following conditions:
- the iterator is in the node hierarchy above the current node " The iterator is selecting from the same query as the current node
- the iterator is selecting a different result variable to the current node.
- Figure 18 shows part of an example template being transformed.
- This template is designed to produce a list of European countries and cities within them, formatted as HTML.
- the template draws from a data set (1804).
- a sample data set showing a possible result of the "european_cities" query is shown at 1814.
- iterator A When iterator A is rendered, it searches for a controlling iterator and finds none. Therefore it uses the entire result set, and iterates over distinct values of the variable "country”. It renders all its child nodes once for each of these three values.
- the value-of node on line 13 finds iterator B as its controlling iterator, and displays the current value of this iterator each time it is executed.
- iterator A After these child nodes have been executed, iterator A then carries out the same process with each of the remaining elements in its result set. Iterator C has no controlling iterator, so it simply iterates over all distinct values of the "City" variable. The value-of node on line 20 is controlled by iterator C, and displays the corresponding value each time it is executed.
- An attribute_node modifies an attribute on the parent node it belongs to. It is generated by a tmpl: attribute node in the template XML.
- the attribute_node first renders all its child nodes and concatenates the result. First, the character_data node corresponding to the string "http://" is rendered, and then the value_of_node is rendered (fetching a result from the specified query). These two results are concatenated (to produce a valid URL) and returned to the ⁇ img> node.
- the ⁇ img> node sets the resultant URL string as an attribute (with the name "src") on the node when it produces the opening XML tag. Therefore, the output might look like this:
- if_node An instance of if_node is created in response to a tmpl:if node in the template source.
- the if_node allows a condition to be specified.
- the condition is evaluated, and if it evaluates to true the content of the child nodes is included, otherwise the child nodes are ignored.
- choose_node An instance of choose_node is created in response to a tmplxhoose node in the template source. It acts similarly to a switch statement in C, i.e. it conditionally executes one of several branches depending on which condition is satisfied.
- the choose_node expects its children to be of type when_node or otherwise node, and will execute the first one in the list for which the corresponding condition is satisfied. 2.7.4.10 when node (extends document node)
- a when_node has a condition attached to it, which has to evaluate to true m order for the parent choose_node to execute it
- the condition attached to a when_node may be an arbitia ⁇ ly complex Boolean expiession, and may include the following types of operations (among others) • fetching iesults fiorn query_node objects
- This node is equivalent to a when_node whose condition always evaluates to true This has the effect that the branch below this node will be executed if and only if none of the previous when_node conditions evaluates to true
- a macro_node defines a section of node tree that can be repeated later on in the document with certain parameter values expanded
- the "name” attribute defines a name that will be used to denote a call to the macro later on
- the "params” attribute is a comma-separated list of parameters that will be made available when invoking the macro latei on
- the preferred embodiment requues the system to know who is using the knowledge base when changes to the knowledge base are asserted (e g addition of knowledge or usei assessment)
- One embodiment of the present invention uses a local identifier for useis m similar fashion
- the ieal-woild identifiers within the system are used
- Other embodiments combine both schemes allowing authentication with local user entities and real entities and/or a subsequent step of linking the local entities to a real- world id
- the process of authentication in this embodiment is illustrated in Figure 19
- the usei In order to log on to the system the usei must first assert his/her real name, identifying him/herself in the same way that that any other object is identified (step 1902 - the "select_object” process with [human being] as a parameter, described in section 2.9.6).
- the process checks to see whether that entity has an associated password (step 1904). If a password exists, the user then authenticates him/herself with that password (step 1906). The system associates the user's real- world identifier with that session of interaction with the system.
- the real world identifier is the same one as identifies the person within the knowledge base.
- the system first prompts the user to say who he/she is (step 1902). The user responds by entering his/her name (e.g. "Michael Smith”). The system then looks up this natural language string in the knowledge base ["michael smith”] to see which entities it could denote. If it only denotes one entity, the system moves immediately on to prompting for a password (step 1906). If two or more entities in the system are denoted by this string, the system lists the unique recognition strings for these entities and asks the user to select which entity he/she is (e.g. "Are you (1) Michael James Smith, date of birth 29th January 1969; (2) Michael R.S. Smith, the children's book author").
- This screen also has a link to follow to add a new entity if none of the alternatives are correct (see section 2.9.7.1).
- the user can also short-cut any ambiguity by entering the internal object name in square brackets (e.g. [michael james smith 32] ).
- the square brackets show that he/she is entering an internal name and not a natural language name.
- the password entered by the user is checked for validity (step 1908), and if invalid, another opportunity given to enter the correct password (step 1910).
- the entity trying to log on to the system is not present, he/she is first taken through the process of adding him/herself as an object to the system using the normal object addition sequence of screens/prompts (see section 2.9.7.1). In the preferred embodiment this is the one situation where an unauthenticated user entity is allowed to add knowledge.
- the knowledge asserted is labelled as coming from the entity added.
- a password is prompted for (twice to guard against the possibility of mistyping) to be associated with this entity and used for authentication in the future (step 1912).
- the password entered by the user should be checked for suitability (step 1914), and if unsuitable an opportunity given to enter a better password (step 1916).
- the password created by the user is then associated with the entity in the knowledge base (step 1918).
- the user entity can be logged in (step 1920).
- a check is then performed to see whether or not the user is a new addition to the knowledge base (step 1922). If the entity had to be added as a new object, his/her user rank and the time when he/she became a user of the knowledge base are asserted (step 1924). (Embodiments without a system of user ranks would omit this last step.) It is useful to request core facts about a new user at this stage (step 1926 - the
- true-identity establishment is the system/methods used to prove that the real-world identity being asserted by a user as corresponding to him or herself truly is him or herself.
- True-identity establishment is used to limit the possibility of people impersonating people whom they are not and is used in various embodiments incorporating real identity user authentication.
- users can be given a temporary id when they first interact with the system and that temporary id is linked to their claimed identity. In this way, more than one user could potentially be linked with a real identity until the methods described herein allow one of them to win out.
- This method also enables facts labelled with the temporary id of someone who is later established to not be who they are claiming, to be suppressed or to have a low weight associated with their user assessments.
- each of these methods provides evidence that the user is not impersonating someone whom they are not.
- Various ways of combining this evidence into an overall belief are possible.
- each item of evidence is given a score corresponding to an estimate of the quality of the evidence and the user is labelled "true identity establishment proven" once a total score threshold has been reached.
- Other embodiments could use a probability based approach where each item of evidence is incorporated into a probability calculation giving an estimate of the chances they are truly who they say they are.
- the first method is to allow people to validate themselves using a real-world documentary id.
- the system can present the user with a form containing a unique code number which is proof that they have logged on and invite them to mail the form with a copy of a real-world id such a driver's license or passport belonging to the person they are asserting they are.
- a real-world id such as a driver's license or passport belonging to the person they are asserting they are.
- the combination of the id document together with the code number would be evidence that the user possessed the document sent in and thus was who they asserted they were.
- This second link can be achieved in a variety of ways.
- the domain on which the email address is based may belong to the entity or another entity closely associated with the entity.
- the domain may house a website which is recognised as the official website of the person or their employer. Representations by trusted users that this is the case can also be used to infer the link between the real world person and their email address.
- identifiers linked to users will denote human beings - i.e. the actual person who is logging in.
- other entities which are considered capable of asserting knowledge can also be supported by various embodiments.
- an identifier which denotes a business can also be used. The business would be responsible for limiting the authentication method (e.g. knowledge of the password) to people to whom it grants the right to represent the business in asserting knowledge.
- certain denotational strings can be translated appropriately.
- the translation routines can parse words such as "my” and "I” and successfully infer denotational facts relating to the user entity as a result.
- a further advantage is in managing the privacy of users.
- Various embodiments can allow an authenticated user to configure various aspects of what personal knowledge is published for privacy and other reasons. This can be done if that user has authenticated themselves. For example, with instructions from an authenticated user, facts of the form, [email address: [joesmt ⁇ 571@hotinail.ccrn] [is an email address of] [joe smith] could be suppressed or only published to authenticated friends of [joe smith] according to the policies and selections of the user.
- 2.8.4 Authentication for third-party systems
- One embodiment could use public key cryptography to allow the user to authenticate themselves and then sign a message with its private key, transmitted to the third party machine which proves this.
- the signed message can contain data provided by the third party machine relating to this session.
- Needham-Schroeder protocol is used with the embodiment acting as the authentication server.
- the details of Needham-Schroeder protocol have been widely published elsewhere and need not be repeated here.
- Knowledge addition refers to the techniques by which knowledge may be added to the system by users.
- the preferred embodiment is designed to enable almost everything needed to make the system produce and display knowledge to be added to by general users including the addition of individual objects, relations, classes and attributes; the assertion of facts; and the addition of profile templates, generators, tools and translation templates.
- the preferred embodiment uses a natural-language based, inteiTogative approach, interacting with the user by asking natural language questions and obtaining input from the user in response, often in an extended sequence, i.e.
- the knowledge addition subsystem can be considered in various embodiments as a natural language interrogation system designed to collect real- world knowledge from human users for addition to the static knowledge base in structured form.
- the users are interacting with at least one remote server by feeding input into a local client computer.
- An interface is provided to the user at the user's computer which transmits data determined by the actions of the user (e.g. entering natural language text, clicking buttons) to the remote server.
- Prompts and other responses relating to activity at the server computer are presented to the user on the computer screen at the user's location.
- actions such as providing interfaces and presenting responses which take place locally to the user.
- the interface comprises one or more web pages specified in HTML containing form elements.
- the web-browser on the local client computer displays the web page containing instructions and form elements and actions by the user result in data being transmitted using HTTP over the internet back to the remote web server.
- the source of all facts in the knowledge base should be published and thus obtained during knowledge addition. This allows other users to judge the veracity of a fact by examining these sources. At a minimum an identity for the user adding the knowledge can be recorded. In some embodiments this also enables automatic assessment to be done on the likely veracity of the fact.
- the first category of source is the user entity him/her/itself. In this case, when interacting with the system, the user asserts that the knowledge asserted is known to be true directly by the user (from the user's own experience). An example of this would be something the user has seen. In this case the user is the direct source of the knowledge. Other valid reasons would be for facts which are true by definition. Various embodiments could also enable a user to label themselves as the source when there are numerous independent sources, they are certain and they are happy to take responsibility for the fact being true.
- the second category is where the user asserts that the knowledge comes from another named source.
- the user is representing that the named source of the fact is the entity described and this entity is the direct source of the knowledge. Obtaining this information is a matter of prompting for it during the user's interaction with the system when the knowledge is being asserted.
- a second source can be identified (and if necessary added first) in the same way that any other real- world entity is identified.
- the preferred embodiment also prompts the user for an optional natural language statement of the source of the fact. This string is also stored with the fact and can be used for later assessing of the validity of the fact by editors and/or others.
- the source is a named web page
- the preferred embodiment takes and stores a local copy of the page.
- the knowledge base can contain the fact [the cia] [is responsible for content at] [dcnnain name: ["www.cia.gov”] ] which would allow any document copied from that website to have [the cia] automatically assigned as the source.
- the preferred embodiment asks the user to provide one, giving the user the option to say that there isn't one or that there is one but only for the page cited. If the user asserts a source for that page only, the source and document is associated with the fact. If the user asserts a source for that domain an [is responsible for content at] fact is asserted.
- some embodiments include indirect sources of knowledge is that it enables it to establish confidence at least partly on the number of independent sources of a fact that appear to exist. For example, an embodiment which labelled the source solely as the user asserting the fact could give an incorrectly high degree of confidence if a magazine made an assertion that was then repeated by a large number of independent users who had read that magazine. In this case the probability that the fact is incorrect is the probability that the magazine was incorrect, not the probability that each of the individual users was in error. With an indirect source listed, a high degree of confidence can be inferred from the number of users that the magazine did indeed assert this fact, but the confidence in the fact itself can be assessed on the basis that there was only a single source.
- the preferred embodiment uses a number of different protocols to determine when and if additions by users are used widely. Other protocols can be used in alternative embodiments.
- the "immediate publication" protocol can be used for the addition of new objects, classes and relations and permanent facts to the knowledge base, i.e. the creation of a new id and various core facts about the object added.
- Facts added using "deferred publication” protocol are not immediately published to any user other than the one who asserted them. i.e. They are not used in the answering of queries initiated by any user other than one labelled as the user who asserted them. However, they are visible to users who specifically request a list of such facts and these users can use user assessment (see section 2.13) to endorse the fact. Once a number of users have endorsed the fact it becomes visible to all users. As a fact asserted a second or more time counts as an endorsement of the original fact, it isn't a requirement that the fact can only be endorsed by users who specifically request such a list. In various embodiments, this is implemented by endorsements and contradictions contributing to a total score for the fact.
- the difference between facts published using deferred publication and immediate publication is that with deferred publication, the threshold is high enough that the assertion of the fact by the original user is insufficient for the fact to immediately be made visible.
- Defei ⁇ ed publication can be used for certain sensitive facts where an incorrect fact has a reasonable probability of being asserted incorrectly or maliciously and where relying on immediate publication and later suppression by user assessment is insufficient.
- the preferred embodiment uses deferred publication in just a few special cases checked for in the system assessment system when summing the endorsements and contradictions generated by user assessment. These cases include asserting a date of death for someone who has a date of birth within a hundred years of the current time and when the user is not related to the person whose date of death is being asserted (checked for with a query). Another example is the assertion of the end of a marriage (assertion of a timeperiod with an ending timepoint that isn't [iafter] when a timeperiod ending [iafter] is in the kb). These examples are things which might be asserted maliciously and which, as they can become true at any time, cannot be dealt with easily using system assessment or fact pattern suppression. They are also examples that could cause distress if they were published incorrectly.
- Editor (or staff) approval is where a high ranking user must first explicitly approve the item added before it is widely used. In the preferred embodiment it is used for added generators, tools, translation and profile templates. Facts published under the deferred publication protocol can also be essentially approved by high ranking users as they can also visit the list of such facts and use user assessment to make them appear. Being high ranking users, their user assessment can be configured to result in immediate publication as the contribution to the sum that their endorsement gives can be set to the total above the publication threshold in all cases. The difference between editor approval and deferred publication is that with the "editor approval" protocol, low ranked users cannot contribute in any way to the item being published.
- a protocol used in some embodiments is to immediately publish all facts asserted by trusted users.
- Embodiments with a separate user assessments table can do this by linking tables.
- this is used for deferred publication of facts and additions of unapproved translation templates, generators and profile templates. Using this technique in these latter cases allows users to upload and test the effects of what they are adding without immediately affecting others.
- a documentor string may be prompted for during the creation of an object.
- Documentor strings are particularly useful in describing class, relation, and attribute objects, and consequently a documentor is always requested (though not necessarily required) during the creation of these types of object. Whether or not a documentor string is requested during the creation of other types of object depends on the complexity or abstract nature of the object concerned, and the information about whether or not to request one is held at the level of its principal class.
- This class contains all objects which are pre-recorded displays of moving images, e.g. movies, television adverts, flash animations. Members are not physical objects, i.e. the sequence of images is identified, not the medium on which it may be recorded.”
- This documentor of a class would typically also be displayed on the profile screen (see section 2.7) describing the class object, i.e. the profile screen for objects of class [class] . It can also be used whenever a user is using the class to add knowledge as an extra check they are using it correctly.
- process denotes an interactive, automated method for communication between an embodiment of the invention and a user. Most processes are designed to elicit knowledge from that user.
- this interrogative interaction is achieved with a sequence of web pages containing form elements, natural language prompts and explanations and buttons.
- the user enters answers into the form elements and selects appropriate buttons based on the prompts.
- Information entered is then re-presented to the user ideally in a different form for confirmation.
- the user then has the chance to confirm what they said or to return and try again.
- the knowledge obtained from the user is added to the static knowledge base increasing the knowledge that is known about.
- part of the process may involve another process which in turn may require another process etc. (termed herein as "sub-processes"). For example, when adding a new object to the knowledge base, the user may be prompted for the name of a class to which this object belongs.
- the user may choose to add the class, opening the "add class" process as a sub-process. Once they have finished adding the class, the process for adding the new object needs to continue on from where it left off.
- processes can be implemented simply by coding the sequence of pages using a server-side scripting language and opening a new browser window for each sub-process. The user can then simply close the new browser window when the sub-process is finished and return to the original window, now able to continue.
- the sub-process happens in a continuous sequence of pages, optionally with a single page introducing and terminating the sub-process with simple messages like "We will now begin the process of adding this class" and "thank you for adding this class, we will now return you to where you left off.
- all processes are coded using PHP but other server side scripting languages are also suitable. (A great deal of information on implementing web interactions in PHP and other server side scripting languages is described elsewhere and the details need not be repeated here.) y ⁇
- an array (the "user workspace") is created. This array is stored in the PHP session to make the data persistent.
- One of the elements of the user workspace array is another array — the "process stack”.
- User interaction with the system is conceptualised as a series of processes ('select ⁇ object', 'add_object', etc.). Ongoing state information for the processes is stored in the process stack with the current process sitting at the top of the stack.
- Each process is modelled as an array (the "process workspace”), itself stored as an element in the process stack. Processes can be pushed onto the stack and popped from it as required.
- each process has a single controller script. It also has a series of pages (for user interaction) associated with it also written in PHP. For convenience, the files (controller and pages) for each process are stored in a separate directory belonging exclusively to that process. The controller handles which pages are shown to the user and in what order, responds according to the user's inputs, and performs operations such as writing knowledge to the knowledge base. In the preferred embodiment pages do not make changes to process data directly, but may look at process data and do other operations solely for memeposes such as determining appropriate wording for questions. This distinction between the relative roles of the controller and page scripts is not strictly necessary in terms of producing an implementation but was found to have some software engineering advantages.
- a process is started by running its controller script.
- the controller resumes the current session, and stores references to certain elements of the user workspace, including the process stack, in an object (this is a matter of convenience — other embodiments might store a copy of the whole user workspace as an array variable, for example).
- the controller needs to know whether its process is already in existence as the current process (i.e. the process at the top of the process stack), or whether it needs to push its process onto the top of the stack as a new process.
- Each process has a name; the name of the current process is stored in the user workspace, and each process stores the name of its parent process (the one below it in the stack) in its own process workspace.
- the controller stores a reference to the current process workspace. If the process associated with the controller is different from the current process name stored in the user workspace, then a new process workspace is pushed onto the process stack with its parent process set to the current process name from the user workspace, and the current process name in the user workspace set to the new process name.
- a step can be thought of as a stage in the process at which the user is asked for an input via a page.
- Each step has a name, and the process workspace includes an array of the steps visited so far as one of its elements. This array of steps is treated as a stack, with the current step at the top. Advancing to a later step involves pushing a new step name onto the step stack, and running the controller until it finds the block of code corresponding to the step at the top of the stack
- process termination is handled by a method on the user workspace object, and this method has a return page argument which specifies the page to go to if theie is no parent piocess
- a piocess that is frequently used by other piocesses is what is called the select_object process in the pieferred embodiment It enables a user to identify another object of any type If the object is already in the knowledge base, its id is returned If not, the user is given an opportunity to add it (using an appropriate sub-process) and then the id of the newly added object is returned
- all objects must have extensive natural language information recorded about them as they are registered in the knowledge base, including as many denotational strings as possible and a generally appreciated unique recognition string (see section 2.3.2.1).
- This enables other users to find the object (and thus the identifier); it greatly reduces the risk of a single object in the real world being given two identifiers within the knowledge base, as for this to happen two users would have to have no terms in common for what they were denoting.
- one internal identifier might be [abraham lincoln] .
- the internal identifiers are a natural language phrase and are distinguished from normal language by placing them in square brackets. This enables experienced users to short-cut the object selection process by simply typing the internal identifier in square brackets. The system will then know that the user is directly identifying an object, and (after checking that the identifier exists) can skip the screen where alternatives are listed or the unique recognition string of the object is displayed for confirmation purposes.
- Other embodiments use different syntax to distinguish between an internal identifier and a natural language string (e.g. the square brackets could be a different character). This also enables objects within the knowledge base to be identified and readily recognised in contexts very different from interactions with the preferred embodiment.
- a name in square brackets included on a printed business card or paper advertisement can be instantly recognised as an identifier pertaining to the preferred embodiment and users can then enter it in the system for more information, perhaps to obtain a profile screen or within a natural language question. (In the preferred embodiment, such identifiers can appear and be parsed within a natural language question.)
- a third embodiment can do away with any natural language in the identifier and use an internal identifier for objects (e.g. a unique number). This embodiment would rely on natural language being used to identify the object.
- Figure 20 illustrates the process of identifying and selecting an object in the preferred embodiment. The process begins by asking the user for the object that he/she wishes to select (step 2002). The user may either enter a natural language string or the object's internal identifier if he/she knows it. A request is then sent to the knowledge base for objects matching the string (step 2004).
- step 2006 The number of matches found is examined (step 2006), and the user is given options accordingly. If only one match was found, the user is asked to confirm whether or not the matching object is the right one, and given alternative options, if the matching object is not what was sought, of trying again, or adding the desired object (step 2008). If the user entered an internal identifier, and a match was found, then the process omits step 2008, and continues as though confirmation had been given. If no matches were found, the user is given the options of trying again, or adding the desired object (step 2010). If more than one match was found, the user is presented with the unique recognition strings of a list of matches (each linked to their profile) and asked to select the one intended, but is also given the alternative of trying again, or adding the desired object (step 2012).
- step 2012 would only be entered if the number of matches were below some number judged to be reasonable (otherwise the user would be returned to step 2002 with a notice asking him/her to enter a more specific string).
- Step 2014 is a check on the user's response to the options given in step 2008, 2010, or 2012. If the user opted to try again, the process returns to step 2002. If an object was selected, the process terminates, returning that object. If, however, the user opted to add the desired object, a check is made to see whether the object's class is complete (i.e. labelled as having all members already fully identified in the knowledge base). If the class is complete, objects can't be added to it. This is explained to the user (step 2018), and the process returns to step 2002.
- the object's class is complete (i.e. labelled as having all members already fully identified in the knowledge base). If the class is complete, objects can't be added to it. This is explained to the user (step 2018), and the process returns to step 2002.
- the process must first examine the class of the object being requested (step 2020). If the object is a class, then the "add_class” process is initiated (step 2022 - described in section 2.9.7.3). If the object is a relation, then the "addjrelation” process is initiated (step 2024 - see section 2.9.7.5). If, however, the object is of any other type, a check is made to see whether the object could be a class or a relation, i.e. whether class or relation are subclasses of the class of the object being requested (step 2026), and, if necessary, the user is asked to clarify (step 2030).
- step 2032 If the user's response is that the object is a class or a relation (step 2032), then the class is reset accordingly (step 2034), and the process returns to step 2020. If the object is not a class or a relation, then the "add_object" process is initiated (step 2028 - see section 2.9.7.1).
- One type of knowledge that a user may wish to assert is the existence of an object not already present in the knowledge base. This task may be a goal in itself, or it will come up when the absence of an identifier for an object is discovered during the assertion of other knowledge.
- the act of adding a new object includes the creation of an internal identifier for the new object, an assertion of at least one class the object is a member of, the storage of a unique recognition string (or other unique recognition data) for the object and the collection and storage of at least one denotational string for the object.
- Embodiments also seek to collect other useful knowledge about the new object in the process of interacting with the user.
- adding new class and relation objects is sufficiently different in terms of the knowledge collected that they are implemented in separate processes. All other objects are handled by the add_object process.
- the add_core_facts process (section 2.9.11.1) mitigates this somewhat by collecting additional knowledge from the user tailored to the specific class of the object added.
- This process is for adding a new object to the knowledge base. This process is used when the object is not a class or relation as these have sufficiently different needs to use different processes (see below). add_object is used for all individual objects, physical or conceptual including attributes.
- Figure 21 shows the steps involved in adding a new object to the knowledge base.
- the class for the object is set first (to the root class [object] by default, but can also be set to another class by a calling process, e.g. during authentication, the class can be set to [human being] ).
- the process begins with the user being asked for the most common term for the object to be added (step 2102) - this will be assigned as the common output translation string.
- the knowledge base is queried for other instances of the same string within the same class, and if one (or more) is found, the user is presented with the unique recognition string of the corresponding object, and asked to confirm that it is not the one that he/she is in the process of adding.
- the user's response is tested (step 2104) — if one of the matching objects is the intended one, an assertion is made that the string is the common output translation of that object (step 2105), and the process terminates returning that object.
- the process attempts to identify the principal class of the object by consulting the ontology of the knowledge base (step 2106). Whether or not it is able to do this will depend on the circumstances in which the process was called (if the class is the default root class, no principal class will be found, but if the class has been set to [human being] then [human being] will be the principal class). If a principal class can be established, then it is assigned as the principal class for the object (step 2108). If the process cannot find a principal class, then the class of the object may not be specific enough, so the "select_object" process is initiated for the user to identify and select the most specific class for the object (step 2110 — described in section 2.9.6).
- the class returned by "select_object” is then tested to see whether a principal class can be determined from it (step 2112). If a principal class can be determined, then it is assigned as the principal class for the object (2108). If not, then the user is asked to confirm that the selected class really is the most specific possible (step 2114). A change of mind at this point returns the user to the "select_object” process, but otherwise the user is permitted to continue adding the object with no principal class.
- the object's class is then tested to see whether or not it is permanent (step 2116), and if it is not then the "select_timeperiod_for_fact" process is initiated for the user to state the period of time during which the object was a member of the class (step 2118 - described in section 2.9.12).
- the next step (2120) is to request a unique recognition string for the object.
- the knowledge base is queried for any other instance of the same string, and in the (unlikely) event that one is found, the user is presented with the corresponding object, and asked to confirm that it is the one that he/she is in the process of adding.
- the user's response is tested (step 2122) - if the matching object is the intended one, all the knowledge gathered so far is asserted to be true of that object (step 2105), and the process terminates returning that object. If the matching object is not the intended one, the user is returned to step 2120.
- denotational strings are important in avoiding duplication within the knowledge base and to translate as effectively as possible, so as many should be added as the user can think of.
- the common output translation string and unique recognition string already added can themselves be regarded as denotational strings, and are set accordingly by default.
- the process then requests additional denotational strings (step 2124 - illustrated in detail in Figure 33 and described in section 2.9.9), which are checked for matches in turn.
- the addition of denotational strings may be terminated if a match is found and the user confirms that it is the object that he/she wanted to add (step 2126). In this case all the knowledge gathered so far is asserted to be true of that object (step 2105), and the process terminates returning it. Otherwise the user continues adding strings until he/she can think of no more.
- the object is an attribute (determined from its class).
- two additional pieces of knowledge will be required. First the user is asked to identify the attribute's scope (the most general class of objects to which it can apply) via the "select_object" process (step 2130). Next the user is asked whether or not the attribute is permanent in its application (step 2132). As these are the only two extra items of knowledge required by the preferred embodiment for attributes, there is no special add_attribute process. Other embodiments may have special handling for other classes here or may have additional special processes for objects of a certain type. It is now desirable to choose an identifier for the object.
- the system creates a valid identifier from the common output translation string (to be valid an identifier must be unique, must only contain certain characters, and must be within a particular range of lengths).
- This identifier is presented to the user, who is given the choice of accepting it or creating a different one (step 2134). If the user chooses to create a different identifier, this is checked for validity before the process can continue. Once a valid identifier has been chosen, if the principal class is one that takes a documentor string, then the user is given the option of adding such a string (step 2136).
- the object being added is a human being, and the user is not already logged in or adding him/herself via the authentication process, then it is necessary to know whether the person being added is, in fact, the user. The user is asked about this, if necessary, at step 2138.
- step 2142 can be initiated (step 2142 - illustrated in Figure 36 and described in section 2.9.11.1).
- step 2142 is omitted if the object added was the user him/herself (in such a case "add_corefacts" is called instead at the end of the authentication process).
- the process then terminates, returning the new object's identifier.
- FIG. 22 An illustrative session of a user using an implementation of the add_object process to add the US state of Oregon to an embodiment is shown in Figure 22 and Figure 23.
- 2202 shows the initial screen of the add object process where the user is prompted for the most normal name of the object being added (additional instructions and examples are omitted for space reasons). The user enters "Oregon” and proceeds by clicking the "enter” button (2203).
- Confirmation screens act as a double check against incorrectly entered information and allow the user to change their mind and replace what they have entered.
- a general philosophy of the preferred embodiment is that confirmation screens should ideally re-present the knowledge given by the user in as different way as possible from the way that the knowledge was initially prompted for, to ensure that the user fully understands the significance of the knowledge they are providing.
- ["us state"] [can denote] a a [is an instance of] [class] which produces one result.
- select_object presents the one result and asks for confirmation that this is the one intended. If more than one result had been returned (an ambiguous denotational string) the user would have been given the option to select the one intended. The option to try again or add a new class corresponding to this denotation string is also provided.
- the user is happy to confirm that the unique recognition string for the class, "state of the United States of America" corresponds to what they were intending to say and the process proceeds to the confirmation screen 2210
- the user confirms that they are indeed trying to say that Oregon is a state and the process controller then checks to see whether the class is permanent or temporary with the query: query
- [class is permanent] [applies to] [us state] A permanent class is one where its members cannot cease to be members without being considered something fundamentally different. As the current US states were in existence prior to joining the union and could conceivably someday leave the union and still continue to exist, the class [us state] was considered to be a temporary class when first added to the knowledge base. (An alternative ontology could make it permanent and consider the independent version of each state to be a different entity with a different id. In this case, this would also have been a practical approach.)
- the add_object process now calls the select_timeperiod_for_fact process (section 2.9.12) to obtain a period of time for Oregon's membership.
- the next screen in the add_object process is 2220.
- the user is prompted for a unique recognition string for Oregon.
- the user enters "the US state of Oregon". As there is only one US state called Oregon and as everyone wanting to denote Oregon would know it was a US state this is sufficient.
- 2304 the user is prompted to create a list of as many possible denotational strings as possible for Oregon. 2304 continues to go around in a loop adding strings added by the user to the list until the user indicates that that the list is complete by clicking another button (not shown for space reasons). If any of the denotational strings can denote any pre-existing object not in a distinct class, the unique recognition strings of these objects would be shown to the user for confirmation that this is not the object they were intending to add.
- 2312 is the final confirmation screen. It presents all the facts gathered from the interaction with the user and by default sets the source as the user. If the user wants to communicate another source and/or document at this point they can do so by entering it in the add new source box. Doing so would repaint this screen with a drop-down list next to each fact allowing the user to change the source for one or more of the presented facts.
- One type of knowledge that a user may wish to assert is the existence of a new class that is not already present within the knowledge base.
- the procedure in the preferred embodiment is very similar to the process for adding any other object.
- the process used in the preferred embodiment for adding a class object is illustrated in Figure 24.
- the process begins with the user being asked for the most common term for the class to be added (step 2402) - this will be assigned as its common output translation string.
- the knowledge base is queried for other classes denoted by the same string, and if one (or more) is found, the user is presented with that class, and asked to confirm that it is not the one that he/she is in the process of adding.
- the user's response is tested (step 2404) - if one of the matching classes is the intended one, an assertion is made that the string is the common output translation of that class (step 2440), and the process terminates returning it.
- the next step (2406) is to request a unique recognition string for the class.
- the knowledge base is queried for any other classes denoted by the entered string, and if one is found, the user is presented with its unique recognition string, and asked to confirm that it is not the one that he/she is in the process of adding.
- the user's response is tested (step 2408) - if the matching class is the intended one, all the knowledge gathered so far (the common output translation string and the unique recognition string) is asserted to be true of that class (step 2440), and the process terminates returning it. If the matching class is not the intended one, the user is returned to step 2406.
- the common output translation string and unique recognition string already added can be regarded as denotational strings, and are set as such.
- the process then requests additional denotational strings for the class (step 2410), using the loop illustrated in Figure 33 and described in section 2.9.9.
- the addition of denotational strings may be terminated if a match is found and the user confirms that it is the class that he/she was in the process of adding (step 3312). In this case all the knowledge gathered so far is asserted to be true of that class (step 2440), and the process terminates, returning the matching class. Otherwise the user continues adding strings until he/she can think of no more.
- the next step is to establish the position of the class being added within the ontology of the knowledge base.
- the process initiates the "select_object” process and asks the user to identify and select the most specific parent class for the class being added (2416 — described in section 2.9.6). If the parent class has any direct subclasses, the user is asked whether each is distinct from the class being added, or is a partial or full subset of it and this knowledge is recorded for later assertion (step 2418). Two classes are said to be distinct if they cannot have any members in common.
- the class of conceptual objects is distinct from the class of physical objects and the class of leopards is distinct from the class of trees. If a subclass is a partial subset of the class being added, then that subclass's own direct subclasses are found, and the user is asked the same question of each of them. If a subclass is a full subset of the class being added, then it can be asserted that it is a subclass of the class being added. Refinements to this step are possible in certain embodiments.
- some embodiments take the user through the ontology from a particular starting class (for example, a parent class suggested by the user or even the root [object] class if the user was unable to identify a parent), find the direct subclasses of that class, and ask the user whether any of those classes is a parent of the class being added. The user would then be asked about the subclasses of each class to which he or she had answered 'yes', and this question and answer process would continue until he or she had said 'yes' or 'no' to all the possible classes.
- Some embodiments usefully insist in the selection of just one parent class for the class being added, but others can permit the selection of multiple parent classes. For example, in an ontology containing the classes [mammal] and [sea-dwelling animal] a user could legitimately (and usefully) select both as parents when adding the class [whale] .
- Embodiments which permit the selection of multiple parents during the "add_class" process need to check that none of the selected parents are a distinct class from, or a subclass of, one of the others (it would be pointless to select [mammal] and [whale] as parents of [blue whale] , and wrong to select [invertebrate] and [whale] ).
- the parent class is tested to see whether it is permanent or temporary (step 2420). If the parent class is temporary, then the class being added must also be temporary, so the process can add this fact to its array of assertions to be made (step 2422). If the parent class is permanent, then the user is asked whether or not the class being added is also permanent (step 2424). (In embodiments where there may be more than one parent class, having any temporary class as a parent is sufficient to say that the class to be added is temporary.)
- the process next looks to see whether the parent class has a principal class, i.e. is itself labelled as Principal, or is below a class which is so labelled (step 2426).
- the principal class of a class's parent class will also be the principal class of the class itself. If the parent class has a principal class, then the fact that the class being added is not Principal can be added to the array of assertions to be made (step 2428). If a principal class could not be found for the parent, then the user is asked whether the class that he/she is adding can be asserted to be Principal (step 2430). The user's response is tested (step 2432), and if he/she has said that the class is not Principal, then a warning is given about the apparent inspecificity of the class, and confirmation is requested (step 2434).
- the next step is to choose an identifier for the class.
- the system creates a valid identifier from the common output translation string. This identifier is presented to the user, who is given the choice of accepting it or creating a different one (step 2436). If the user chooses to create a different identifier, this is checked for validity before the process can continue.
- step 2438 the user is presented with a page (step 2438) requesting a documentor string (the user has the option to leave this empty).
- step 2440 the process is ready to make the assertions gathered from the user's responses and the system's own inferences (step 2440 - illustrated in detail in Figure 32 and described in section 2.9.8).
- the system then terminates, returning the identifier of the new class.
- screen 2502 the user is prompted for the common translation of the class they wish to add.
- the user enters "sequoia”.
- control goes to screen 2504 where the user is prompted for the unique recognition string for the class.
- the user enters "sequoia tree (the California redwood, sequoia sempervirens)" here. As this combines both common names for the species, the word “tree” and the strict latin name for the species, it is sufficient.
- add_object all possible denotational strings are prompted for on screen 2506. The user continues to add denotational strings and then clicks the "no more" button when the list is complete.
- class denotational strings may need to be pluralised or recognised in their plural form.
- a smart generator that can generate English plurals from one string to another but for confirmation the results for each denotational string are presented to the user and the user is allowed to correct any errors made by the smart generator (2510).
- Alternative embodiments could just prompt for the plurals. These plurals are then confirmed.
- the immediate parent class of the newly added class is prompted for. The user asserts that a sequoia is a kind of free.
- additional prompts attempting to firmly place this new class within the current ontology would take place at this stage.
- the knowledge base can be consulted for subclasses of the selected parent class and asked if they are a more specific parent class to the one indicated.
- each immediate child class of the selected parent can be prompted for and the user asked whether it is possible for these classes to overlap or not. If the answer is "no", facts of the form classl [is a distinct class frcm] class2 are generated. If "no" similar prompts are made for the immediate subclasses of the overlapping class.
- 2602 shows the screen where the user is prompted about the [class is permanent] property of the class.
- the class is clearly permanent and the user indicates this.
- the confirmation screen 2612 is then displayed when this has happened.
- Desirable information requested about a relation includes the class of the objects that the relation can assert a relationship between (one for each side) and whether the relationship is permanent or not.
- the class of each side of the relation can be used to resolve ambiguity in questions. Permanence is important in knowing when the relationship holds. Other knowledge can also be collected.
- the process used in the preferred embodiment for adding a relation object is illustrated in Figure 27.
- the process begins with the user being asked for the most common term for the relation to be added (step 2702) - this will be assigned as its common output translation string.
- the knowledge base is queried for other relations denoted by the same string, and if one (or more) is found, the user is presented with that relation, and asked to confirm that it is not the one that he/she is in the process of adding.
- the user's response is tested (step 2704) - if one of the matching relations is the intended one, an assertion is made that the string is the common output translation of that relation (step 2705), and the process terminates returning it.
- the next step (2706) is to request a unique recognition string for the relation.
- the knowledge base is queried for any other relations matching the entered string, and if one is found, the user is presented with it, and asked to confirm that it is the one that he/she is in the process of adding.
- the user's response is tested (step 2708) — if the matching relation is the intended one, all the knowledge gathered so far (the common output translation string and the unique recognition string) is asserted to be true of that relation (step 2705), and the process terminates returning it. If the matching relation is not the intended one, the user is returned to step 2706.
- the common output translation string and unique recognition string already added can be regarded as present central strings, and are set as such. These are similar to denotational strings collected in add_object and add_class.
- the process then requests additional present central strings for the relation (step 2710), using the loop illustrated in Figure 33 and described in section 2.9.9.
- the addition of present central strings may be terminated if a match is found and the user confirms that it is the relation that he/she was in the process of adding (step 3312) after seeing the unique recognition string of the match. In this case all the knowledge gathered so far is asserted to be true of that relation (step 2705), and the process terminates, returning the matching relation. Otherwise the user continues adding strings until he/she can think of no more.
- the process then goes on to establish the left and right classes of the relation being added.
- the process initiates the "select_object” process with a message requesting the left class of the relation (step 2714 - desc ⁇ bed in section 2 9 6)
- the object returned by "select_object” is stored as the left class
- the piocess reinitiates "select_object” to request the right class (step 2716)
- Step 2718 represents the collection of various coie properties of the relation
- the relation is permanent If it isn't, a check is made to see whether the left and right classes contain objects which can have a creation date, and if this is the case for either, the user is asked whether the object on that side of the relation must exist for facts involving the relation to be meaningful If the left and right classes aie different (and neither is a subclass of the other), then it can be infened that the relation is antisymmetric and antitransitive, otherwise the user must be asked whether it is symmetric and/or transitive If the relation is transitive then it cannot be left unique, but if it isn't transitive the user must be asked about the left uniqueness If the relation is not left unique, the piesent central strings are checked foi thekulnce of the definite article, and if it is not found, the user is asked whethei the relation is "anti left unique" (A relation such as [is a child of] is neither left unique nor anti left unique - "is the
- left possessive strings can be generated from present central strings, so new present central strings may be cieated from the left possessive strings entered by the user If any new present central forms aie created, they are shown to the user, who is given the opportunity to reject any that are wrong (step 2724)
- the next step is to choose an identifier for the relation
- the system creates a valid identifier fiom the common output translation string if it is unique - adding a number to make a unique id if it is not This identifier is presented to the usei, who is given the choice of accepting it or creating a different one (step 2726) If the user chooses to create a different identifier, this is checked for validity before the process can continue
- step 2728 the user is presented with a page (step 2728) requesting a documentor stnng (the usei has the option to leave this empty)
- the process is then ready to make the assertions gathered from the user's responses and the system's own inferences (step 2730 - illustrated in detail in Figure 32 and described in section 2.9.8).
- step 2732 The first is whether a more general form of the relation exists (e.g. [is married to] is a more general form of [is the wife of]). The second, which is only asked if the relation is not symmetric, is whether the relation has a natural-sounding reverse form (e.g. [is a parent of] is the reverse form of [is a child of] ). (This second question is also omitted if the relation being added is the reverse form of an existing relation.) Both questions are optional — the user can choose not to answer them. If either is answered, the user's input is sent to the "select_object" process for identification (described in section 2.9.6). These additional assertions are then made (step 2734 - illustrated in detail in Figure 32 and described in section 2.9.8).
- the user is wishing to add the relation linking a person with the geographical area where they are normally resident so that facts asserting such information are supported by the system.
- screen 2802 the user is prompted the common translation of the relation they wish to add.
- English grammar rules e.g. "has been”, “is not”, “have not been”
- Various embodiments allow the user to override the insistence on this requirement and express the relation in other ways, prompting the user for confirmation of the other forms later in the process.
- Screen 2804 shows the user being prompted for the generally appreciated unique recognition string of the relation.
- this relation refers to the general residence of the person so only a slightly augmented version of the common translation string is entered.
- screen 2806 is where the user provides as many alternative denotational strings for the relation as possible to maximise the chances of the relation being hit when other users attempt to denote it. For this screen central present forms not starting with "is" are permitted. As with add_object and add ⁇ class the translation strings are added automatically to this list.
- the left and right classes of a relation are a consequence of the semantics of the relation. They provide the largest class of objects which can appear on the left of the relation and the largest class of objects which can appear on the right. Any object which is not in the left and right class cannot have the relation with any other object.
- This knowledge is to disambiguate ambiguous translations of questions (see section 2.6.7)
- 2810 and 2812 prompt for the left class of the relation. In this case it is the class of human beings.
- 2814 and 2816 prompt for the right class. In this case it is the class of geographical areas. This information is also useful for steering and explaining the later stages of the process.
- 2904 is the confirmation screen for this step. As [human being] is a subclass of the class [object with a creation date] the process then enquires whether the relationship can only hold when the left object is in existence (in this case alive). The semantics of some relationships require this and others do not. 2906 prompts for this property and 2908 confirms it. The user says that this property holds. Note that the page uses the word "alive” on this page as it can word intelligently according to what has been entered. As it knows that the left class is a subclass of [biological object] it uses the word "alive” in the prompt. Otherwise the word "exists" would have been used.
- 2910 and 2912 do the same for the right class (rewording with the word "exists” as it can do a query to show that [geographical area] is not a subclass of animal).
- 2918 asks about whether about the property of whether it is possible for a single entity to have the relationship despite it not being required: [anti left unique] . This property would not have been asked if the relationship was [left ⁇ unique] as it could then be inferred that it does not apply. This property is useful with the English language for determining whether the indefinite article "the” can be used in denoting the relationship. In embodiments where English is not used, this step might be skipped. As it is just possible for a person to be the only resident of a particular geographical area (a private island or small estate perhaps), the user answers this question yes and their answer is confirmed on 2920. In 3002 the user is asked about the [right unique] property of the relationship. As the concept being captured is the primary residence of a person, this relationship is [right unique] and the answer to the question is "no". This is confirmed on 3004.
- step 3010 the controller has used the left possessive forms given by the user to suggest some other present central forms possibly missed by the user.
- the articles "a/an” or “the” chosen are partly determined by the user's responses to [left unique] and [anti left unique] properties. As both do not apply both articles are used in generating the possible central present forms.
- 3014 is where the user is prompted for a documentor. Documentors are particularly important for relations.
- 3102 is where the collected facts are presented to the user and alternative sources can be specified. This is similar to the corresponding steps in add_object and add_class. When the user confirms these, the facts are written to the static knowledge base and system assessed as with the other add knowledge processes.
- This knowledge can be used to generate more general forms of a relation from a more specific fact stored in the static knowledge base.
- a reverse form is a semantically identical relationship where the left and right objects are reversed. In this case, the reverse form of the relation was already present in the knowledge base. If it was not, the add_relation process would have been repeated for the reverse form (and by passing the name of the reverse relation to the process it would be able to skip many steps where the answers could be logically inferred from the properties of the relation which were the reverse, i.e. the left and right classes and properties).
- add_fact When a reverse relation is specified in add_relation, the preferred embodiment labels the more newly added relation with the property [reverse f orm preferred] .
- This property is used by add_fact and the query processing system to switch around relations which have this property by changing them for their reverse relation and swapping the left and right objects.
- add_fact this keeps the static knowledge base "tidy" by not having semantically identical facts in two formats (e.g. having ⁇ attribute> [applies to] ⁇ object> facts as well as ⁇ object> [is] ⁇ attribute> facts).
- the generator which generates reverse forms can be ignored, gaining some efficiency.
- Alternative embodiments which have the generator active and allow static facts to be asserted both ways around are also believed practical though.
- the first step (3202) is to loop through the array of assertions checking that each is permissible. If any are not permissible (for example, if one of the denotational strings supplied by the user to denote a particular person is suppressed for knowledge addition), then the list of assertions is shown to the user (step 3204) with the problem assertions highlighted. The user is asked to correct the problems. Continuing from this point will take the user back to the step associated with the problem assertion (step 3206) - if there is more than one problem assertion, then the user is taken back to the earliest one in the process.
- step 3208 the user is shown them as a list, together with the source for each (step 3208).
- the source is the user him/herself.
- This page gives the user options to add a new source to the available sources (by entering the name of the source in an input box), confirm the assertions as presented, or change a particular assertion. If more than one source is available, he/she can associate particular assertions with particular sources before confirming. The user's response is then tested (step 3210).
- step 3206 If the user chose to change an assertion ('disagree'), then he/she is taken back to the step associated with that assertion (step 3206). If the user chose to add a new source, the string input must be identified as a source, and, if possible, an animate source identified (step 3212). The method for doing this is illustrated in detail in Figure 38 (described in 2.9.13). Once the source has been identified, it is added to the list of sources available (step 3214). The user is returned to the assertions confirmation page (step 3208). Users can add as many sources as desired (one at a time) by looping through steps 3208 to 3214.
- Denotational strings are related to their object by various relations, including [can denote] , as shown in examples in section 2.6.1. They are names or phrases which may be used to denote the object, and are important in translating user queries and in avoiding the addition of duplicate objects to the knowledge base. In the preferred embodiment the same method for gathering these strings is used by the "add object”, “add_relation”, and “add_class” processes. This method is illustrated in Figure 33.
- a page is presented to the user, requesting a name or phrase which could be used to denote the object being added (step 3302).
- the page also gives options to delete an already added string, or to stop adding strings.
- the user's response is checked at step 3304. If the user chooses to add a new string, the knowledge base is queried for objects which can also be denoted with that string and which may be the object the user is intending to add, in some embodiments this check may involve verifying that the possible matching object is not a distinct class from any known class of the object being added (step 3306). If there are no matches, the string is added (step 3308), and the user is taken back to step 3302.
- step 3310 If one (or more) matches is found, the user is presented with the unique recognition strings for the corresponding objects, and asked to confirm that it is not the one that he/she is in the process of adding (step 3310).
- the user's response is tested (step 3312) — if one of the matching objects is the intended one, the loop ends, and that object is returned (step 3314). If the user is sure that the matching object is not the one being added, the string is added (3308), and the user is taken back to step 3302. If the user's response at step 3304 is to delete a string, the string is deleted (step 3316), and the user is taken back to step 3302.
- step 3304 If the response at step 3304 is to stop adding strings, the user is shown a list of the strings he/she has added, and asked to confirm that they can all be used to denote the object (step 3318). The response to this message is tested (step 3320). If the user won't confirm, then he/she is returned to step 3302 (where any problem strings can be deleted). When the user is happy with the list of strings to be associated with the object, the loop ends, and the list of strings is returned (step 3322).
- a count is kept of how frequently each denotational string is used by users of the system to denote an object. These counts can be used to present denotational strings representing an object in order of popularity when displaying (say) a profile of the object.
- the preferred embodiment also keeps a count of how frequently each ambiguous denotational string is used to denote each of the possible objects it may refer to. In situations where one object is many times more common than another (e.g. a celebrity and a much less famous person with the same name), it can be used in some embodiments to assume that the more frequent choice is intended, thereby saving each user from having to choose every time. In the preferred embodiment it is also used to list ambiguous translations in order of likelihood.
- One of the desirable actions that a user can perform is to assert new factual knowledge.
- this is that a named relationship exists between two named entities, and, in the case of a non-permanent relationship, when that relationship exists (i.e. they are also prompted for a timeperiod).
- negative relationships can also be asserted.
- the process by which relationships are asserted is illustrated in Figure 34.
- the process begins (step 3402) by requesting the fact in natural language ("as you would tell it to another person")-
- the system attempts to parse the string entered by the user (step 3404). If it can be parsed, the elements of the fact (at least left object, relation, and right object, but possibly also negativity and temporal information) are extracted from the string (step 3406). Translation is described in section 2.6.9. If the user's string is not understood, then the user is presented with a page (step 3408) where the left object, relation, and right object are entered as separate elements along with detailed explanation and examples. A check is made that all three elements have been entered (step 3410) - once they have, the process can continue.
- the next stage is to identify each of the three fact elements.
- the relation is sent for identification by the "select_object” process (step 3412— described in section 2.9.6). Once the intended relation has been established, the left and right classes of the relationship are found (step 3414).
- the first object is then sent for identification by the "select_object” process (step 3416) - it is sent with the left class of the relationship as a parameter to ensure that "select_object” only looks for relevant objects.
- the second object is sent to "select_object” with the right class as a parameter (step 3418).
- a translation of the fact is created (using the unique recognition string of each element) and shown to the user for confirmation (step 3420).
- the user's reaction is tested (step 3422). If the user does not agree that the fact as stated is the fact that he/she is intending to add, then the process returns to the beginning.
- step 3424 If the user confirms the fact translation, the process continues by testing whether or not the fact is a permanent one, and acts accordingly (step 3424). If it is not inferred to be permanent, a timeperiod for the fact is requested using the "select_timeperiod_for_fact" process (step 3426 - described in section 2.9.12).
- the source for the assertion might already have been set as the user.
- the process tests to see whether the source is the user (step 3428). If the source is the user, then he/she can be attributed as the animate source for the assertion (step 3430). If the source is still unknown, the user is asked to specify a source (step 3432). The user can state that he/she is the source, or provide a different source perhaps a named individual or work of reference, or the URL of a web document. The user's response is examined (step 3434).
- step 3436 If the user has stated that he/she is the source, a check is made to see whether he/she is logged in (step 3436). If not, he/she is required to log in (step 3438 - the "authenticate” process illustrated in Figure 19 and described in section 2.8). Once the user's identity has been established, he/she can be attributed as the animate source for the assertion (step 3430). If the user is not the source of the fact, then the specified source must be identified and an attempt made to establish an associated animate source (step 3440 - illustrated in detail in Figure 38 and described in section 2.9.13).
- step 3442 the relationship, the source of the assertion, and (if relevant) any timeperiods are asserted. If the fact is already known to the knowledge base, then this assertion will count as an endorsement. Facts can also be parsed from complete natural language assertions typed by the user into the main system prompt in some embodiments (e.g. "Paris is the capital of France"). If the translation system translates this into an assertion of a fact the add_fact process can be started at step 3406 exactly as if the initial assertion had been typed into the prompt corresponding to step 3402
- the add_fact process can be called with one or more of the three objects already filled in.
- FIG. 35 An example of this is illustrated in Figure 35.
- a user has typed "Victoria the Empress of India” into the general prompt the system has translated this into a request for a profile screen for the historical figure [queen victoria] and displayed the default profile which is the default [human being] profile (3502).
- add_fact uses select_object to locate the correct entity and asks for confirmation of the fact to be added (screen 3506). After source selection and confirmation of the fact being added the user opens the profile again (3508). This time the knowledge is in the knowledge base and the profile correctly shows her place of birth.
- each principal class has certain core facts associated with it. This is knowledge which varies between members of the class and which is considered important.
- the preferred embodiment will also prompt the user for the core facts associated with the principal class of the object. For example, with the principal class [human being] , the preferred embodiment will prompt for the sex of the object (person) added and the date of birth.
- core facts are associated with any class and instead of prompting for the core facts associated with the principal class of the object, a search is made for the most specific class which has core fact information associated with it.
- Figure 36 shows the steps involved in adding core facts about an object. First of all, it is necessary to establish whether or not any core facts are associated with the object's principal class (step 3602). If no core facts are so associated, the process ends. Otherwise, an array of the core facts is created (step 3604), and a loop is entered between step 3606 (which requests the answer to each core fact in turn) and step 3612 (which checks to see whether any core fact questions remain to be asked).
- Figure 37 illustrates the process in the preferred embodiment of selecting a time period.
- the user is first asked whether the fact is true now (step 3702), and is then asked for the earliest time when the fact was true (step 3704).
- the "select_object” process is initiated with the string entered by the user and the class [tim ⁇ point] as parameters (step 3706 — "select_object” is described in 2.9.6).
- "select_object” returns a [tiinepoint] object.
- step 3708 If the user has said that the fact is true now (step 3708), the second timepoint will be [iaf ter] (step 3710), but if the fact is not true now, the user is asked for the latest time when the fact is true (step 3712). As before, the "select_object" process is initiated with the string entered by the user and the class [timepoint] as parameters (step 3714).
- step 3716 After the second [tiirepoint] object has been established, a check is made (step 3716) to see that the timepoints make a reasonable time period (the second must be later than the first). A problem encountered at step 3716 results in the user being shown an explanatory message and a request to enter the initial timepoint again (step 3718).
- the user is asked whether the fact might have been true before the starting timepoint (step 3722). The user's response is tested (step 3724), and if he/she is confident that the fact is not true, the process creates a prior [timeperiod] object from [time zero] to the starting timepoint (step 3726) which can be used to assert the inverse of the fact.
- step 3728 just as a prior [timeperiod] object might be created, so a check is made to see whether the second timepoint is [iaf ter] (step 3728), and if it is not, a [timeperiod] object for the period after the fact ceased to be true - from the second timepoint to [iaf ter] — is created (step 3730). Finally the [timeperiod] object is created (step 3732) and the process terminates.
- step 3802 The behaviour will depend on whether or not the user has supplied a URL as the source, so the user's input is tested initially (step 3802). If the source specified by the user is not a URL, the "select_object" process is initiated in order to identify, or, if necessary, add the source as an object (step 3804 — described in section 2.9.6). A check is then made on the source (step 3806) to establish whether it is animate (a person or an organisation) or inanimate (e.g. a book).
- the source is inanimate, then an attempt is made to find an animate source behind the specified source (if, for example, the source is a single-author book, then this animate source would be the author).
- a check is made to see whether the knowledge base already knows the animate source associated with the source specified by the user (step 3808). If the source specified by the user is a URL, the user is shown the content of the page at that URL and asked to confirm that that page is the intended source document (step 3810). If it is, a copy of the content is stored locally and associated with an ID, unless the same page is already held (step 3812), and a check is made to see whether an animate creator source — in most cases this will be the site's webmaster - is already known for the document (step 3808).
- step 3814 the user is asked whether he/she knows of an animate source, and, if so, whether this animate source is responsible for all knowledge obtained from the specified source or just this particular piece of knowledge (step 3814). The user's response is tested (step 3816). If the user does know of an animate source, the "select_object" process is initiated in order for the user to specify that animate source (step 3818 - described in section 2.9.6). If the user has said that the animate source is responsible for all information in the original source, then this fact should be asserted (step 3820), so that steps 3814 to 3820 can be omitted by future users who give the same source.
- a source ID is returned: of the animate source if one has been established (step 3822), or, failing that, of the inanimate source (step 3824).
- dumb generators can be added to the system via a web-based editing page allowing the generator to be added to the list and tested by the user.
- the user who has created the generator is associated with the generator and prior to editor approval the generator will be ignored by the query answering system for all users other than the user who has submitted the generator. In this way, any mistakes or errors with the generator will only affect the user who is testing it.
- the generator After editor approval the generator will be used by the query answering system for all queries.
- the system also needs to permit the addition and testing of tools.
- the preferred embodiment achieves this by allowing users to add tools in an interpreted language which can ran on the server but without having access to any sensitive files.
- the server would also terminate any script that was running for anything other than short timeout to prevent scripts which loop. Access to the network is also controlled.
- Python http://www.python.org/
- Python is a suitable scripting language used by the preferred embodiment.
- the interpreter is widely available, freely licensed and information about how to incorporate it into a server is widely available.
- the Python script that implements the tool can again be edited and tested by the user prior to approval by an editor. On approval the tool is then available to be used in generators. Prior to approval it will only be used in queries ran by the user who submitted the tool so that it can be tested. 2.9.15 Adding Profile Templates
- Creation of the profile can be achieved by a web-based editor or the template can be created offline and uploaded to the system.
- the templates, pattern and generators are added via a web-based editor and initially only used in response to translations by the user who added it to allow testing.
- a web-based command allows the user to submit the template for editor approval.
- the translation template is used for all translations by the complete user base extending the functionality of the system for everyone.
- Various embodiments can draw attention to existing translation templates and thus educate users in adding them by producing an explanation of how questions were translated when a translation is successful (containing at least a link to the template used to do the translation).
- the fall-back strategy when a question was not understood can also provide a link to the add_translation process with instructions thus providing the user with a mechanism to correct and improve the problem for all users.
- user assessment is the facility for users of an embodiment of the invention to provide information on the veracity of knowledge already present in the system.
- User assessment is an optional but desirable feature of various embodiments as it enables users to draw attention to facts which are incorrect and/or to increase the confidence in facts which are true.
- users can both endorse and contradict facts. When doing so they use the same source of knowledge methodology as is used when asserting new facts. (See section 2.9.13.)
- the preferred embodiment When a user adds a fact that is already in the static knowledge base, the preferred embodiment simply considers this a user endorsement of the fact and doesn't create a new fact in the static knowledge base.
- the initial assertion of the fact also counts as an endorsement of the fact by the asserting user.
- the preferred embodiment also enables users who are contradicting a fact to label the original fact as probably asserted abusively. By distinguishing between facts which were asserted in good faith but are wrong in error and facts which were probably asserted to be mischievous and/or abusive, a number of options become available. These include taking sanctions against the user entity reporting the fact abusively, having a lower threshold for suppression of other facts asserted by this user and suppressing the abusively asserted fact faster than would otherwise have been the case.
- the preferred embodiment also enables users of sufficiently high rank to label their assessment as final. Once done, the status of the fact (true or false) is locked down and cannot be changed by further assessments from users of lower rank. This facility enables a highly ranked user such as a staff member to resolve an issue with a fact immediately. For example, a staff member can make an obviously abusively asserted fact immediately invisible.
- user assessment is implemented by maintaining an assessments database table which records each endorsement and contradiction and includes the following information: the fact being user assessed; whether it is an endorsement or contradiction; the date and time of the action; the reporter (i.e. the id of the user who is performing the assessment); the source of the information (which may also be the user); optionally the id of the document which this assessment is based on (if there is one). (If a document is present, the source is the entity responsible for the document); whether the assessment has been labelled as abusive; whether the assessment has been labelled as final; any text explanation entered by the user at the same time (this can be used to explain the assessment further if the user wishes and appears on the fact profile).
- the user assessments of a fact are combined together to get an overall picture of the veracity of the fact.
- the fact is also closed for further user assessment. This gives some stability to the system as facts for which there is an overwhelming certainty of them being true or false cannot be changed. This is especially important for certain core facts used frequently by the system in numerous situations such as properties of common relations. Should a fact be locked down in an incorrect state, various embodiments would however, allow a user to draw this issue to the attention of staff for correction.
- user assessment information is combined together by attaching a positive score to each endorsement of a fact and a negative score to each contradiction and setting the truth and visibility of the fact based on the sum.
- the magnitude of the score for each endorsement and contradiction is determined by the track record of the user making the assessment. For example, a new user could be given a score of 10 while an experienced user who had been using the system for many months with a track record of accurate assessment could be given a score of 200.
- This embodiment does not allow repeated endorsements by the same user to increase the sum but users can be permitted to change their endorsement by contradicting a fact they have previously endorsed etc.
- each fact is labelled as true/false and visible/invisible in the table in which they are stored, true/false is the veracity of the fact: whether the system believes it is true, visible/invisible is whether the fact is normally visible to the query answering system, untrue facts are always invisible. Other embodiments could remove untrue facts from the knowledge base.
- Various embodiments also take into account fact exposure information in the assessment of the veracity of the fact from user assessments.
- Fact exposure information is information about the events when the fact was shown to users and the users were given an opportunity to apply a user assessment. For example, if a user has asked a question and the summary explanation has been displayed showing the fact and giving the user a chance to contradict it, that would be an exposure of the fact to the user.
- exposure information By combining exposure information with user assessments the system can obtain a superior understanding of the likely veracity of the fact. For example, a fact which has been exposed one thousand times and received five user contradictions is more likely to be true than a similar fact which has also received five contradictions but has been exposed far fewer times.
- One example embodiment of how the system can incorporate fact exposure information into a scheme for assessing the fact is to consider each exposure of a fact without a user assessment action as a form of tacit endorsement of the fact and to count these in a similar way to actual endorsements by the user but with a much smaller weight.
- true-now assertions including taking into account the date that the endorsement or contradiction took place.
- the assertion of the negative version of the true- now fact suggests a point when the fact may have ceased to be true.
- true-now facts are thus always challengeable.
- the true-now fact is a temporal partner closed with the [iafter] object the closing timepoint of overlapping similar facts provides candidate change points.
- Figure 39 illustrates the preferred embodiment user assessment process of endorsing or contradicting a fact in the knowledge base.
- the process is always initiated with parameters for the fact to be assessed and the type of assessment (endorsement or contradiction).
- a check is made as to whether assessment of the fact is allowed (step 3901). Certain facts are marked as being unchallengeable, while others are suppressed for knowledge addition. If assessment is not possible the process terminates and the user is given an explanation.
- step 3902 it is necessary to check whether the user is currently logged in (step 3902), and if not, he/she is required to log in (step 3904 - the "authenticate” process illustrated in Figure 19 and described in section 2.8).
- step 3906 the system determines whether the fact is transient (or is itself a temporal partner to a transient fact). If it is not, the user can be taken directly to the step where a source is requested (3922). If the fact is transient (or a temporal partner), it will be necessary to show the user all the other facts associated with the fact, and find out exactly what it is that the user wishes to endorse or contradict (for example, if a user follows a link to contradict the fact that two people are married, it is not clear whether he/she is contradicting the fact that they are married now, or the fact that they have ever been married).
- the basic "subject" fact associated with the fact being assessed is found, and a "time history" for that fact is constructed (step 3908), indicating periods when it is true, when it is false, and when its veracity is unknown.
- the user is shown a schematic representation of this time history (step 3910), and given various options (to endorse or contradict particular periods, to contradict the basic fact in its entirety, or to make changes to the time history). If the user has chosen to contradict the basic fact (step 3912) - for example, saying that two people were never married, rather than just not married now - then he/she is taken straight to step 3922 (specifying a source).
- step 3914 the process continues by checking whether the user has asked to change any of the timepoints associated with the fact (step 3914). If there are no timepoints to change, the process checks that the user has endorsed or contradicted at least one of the periods (step 3916), and if not, he/she is taken back to the page at step 3910 with a message requesting at least one endorsement, contradiction, or alteration. If there are timepoints to change, the user is asked for them one by one (step 3918), and they are checked for validity. Next (step 3920), a new "time history" is constructed, based on what the user has said.
- step 3922 The user is then asked for the source of his/her knowledge about the fact(s) (step 3922), and a check is made on whether that source is the user him/herself or a secondary source (step 3924). If the user is the source, then he/she will be recorded as the animate source behind whatever assessments and assertions are made (step 3926). If the user is not the source of the fact, then the specified source must be identified and an attempt made to establish an associated animate source (step 3928 - illustrated in detail in Figure 38 and described in section 2.9.13).
- step 3930 the information given by the user is examined, and all assessments and assertions that follow from it (whether directly or by inference) are made.
- system assessment is the automated analysis of a fact to determine its veracity using at least whether the fact is semantically contradicted by other knowledge in (or known to) the system.
- the preferred embodiment also determines whether a fact is superfluous: i.e. whether it can be generated by the system anyway.
- “interactivity information” is data about how the fact interacts semantically with other facts in the system: whether a fact is contradicted or rendered superfluous by other facts in the knowledge base.
- a fact which is contradicted is in semantic conflict with one or more other believed- true facts in the system.
- a fact which is superfluous can already be produced by the system.
- a fact which is "uninfluenced” is neither contradicted nor superfluous and thus adds to the total knowledge of the system.
- System assessment is a useful (but optional) component found in the preferred embodiment. It helps to keep the facts in the static knowledge base consistent with each other and is also another weapon to counter abusive or accidental assertion of untrue facts by users. Embodiments making use of user assessment data but not including system assessment will need an automated process to combine the user assessment data in determining the veracity of the fact (as described above). However, in the preferred embodiment user assessment data is used in combination with interactivity information m assessing a fact. To generate interactivity information for a single fact in the preferred embodiment, the system assessment component creates a truth query in full mode corresponding to the fact.
- the fact being assessed is already in the static knowledge base it also tells the query answering system to ignore it when answering this query. Alternatively, the fact can be temporarily suppressed or removed from the static knowledge base while it is being system assessed.
- the query is then executed. If the result is "no", the fact is contradicted. If the result is "yes” the fact is superfluous. If the result is "unknown” the fact is uninfluenced.
- a variant of this is create an inverse query corresponding to the negative of the fact. If this query returns "yes”, the fact is contradicted. This variant may be useful in embodiments where "no" answers to truth queries are not supported. (See section 2.5)
- the static facts used in answering the query together are the ones which render the fact contradicted or superfluous and are termed "influencing facts" herein.
- the first thing done is to scan the record of user assessments for this fact (endorsements and contradictions) to create a weighted sum (step 4002).
- the sum initially starts at a small positive amount, endorsements add to this sum and contradictions subtract from it.
- the amount added or subtracted for each assessment is a pre-determined amount based on the track record of the user making the assessment.
- the initial assertion of the fact is considered as an endorsement. Multiple endorsements or contradictions by the same user are ignored.
- the sum is then used to set provisional values for the veracity of the fact and its challengeability (step 4004). For example, a score above zero would set the veracity to true (i.e. the fact is believed true), and below zero to false (believed false). Challengeability is set based on the sum being above or below a much higher threshold, e.g. a sum less than -1000 or greater than +1000 would make the fact unchallengeable.
- Step 4006 creates a simple truth query of just the fact itself (without fact id) and no query variables.
- step 4008 the query is executed in full mode with explanation. The fact itself is temporarily masked while the query is being run, e.g. by passing the fact id to the process_query routine and asking for the static search routine to ignore it. (Some embodiments may perform system assessment on a fact before it is added to the static knowledge base making this masking step unnecessary.)
- step 4010 The return result of the query is then examined (step 4010). If the query returned "no" (i e the static fact is contradicted by what would be in the system without it), veracity is set to false (i e the fact is believed untrue) and the interactivity is set to "contradicted” (step 4012)
- step 4020 is done to recoid the results of this system assessment m the static knowledge base including the values foi veracity, challengeabihty, mtei activity and visibility
- the visibility is always set to false if the fact is believed untrue and m some embodiments it will be set to invisible if the fact is superfluous
- the date and time when this system assessment was done is also recorded for use by the system selecting facts for peiiodic ieassessment Some embodiments may choose to remove untrue facts from the knowledge base iather than just making them invisible
- a scan of the related_facts table is made finding facts which are influenced by the one just assessed (whose veracity has changed) and each of these facts is recursively system assessed (step 4024) For example, if a true fact was being contradicted by the fact just reassessed and the fact is now false, this would resuscitate the wrongly suppressed fact immediately
- Periodic reassessment of facts Various embodiments will peiiodically re system assess each static fact in the knowledge base In the prefened embodiment, this is achieved by having a field in the database table containing the static facts which gives a date and time when the fact was last system assessed Periodic reassessment is then achieved by calculating the timepoint corresponding to a threshold time period before the current time (e g one week) and doing a SQL SELECT statement which gathers the ids of all facts which have not been ieassessed for this pe ⁇ od oidered by last ieassessment time (earliest first) The piogram then reassesses each fact m order tuning out after a pre-determined period (e g twenty minutes) A cionjob is set up to periodically (e g every hour) call this function so facts are continuously reassessed Some embodiments may prioritise certain types of fact foi faster/higher priority reassessment
- users can additionally reassess a fact at any time. This is accomplished by a "reassess this fact's properties" button on the fact profile (an example is 1409 on Figure 14). Clicking this button immediately results in a system assessment being done on the fact and the results displayed to the user.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US78151706P | 2006-03-08 | 2006-03-08 | |
| PCT/GB2006/050222 WO2007101973A2 (en) | 2006-03-08 | 2006-07-26 | Knowledge repository |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP1999692A2 true EP1999692A2 (en) | 2008-12-10 |
Family
ID=38330239
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP06765371A Ceased EP1999692A2 (en) | 2006-03-08 | 2006-07-26 | Knowledge repository |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP1999692A2 (en) |
| IL (1) | IL193913A (en) |
| WO (1) | WO2007101973A2 (en) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7013308B1 (en) | 2000-11-28 | 2006-03-14 | Semscript Ltd. | Knowledge storage and retrieval system and method |
| US8666928B2 (en) | 2005-08-01 | 2014-03-04 | Evi Technologies Limited | Knowledge repository |
| US8838659B2 (en) | 2007-10-04 | 2014-09-16 | Amazon Technologies, Inc. | Enhanced knowledge repository |
| US9805089B2 (en) | 2009-02-10 | 2017-10-31 | Amazon Technologies, Inc. | Local business and product search system and method |
| EP2290914A1 (en) * | 2009-08-31 | 2011-03-02 | Nederlandse Organisatie voor toegepast -natuurwetenschappelijk onderzoek TNO | Support for network routing selection |
| EP2290913A1 (en) | 2009-08-31 | 2011-03-02 | Nederlandse Organisatie voor toegepast -natuurwetenschappelijk onderzoek TNO | Support for network routing selection |
| US9110882B2 (en) | 2010-05-14 | 2015-08-18 | Amazon Technologies, Inc. | Extracting structured knowledge from unstructured text |
| US9361293B2 (en) | 2013-09-18 | 2016-06-07 | International Business Machines Corporation | Using renaming directives to bootstrap industry-specific knowledge and lexical resources |
| CN108932261B (en) * | 2017-05-25 | 2023-08-15 | 株式会社日立制作所 | Method and device for updating business data processing information table of knowledge base |
| CN114064632B (en) * | 2020-07-31 | 2025-08-26 | 中移(苏州)软件技术有限公司 | Data kinship management method, system and storage medium |
| IL279406B2 (en) | 2020-12-13 | 2025-01-01 | Google Llc | Privacy-preserving techniques for content selection and distribution |
| CN115525669B (en) * | 2021-06-24 | 2026-01-27 | 中移(苏州)软件技术有限公司 | Data processing method, device and storage medium |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7013308B1 (en) * | 2000-11-28 | 2006-03-14 | Semscript Ltd. | Knowledge storage and retrieval system and method |
| US8666928B2 (en) * | 2005-08-01 | 2014-03-04 | Evi Technologies Limited | Knowledge repository |
| WO2007083079A1 (en) * | 2006-01-20 | 2007-07-26 | Semscript Limited | Knowledge storage and retrieval system and method |
-
2006
- 2006-07-26 EP EP06765371A patent/EP1999692A2/en not_active Ceased
- 2006-07-26 WO PCT/GB2006/050222 patent/WO2007101973A2/en not_active Ceased
-
2008
- 2008-09-04 IL IL193913A patent/IL193913A/en active IP Right Grant
Non-Patent Citations (1)
| Title |
|---|
| None * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2007101973A2 (en) | 2007-09-13 |
| IL193913A (en) | 2015-07-30 |
| WO2007101973A3 (en) | 2009-07-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9098492B2 (en) | Knowledge repository | |
| US9519681B2 (en) | Enhanced knowledge repository | |
| US11132610B2 (en) | Extracting structured knowledge from unstructured text | |
| US12430504B2 (en) | Computer implemented methods for the automated analysis or use of data, including use of a large language model | |
| US11977854B2 (en) | Computer implemented methods for the automated analysis or use of data, including use of a large language model | |
| US11989527B2 (en) | Computer implemented methods for the automated analysis or use of data, including use of a large language model | |
| US12067362B2 (en) | Computer implemented methods for the automated analysis or use of data, including use of a large language model | |
| US12073180B2 (en) | Computer implemented methods for the automated analysis or use of data, including use of a large language model | |
| Hogan | Web of data | |
| US7013308B1 (en) | Knowledge storage and retrieval system and method | |
| US20260030457A1 (en) | Computer implemented methods for the automated analysis or use of data, and related systems | |
| EP1999692A2 (en) | Knowledge repository | |
| EP1974316B1 (en) | Knowledge storage and retrieval system and method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20080919 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
| AX | Request for extension of the european patent |
Extension state: AL BA HR MK RS |
|
| RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: TRUE KNOWLEDGE LIMITED |
|
| R17D | Deferred search report published (corrected) |
Effective date: 20090730 |
|
| 17Q | First examination report despatched |
Effective date: 20091112 |
|
| DAX | Request for extension of the european patent (deleted) | ||
| RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: EVI TECHNOLOGIES LIMITED |
|
| RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: AMAZON EUROPE HOLDING TECHNOLOGIES SCS |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
| 18R | Application refused |
Effective date: 20200606 |