[go: up one dir, main page]

EP2387756A2 - Retrieving and displaying information from an unstructured electronic document collection - Google Patents

Retrieving and displaying information from an unstructured electronic document collection

Info

Publication number
EP2387756A2
EP2387756A2 EP10732191A EP10732191A EP2387756A2 EP 2387756 A2 EP2387756 A2 EP 2387756A2 EP 10732191 A EP10732191 A EP 10732191A EP 10732191 A EP10732191 A EP 10732191A EP 2387756 A2 EP2387756 A2 EP 2387756A2
Authority
EP
European Patent Office
Prior art keywords
instance
collection
attribute
instances
structured presentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP10732191A
Other languages
German (de)
French (fr)
Other versions
EP2387756A4 (en
Inventor
Daniel N. Quine
Daniel Loreto
Bogdan Caprita
Antonella Pavese
Jeffrey C. Reynar
Andrew William Hogue
Anthony J. Aiuto
John Alexander Komoroske
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/355,459 external-priority patent/US8412749B2/en
Priority claimed from US12/355,607 external-priority patent/US8615707B2/en
Priority claimed from US12/355,554 external-priority patent/US8452791B2/en
Priority claimed from US12/355,228 external-priority patent/US20100185651A1/en
Priority claimed from US12/355,103 external-priority patent/US8977645B2/en
Application filed by Google LLC filed Critical Google LLC
Publication of EP2387756A2 publication Critical patent/EP2387756A2/en
Publication of EP2387756A4 publication Critical patent/EP2387756A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor

Definitions

  • This specification relates to retrieving and displaying information from an unstructured electronic document collection.
  • An electronic document is a collection of machine-readable data.
  • Electronic documents are generally individual files and are formatted in accordance with a defined format (e.g., PDF, TIFF, HTML, MS Word, PCL, PostScript, or the like).
  • Electronic documents can be electronically stored and disseminated.
  • electronic documents include audio content, visual content, and other information, as well as text and links to other electronic documents.
  • Electronic document can be collected into electronic document collections.
  • Electronic document collections can either be unstructured or structured.
  • the formatting of the documents in an unstructured electronic document collection is not constrained to conform with a predetermined structure and can evolve in often unforeseen ways. In other words, the formatting of individual documents in an unstructured electronic document collection is neither restrictive nor permanent across the entire document collection. Further, in an unstructured electronic document collection, there are no mechanisms for ensuring that new documents adhere to a format or that changes to a format are applied to previously existing documents. Thus, the documents in an unstructured electronic document collection cannot be expected to share a common structure that can be exploited in the extraction of information. Examples of unstructured electronic document collections include the documents available on the Internet, collections of resumes, collections of journal articles, and collections of news articles. Documents in some unstructured electronic document collections are not prohibited from including links to other documents inside and outside of the collection.
  • the documents in structured electronic document collections generally conform with formats that can be both restrictive and permanent.
  • the formats imposed on documents in structured electronic document collections can be restrictive in that common formats are applied to all of the documents in the collections, even when the applied formats are not completely appropriate.
  • the formats can be permanent in that an upfront commitment to a particular format by the party who assembles the structured electronic document collection is generally required.
  • users of the collections in particular, programs that use the documents in the collection — rely on the documents' having the expected format. As a result, format changes can be difficult to implement. Structured electronic document collections are best suited to applications where the information content lends itself to simple and stable categorizations.
  • structured electronic document collections include databases that are organized and viewed through a database management system (DBMS) in accordance with hierarchical and relational data models, as well as a collections of electronic documents that are created by a single entity for presenting information consistently.
  • DBMS database management system
  • a collection of web pages that are provided by an online bookseller to present information about individual books can form a structured electronic document collection.
  • a collection of web pages that is created by server-side scripts and viewed through an application server can form a structured electronic document collection.
  • one or more structured electronic document collections can each be a subset of an unstructured electronic document collection.
  • This specification describes technologies relating to retrieval and display of information from an unstructured electronic document collection, for example, the electronic documents available on the Internet.
  • an electronic document collection may be unstructured
  • the information content of the unstructured electronic document collection can be displayed in a structured presentation.
  • the information content of an unstructured electronic document collection can be used not only to determine the values of attributes but also to identify, select, and name attributes and instances in a structured presentation.
  • Such structured presentations can present information in a coherent manner to a user despite the diversity in sources. Examples of structured presentations include tables and other collections of records.
  • one aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving a machine- readable search query from a user and responding to the search query with instructions for presenting the user with a structured presentation of instances relevant to the search query.
  • a visual presentation of the structured presentation denotes associations between the instances and values that characterize attributes of the instances by virtue of an arrangement of identifiers of the instances and the values.
  • the identifiers of the instances and the values are drawn from two or more documents in an unstructured collection of electronic documents.
  • the electronic document collection being unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
  • Responding to the search query can include identifying a first collection of electronic documents in the unstructured collection that relate to the instances, extracting values of the attributes of the instances from the first collection of electronic documents, and populating the structured presentation with the values extracted from two or more electronic documents.
  • Responding to the search query can include extracting a first value of a first attribute of a first instance from a first electronic document, extracting a second value of a second attribute of the first instance from a second electronic document, and associating the first value and the second value with the first instance in a single in the structured presentation.
  • the first attribute can differ from the second attribute and the first electronic document can differ from the second electronic document.
  • Responding to the search query can include extracting a first value of an attribute of a first instance from a first electronic document, extracting a second value of an attribute of a second instance from the first electronic document, associating the first value with the first instance in a first record, and associating the second value in with the second instance in a second record.
  • the first instance can differ from the second instance.
  • the structured presentation can include a table and the records can include rows or columns of the table.
  • the structured presentation can include a collection of cards and the records can be individual cards in the collection.
  • the method can also include receiving a trigger for the addition of a new instance to the structured presentation and suggesting new instances for addition to the structured presentation in response to the trigger.
  • the method can also include receiving a specification of a constraint from a user and suggesting new instances comprises suggesting new instances that satisfy the user- specified constraint.
  • the method can include receiving a trigger for the addition of a new attribute to the structured presentation and adding a new attribute to the structured presentation in response to the trigger.
  • the method can also include receiving a user specification of a trait of the new attribute and populating the structured presentation with values of the attribute based on the user- specified trait.
  • the unstructured electronic document collection can include electronic documents available on the Internet.
  • the structured presentation can be physically presented on a display screen, including physically transforming one or more elements of the display screen.
  • Other embodiments of this aspect include corresponding systems, apparatus, and computer programs recorded on computer storage devices, each configured to perform the operations of the methods.
  • Another aspect of the subject matter described in this specification can be embodied in an apparatus that includes one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations.
  • the operations can include receiving description data describing a preexisting structured presentation, drawing an identifier of a first instance from a first web site, drawing a first value of a first attribute of the first instance from a second web site, adding the identifier of a first instance and the new value to the preexisting structured presentation to form a new record in a new structured presentation, and outputting instructions for visually presenting the new structured presentation.
  • a visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • Drawing the identifier of the first instance from the first web site can include comparing characteristics of the preexisting structured presentation with content of the preexisting structured presentation.
  • the operations can also include receiving an identifier of a second instance from the user.
  • the new structured presentation can include a second new record that presents the second instance in association with a second value of the first attribute of the second instance.
  • the operations can include receiving the second value from the user.
  • a collection of candidate values can be presented to the user and a selection of a second value can be received from the user.
  • the collection of candidate values an include the second value.
  • a collection of candidate values of the first attribute of the second instance can be identified and, for each of the candidate values, a confidence that the candidate value is correct can be determined.
  • the operations can include suggesting a collection of new instances to be added to the structured presentation.
  • the collection of new instances can be suggested by comparing characteristics of the preexisting structured presentation with content of the first web site and the second web site and/or by comparing a machine-readable search query with content of the first web site and the second web site.
  • Drawing the first value from the second web site can include identifying that the second web site includes a review, extracting the identifier directly from the first web site, or extracting the identifier from a machine-readable database that includes information extracted from the first web site.
  • the preexisting structured presentation can include a table and the records can include rows or columns of the table.
  • the preexisting structured presentation can include a collection of cards and the records can be individual cards in the collection.
  • the operations can include visually displaying the new structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • a system in another aspect, includes a client device and one or more computers programmed to interact with the client device and to perform operations.
  • the operations include receiving description data describing a preexisting structured presentation, drawing an identifier of a first instance from a first web site, drawing a first value of a first attribute of the first instance from a second web site, adding the identifier of a first instance and the new value to the preexisting structured presentation to form a new record in a new structured presentation, and outputting to the client device instructions for visually presenting the new structured presentation.
  • a visual presentation of the preexisting structured presentation visually presents information in a systematic arrangement that conforms with a structured design.
  • the structured presentation including a collection of records, each of which denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • the one or more computers can include a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client.
  • a system in another aspect, includes a client device and one or more computers programmed to interact with the client device and to perform operations.
  • the operations include receiving a machine-readable search query from the client device and responding to the search query by sending to the client device instructions for presenting a structured presentation of instances relevant to the search query.
  • a visual presentation of the structured presentation denotes associations between the instances and values that characterize attributes of the instances by virtue of an arrangement of identifiers of the instances and the values.
  • the identifiers of the instances and the values are drawn from two or more documents in an unstructured collection of electronic documents.
  • the electronic document collection being unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
  • the one or more computers can include a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client.
  • Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation, adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation.
  • a visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design.
  • the structured presentation associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • Adding the identifier of the new instance can include formulating a collection of instance suggestions, providing the instance suggestion collection to a user, and receiving a user selection of the new instance, wherein the new instance is in the collection of instance suggestions.
  • Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation.
  • Formulating the collection of instance suggestions can include identifying a first document in the electronic document collection that includes an identifier of an instance identified in the preexisting structured presentation and that is arranged in accordance with a template, identifying a second document that is arranged in accordance with the template but relevant to a second instance, and including the second instance in the instance suggestion collection.
  • Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include one or more of the following: identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation, locating the new instance in a stored collection of associations of instances with attributes, comparing the characteristics of the preexisting structured presentation with the attributes characterized in the preexisting structured presentation, comparing the attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents, comparing the value of attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents, and comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents.
  • the collection of electronic documents can include the electronic documents available on the Internet.
  • the electronic documents can include web pages.
  • the expanded structured presentation can include a table or a collection of cards.
  • the method can include visually displaying the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • apparatuses that include one or more machine-readable data storage media storing instructions operable to cause one or more data processing machines to perform operations.
  • the operations include formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, providing the instance suggestion collection to a user, receiving a user selection of a first instance in the collection of instance suggestions, and adding an identifier of the first instance suggestion to a structured presentation.
  • the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
  • a visual presentation of the structured presentation visually presents information in an organized arrangement.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation.
  • Formulating the collection of instance suggestions can include one or more of the following: comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection; identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation; identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template, identifying a second document that is arranged in accordance with the template but relevant to the a second instance, and including the second instance in the instance suggestion collection; identifying documents in the electronic document collection that include identifiers of one or more instances in the preexisting structured presentation, identifying additional attributes used to characterize instances in the preexisting structured presentation; comparing values of attributes used to characterize instances in the preexisting structured presentation with values of the instance suggestions; identifying a category of instances that includes instances in the preexisting structured presentation and formulating the collection of instance suggestions using instances in the category of instances; identifying the instance
  • a system that includes a client device and one or more computers programmed to interact with the client device and to perform operations.
  • the operations include receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation, adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation on a display device coupled in data communication with the client device.
  • a visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design.
  • the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • FIG. 1 Another aspect of the subject matter described in this specification can be embodied in a system that includes a client device and one or more computers programmed to interact with the client device and to perform operations.
  • the operations include formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, providing the instance suggestion collection to a user using the client device, receiving a user selection of a first instance in the collection of instance suggestions, and adding an identifier of the first instance suggestion to a structured presentation presented on a display device coupled in data communication with the client device, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement.
  • the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation.
  • the one or more computers can include a server operable to interact with the client device through a data communication network.
  • the client device can be operable to interact with the server as a client.
  • the client device can include a personal computer running a web browser.
  • the personal computer can include the display device.
  • Other embodiments of this aspect include corresponding computer program products, apparatus, and methods.
  • Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation, adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation.
  • a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • Adding the identifier of the new attribute can include formulating a collection of attribute suggestions, providing the attribute suggestion collection to a user, and receiving a user selection of the new attribute.
  • the new attribute can be in the collection of instance suggestions.
  • Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation.
  • Formulating the attribute suggestion collection can include identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template and adding an attribute used in the first document to characterize the instance in the attribute suggestion collection.
  • Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include one or more of the following: identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation; identifying the new attribute in a stored collection of associations of instances with attributes; comparing the instances characterized in the preexisting structured presentation with the content of the electronic documents; identifying additional instances related to the instances identified in the preexisting structured presentation; comparing an attribute or a value of an attribute used to characterize an instances in the preexisting structured presentation with the content of the electronic documents; comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents.
  • the collection of electronic documents can include the electronic documents available on the Internet and the electronic documents can include web pages.
  • the expanded structured presentation can include a table or a collection of cards.
  • the method can include visually presenting the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • the operations can include formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, providing the attribute suggestion collection to a user, receiving a user selection of a first attribute in the collection of attribute suggestions, and adding an identifier of the first attribute suggestion to a structured presentation.
  • the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
  • a visual presentation of the structured presentation visually presents information in an organized arrangement.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation.
  • Formulating the collection of attribute suggestions can include one or more of the following: comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection; identifying documents in the electronic document collection that include structured components that characterize instances identified in the preexisting structured presentation; identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template and including an attribute used to characterize the instance in the attribute suggestion collection; and identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation.
  • Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include one or more of the following: comparing instances identified in the preexisting structured presentation with the content of the electronic documents; and comparing an attribute or a value of an attribute used to characterize an instance in the preexisting structured presentation with the content of the electronic documents.
  • Formulating the collection of attribute suggestions can include identifying a category of instances that includes instances in the preexisting structured presentation and formulating the collection of attribute suggestions from attributes used to characterize instances in the category of instances.
  • the collection of attribute suggestions can also be formulated by identifying the attribute suggestions in a stored collection of associations of instances with attributes.
  • the collection of electronic documents can include electronic documents available on the Internet and the electronic documents can include web pages.
  • the structured presentation can include a table or a collection of cards.
  • the operations can also include visually presenting the structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • a system that includes a client device comprising a display screen, and one or more computers programmed to interact with the client device and to perform operations.
  • the operations include receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation, adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation on the display screen.
  • a visual presentation of the preexisting structured presentation visually presents information in an systematic arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • a system that includes a client device comprising a display screen, and one or more computers programmed to interact with the client device and to perform operations.
  • the operations include formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, providing the attribute suggestion collection to the client device, receiving a selection of a first attribute in the collection of attribute suggestions from the client device, and adding an identifier of the first attribute suggestion to a structured presentation presented on the display screen.
  • the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
  • a visual presentation of the structured presentation visually presents information in an organized arrangement.
  • the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation.
  • Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation, adding the new value to the preexisting structured presentation to form a new structured presentation, and outputting instructions for visually presenting the new structured presentation.
  • a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • An identifier of a first instance that appears in the structured presentation in a first electronic document can be located and the new value can be extracted from the first electronic document.
  • the adding of the new value can include identifying a collection of values of a first attribute of a first instance and establishing a subset of one or more of the identified values as suitably characterizing the first attribute of the first instance.
  • Establishing the subset of values as suitable can include one or more of the following: grouping the values in the collection into groups; selecting the subset based at least in part on a count of values in the subset; selecting the subset based at least in part on values in the subset meeting a user- specified constraint; selecting the subset based at least in part on a value in the subset being drawn from a high quality document; selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another instance in the preexisting structured presentation, and selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another attribute in the preexisting structured presentation.
  • the collection of electronic documents can be the Internet and the electronic documents can be web pages.
  • the preexisting structured presentation can include a table or a collection of cards.
  • the method can include visually presenting the new structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • the operations can include receiving description data describing a first instance, a second instance, and a first attribute, extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection, extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection, establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance, establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance, and generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset.
  • the structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values.
  • the first subset of values can be established as suitable by grouping the values in the first collection into groups, wherein each group includes a subset of the first collection of values.
  • the first subset of values can be established as suitable by selecting the first subset based at least in part on a count of values in the first subset.
  • the first subset of values can be established as suitable by comparing the values in the first subset with a user- specified constraint on the values.
  • the first subset of values can be established as suitable by determining that a value in the first subset is drawn from a high quality document.
  • the first subset of values can be established as suitable by determining that a value in the first subset is drawn from a document relevant to the second instance.
  • the first subset of values can be established as suitable by determining that a value in the first subset is drawn from a document relevant to another attribute that characterizes both the first instance and the second instance.
  • the description of the first instance can include an identifier of the first instance that appears in a preexisting structured presentation.
  • the description of the second instance can include an identifier of the second instance that appears in the preexisting structured presentation.
  • the description of the first attribute can include a description of a new attribute that is to be added to a preexisting structured presentation.
  • the unstructured electronic document collection can include electronic documents available on the Internet.
  • the structured presentation can be a table or a collection of cards.
  • the structured presentation can be visually presented on a display screen, including physically transforming one or more elements of the display screen. Other embodiments of this aspect include corresponding systems, apparatus, and methods.
  • Another aspect of the subject matter described in this specification can be embodied in a system that includes a device and one or more computers programmed to interact with the device and to perform operations.
  • the operations include receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation, adding the new value to the preexisting structured presentation to form a new structured presentation, and outputting instructions for visually presenting the new structured presentation to the device.
  • a visual presentation of the preexisting structured presentation visually presents information in an systematic arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • Other embodiments of this aspect include corresponding computer program products, apparatus, and methods.
  • a system that includes a device and one or more computers programmed to interact with the device and to perform operations.
  • the operations include receiving description data describing a first instance, a second instance, and a first attribute, extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection, extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection, establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance, establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance, generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset, wherein the structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values, and sending the machine-readable
  • Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of displaying a structured presentation on a display device, receiving data characterizing a user interaction with the displayed structured presentation, the data including a specification of a first instance and a first attribute of the structured presentation, and displaying a formerly concealed search interface on the display device in response to receiving the data.
  • the structured presentation visually presents information in a systematic and structured arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • the search interface includes information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
  • Receiving the data characterizing the user interaction with the displayed structured presentation can include receiving a manual user specification of the first instance and the first attribute that are associated with a cell in the structured presentation or receiving data characterizing a user interaction with a cell in the structured presentation.
  • the cell can be associated with the first instance and the first attribute by virtue of the arrangement of the cell relative to identifiers of the first instance and the first attribute in the structured presentation.
  • Receiving data characterizing the user interaction with the cell can also include receiving data characterizing the user interaction with an empty cell.
  • Displaying the formerly concealed search interface can include one or more of the following: displaying an interactive element that can be selected by a user to trigger a search of the electronic document collection to locate the first value; displaying an interactive value entry element that can be selected by a user to specify a value characterizing the first attribute of the first instance; displaying a snippet characterizing a context of the first value in a first document of the electronic document collection; and displaying a result of a prior search of the electronic document collection to locate the first value.
  • the first value can appear in the structured presentation as a value characterizing the first attribute of the first instance.
  • Displaying the formerly concealed search interface can also include displaying an identifier of a first electronic document in the electronic document collection, wherein the first value is drawn from the first electronic document.
  • the method can also include determining that the first electronic document is inoperative to provide the first value and displaying a visual indication of the inoperativeness of the first document.
  • the user can be presented with an option to select the first value consistently from a first document regardless of changes in relevancy of the first document to the first instance and the first attribute or with an option to select the first value from a first document that is most relevant to the first instance and the first attribute.
  • the method of can also include searching an unstructured collection of electronic documents to locate the first value in response to a user interaction with the search interface and adding the first value to the structured presentation.
  • Receiving the specification of the first instance and the first attribute can include receiving a specification of a collection of attributes or a collection of instances.
  • the method can also include updating the display of the structured presentation in response to a passage of a time.
  • Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.
  • FIG. 1 Another aspect of the subject matter described in this specification can be embodied in a system that includes one or more computers programmed to interact with client devices and to perform operations.
  • the operations include receiving data characterizing user interaction specifying a first cell of a structured presentation displayed on a display device, determining that a prior search has been conducted to populate the first cell with a first value, and, in response to determining that a prior search was conducted, displaying information characterizing the prior search on the display device.
  • the structured presentation visually presents information in a systematic and structured arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of the values in cells.
  • the data characterizing user interaction specifying the first cell can include a manual user specification of the first instance and the first attribute that are associated with the first cell.
  • the information characterizing the prior search can include information identifying an electronic document from which the first value is drawn.
  • the information characterizing the prior search can include one or more of the following: a collection of electronic documents from which the first value could have been drawn; information identifying a first electronic document in the electronic document collection from which the first value is drawn; and a snippet characterizing a context of the first value in a first document of the electronic document collection.
  • the information characterizing the prior search can be displayed, e.g., in a display element of a formerly concealed search interface.
  • the operations can also include determining that the first electronic document is inoperable to provide the first value and displaying a visual indication of the inoperability of the first document.
  • the operations can also include updating a display of a value in the first cell of the structured presentation in response to the user interaction.
  • the collection of electronic documents can include electronic documents available on the Internet.
  • the electronic documents can include web pages.
  • the structured presentation can be a collection of cards.
  • a system that includes one or more computers programmed to interact with a client device comprising a display device and to perform operations.
  • the operations include displaying a structured presentation on the display device, receiving data characterizing a user interaction with the displayed structured presentation, and displaying a formerly concealed search interface on the display device in response to receiving the data.
  • the structured presentation visually presents information in a systematic and structured arrangement that conforms with a structured design.
  • the structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
  • the data includes a specification of a first instance and a first attribute of the structured presentation.
  • the search interface includes information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
  • FIG. 1 is a schematic representation of a system in which information from an electronic document collection is presented to a user in a structured presentation.
  • FIG. 2 is a schematic representation of an implementation of another system in which information from an electronic document collection is presented to a user in a structured presentation.
  • FIGS. 3, 4, 5 are schematic representations of example structured presentations.
  • FIG. 6 is a flow chart of an example process for presenting information from an electronic document collection to a user in a structured presentation.
  • FIGS. 7 and 8 are flow charts of example processes for identifying two or more relevant documents in an electronic document collection.
  • FIG. 9 is a flow chart of a process for suggesting and/or adding new instances to a structured presentation
  • FIG. 10 is a schematic representation of a user interface component for receiving user input specifying modifications of a structured presentation.
  • FIG. 11 is schematic representation of a user interface component for receiving user input specifying a technique for adding new instances to a structured presentation.
  • FIG. 12 is schematic representation of a user interface component for receiving user input specifying constraints that are to be used in the user-specified constraint option for adding new instances to a structured presentation.
  • FIG. 13 is a flow chart of an example process for adding new attributes to a structured presentation.
  • FIG. 14 is schematic representation of a user interface component for adding new attributes to a structured presentation.
  • FIG. 15 is a flow chart of an example process for adding new attribute values to a structured presentation.
  • FIG. 16 is a flow chart of an example process for adding new attribute values to a structured presentation.
  • FIG. 17 is a schematic representation of a user interface component for selecting a candidate value to be added to a structured presentation.
  • FIG. 18 a schematic representation of a structured presentation that includes highlights of deficiencies in the attribute values presented therein.
  • FIG. 19 is a schematic representation of a user interface component for selecting a candidate attribute to be added to a structured presentation.
  • FIG. 20 is a schematic representation of a user interface component for selecting a candidate instance to be added to a structured presentation.
  • FIG. 21 is a schematic representation of a process by which new instances can be added to expand a preexisting structured presentation.
  • FIG. 22 is a flow chart of an example process for adding instances to a structured presentation based on the content of documents in an electronic document collection.
  • FIG. 23 is a flow chart of an example process for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 24 is a representation of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 25 is a flow chart of an example process for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 26 is a representation of a portion of a hypertext markup language template that is used as a pattern for descriptions of a movie.
  • FIG. 27 is a schematic representation of a process by which a collection of new instance suggestions can be formulated based on information in a preexisting structured presentation.
  • FIG. 28 is a schematic representation of a table that associates attributes and instances in an electronic document collection.
  • FIG. 29 is a flow chart of a process for formulating instance suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation.
  • FIG. 30 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation.
  • FIG. 31 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation.
  • FIG. 32 is a schematic representation of a table that associates attributes, instances, and their values in data collection.
  • FIG. 33 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation.
  • FIG. 34 is a representation of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 35 is a schematic representation of a collection of processes that can be used to formulate a collection of new instance suggestions based on information in a preexisting structured presentation.
  • FIG. 36 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation.
  • FIG. 37 is a schematic representation of a process by which new attributes can be added to expand a preexisting structured presentation.
  • FIG. 38 is a flow chart of an example process for adding attributes to a structured presentation based on the content of documents in an electronic document collection.
  • FIG. 39 is a flow chart of an example process for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 40 is a representation of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 41 is a flow chart of an example process for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 42 is a representation of a portion of a hypertext markup language (HTML) template that is used as a pattern for descriptions of a movie.
  • HTML hypertext markup language
  • FIG. 43 is a schematic representation of a process by which a collection of new attribute suggestions can be formulated based on information in a preexisting structured presentation.
  • FIG. 44 is a schematic representation of a table that associates attributes and instances in an electronic document collection.
  • FIG. 45 is a flow chart of a process for formulating attribute suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation.
  • FIG. 46 is a flow chart of a process for formulating a collection of new attribute suggestions based on information in a preexisting structured presentation.
  • FIG. 47 is a flow chart of a process for identifying related instances for use in formulating attribute suggestions based on information in a preexisting structured presentation.
  • FIG. 48 is a flow chart of a process for formulating a collection of new attribute suggestions based on information in a preexisting structured presentation.
  • FIG. 49 is a representation of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • FIG. 50 is a schematic representation of a collection of processes that can be used to formulate a collection of new attribute suggestions based on information in a preexisting structured presentation.
  • FIG. 51 is a flow chart of a process for formulating a collection of new attribute suggestions based on information in a preexisting structured presentation.
  • FIG. 52 is a schematic representation of a system in which attribute values drawn from two or more electronic documents in electronic document collection are presented to a user in a structured presentation.
  • FIG. 53 is a schematic representation of an implementation of system in which attribute values drawn from two or more electronic documents in electronic document collection are presented to a user in a structured presentation.
  • FIG. 54 is a schematic representation of a table that can associate attributes, values, and instances in an electronic document collection.
  • FIG. 55 is a flow chart of an example process for presenting attribute values drawn from two or more electronic documents in an electronic document collection to a user in a structured presentation.
  • FIG. 56 is a flow chart of a process for selecting one or more values for presentation in a structured presentation.
  • FIG. 57 is a flow chart of a process for selecting one or more values for presentation in a structured presentation.
  • FIG. 58 is a flow chart of an example process for selecting one or more values for presentation in a structured presentation.
  • FIG. 59 is a schematic representation of a circumstance in which attribute values drawn from electronic documents in electronic document collection are presented to a user in a structured presentation.
  • FIG. 60 is a schematic representation of a process in which both attributes and attribute values are drawn from electronic documents in an electronic document collection and presented to a user in a structured presentation.
  • FIG. 61 is a flow chart of a process for adding values to a structured presentation based on the content of documents in an electronic document collection.
  • FIGS. 62, 63, and 64 are schematic representations of structured presentations in which a search interface is concealed.
  • FIGS. 65, 66, 67, 68, 69-70 illustrate display elements in which formerly concealed search interfaces are presented.
  • FIG. 71 is a flow chart of a process for adding values to a structured presentation by drawing the values from the content of documents in an electronic document collection.
  • FIGS. 72, 73, and 74 illustrate display elements in which formerly concealed search interfaces presented.
  • FIG. 75 is a flow chart of a process for adding values to a structured presentation based on the content of documents in an electronic document collection.
  • FIG. 1 is a schematic representation of a system 100 in which information from an unstructured electronic document collection 102 is presented to a user in a structured presentation 106.
  • system 100 includes a display screen 104 and a data communication infrastructure 108.
  • system 100 extracts information from unstructured collection of electronic documents 102 and presents the extracted information in a structured presentation 106 on display screen 104.
  • Electronic document collection 102 is unstructured in that the organization of information within individual documents in electronic document collection 102 need not conform with a predetermined structure that can be exploited in the extraction of information.
  • electronic documents 110, 112, 114 were added to collection 102 by three different users who organize the content of their respective electronic documents differently. The users need not collaborate to ensure that information within documents 110, 112, 114 is in a particular format.
  • the user can do so without regard for the format of the documents added by the other users. There is no need for the user to inform the other users of the change.
  • documents can be added to collection 102 by entities who not only fail to collaborate but who are also competitors who are adverse to one another, such as three different car manufacturers or three different sellers of digital cameras.
  • structured presentation 106 is structured and presents information drawn from documents in collection 102 in an organized, systematic arrangement.
  • the grouping, segmentation, and arrangement of information in structured presentation 106 conforms with a structured design even when the information therein is drawn from different contexts in a diverse set of documents in collection 102. Further, changes to one aspect of the design of structured presentation 106 can be propagated throughout structured presentation 106.
  • structured presentations include spreadsheet tables, collections of cards or other records, and other structured presentation formats. Such structured presentations can conform with rules that specify the spatial arrangement of information in the displays, the positioning and identification of various organizational and informational aspects (e.g., column headers, row headers, unit identifiers, and the like) of the structured presentations, the graphical representation of values, and other characteristics.
  • organizational and informational aspects e.g., column headers, row headers, unit identifiers, and the like
  • the structuring of information in structured presentations generally facilitates the understanding of the information by a viewer. For example, a viewer can discern the nature of the information contained within the structured presentation by reading headers. A viewer of can easily identify and compare values described in the structured presentation based on the arrangement and positioning of those values in the display. For example, a user can easily ascertain that certain values in a structured presentation all relate to attributes (i.e., characteristics) of different cars and can easily compare those values.
  • System 100 is not limited to merely populating structured presentation 106 with values drawn from documents in collection 102.
  • system 100 can determine entities (i.e., "instances") that are to be described in structured presentation 106, values that characterize the attributes of those instances, as well as an appropriate structuring of structured presentation 106.
  • entities i.e., "instances”
  • values that characterize the attributes of those instances, as well as an appropriate structuring of structured presentation 106.
  • Such determinations can be based on information drawn from different documents in collection 102 that are not restricted to having a specific format, a permanent format, or both.
  • the attributes that appear in structured presentation 106 can be based on the attributes used in documents in collection 102 to characterize certain instances, as discussed further below.
  • the units of the values (e.g., meters, feet, inches, miles) that appear in structured presentation 106 can be based on the units of the values that appear documents in collection 102.
  • the instances that appear in structured presentation 106 can be determined based on collections of instances that appear in documents in collection 102.
  • such information can be drawn from previously unspecified documents in collection 102.
  • a search query can be used to identify documents in collection 102 and the information can be drawn from these documents.
  • the identified documents need not be limited to being associated with the account of a particular individual or originating from a particular retailer. Instead, the information can be drawn from previously unspecified documents.
  • System 100 can thus exploit the diverse information content of documents in collection 102 in a variety of different ways to present a structured presentation to a user.
  • the amount of information that can be exploited can be very large. Moreover, in many cases, this can be done automatically or with a relatively small amount of human interaction, as discussed further below.
  • FIG. 2 is a schematic representation of an implementation of a system 200 in which information from an unstructured electronic document collection 102 is presented to a user in a structured presentation 106.
  • the data communication infrastructure 108 interconnects electronic document collection 102, display screen 104, and a collection of data storage and processing elements, including a search engine 202, a crawler 204, a data center 208, and document compressing, indexing and ranking modules 210.
  • Search engine 202 is programmed with one or more sets of machine-readable instructions for searching unstructured electronic document collection 102.
  • Search engine 202 can be implemented on one or more computers deployed at one or more geographical locations.
  • Crawler 204 is programmed with one or more sets of machine-readable instructions for crawling unstructured electronic document collection 102.
  • Crawler 204 can be implemented on one or more computers deployed at more or more geographical locations.
  • Compressing, indexing, and ranking modules 210 are programmed with one or more sets of machine-readable instructions for compressing, indexing, and ranking documents in collection 102.
  • Compressing, indexing, and ranking modules 210 can be implemented on one or more computers deployed at more or more geographical locations.
  • the data center 208 stores information characterizing electronic documents in electronic document collection 102.
  • the information characterizing such electronic documents can be stored in the form of an indexed database that includes indexed keywords and the locations of documents in collection 102 where the keywords can be found.
  • the indexed database can be formed, e.g., by crawler 204.
  • the information stored in data center 208 can itself be organized to facilitate presentation of structured presentation 106 to a user.
  • information can be organized by crawler 204 and compressing, indexing and ranking modules 210 in anticipation of the need to present structured presentations 106 that are relevant to certain topics.
  • the structure of information in data center 208 can facilitate the grouping, segmentation, and arrangement of information in structured presentations 106. This organization can be based on a variety of different factors. For example, an ontology can be used to organize information stored in data center 208. As another example, a historical record of previous structured presentations 106 can be used to organize information stored in data center 208. As another example, the data tables described herein can be used to organize information stored in data center 208.
  • system 200 includes multiple display screens 104 that can present structured presentations in accordance with machine-readable instructions.
  • Display screens 104 can include, e.g., cathode ray tubes (CRT's), light emitting diode (LED) screens, liquid crystal displays (LCD's), gas-plasma displays, and the like.
  • Display screens 104 can be an integral part of a self-contained data processing system, e.g., a personal data assistant (PDA) 215, a desktop computer 217, or a mobile telephone.
  • PDA personal data assistant
  • instructions for presenting structured presentations are modified to the particularities of a display screen 104 after receipt by such a self-contained data processing system. However, this is not always the case.
  • display screens 104 can also be part of more disperse systems where the processing of instructions for presenting a structured presentation is completed before the instructions are received at display screen 104.
  • display screens 104 can be incorporated into "dumb" devices, e.g., television sets or computer monitors, that receive instructions for presenting structured presentation 106 from a local or remote source.
  • system 200 can transform the unstructured information in collection 102 into structured presentation 106 that is presented to a viewer. Such transformations can be performed in the context of web search in which a search engine receives and responds to information requests based on information extracted from the electronic documents in collection 102.
  • PDA personal data assistant
  • desktop computer 217 can interact with a user and thereby receive a search query, e.g., by way of a web browser application.
  • a description 212 of the query can be transmitted over a wireless data link 219 and/or a wired data link 221 to search engine 202.
  • search engine 202 can use query description 212 to identify information in data center 208 that can be used in presenting structured presentation 106 on display screen 104.
  • the identified information can be drawn from two or more unspecified electronic documents in unstructured electronic document collection 102.
  • query description 212 can include search terms that are used by search engine 202 to retrieve information for presenting a structured presentation 106 to a user.
  • search terms in query description 212 can be used to identify, in data center 208, a collection of related instances, attributes that characterize such instances, value that characterize the individual instances, and/or other aspects of structured presentation 106.
  • the search engine 202 can also generate a response 214 to query description 212.
  • the response 214 can be used to present structured presentation 106 for a user.
  • response 214 includes machine readable-instructions that can be interpreted by a data processing device in systems 215, 217 to present structured presentation 106.
  • response 214 can be coded in HTML to specify the characteristics and content of structured presentation 106.
  • response 214 can include text snippets or other information from data center 208 that is used in presenting structured presentation 106.
  • response 214 can include a collection of values, the name of a new attribute, or an estimate of the likelihood that a value to be displayed in structured presentation 106 is correct, as discussed further below.
  • system 200 uses the information stored in data center 208 to identify the location of one or more documents that are relevant to the query described in query description 212.
  • search engine 202 can compare the keywords in query description 212 to an index of keywords stored in data center 208. The comparison can be used to identify documents in collection 102 that are relevant to query description 212. The locations of such identified documents can be included in responses 214, e.g., as a hyperlink to the documents that are that are responsive to the described query.
  • the system 200 can store attributes and/or their respective values in a manner that facilitates the grouping, segmentation, and arrangement of information in structured presentations 106.
  • collections of instances, their attributes, and their values can be stored in data center 208 as structured presentations 106 are amended and changed by users interacting with client systems such as systems 215, 217.
  • client systems such as systems 215, 217.
  • instances, attributes, and values in one structured presentation 106 presented to a first viewer can be stored in the data center 208 and used in providing subsequent structured presentations 106 to other viewers.
  • FIG. 3 is a schematic representation of an example structured presentation 106, namely, one that includes a table 300.
  • Table 300 is an organized, systematic arrangement of one or more identifiers of instances, as well as the values of particular attributes of those instances. Instances are individually identifiable entities and generally share at least some common attributes.
  • An attribute is a property, feature, or characteristic of an entity. For example, Tom, Dick, and Harry are instances of individuals. Each such individual has attributes such as a name, a height, a weight, and the like. As another example, city instances each have a geographic location, a mayor, and a population. As yet another example, a product instance can have a model name, a maker, and a year.
  • the attributes of an instance can be characterized by values.
  • the values of a particular attribute of a particular instance thus characterize that particular instance.
  • the name of an individual can have the value "Tom”
  • the population of a city can have the value "4 million”
  • the model name of a product can have the value "Wrangler.”
  • structured presentations such as table 300 can also include identifiers of attributes, as well as identifiers of the units in which values are expressed.
  • the grouping, segmentation, and arrangement of information in table 300 can be selected to facilitate understanding of the information by a user.
  • table 300 includes a collection of rows 302.
  • Each row 302 includes an instance identifier 306 and a collection of associated attribute values 307.
  • the arrangement and positioning of attribute values 307 and instance identifiers 306 in rows 302 thus graphically represents the associations between them. For example, a user can discern the association between attribute values 307 and the instance identifier 306 that is found in the same row 302.
  • Table 300 also includes a collection of columns 304. Each column 304 includes an attribute identifier 308 and a collection of associated attribute values 307.
  • the arrangement and positioning of attribute values 307 and attribute identifier 308 in columns 304 thus graphically represent the associations between them. For example, a user can discern the association between attribute values 307 and the attribute identifier 308 that is found in the same column 304 based on their alignment.
  • Each row 302 is a structured record 310 in that each row 302 associates a single instance identifier 306 with a collection of associated attribute values 307. Further, the arrangement and positioning used to denote these associations in one structured record 310 is reproduced in other structured records 310 (i.e., in other rows 302). Indeed, in many cases, all of the structured records 310 in a structured presentation 106 are restricted to having the same arrangement and positioning of information. For example, values 307 of the attribute "ATTR_2" are restricted to appearing in the same column 304 in all rows 302. As another example, attribute identifiers 308 all bear the same spatial relationship to the values 307 appearing in the same column 304.
  • changes to the arrangement and positioning of information in one structured record 310 are generally propagated to other structured record 310 in the structured presentation 106. For example, if a new attribute value 307 that characterizes a new attribute (e.g., "ATTR_2%") is added to one structured record 310, then a new column 304 is added to structured presentation 106 so that the values of attribute "ATTR_2W of all instances can be added to structured presentation 106.
  • a new attribute value 307 that characterizes a new attribute e.g., "ATTR_2%”
  • values 307 in table 300 can be presented in certain units of measure. Examples of units of measure include feet, yards, inches, miles, seconds, gallons, liters, degrees Celsius, and the like. In some instances, the units of measure in which values 307 are presented are indicated by unit identifiers 309. Unit identifiers 309 can appear, e.g., beside values 307 and/or beside relevant attribute identifiers 308. The association between unit identifiers 309 and the values 307 whose units of measure are indicated is indicated to a viewer by such positioning. In many cases, all of the values 307 associated with a single attribute (e.g., all of the values 307 in a single column 304) are restricted to being presented in the same unit of measure. The information extracted from electronic document collection 102 by systems 100,
  • the information extracted from electronic document collection 102 can be used to determine values 307 for populating table 300.
  • the information extracted from electronic document collection 102 can be used to suggest new attributes and/or new instances for addition to table 300.
  • instance identifiers 306 can be selected based on one or more search strings. For example, if the search string "hybrid vehicles" is received from a user by search engine 202, systems such as system 200 can generate and populate table 300 based on information extracted from electronic document collection 102 using the search string. For example, system 200 can access data center 208, identify instance identifiers 306 in the electronic documents that are relevant to the search string, determine a set of common attributes for the identified instances — as well as identifiers 308 of those attributes and values 307 for those attributes. In effect, system 200 can determine instance identifiers 306, attribute identifiers 308, as well as the associated values 307 based on the received search string.
  • one or more attribute identifiers 308, instance identifiers 306, and/or values 307 can be received from a user for whom table 300 is to be displayed.
  • systems such as system 200 can generate and populate table 300 based on information extracted from electronic document collection 102 using one or more received attribute identifiers 308, instance identifiers 306, and/or values 307.
  • system 200 can formulate new instance identifiers 306, attribute identifiers 308, as well as the associated values 307 based on the received attribute identifiers 308, instance identifiers 306, and/or values 307.
  • FIG. 4 is a schematic representation of another implementation of a structured presentation, namely, one that includes a table 400.
  • table 400 In addition to including attribute identifiers 308, instance identifiers 306, values 307, unit identifiers 309 organized into rows 302 and columns 304, table 400 also includes a number of interactive elements for interacting with a user.
  • table 400 includes a collection of instance selection widgets 405, a collection of action triggers 410, a collection of column action trigger widgets 415, and a notes column 420.
  • Instance selection widgets 405 are user interface components that allow a user to select structured records 310 in table 400.
  • instance selection widgets 405 can be a collection of clickable checkboxes that are associated with a particular structured record 310 by virtue of arrangement and positioning relative to that structured record 310.
  • Instance selection widgets 405 are "clickable" in that a user can interact with widgets 405 using a mouse (e.g., hovering over the component and clicking a particular mouse button), a stylus (e.g., pressing a user interface component displayed on a touch screen with the stylus), a keyboard, or other input device to invoke the functionality provided by that component.
  • Action triggers 410 are user interface components that allow a user to trigger the performance of an action on one or more structured records 310 in table 400 selected using instance selection widgets 405.
  • action triggers 410 can be clickable text phrases, each of which can be used by a user to trigger an action described in the phrase.
  • a "keep and remove others" action trigger 410 triggers the removal of structured records 310 that are not selected using instance selection widgets 405 from the display of table 400.
  • a "remove selected” action trigger 410 triggers the removal of structured records 310 that are selected using instance selection widgets 405 from the display of table 400.
  • a "show on map" action trigger 410 triggers display of the position of structured records 310 that are selected using instance selection widgets 405 on a geographic map. For example, if a selected instance is a car, locations of car dealerships that sell the selected car can be displayed on a map. As another example, if the selected instances are spring break destinations, these destinations can be displayed on a map.
  • Column action trigger widgets 415 are user interface components that allow a user to apply an action to all of the cells within a single column 304.
  • a further user interface component is displayed which offers to the user a set of possible actions to be performed.
  • the actions in this set can include, e.g., removing the entire column 304 from the structured presentation 400 or a search to find values for all the cells in column 304 which are currently blank.
  • notes column 420 is a user interface component that allows a user to associate information with an instance identifier 306.
  • notes column 420 includes one or more notes 425 that are each associated with a structured record 310 by virtue of arrangement and positioning relative to that structured record 310.
  • the information content of notes 425 is unrestricted in that, unlike columns 304, notes 425 are not alleged to be values of any particular attribute. Instead, the information in notes 425 can characterize unrelated aspects of the instance identified in structured record 310.
  • table 400 can include additional information other than values of any particular attribute.
  • table 400 can include a collection of images 430 that are associated with the instance identified in a structured record 310 by virtue of arrangement and positioning relative to that structured record 310.
  • table 400 can include one or more hypertext links 440 to individual electronic documents in collection 102.
  • the linked documents can be highly ranked results in searches conducted using instance identifiers 306 as a search string.
  • the linked documents can be source of a value 307 that was extracted to populate table 400.
  • interaction with hypertext link 440 can trigger navigation to the source electronic document based on information embedded in hypertext link 440 (e.g., a web site address).
  • FIG. 5 is a schematic representation of another implementation of a structured presentation, namely, a collection of cards 500.
  • Card collection 500 is an organized, systematic arrangement of one or more identifiers of instances, as well as the values of particular attributes of those instances. The attributes of an instance can be specified by values.
  • card collection 500 generally includes identifiers of attributes, as well as identifiers of the units in which values are expressed, where appropriate.
  • card collection 500 includes a collection of cards 502.
  • Each card 502 includes an instance identifier 306 and a collection of associated attribute values 307.
  • the arrangement and positioning of attribute values 307 and instance identifiers 306 in cards 502 thus graphically represents the associations between them. For example, a user can discern the association between attribute values 307 and the instance identifier 306 that is found on the same card 502.
  • cards 502 in card collection 500 also include a collection of attribute identifiers 308. Attribute identifiers 308 are organized in a column 504 and attribute values 307 are organized in a column 506. Columns 504, 506 are positioned adjacent one another and aligned so that individual attribute identifiers 308 are positioned next to the attribute value 307 that characterizes that identified attribute. This positioning and arrangement allows a viewer to discern the association between attribute identifiers 308 and the attribute values 307 that characterize those attributes.
  • Each card 502 is a structured record 310 in that each card 502 associates a single instance identifier 306 with a collection of associated attribute values 307. Further, the arrangement and positioning used to denote these associations in one card 502 is reproduced in other cards 502. Indeed, in many cases, all of the cards 502 are restricted to having the same arrangement and positioning of information. For example, the value 307 that characterizes the attribute "ATTR_1" is restricted to bearing the same spatial relationship to instance identifiers 306 in all cards 502. As another example, the order and positioning of attribute identifiers 308 in all of the cards 502 is the same.
  • changes to the arrangement and positioning of information in one card 502 are generally propagated to other cards 502 in card collection 500. For example, if a new attribute value 307 that characterizes a new attribute (e.g., "ATTR_1 W) is inserted between the attribute values "value_l_l” and “value_2_l” in one card 502, then the positioning of the corresponding attribute values 307 in other cards 502 is likewise changed.
  • a new attribute value 307 that characterizes a new attribute e.g., "ATTR_1 W
  • cards 502 in card collection 500 can include other features.
  • cards 502 can include interactive elements for interacting with a user, e.g., instance selection widgets, action triggers, attribute selection widgets, a notes entry, and the like.
  • cards 502 in card collection 500 can include additional information other than values of any particular attribute, e.g., images and/or text snippets that are associated with an identified instance.
  • cards 502 in card collection 500 can include one or more hypertext links to individual electronic documents in collection 102.
  • Such features can be associated with particular instances by virtue of appearing on a card 502 that includes an instance identifier 306 that identifies that instance.
  • a viewer can interact with the system presenting card collection 500 to change the display of one or more cards 502. For example, a viewer can trigger the side- by- side display of two or more of the cards 502 so that a comparison of the particular instances identified on those cards is facilitated. As another example, a viewer can trigger a reordering of card 502, an end to the display of a particular card 502, or the like. As another example, a viewer can trigger the selection, change, addition, and/or deletion of attributes and/or instances displayed in cards 502. As yet another example, a viewer can trigger a sorting of cards into multiple piles according to, e.g., the values of an attribute values 307 in the cards.
  • cards 502 will be displayed with two "sides.”
  • a first side can include a graphic representation of the instance identified by instance identifier 306, while a second side can include instance identifier 306 and values 307. This can be useful, for example, if the user is searching for a particular card in the collection of cards 500, allowing the user to identify the particular card with a cursory review of the graphical representations on the first side of the cards 502.
  • FIG. 6 is a flow chart of an example process 600 for presenting information from an electronic document collection to a user in a structured presentation.
  • Process 600 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 600 can be performed by the search engine 202 in system 200.
  • process 600 can be performed in response to the receipt of a trigger, e.g., a user request, to create or change a structured presentation.
  • the system performing process 600 can identify two or more responsive electronic documents in the electronic document collection (step 605).
  • the responsive documents can be identified in a number of different ways. In some instances, documents are identified based on "new" information — e.g., a new search query — received from viewer.
  • the system can compare a newly received search query with the content of the electronic documents in the electronic document collection using string comparisons.
  • the system can access a data center such as data center 208 and compare the terms in a search query with an index of keywords to identify the location of responsive electronic documents.
  • documents are identified based on "old" information that is already found in a structured presentation.
  • information found in a structured presentation are the identities of instances, attributes, values, and the units in which the values are represented.
  • the system performing process 600 can use this old information to identify responsive electronic documents in the electronic document collection. For example, documents that include instances already found in a structured presentation can be identified as responsive. As another example, documents that characterize instances using attributes already found in a structured presentation can be identified as responsive. Additional examples of such identifications are discussed further below.
  • the system performing process 600 can also gather information from the identified electronic documents (step 610).
  • the gathered information can regard one or more instances, attributes, and/or values.
  • the system performing process 600 can gather this information directly from the documents in an electronic document collection or from previously assembled collections of information that characterize the electronic documents in an electronic document collection.
  • the system performing process 600 can locate documents in collection 102, access the located documents, and extract the information directly from the original documents in collection 102.
  • FIG. 2 In another example in the context of system 200 (FIG.
  • the system performing process 600 can access a collection of information in data center 208 and gather the information from, e.g., a database that includes an index of keywords and the location of documents that include those keywords, an ontology, and/or a historical record of previous structured presentations that were presented using information extracted from documents in collection 102.
  • a database that includes an index of keywords and the location of documents that include those keywords, an ontology, and/or a historical record of previous structured presentations that were presented using information extracted from documents in collection 102.
  • the system performing process 600 can use the gathered information to provide instructions for presenting structured presentations based on the gathered information (step 615). For example, the system performing process 600 can generate machine-readable instructions for presenting a structured presentation, e.g., tables 300, 400 or collection of cards 500.
  • a structured presentation e.g., tables 300, 400 or collection of cards 500.
  • FIG. 7 is a flow chart of an example process 700 for identifying responsive documents in an electronic document collection.
  • Process 700 can be performed in isolation or in conjunction with other data processing activities.
  • process 700 can be performed at step 605 in process 600 (FIG. 6).
  • the system performing process 700 receives a search query (step 705).
  • the system can receive one or more search strings (e.g., "hybrid vehicles") from a user.
  • the system can receive a search string from another process or system.
  • the search string is received through an application programming interface (API), a common gateway interface (CGI) script, or other programming interfaces.
  • the search string is received through a web portal, a web page, or web site, or the like.
  • the system performing process 700 identifies two or more documents that contain instances, attributes, and/or values that are responsive to the search query (step 710).
  • the documents can be identified by classifying the role that terms in the search query are to play in a structured presentation. For example, the terms in a search query can be classified as a categorization of the instances that are to appear in a structured presentation based on, e.g., the particular terms in the search query, an express indication by the user as to how search query terms are to be classified, and/or the context of the search.
  • the terms in a search query "cities in California” can be classified as a categorization of instances such as “San Diego,” “Los Angeles,” and “Bakersfield” due to the plural term “cities” being characterized by an attribute, namely, being "in California.”
  • the terms in a search query "Ivy League schools” can be classified as categorization of instances (such as “Cornell,” “Columbia,” and “Brown") due to the plural term “cities” being characterized by an attribute "Ivy League.”
  • the search query "Ivy League” can reasonably be taken as a categorization of school instances or as an example instance of the category "athletic conferences" which includes instances such as “Atlantic Coast Conference” and "PAC-10.”
  • the terms can be classified, e.g., based on an express indication by the user as to how they are to be classified or based on the context of the terms in a search session. For example, if a user had previously entered the phrases "Atlantic Coast Conference" and "PAC- 10" as search queries, the search query "Ivy League” can be taken as an example instance that is to appear in a structured presentation alongside those other instances.
  • FIG. 8 is a flow chart of another example process 800 for identifying two or more responsive documents in an electronic document collection.
  • Process 800 can be performed in isolation or in conjunction with other data processing activities. For example, process 800 can be performed at step 605 in process 600 (FIG. 6). As another example, process 800 can be performed in conjunction with process 700 at step 605 in process 600 (FIG. 6).
  • processes 700, 800 can be part of an iterative, interactive process in which a search query is received and used to identify a first collection of responsive documents, a first structured presentation that includes content drawn from the identified documents is presented to a user, user modifications are received, and a description of the modified structured presentation is used to identify a second collection of relevant documents.
  • process 800 can be performed several times.
  • process 800 can be performed without user input, e.g., by crawler 206 in system 200 (FIG. 2).
  • the system performing process 800 receives a description of existing content of a structured presentation (step 805).
  • the system can receive a description of the instances, the attributes, the values, and/or the units in which values are presented in an existing structured presentation.
  • the description can include, e.g., identifiers of the instances and the attributes and/or ranges of the values of the attributes.
  • the description can also include a categorization of the instances and/or attributes. Such a categorization can be determined, e.g., using an ontology or based on a categorization assigned by a viewer to a structured presentation. For example, if a user entitles a structured presentation "Ivy League Schools,” then this title can be taken as a categorization of the instances in that structured presentation.
  • the system performing process 800 can identify one or more documents that contain instances, attributes, and/or values that are relevant to the existing content (step 810). For example, the system can compare the identifiers of instances and/or attributes to indexed keywords to determine if particular documents contains one or more of the instances and/or attributes that already appear in the existing content of a structured presentation. As another example, the system can identify new instances, their attributes, and the values of such attributes from such documents, compare these values to values that already appear in the existing content of a structured presentation, and determine whether the new instances are potentially relevant to the to the existing content of the structured presentation.
  • the documents can be identified either directly in electronic document collection 102 or using identifying information in electronic data center 208.
  • identifying information can include, e.g., the memory location where the document was found the last time it was crawled.
  • FIG. 9 is a flow chart of a process 900 for suggesting and/or adding new instances to a structured presentation.
  • Process 900 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. These digital data processing devices can interact with a user over input and output devices, such as keyboards, mice, touchscreens, displays screens, and the like.
  • input and output devices such as keyboards, mice, touchscreens, displays screens, and the like.
  • user interaction in process 900 can be performed at clients such and PDA 215 or desktop computer 217.
  • Process 900 can be performed alone or in conjunction with other data processing activities. For example, as discussed further below, process 900 can be performed in conjunction with various processes for formulating instance suggestions for addition to a preexisting structured presentation. Examples of such formulation processes are described in FIGS. 21-26 and the associated text. In general, process 900 will be performed by multiple digital data processing devices. For example, in the context of system 200 (FIG. 2), activities for formulating instance suggestions can be performed at search engine 202 while user interaction can occur at clients such and PDA 215 or desktop computer 217 (FIG. 2).
  • the system performing process 900 can receive a new instance trigger (step 905).
  • a new instance is an instance that is not currently displayed in a structured presentation, e.g., structured presentation 106 (FIG. 1).
  • a new instance trigger is an event that activates the processes for adding a new instance to a structured presentation.
  • a new instance can be triggered by user input received over a mouse, stylus, keyboard, or the like.
  • a new instance can be triggered by another process or system.
  • a new instance trigger can be received through inter-process communication or an application's message handler, to name two examples.
  • the system performing process 900 can present, to a user, options for adding new instances to a structured presentation (step 910). Options are alternative approaches for adding new instances.
  • Example options include fully automatic options, automatic options with user-specified constraints, and manual options. These options are discussed in further detail below.
  • the system performing process 900 can present options to a user using a user interface device, e.g., a display screen.
  • the display screen that presents the options can be the same display screen that presents the structured presentation to which the instances are to be added. For example, options can be presented to a user using a display screen 104 (FIG. 1).
  • the system performing process 900 can receive user selection of an option (step 915).
  • the user selection can be received using one or more input devices, e.g., a keyboard, touchpad, or touchscreen.
  • the system can also determine the nature of the option selected by the user (step 920).
  • system performing process 900 determines that the user has selected an "automatic option," then the system can suggest and/or add additional instances to the structured presentation automatically, without interaction with a user.
  • the new instances can be suggested and/or added based on the characteristics of the structured presentation (step 925). Examples of such characteristics include the nature of the instances already specified in the structured presentation, categorizations of those instances, and the attributes of those instances. Approaches for formulating new instances based on such characteristics are described in FIGS. 21-26 and the associated text. For example, as described therein, search queries can be constructed using attribute identifiers drawn from the preexisting structured presentation, attribute values drawn from the preexisting structured presentation, and/or combinations thereof. These search queries can be used to identify instances for addition to the structured presentation using string comparisons or other matching techniques.
  • the system performing process 900 can suggest and/or add additional instances to the structured presentation automatically based on user-specified constraints on the nature of the additional instances.
  • the constraints can be expressed as one or more parameters that characterize the suggested and/or added instances.
  • the constraints can be expressed as the acceptable value of an attribute of the instances or as a range of acceptable values of an attribute. 5
  • the system performing process 900 presents a user with options for constraining values of attributes of new instances (step 930). For example, the system can display a list of attributes that characterize the instances in a structured presentation as well as input fields that allow a user to input constraints on the values of those attributes.
  • the attributes in such a list also appear in o the structured presentation to which the new instances are to be added.
  • the attributes in such a list can be formulated based on the attributes used to characterize the instances elsewhere, such as in the documents of an electronic document collection. Example approaches for formulating such attributes are described in FIGS. 37-51 and the associated text. 5
  • the system performing process 900 can also receive a user specification of one or more constraints on the values of attributes of the new instances (step 935). As discussed above, the constraints can limit the values of one or more attributes to a specific value or to a range of values.
  • one attribute that characterizes cars is "number of cylinders.”
  • a user specified constraint of the values of this attribute can limit the number of cylinders of0 new car instances to a specific value (e.g., "six") or to a range of values (e.g., "six to eight" or "more than six").
  • the system performing process 900 can also suggest and/or add new instances based on the user- specified constraints and on characteristics of the structured presentation (step 940).
  • characteristics of a structured presentation include the nature of the5 instances already specified in the structured presentation, categorizations of those instances, and the attributes of those instances. Approaches for formulating new instances based on such characteristics are described in FIGS. 37-51 and the associated text.
  • search queries can be constructed using attribute identifiers drawn from the preexisting structured presentation, attribute values drawn from the preexisting structured0 presentation, and/or combinations, as well as the constraints specified by a user. These search queries can be used to identify instances using string comparisons or other matching techniques. The identified instances can then be suggested and/or added to the structured presentation. If the system performing process 900 determines that the user has selected a "manual option," then the system can add additional instances to the structured presentation under the direction of a user.
  • the system performing process 900 can receive a new instance from the user (step 945). For example, the user can input an instance name using a keyboard or other user input device.
  • the system performing process 900 can add the new instance to the structured presentation (step 950).
  • the name of a new instance can be added directly to the structured presentation as instance identifier 306 in a new structured record 310.
  • the new structured record 310 can be a new row 302 (FIGS. 3, 4) or a new card 502 (FIG. 5).
  • the system performing process 900 can also perform additional operations based on the received new instance. For example, the system can use a new instance to refine the set of suggested instances or a set of suggested attributes.
  • FIG. 10 is a schematic representation of a user interface component 1000 for receiving user input specifying modifications of a structured presentation.
  • user interface component 1000 can be used to receive a new instance trigger at step 905 in process 900 (FIG. 9).
  • User interface component 1000 includes an attribute modification region 1005 and an instance modification region 1010.
  • Attribute modification region 1005 includes a header 1015, a collection 1020 of attribute identifiers 1025, each of which is associated with an attribute identifier selection widget 1030, and a new attribute addition trigger 1035.
  • Header 1015 includes text or other information that identifies that user interaction with attribute modification region 1005 will indeed allow the user to modify attributes.
  • Attribute identifiers 1025 are text or other information that identifies attributes to be included in a structured presentation.
  • attribute identifiers 1025 can be the same text that appears as attribute identifiers 308 in structured presentations 300, 400, 500 (FIGS. 3, 4, 5).
  • Attribute identifier selection widget 1030 is an interactive display element that allows users to select and deselect attributes for display in structured presentations. For example, in collection 1020, each attribute identifier selection widget 1030 is associated with a single attribute identifier 1025 by virtue of their arrangement and positioning adjacent one another.
  • Attribute identifier selection widgets 1030 can indicate whether an attribute identifier 1025 is selected or deselected for display using one or more graphical indicia, e.g., the checks and coloring shown. For example, if a user interacts with the checked attribute identifier selection widget 1030 associated with attribute identifier 1025 "Attribute_l," the color and checked status in attribute identifier selection widget 1030 is changed and the removal of an attribute identifier associated with "Attribute_l" (as well as the values corresponding to "Attribute_l”) from a structured presentation is triggered.
  • graphical indicia e.g., the checks and coloring shown. For example, if a user interacts with the checked attribute identifier selection widget 1030 associated with attribute identifier 1025 "Attribute_l," the color and checked status in attribute identifier selection widget 1030 is changed and the removal of an attribute identifier associated with "Attribute_l" (as well as the
  • New attribute addition trigger 1035 is an interactive display element by which a user can trigger the addition of a new attribute to a structured presentation.
  • the formulation of new attributes for addition is described in FIGS. 37-51 and the associated text.
  • the addition of new attributes is also described in more detail below, e.g., in FIGS. 13-15.
  • Instance modification region 1010 includes a new instance addition trigger 1040 and an instance filter trigger 1045.
  • New instance addition trigger 1040 is an interactive display element by which a user can trigger the addition of a new instance to a structured presentation.
  • new instance addition trigger 1040 can be used at step 905 in process 900 (FIG. 9).
  • Instance filter trigger 1045 is an interactive display element by which a user can trigger the filtering of instances in a structured presentation.
  • Filtering instances yields a collection of instances that satisfy one or more criteria. For example, filtering can yield a collection of instances that have certain values, or values within a designated range. Filtering can thus reduce the number of instances to be included in a structured presentation.
  • the filtering triggered by instance filter trigger 1045 can include the presentation of a user interface component that allows a user to specify one or more filtering criteria and modifying a structured presentation so that instances which fail to meet the criteria are not displayed.
  • user interface component 1000 can respond dynamically to modifications made by a user using user interface component 1000 or otherwise. For example, if the user triggers and adds a new attribute to a structured presentation, an identifier of that new attribute can be added to collection 1020 and presented in user interface component 1000. For example, if the user adds "Attribute_9" to the structured presentation, the attribute identifier "Attribute_9" can be added to user interface component 1000 with an associated action trigger 1030.
  • FIG. 11 is schematic representation of a user interface component 1100 for receiving user input specifying a technique for adding new instances to a structured presentation.
  • user interface component 1100 can be used to present options for adding new instances to a structured presentation at step 910 and to receive a user selection of a option at step 915 in process 900 (FIG. 9).
  • User interface component 1100 includes a header 1105, a prompt 1110, a collection of descriptions of techniques for adding new instances to a structured presentation 1115, 1120, 1125, each of which is associated with a selection widget 1130, 1135, 1140.
  • Header 1105 includes text or other information that identifies that user interaction with user interface component 1100 will indeed allow the user to specify a technique for adding new instances.
  • Prompt 1110 prompts a user to interact with user interface component 1100 to specify a technique for adding new instances.
  • Description 1115 describes that user specification of this technique will result in new instances being added by a user-specified constraint option.
  • User interaction with selection widget 1130 allows a user to specify the user- specified constraint option described by description 1115.
  • Description 1120 describes that user specification of this technique will result in new instances being added by a user-specified constraint option.
  • Description 1120 includes a constraint addition widget 1145 and a constraint clear widget 1150.
  • User interaction with constraint addition widget 1145 triggers the addition of new constraint that is to be used in the user-specified constraint option.
  • User interaction with constraint clear widget 1150 clears all current constraints.
  • User interaction with selection widget 1135 allows a user to specify the user- specified constraint option described by description 1120.
  • Description 1125 describes that user specification of this technique will result in new instances being added by a manual option.
  • Description 1125 includes a new instance identifier input field 1155.
  • User interaction with new instance identifier input field 1155 allows a user to identify a new instance, e.g., by name.
  • User interaction with selection widget 1140 allows a user to specify the manual option described by description 1125.
  • FIG. 12 is schematic representation of a user interface component 1200 for receiving user input specifying constraints that are to be used in the user-specified constraint option for adding new instances to a structured presentation.
  • User interface component 1200 can be used in isolation (e.g., on a dedicated window or portal) or in conjunction with other user interface component.
  • user interface component 1200 can be inserted into user interface component 1100 immediately below technique description 1120 (FIG. 11).
  • user interface component 1200 can be used to present options for specifying values of attributes of new instances that are to be added to a structured presentation at step 930 and to receive a user specification of such values of attributes at step 935 in process 900 (FIG. 9).
  • User interface component 1200 includes a collection of one or more attribute selection widgets 1205, 1210, each of which is associated with a value specification region 1215, 1220.
  • Attribute selection widgets 1205, 1210 are interactive display elements that allow a user to select an attribute whose values are to be constrained.
  • each attribute selection widget 1205, 1210 is drop-down box widget that lists identifiers of attributes.
  • the listed attribute identifiers can be identical to the attribute identifiers 308 in a structured presentation to which the new instance is to be added.
  • Value specification regions 1215, 1220 are interactive display elements that allow a user to specify one or more constraints on the value of the attribute identified in the respective of attribute selection widgets 1205, 1210.
  • value specification region 1215 includes a pair of text entry fields 1225 that allow a user to specify an acceptable range of values of the attribute identified in attribute selection widget 1205.
  • Value specification region 1220 includes a collection of interactive check boxes 1230 that allow a user to specify an acceptable value of the attribute identified in attribute selection widget 1210.
  • user selection of a particular attribute identifier using an attribute selection widget 1205, 1210 can trigger a change in the associated value specification region 1215, 1220.
  • the nature of any interactive elements and the values and/or ranges that can be specified in the associated value specification region 1215, 1220 can be changed.
  • these changes can be based on the distribution of values of such attributes in the structured presentation to which the new instance is to be added. For example, if only four values of the attribute "maker" appear in the structured presentation, these same four values can be presented for specification in the associated value specification region.
  • the changes to the associated value specification region 1215, 1220 can be based on the values of the attribute that characterize similar instances in an electronic document collection 102. For example, the attribute "maker" of instances of cars may be characterized in documents in electronic document collection 102 using a wider variety of values. These values can be identified and presented for specification in the associated value specification region.
  • FIG. 13 is a flow chart of an example process 1300 for adding new attributes to a structured presentation.
  • Process 1300 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. These digital data processing devices can interact with a user over input and output devices, e.g., keyboards, mice, touchscreens, displays screens, and the like.
  • input and output devices e.g., keyboards, mice, touchscreens, displays screens, and the like.
  • user interaction in process 1300 can be performed at clients such and
  • Process 1300 can be performed alone or in conjunction with other data processing activities. For example, as discussed further below, process 1300 can be performed in conjunction with various processes for formulating attribute suggestions for addition to a preexisting structured presentation. Examples of such formulation processes are described in FIGS. 37-51 and the associated text and in FIGS. 21-26 and the associated text. In general, process 1300 will be performed by multiple digital data processing devices. For example, in the context of system 200 (FIG. 2), activities for formulating attribute suggestions can be performed at search engine 202 while user interaction can occur at clients such and PDA 215 or desktop computer 217 (FIG. 2). The system performing process 1300 can receive a new attribute trigger (step 1305).
  • a new attribute is an attribute that is not currently displayed in a structured presentation, e.g., structured presentation 106 (FIG. 1).
  • a new attribute trigger is an event that activates the processes for adding a new attribute to a structured presentation.
  • a new attribute can be triggered by user input received over a mouse, stylus, keyboard, or the like.
  • a new attribute can be triggered by another process or system.
  • a new attribute trigger can be received through inter-process communication or an application's message handler, to name two examples.
  • the system can receive a new attribute trigger from the user interface component 1000 through user selection of new attribute addition trigger 1035 (FIG. 10).
  • the system performing process 1300 can present options for specifying new attributes
  • the system can display a list of new attributes that are used to characterize the instances in a structured presentation as well as interactive display elements that allow a user select one or more of those attributes.
  • the attributes in such a list can be formulated based on the attributes used to characterize the instances elsewhere, such as in the documents of an electronic document collection. Example approaches for formulating such attributes are described in FIGS. 37-51 and the associated text.
  • the system performing process 1300 can receive a specification of a new attribute from a user (step 1315).
  • the specification of an attribute can characterize traits or characteristics of the new attribute, including, e.g., the name or other identifier of the new attribute, keywords associated with the new attribute, trustworthy sources of information regarding the new attribute, and the like.
  • the specification of an attribute can be received from the user over one or more input devices, e.g., a keyboard, touchpad, or touchscreen.
  • the system performing process 1300 can add the specified new attributes to a structured presentation (step 1320). For example, the system performing process 1300 can add a new attribute identifier 308 and column 304 to tables 300, 400 (FIGS. 3, 4).
  • the system can add a new attribute identifier 308 into column 504, along with a corresponding attribute value 307 in column 506 of card collection 500 (FIG. 5).
  • the system performing process 1300 can also add the new attribute not only to a structured presentation but also to a user interface component for receiving user input specifying modifications of a structured presentation. For example, the system can add the new attribute to attribute modification region 1005 of user interface component 1000 (FIG. 10).
  • the system performing process 1300 can populate the attribute values based at least in part on the user specification (step 1325).
  • the system can populate the attribute values using various techniques, as described in further detail below.
  • FIG. 14 is schematic representation of a user interface component 1400 for adding new attributes to a structured presentation.
  • User interface component 1400 can interact with a user for the specification of one or more traits or characteristics of the new attribute. These traits or characteristics can be used, e.g., in adding new attributes and attribute values to a structured presentation.
  • user interface component 1400 can be used to present options for adding a new attribute class to a structured presentation at step 1310 and to receive a user specification of a new attribute at step 1315 in process 1300 (FIG. 13).
  • User interface component 1400 includes a header 1405 and a collection of trait identifiers 1410, 1415, 1420, 1425 that identify traits that characterize the new attribute. Each trait identifier 1410, 1415, 1420, 1425 is associated with a trait specification widget 1410, 1415, 1420, 1425 and identifies the trait that can be specified by user interaction with that widget. Header 1405 includes text or other information that identifies that user interaction with user interface component 1400 will indeed allow the user to add a new attribute to a structured presentation.
  • Trait identifier 1410 identifies that a user can specify a class of the attribute to be added to a structured presentation by interacting with trait specification widget 1430.
  • the class of an attribute indicates how the attribute and its values are to be identified.
  • an attribute class can specify a technique by which the attribute and its values to be identified in an electronic document collection.
  • Example attribute classes include "auto-find values,” “search results,” “review,” and “note” classes. Details regarding these attribute classes are discussed further below.
  • Trait specification widget 1430 is an interactive display element that allows a user to specify the class of the attribute to be added to a structured presentation.
  • trait specification widget 1430 is a dropdown box widget.
  • Trait identifier 1415 identifies that a user can specify a name or other identifier of the new attribute by interacting with trait specification widget 1435.
  • Trait specification widget 1435 is an interactive display element that allows a user to specify the name or other identifier of the new attribute to be added to a structured presentation.
  • trait specification widget 1435 includes a text entry field.
  • the attribute identifier identified in trait identifier 1415 can be added directly into a structured presentation as an attribute identifier 308.
  • Trait identifier 1420 identifies that a user can specify keywords that that characterize the new attribute by interacting with trait specification widget 1440.
  • Trait specification widget 1440 is an interactive display element that allows a user to specify one or more keywords that characterize the attribute to be added to a structured presentation.
  • trait specification widget 1440 includes a text entry field into which one or more keywords can be entered.
  • the keywords can include, e.g., synonyms of the attribute identifier or terms that characterize the context of the attribute identifier. For example, if the attribute identifier is "bank,” the keywords identified in trait specification widget 1440 can include "NASCAR" and "speedway” to indicate that the attribute refers to the "bank” of a racetrack as opposed to a financial institution.
  • the keywords specified in trait specification widget 1440 can be used to identify instances, attributes, and/or attribute values in searches of electronic document collections.
  • the keywords can be used when formulating new attributes and/or new instances, as described in FIGS. 21-26 and the associated text and in FIGS. 37-51 and the associated text.
  • Trait identifier 1425 identifies that a user can specify "favorite sites” that characterize the new attribute by interacting with trait specification widget 1445.
  • "Favorite sites” are documents in an electronic document collection. User specification of a document as a "favorite site” is indicative that the user considers the content of the document to be both being relevant to the new attribute and likely to be true. The content of a "favorite site” can thus be assigned a high confidence value, e.g., in formulating new instances and new attributes for addition to a preexisting structured presentation (as discussed further below). User specification of a document as a "favorite sites” can also be used as an indication that the content of the document is a trustworthy of attribute values for populating a structured presentation.
  • Trait specification widget 1445 is an interactive display element that allows a user to specify one or more documents in an electronic document collection as "favorite sites.”
  • trait specification widget 1445 includes a text entry field into which, e.g., one or more domain names or other electronic document locations can be entered.
  • a trait "de-specification" widget allows a user to identify that one or more documents in an electronic document collection are "disfavored" sites.
  • User specification of a document as a "disfavored site” indicates that the user does not trust the document as a source of attribute values.
  • Such a trait de- specification widget can includes a text entry field into which, e.g., one or more domain names or other electronic document locations can be entered.
  • FIG. 15 is a flow chart of an example process 1500 for adding new attribute values to a structured presentation.
  • Process 1500 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • Process 1500 can be performed alone or in conjunction with other data processing activities.
  • process 1500 can be performed in conjunction with various processes for adding new attributes to a structured presentation, e.g., process 1300 (FIG. 13).
  • the system performing process 1500 can receive user specification of the class of a new attribute (step 1505).
  • the class of an attribute indicates how the attribute and its values are to be identified.
  • the receipt of the class of a new attribute can be part of the receipt of a specification of a new attribute at step 1315 in process 1300 (FIG. 13).
  • the user specification of the class of a new attribute can be received over trait specification widget 1430 in user interface component 1400 (FIG. 14).
  • the system performing process 1500 can determine which class is specified for the new attribute (step 1510). Based on the class specified, the system performing process 1500 can determine which of various subprocesses for adding new attribute values to the structured presentation is to be performed. For example, the system can determine to add attribute values in accordance with a subprocess associated with a "note” class, a subprocess associated with a "reviews” class, a subprocess associated with a "search results” class, or a subprocess associated with an "already found” class. If the system performing process 1500 determines to add new attribute values using a subprocess associated with the "note” class, the system can populate attribute values with notes received from the user (step 1515). For example, in the context of FIG. 4, values in the notes column 420 in table 400 can be received from a user and used to populate the values of a new attribute.
  • the system performing process 1500 determines to add new attribute values using a subprocess associated with the "reviews" class, the system can search for and identify electronic documents that include reviews (step 1520).
  • Reviews are critical evaluations of one or more instances characterized by the new attribute.
  • reviews can be authored by someone with expertise in evaluating instances, such as a critic.
  • Reviews can be identified, e.g., based on a label or other text that identifies them as reviews. For example, certain domain names (e.g., http://www.google.com/prdhp, http://www.epinions.com/, http://www.amazon.com/) can be used to identify electronic documents that include reviews.
  • the electronic documents that include reviews can be found in an electronic document collection, e.g., collection 102.
  • the system performing process 1500 can populate attribute values using content from the identified reviews (step 1525).
  • the system can extract values from the review using one or more text- or table-based extraction patterns and present those extracted values in the structured presentation. These extraction patterns may preferentially select segments of the review documents that are "sentiment focused.” Sentiment focused segments are identified as voicing strong sentiments, either positive or negative, about certain subject matter. For example, a review of a restaurant could include a sentiment focused segments such as "the food is exceptionally good” and "the service was very poor indeed.”
  • the presentation of those extracted values in the structured presentation can be part of a population of a structured presentation at step 1325 in process 1300 (FIG. 13).
  • the system performing process 1500 determines to add new attribute values using a subprocess associated with the "search results" class, the system can generate a collection of search results from an electronic document collection, e.g., collection 102 (step 1530).
  • the search can yield a result set that is not limited to reviews but rather can include a variety of electronic documents.
  • the electronic documents can be found in an electronic document collection, e.g., collection 102.
  • the search results can be generated by searching based on an identifier of the new attribute, as well as the identifiers of instances characterized by that attribute.
  • additional keywords that are associated with the new attribute can be used to refine search results, e.g., the keywords received from the user over trait specification widget 1440 of user interface component 1400 (FIG. 14).
  • the system performing process 1500 can populate attribute values in the structured presentation with content from the search result set (1535).
  • the system can 5 extract one or more values from the search result set using one or more text- or table-based extraction patterns and present those extracted values in the structured presentation.
  • the population of those attribute values with the content of the search result set can be part of a population of a structured presentation at step 1325 in process 1300 (FIG. 13).
  • the system performing process 1500 can identify values that have already been found and extracted from an electronic document collection, e.g., electronic document collection 102 (step 1540).
  • the "already found" values can be stored, e.g., in a collection of information that characterizes the electronic documents, e.g., data center 208 in system 200 (FIG. T). In some implementations, such a collection of information can include5 a historical record of previous structured presentations.
  • the system performing process 1500 can populate attribute values of a structured presentation with the previously extracted values (step 1545).
  • FIG. 16 is a flow chart of an example process 1600 for adding new attribute values to a structured presentation.
  • process 1600 is concerned with selecting attribute values to be used in populating the attribute values of a structured presentation.
  • Process 1600 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • Process 1600 can be performed alone or in5 conjunction with other data processing activities. For example, process 1600 can be performed at step 1325 in process 1300 (FIG. 13), at step 1525 in process 1500 (FIG. 15), at step 1535 in process 1500 (FIG. 15), and/or at step 1545 in process 1500 (FIG. 15).
  • the system performing process 1600 can identify candidate attribute values (step 1605).
  • the candidate attribute values can be, e.g., extracted directly from content (such as0 reviews or other documents in an electronic document collection) or identified from a collection of previously-extracted attribute values.
  • the system can access data center 208 and extract one or more stored attribute values.
  • the system performing process 1600 can determine a confidence in the identified candidate values (step 1610).
  • the confidence in a candidate value should characterize the degree of assurance that the candidate value correctly characterizes the attribute of an instance.
  • the confidence in the correctness of a value can be determined based on, e.g., the number of times that the value is used to characterize an attribute of an instance, the quality of the documents from which the value is used to characterize an attribute of an instance, and the like.
  • the system performing process 1600 can determine whether the confidence in certain of the candidate values is low, medium, or high (step 1615).
  • a low confidence in an attribute value indicates that it is unlikely that the candidate value correctly characterizes the attribute of an instance.
  • a high confidence in an attribute value indicates that it is likely that the candidate value correctly characterizes the attribute of an instance.
  • the system performing process 1600 determines that the confidence in certain of the candidate values is high, then the system can populate attribute values in the structured presentation with the extracted values (step 1545). This can be done automatically, i.e., without input from a user. If the system performing process 1600 determines that the confidence in certain of the candidate values is medium, then the system can provide the candidate values to the user (step 1625). For example, the system can generate a user interface component that presents candidate values in association with identifiers of the instances and the attributes potentially characterized by those candidate values. The system performing process 1600 can receive user selections of certain of the presented values (step 1630). The user selection can be received as one or more user inputs. For example, a user interface component that presents candidate values can include one or more selection widgets that allow the user to select candidate values for populating a structured presentation. The selection can be received from a user using a mouse, keyboard or other user input device.
  • the system performing process 1600 can populate the attribute value with the selected values (step 1635). For example, the system performing process 1600 can present the selected value in the structured presentation.
  • the selected attribute values can be used to further refine the attributes, values, and/or instances presented in the structured presentation. For example, if a user specifies that the value of an attribute of an instances is several thousand dollars, the magnitude of the value can be used to exclude values of significantly different magnitude from the structured presentation. As another example, if a user specifies that the value of an attribute of an instances is several thousand dollars, the magnitude of the value can be used to exclude instances that have values of that attribute that are significantly different in magnitude.
  • FIG. 17 is a schematic representation of a user interface component 1700 for selecting a candidate value to be added to a structured presentation.
  • User interface component 1700 can interact with a user for the selection of a value that is to characterize a new attribute in the structured presentation. For example, user interface component 1700 can be presented to a user at step 1625 and receive a user selection at step 1630 of process 1600 (FIG. 16).
  • the user interface component 1700 includes a header 1705 and a table 1710. Header
  • Table 1705 includes text or other information that identifies that user interaction with user interface component 1700 will allow the user to select a value of an attribute of an instance for display in a structured presentation.
  • Table 1710 includes a collection of candidate value information organized into columns 1715, 1720, 1725, as well as a collection of row selection widgets 1730.
  • column 1715 includes a column header 1735 as well as a collection of candidate value identifiers.
  • the candidate value identifiers can have been extracted directly from document the electronic document collection 102 or indirectly over data center 208.
  • the values may also include unit identifiers 309 that specify the unit of measure for the particular value 307.
  • Column header 1735 identifies that candidate value identifiers are found in column 1715.
  • Column 1720 includes a column header 1740 as well as a collection of confidence values.
  • the confidence values indicate the likelihoods that the candidate values identified in column 1715 are correct.
  • the confidence values can be expressed in numerical or word terms. For example, the confidence values can be presented as, e.g., the percentage chance that a value is correct or on a numeric scale.
  • Column header 1740 identifies that confidence values are found in column 1720.
  • Column 1725 includes a column header 1745 as well as a collection of source identifiers.
  • the source identifiers identify one or more sources of the candidate values identified in column 1715.
  • the sources can be identified using, e.g., the title of an electronic document, a domain name, the author's name, or the like.
  • the source identifiers can include text snippets that include the candidate values identified in column 1715.
  • Column header 1744 identifies that source identifiers are found in column 1720.
  • Selection widget collection 1730 includes one or more user interactive elements for receiving input from a user.
  • the user input can identify that a candidate value identified in column 1715 is to be added to a structured presentation.
  • user interface component 1700 can present candidate values in an order that is based on confidence values. For example, a candidate value with the highest confidence value can be presented on the top of column 1715 and the candidate value with the lowest confidence value can be presented on the bottom of column 1715.
  • user interface component 1700 can also include snippets of text surrounding attributes and values in a particular source identified in column 1725. Such snippets can allow a user to see the value in context.
  • FIG. 18 a schematic representation of a structured presentation 1800 that includes highlights 1802 of deficiencies in the attribute values presented therein.
  • the confidence in the values that are candidates for characterizing the attributes "ATTR_1" and “ATTRIBUTE_N” of instance "INSTANCE_1" are low, as is the confidence in the values that are candidates for characterizing the attribute "ATTR_2" of instance
  • user interaction with a cell in which a deficiency is highlighted can trigger a search directed to remedying the deficiency.
  • user interaction with empty cell 1804 can trigger a search.
  • the search can use a customizable query that is based on, e.g., a category of the instances in the display, an identifier of the instance that is to be characterized by the new value, and/or an identifier of the attribute that is to be characterized by the new value.
  • a system can receive further interaction that specifies the value that remedies the deficiency.
  • the returned set of search results can include attribute- specific highlighting 5 in text snippets that demarcate potential values.
  • FIG. 19 is a schematic representation of a user interface component 1900 for selecting a candidate attribute to be added to a structured presentation.
  • User interface component 1900 can interact with a user for the selection of an attribute that is to characterize an instance in the structured presentation.
  • user interface o component 1900 can be presented to a user to select which attribute is to be added to a structured display at step 1320 of process 1300 (FIG. 13).
  • the user interface component 1900 includes a header 1905 and a table 1910.
  • Header 1905 includes text or other information that identifies that user interaction with user interface component 1900 will allow the user to select an attribute of an instance for display in a5 structured presentation.
  • Table 1910 includes a collection of candidate attribute information organized into columns 1915, 1920, 1925, as well as a collection of row selection widgets 1930.
  • column 1915 includes a column header 1935 as well as a collection of candidate attribute identifiers.
  • the candidate attribute identifiers can have been extracted0 directly from document the electronic document collection 102 or indirectly over data center 208.
  • the attributes may also include unit identifiers 309 that specify the units of measure in which values of the candidate attributes are to be cast.
  • Column header 1935 identifies that candidate attribute identifiers are found in column 1915.
  • Column 1920 includes a column header 1940 as well as a collection of confidence5 values.
  • the confidence values indicate the likelihoods that the candidate attributes identified in column 1915 are correct.
  • the confidence values can be expressed in numerical or word terms.
  • the confidence values can be presented as, e.g., the percentage chance that an attribute is correct or on a numeric scale.
  • Column header 1940 identifies that confidence values are found in column 1920.
  • Column 1925 includes a column header 1945 as well as a collection of source identifiers.
  • the source identifiers identify one or more sources of the candidate attributes identified in column 1915.
  • the sources can be identified using, e.g., the title of an electronic document, a domain name, the author's name, or the like.
  • the source identifiers can include text snippets that include the candidate attributes identified in column 1915.
  • Column header 1944 identifies that source identifiers are found in column 1920.
  • Selection widget collection 1930 includes one or more user interactive elements for receiving input from a user.
  • the user input can identify that a candidate attribute identified in column 1915 is to be added to a structured presentation.
  • user interface component 1900 can present candidate attributes in an order that is based on confidence values. For example, a candidate attribute with the highest confidence value can be presented on the top of column 1915 and the candidate attribute with the lowest confidence value can be presented on the bottom of column 1915.
  • user interface component 1900 can also include snippets of text surrounding instances and attributes in a particular source identified in column 1925. Such snippets can allow a user to see the attributes in context.
  • FIG. 20 is a schematic representation of a user interface component 2000 for selecting a candidate instances to be added to a structured presentation.
  • User interface component 2000 for selecting a candidate instances to be added to a structured presentation.
  • user interface component 2000 can interact with a user for the selection of an instance that is to be added to a structured presentation.
  • user interface component 2000 can be presented to a user to select which instance is to be added to a structured display at steps 925, 940 of process 900 (FIG. 9).
  • the user interface component 2000 includes a header 2005 and a table 2010. Header
  • Table 2010 includes text or other information that identifies that user interaction with user interface component 2000 will allow the user to select an instance for display in a structured presentation.
  • Table 2010 includes a collection of candidate instance information organized into columns 2015, 2020, 2025, as well as a collection of row selection widgets 2030.
  • column 2015 includes a column header 2035 as well as a collection of candidate instance identifiers.
  • the candidate instance identifiers can have been extracted directly from document the electronic document collection 102 or indirectly over data center 208.
  • Column header 2035 identifies that candidate instance identifiers are found in column 2015.
  • Column 2020 includes a column header 2040 as well as a collection of confidence values.
  • the confidence values indicate the likelihoods that the candidate instance identified in column 2015 are to be added.
  • the confidence values can be expressed in numerical or word terms. For example, the confidence values can be presented as, e.g., the percentage chance that an instance is meets with user- specified constraints.
  • Column header 2040 identifies that confidence values are found in column 2020.
  • Column 2025 includes a column header 2045 as well as a collection of source identifiers.
  • the source identifiers identify one or more sources of the candidate instances 5 identified in column 2015.
  • the sources can be identified using, e.g., the title of an electronic document, a domain name, the author's name, or the like.
  • the source identifiers can include text snippets that include identifiers of the candidate instances in column 2015.
  • Column header 2044 identifies that source identifiers are found in column 2020.
  • o Selection widget collection 2030 includes one or more user interactive elements for receiving input from a user. The user input can identify that a candidate instance identified in column 2015 is to be added to a structured presentation.
  • user interface component 2000 can present candidate instances in an order that is based on confidence values. For example, a candidate instance5 with the highest confidence value can be presented on the top of column 2015 and the candidate instance with the lowest confidence value can be presented on the bottom of column 2015.
  • user interface component 2000 can also include snippets of text surrounding instance identifiers in a particular source identified in column 2025. Such0 snippets can allow a user to see the instances in context.
  • process 800 (FIG. 8) can be repeated several times. Since the scope of existing content increases, the additional5 instances, attributes, and/or values that are identified are likely to be of increased confidence.
  • FIG. 21 is a schematic representation of a process 2100 by which new instances can be added to expand a preexisting structured presentation.
  • Process 2100 can be performed by a system of one or more computer that perform operations by one or more sets of machine- readable instructions, e.g., a system 200 (FIG. T).
  • a system 200 FIG. T
  • Process 2100 includes an extraction operation 2105 and a merge operation 2110 that add new instances to a preexisting structured presentation based on information drawn from documents in electronic document collection 102.
  • process 2100 suggests one or more new instances based on information presented in the preexisting structured presentation
  • the system 200 can suggest additional instances of movies according to information drawn from the electronic document collection. That is, the system 200 can identify and suggest additional instances according to similarities of the attribute identifiers, units of measurement of the attribute values, values of the attribute values, or combinations thereof. For example, the system 200 may suggest movies that have similar show times, theaters, or run times.
  • extraction operation 2105 uses the characteristics of a preexisting structured presentation 106 to extract a collection of new instance suggestions from electronic document collection 102.
  • Example characteristics include the instances in the preexisting structured presentation, the attributes in the preexisting structured presentation, and the values of the attributes in the preexisting structured presentation.
  • the characteristics of the preexisting structured presentation 106 can be expressed as a collection of machine-readable information and can be received by a system of one or more computer that perform operations by one or more sets of machine-readable instructions.
  • the characteristics of the preexisting structured presentation 106 can be received by a search engine 202 (FIG. 2).
  • one or more new instance suggestions can be formulated based on the content of documents in electronic document collection 102 and the characteristics of preexisting structured presentation 106. A variety of different techniques for formulating new instance suggestions can be used, as discussed further below.
  • Some or all of the new instance suggestions can be merged with the preexisting structured presentation 102 in merge operation 2110 to form an expanded structured presentation 106.
  • the expanded structured presentation can be displayed for a viewer, e.g., at a display device such as display screen 106. All the new instance suggestions formulated during extraction operation 2105 need not be merged with the preexisting structured presentation 102 and displayed for a viewer.
  • a collection of new instance suggestions can be presented to a viewer along with an interactive element that allows the viewer to select one or more instances that are to be added.
  • the new instance suggestions can be added automatically, without user interaction, and without winnowing of the new instance suggestions before display. More details regarding the merger can be found, e.g., in FIGS. 9-20 and the associated text.
  • FIG. 22 is a flow chart of an example process 2200 for adding instances to a structured presentation based on the content of documents in an electronic document collection.
  • Process 2200 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions.
  • process 2200 can be performed by the search engine 202 in system 200.
  • process 2200 can be performed in response to receiving input, e.g., from a user or from another system or process that triggers an update of the structured presentation 226.
  • process 2200 can be performed in response to a user request that one or more new instances be added to a structured presentation 226.
  • process 2200 may be performed by a search engine, e.g., search engine 202 (FIG. 2), in response to receipt of a search query.
  • the system performing process 2200 can receive one or more characteristics of a preexisting structure display (step 2205). For example, the system can receive one or more attribute identifiers of the preexisting structured presentation.
  • the system can receive one or more instance identifiers that appear in the preexisting structured presentation.
  • the system performing process 2200 can formulate one or more instance suggestions from documents in an electronic document collection based on one or more characteristics of the preexisting structured presentation (step 2210).
  • Instance suggestions can be formulated based on these characteristics in a number of different ways.
  • the system can formulate instance suggestions from documents in an electronic document collection 102 by constructing search queries using attribute identifiers drawn from the preexisting structured presentation. These search queries can be used to identify instances that may share similar attributes using string comparisons or other matching techniques. Examples of other approaches are discussed further below.
  • the system performing process 2200 can provide one or more instance suggestions to a user (step 2215). For example, a list of instance suggestions can be displayed for the user on the same display screen that displays the preexisting structured presentation.
  • the system performing process 2200 can receive user selection of one or more instance suggestions (step 2220).
  • a user interface component can interact with a user to receive one or more user inputs (e.g., mouse clicks, key strokes, or other user input) that select one or more instance suggestions.
  • the system performing process 2200 can add the selected instance suggestions to a structured presentation as new structured records (step 2225).
  • a structured presentation is a table such as table 300 (FIG 3)
  • the system can add new rows 302.
  • the structured presentation is a collection of cards such as collection of cards 500 (FIG 5)
  • the system can add new cards 500.
  • FIG. 23 is a flow chart of an example process 2300 for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • Process 2300 can be performed alone or in conjunction with other activities. For example, process 2300 can be performed at step 2210 in process 2200 (FIG. 22).
  • Process 2300 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions.
  • process 2300 can be performed by search engine 202 in system 200 (FIG. 2).
  • the system performing process 2300 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation (step 2305).
  • Structured components are portions or regions of an electronic document that are structured.
  • Example structured components include tables, lists, records, collections of attribute- value pairs, and the like. Structured components can thus organize attribute values and instance identifiers in conformity with a defined structure, much like a structured presentation.
  • an electronic document that includes a structured component need not be structured.
  • an electronic document can include a table between two paragraphs of unstructured text.
  • structured components in different documents need not have the same format or conform with a predetermined or persistent structure.
  • the organization of information in one structured component generally can be changed without regard to the organization of information in structured components that appear in other documents.
  • a structured list of schools in one person's resume is changed to delete the year of graduation, there is no need to insure that other structured lists of schools in other resumes are similarly changed.
  • the system performing process 2300 can identify documents that include structured components in a variety of ways. For example, tables and other structured components can be identified using metadata labels, e.g., HTML tags, found in the documents themselves. As another example, structured components can be identified by identifying repetitive elements (e.g., a series of comma or tab delineations) in a document.
  • metadata labels e.g., HTML tags
  • structured components can be identified by identifying repetitive elements (e.g., a series of comma or tab delineations) in a document.
  • Structured components relate to instances specified in a preexisting structured presentation when they include information that is relevant to the specified instances. For example, a structured component that characterizes one or more of the specified instances with one or more attribute values can be considered relevant to the instances specified in a preexisting structured presentation. As another example, a structured component that characterizes one or more of the same attributes of instances that differ from instances specified in a preexisting structured presentation can be considered relevant to the specified instances. In many implementations, the instance and/or attribute identifiers need not be the same. Rather, conceptually related instances and attributes can be used to identify documents that include structured components.
  • the system performing process 2300 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation by identifying documents that include the same or related instance identifiers as found in the preexisting structured presentation and/or the same or related attribute identifiers as found in the preexisting structured presentation.
  • the system performing process 2300 can select one or more instance suggestions from the structured components (step 2310). This selection process can winnow down the number of instances that are to be suggested to a user.
  • the selection of instance suggestions can be performed in a number of ways. For example, the system can select instance suggestions based on a category of the instances in the structured components, the attributes of the instances in the structured components, and/or the values of the attributes of the instances in the structured components, as discussed further below.
  • FIG. 24 is a representation 2400 of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. In particular, representation 2400 illustrates a formulation of instance suggestions using one implementation of process 2300 (FIG. 23).
  • a preexisting structured presentation specifies a collection of instances 2405 (i.e., the instances "Philadelphia” and “Chicago.")
  • different documents in an electronic document collection include different structured components 2410, 2415, 2420.
  • Structured components 2410, 2415, 2420 can be identified as relevant to specified instances 2405 based on, e.g., the same instance identifiers "Philadelphia” and “Chicago” appearing therein.
  • structured components 2410, 2415, 2420 include a wide variety of different potential instance suggestions based on different contexts.
  • the instances “Philadelphia” and “Chicago” are part of a tabular component that represents the properties of various cities.
  • the instances “Philadelphia” and “Chicago” are part of a structured component that represents part of the standings in the National League East sometimes in the 1970's.
  • the instances “Philadelphia” and “Chicago” are part of a tabular component that represents the properties of various films.
  • instance selections can be selected from components 2410, 2415, 2420 based on the attributes used to characterize those instances.
  • preexisting structured presentation 106 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes “year,” “rating,” and “box office receipts.”
  • Structured component 2410 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes “population” and “area.”
  • Structured component 2415 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes “wins", “losses,” and “GB (i.e., games behind).”
  • Structured component 2420 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes "year,” “runtime,” and “rating.”
  • a system can select from the instances in structured components 2410, 2415, 2420 based on these characterized attributes.
  • the system can identify the correspondence between the attribute identifiers "year” and “rating” in preexisting structured presentation 106 and the attribute identifiers "year” and “rating” in structured component 2420 to select the instances “Peter Pan” and "Star Wars” as suggestions for addition to the preexisting structured presentation 106.
  • structured component 2420 includes an attribute identifier "runtime.” Such a system can thus suggest the attribute identifier "runtime" with or without the corresponding attribute values. In some implementations, even if instances drawn from structured components 2410,
  • instances 2415 are not suggested in a particular formulation, such instances can be stored for use during future information requests. For example, even through the cities represented in structured component 2410 are not selected as instance suggestions, these cities can be stored along with their respective attribute identifiers (e.g., "population" and "area") and attribute values in a data collection (such as, e.g., data center 208). When a subsequent user requests information regarding one or more cities, such a system can access this stored information and provide additional information to the user.
  • attribute identifiers e.g., "population” and "area”
  • attribute values in a data collection
  • a subsequent user requests information regarding one or more cities, such a system can access this stored information and provide additional information to the user.
  • FIG. 25 is a flow chart of an example process 2500 for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • Process 2500 can be performed alone or in conjunction with other activities.
  • process 2510 can be performed at step 2210 in process 2200 (FIG. 22).
  • Process 2500 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions.
  • process 2510 can be performed by search engine 202 in system 200 (FIG. T).
  • the system performing process 2500 can identify one or more documents relevant to one or more specified instances (step 2505). For example, the system performing process 2500 can use string comparisons to match one or more of the specified instances and their attributes and/or values with documents in an electronic document collection such as electronic document collection 102. As another example, the system performing process 2500 can access stored information (such as information in data center 208) to identify electronic documents that are relevant to the specified instances.
  • the system performing process 2500 can extract a template of one or more of the identified documents (step 2510).
  • a document template serves as a pattern for the arrangement of the content of individual documents in a subcollection of documents in an electronic document collection.
  • the documents in a subcollection generally originate from a single source, such as a single commercial entity.
  • a bookseller can use a single document template as a pattern for the arrangement of content describing different books.
  • a furniture retailer can use a single document template as a pattern for the arrangement of the content of fliers for different sofas.
  • the template of an electronic flyer for a sofa can specify the arrangement, on the flyer, of the brand name of the sofa, a picture of the sofa, an interactive element that allows the user to select the color in which the sofa is shown, a description of the sofa in text format, and a table that characterizes the sofa's dimensions, availability, and price.
  • Document templates can thus organize information regarding an instance in conformity with a defined structure, much like a structured presentation.
  • a document template can serve as a pattern for the entire content of an electronic document and, as discussed above, can even specify the arrangement of a structured component in a document.
  • document templates only specify the arrangement of the content of a subcollection of documents in an unstructured electronic document collection, the electronic document collection itself remains unstructured.
  • AMAZON.COM uses one template as a pattern for the arrangement of a description of every book that AMAZON.COM sells
  • BARNESANDNOBLE.COM and other booksellers do not necessarily use that same template as a pattern for the arrangement of descriptions of books that they sell.
  • a document template can be changed without that change necessarily being propagated throughout the entire collection, or even a subcollection, of electronic documents.
  • FIG. 26 is a representation of a portion of a hypertext markup language (HTML) template 2600 that is used as a pattern for descriptions of a movie (i.e., the movie "Philadelphia.”
  • the hypertext markup language (HTML) code of template portion 2600 is both machine-readable and human-readable.
  • HTML code of template portion 2600 can be used by a browser to generate a web page.
  • template portion 2600 is split into two subsections 2605,
  • Subsection 2605 serves as a pattern for the arrangement of text that identifies the movie "Philadelphia.”
  • Subsection 2610 serves as a pattern for the arrangement of various attribute identifiers and their values. In general, the patterns in subsections 2605, 2610 are repeated a number of times in a particular subcollection of documents in an electronic document to describe different movies.
  • An HTML parser can be used to extract the formatting from template portion 2600 so the formatting can be used to identify documents having the same template. For example, the HTML tags ⁇ title>, ⁇ div>, other HTML tags, and their relative position to each other can be identified by an HTML parser. Such an HTML parser can determine that the HTML tag ⁇ title> appears before the HTML tag ⁇ div>. Thus, an HTML parser can extract the formatting from template portion 2600 from content that is arranged in accordance with the template.
  • the system performing process 2500 can identify one or more documents that have the same template (step 2515). For example, the system can compare the template of documents in the electronic document collection with the extracted template.
  • the system performing process 2500 can also formulate one or more instance suggestions from the documents identified as having the same template (step 2520).
  • the system can use the repetition of the template within a subcollection of documents to infer that the documents in the subcollection include the same kind of content regarding the same category of instances.
  • the system can infer that the context of two documents is the same since the same template serves as a pattern for the different documents.
  • the templates themselves can be used to formulate the instance suggestions. For example, HTML tags in template portion 2600 (FIG. 26) identify that the title of the film described in that document is "Philadelphia (1993)." By searching for similarly- tagged text in documents that share the 5 same template, the system can identify the titles of other films.
  • additional content in a document template can be used in formulating instance suggestions. For example, the identification of a certain value (e.g., George Lucas) as a "director" can be used to select particular instance suggestions from a subcollection of documents.
  • subsection 2610 of template o portion 2600 can be parsed or otherwise analyzed to determine if any of the attributes have similar values, identifiers, or other characteristics. In such situations, the instance identifier can be extracted from subsection 2605.
  • FIG. 27 is a schematic representation of a process 2700 by which a collection of new instance suggestions 2115 can be formulated based on information in a preexisting structured5 presentation 106.
  • Process 2700 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions (FIG. 2).
  • Process 2700 performs an extraction operation 2705 on an instance/attribute collection 2710 based on the information in the preexisting structured presentation 106.
  • Instance/attribute collection 2710 is a collection of information that associates instances with0 their attributes and, in some implementation, with the values of those attributes as well.
  • the information in collection 2710 can be extracted from documents in an electronic document collection 102 either in response to receipt of a trigger (e.g., a search query) or in anticipation of receipt of a trigger, e.g., as part of a process of indexing electronic document collection 102.
  • information in collection 2710 can include the content of5 previous structured presentations that were presented to the current user or to other users.
  • the instance suggestions are provided to a user who selects instance suggestions to be added to a structured presentation, e.g., as described in steps 2215, 2220, 2225 (FIG. 22).
  • FIG. 28 is a schematic0 representation of a table 2800 that associates attributes and instances in collection 2710.
  • Table 2800 includes a collection of records 2802, 2804, 2806, 2808 2810, 2812, 2814, each of which associates an identifier of an instance with descriptions of a document location and the attributes that characterize the identified instances in those documents.
  • the information in records 2802, 2804, 2806, 2808 2810, 2812, 2814 can be organized in a collection of columns 2815, 2820, 2825, 2830, 2835, 2840.
  • column 2815 can include instance identifiers.
  • Column 2820 can include a description of the location of an electronic document that includes the instance identified in column 2815.
  • Columns 2825, 2830, 2835, 2840 can identify attributes that characterize the instances identified in column 2815 in the document whose location is described in column 2820.
  • different electronic documents can include different categories and amounts of information characterizing the same instance.
  • the document whose location is identified in column 2820 of record 2804 includes two attributes of an instance "INSTANCE_2”
  • the document whose location is identified in column 2820 of record 2810 includes three attributes of an instance “INSTANCE_2.”
  • the attributes in record 2804 i.e., attribute "ATTR_5" and attribute "ATTR_6”
  • Data collections 2710 that associate attributes and instances can be formed in a number of different ways. For example, documents that include internal, structured components can be identified. Examples of such internal, structured components include tables and lists that appear in HTML documents. The relationships between attributes and instances in these internal structured components can be copied to form data collections 2710. As another example, collection 2710 can be formed from the content of previous structured presentations that were presented to the current user or to other users.
  • the template of that document can be used to extract attributes and instances from other documents that include the same template. For example, if a stereo retailer uses the same document template to describe different stereos that are offered for sale, the arrangement of information in a first electronic document regarding a first stereo can be used to extract information from other electronic documents that regard other stereos.
  • techniques such as natural language parsing can be used to identify instances and attributes. For example, electronic documents can be parsed to identify phrases such as "[Instance] has a/an [attribute]" in electronic documents.
  • data collection 2710 can categorize instances and their attributes. For example, instances and attributes can be categorized as North American cities, National League East teams, or popular movies. The storage of information in data collection 2710 can be based on such categorizations. For example, different categories can be stored in different files, records, or the like.
  • process 2700 suggests one or more new instances based on information presented in the preexisting structured presentation 106.
  • the system 200 can suggest additional instance of movies according to information drawn from data collections 2710. That is, the system 200 can identify and suggest additional instances according to similarities of the attribute identifiers. For example, the system 200 may suggest movies that have similar show times, theaters, or run times.
  • FIG. 29 is a flow chart of a process 2900 for formulating instance suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation.
  • Process 2900 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions.
  • process 2900 can be performed by the search engine 202 in system 200 (FIG. T).
  • Process 2900 can be performed alone or in conjunction with other activities.
  • process 2900 can be performed during process 2700 (FIG. 27).
  • process 2900 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or both of processes 2300, 2500 (FIGS. 23, 25).
  • the system performing process 2900 can access a collection of instances and their attributes (step 2905). For example, the system performing process 2900 can access instance and attribute collection 2710 (FIG. 27) stored in data center 208 (FIG. 2).
  • the system performing process 2900 can identify one or more relevant instances based on characteristics of instance attributes specified in the preexisting structured presentation (step 2910). For example, the system can compare instance attributes of instances stored in the instance and attribute collection 2710 (FIG. 27) with instances specified in the structured presentation 106. The system can use the comparison to determine which, if any, of the stored instances share attribute identifiers, or related attributes, with the attributes specified in the preexisting structured presentation 106. For example, suppose that a preexisting structured presentation 106 uses the attributes "ATTR_3" and "ATTR_5" to characterize a collection of instances. Upon review of an instance and attribute collection
  • FIG. 30 is a flow chart of a process 3000 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106.
  • Process 3000 can be performed by a system of one or more computers that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T).
  • Process 3000 can be performed alone or in conjunction with other activities. For example, process 3000 can be performed during process 2700 (FIG. 27).
  • process 3000 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or more of processes 2300, 2500, 2900 (FIGS. 23, 25, 29).
  • process 3000 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process).
  • process 3000 can be performed by search engine 202 in response to receiving a search string.
  • the system performing process 3000 can identify one or more authoritative sources regarding one or more specified instance (step 3005).
  • the system can access a collection of authoritative sources of documents in electronic document collection 102 that has been assembled, e.g., by a programmer.
  • the system can receive user-specific input identifying one or more authoritative sources of documents in electronic document collection 102 as "authoritative" in the view of that user.
  • a display screen 104 that displays a preexisting structured presentation 106 can include a GUI component that allows a viewer to specify authoritative sources of documents.
  • the identification of an authoritative source can be received in conjunction with receipt of a search query.
  • a viewer can identify JD POWER AND ASSOCIATES, AMAZON.COM, and MAJOR LEAGUE BASEBALL as authoritative sources of the documents found at http://www.jdpower.com/, http://www.amazon.com/, and http://www.mlb.com/, respectively.
  • the user-specific input can identify the subject mater on which a source is authoritative.
  • MAJOR LEAGUE BASEBALL may be identified as an authoritative source for baseball statistics, but may not be considered an authoritative source for information regarding drug testing.
  • the system performing process 3000 can analyze a collection of user-specific input identifying authoritative sources from multiple users to assemble a "generic" collection of authoritative sources. For example, a large number of users may identify the AMERICAN AUTOMOBILE ASSOCIATION (AAA) as authoritative. Based on a statistical analysis of these identifications, the AAA can then be added to a collection of authoritative sources.
  • the system performing process 3000 can determine additional attributes from the authoritative sources for instances that are specified in the preexisting structured presentation (step 3010). For example, the system can access documents provided by an authoritative source and identify one or more documents that characterize specified instances using one or more attributes. The system can extract attribute identifiers from these documents using a parser or other string comparison techniques.
  • the system can access a data collection that associates attributes and instances, such as table 2800 (FIG. 28).
  • the system can filter records such as records 2802, 2804, 2806, 2808 2810, 2812, 2814 based on both the instances identified in the preexisting structured presentation and whether or not the documents whose location is identified in records 2802, 2804, 2806, 2808 2810, 2812, 2814 originated from an authoritative source. For example, if AMAZON.COM is an authoritative source, a collection that associates attributes and instances can be scanned to identify documents with the http://www.amazon.com/ domain.
  • the system performing process 3000 can compare these additional attributes with attributes in an instance and attribute collection such as table 2800 (FIG. 28) (step 3015). For example, the system can use string comparisons, or other comparison techniques, to compare the additional attributes with attributes stored in the instance and attribute collection.
  • the system performing process 3000 can identify an instance in the instance and attribute collection based on the results of these comparisons (step 3020). For example, the system can determine the number of attributes that are used to characterize instances in documents from an authoritative source and the attributes that are associated with other instances in the instance and attribute collection.
  • FIG. 31 is a flow chart of a process 3100 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106.
  • Process 3100 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T).
  • Process 3100 can be performed alone or in conjunction with other activities.
  • process 3100 can be performed during process 2700 (FIG. 27).
  • process 3100 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or more of processes 2300, 2500, 2900, 3000 (FIGS. 23, 25, 29, 30).
  • process 3100 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process).
  • process 3100 can be performed by search engine 202 in response to receiving a search string.
  • the system performing process 3100 can identify one or more relevant instances based on attribute values of the instances specified in a preexisting structured presentation. For example, the system can identify relevant instances by comparing attribute values of specified instances with attribute values of other instances to determine if the other instances are suitable suggestions. Such comparisons can require, e.g., that the attribute values be identical or that the attribute values fall within a certain range.
  • Such a range can be determined, e.g., based on the range of attribute values that are specified by a user over an interactive element or that already characterize instances in a preexisting structured presentation.
  • the system performing process 3100 can convert attribute values into a common unit of measurement prior to comparing the attribute values. For example, if the specified unit of measurement is in feet, but one or more extracted attribute values is in meters, the system can convert the values in meters into feet using conventional techniques. A schematic representation of one such approach is described in more detail below.
  • FIG. 32 is a schematic representation of a table 3200 that associates attributes, instances, and their values in data collection. Since table 3200 associates attributes and instances, table 3200 can also serve as instance/attribute collection 2710 (FIG. 27).
  • Table 3200 can be generated based on information drawn from a collection of electronic documents, e.g., electronic document collection 102. Table 3200 can be generated, e.g., during a crawling process and stored, e.g., in data center 328 for subsequent use.
  • Table 3200 includes a collection of records 3202, 3204, 3206, 3208 3210, 3212, 3214, each of which associates an identifier of an instance with descriptions of a document location, attributes that characterize the identified instances in those documents, and values that characterize those attributes in those documents.
  • 3206, 3208 3210, 3212, 3214 can be organized in a collection of columns 3215, 3220, 3225, 3230, 3235, 3240.
  • column 3215 can include instance identifiers.
  • Column 3220 can include a description of the location of an electronic document that includes the instance identified in column 3215.
  • Columns 3225, 3235 can identify attributes that characterize the instances identified in column 2815 in the document whose location is described in column 2820.
  • Columns 3230, 3240 can include values that characterize the attributes identified in columns 3225, 3235.
  • each record 3202, 3204, 3206, 3208 3210, 3212, 3214 relates to a different instance (e.g., INSTANCE_10 to INSTANCE_N).
  • Each of the instances is characterized in at least one document by attribute identifiers ATTR_3, ATTR_6.
  • instance suggestions were formulated based solely on the attributes that could be used to characterize INSTANCE_10 to INSTANCE_N, every INSTANCE_10 to INSTANCE_N could be suggested to a user. In many circumstances, this is unacceptable.
  • a system such as search engine 202 can identify that the instances identified in records 3202, 3206 (i.e., "INSTANCE_10” and "INSTANCE_12") can be suggested to a user based on their common values (albeit in different units).
  • a system can convert the values in cells 3245, 3250 and 3255, 3260 into a common unit of measurement and compare those values to determine that they are similar.
  • like instances can be selected even if the units in which those values are expressed are different.
  • the instance identified in record 3208 shares a common value of attribute "ATTR_3" with a specified instance in a structured presentation
  • the instance identified in record 3208 need not be suggested to a user.
  • the value that characterizes attribute "ATTR_6" of this instance is value "VAL_8,” which differs from the value which characterizes this attribute of a specified instance in a structured presentation.
  • the instance identified in record 3208 can be excluded from a list if suggested instances.
  • Different criteria for including and excluding instances from a list of suggested instances can be used. For example, the number of attribute values that must be similar can differ. As another example, in some implementations, a user can specify the number and/or the nature of the attribute values that are considered in formulating a list of suggested instances. As yet another example, instances can be ranked based on the correspondence between their attribute values and the attribute values of one or more specified instances in a preexisting structured presentation. As yet another example, a range of values can be determined based on the values of characterizes the attributes of one or more instance specified in a structured presentation, and this range can be used to identify relevant instances for inclusion in a list of suggested instances.
  • a system can select from among a collection of different values based on criteria that reflect the likelihood that a value is appropriate. Examples of such include user- specified ranges, the number of documents that characterize an attribute with a certain value, and/or the quality of the documents that characterize an attribute with a certain value.
  • FIG. 33 is a flow chart of a process 3300 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106.
  • Process 3300 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T).
  • Process 3300 can be performed alone or in conjunction with other activities.
  • process 3300 can be performed during process 2700 (FIG. 27).
  • process 3300 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or more of processes 2300, 2500, 2900, 3000, 3100 (FIGS. 23, 25, 29, 30, 31).
  • process 3300 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process).
  • process 3300 can be performed by search engine 202 in response to receiving a search string.
  • the system performing process 3300 can access categorized collections of instances and attributes (step 3305).
  • the system can access the instance and attribute collection 2710 to access one or more categorized collections of instances and attributes generated during previous searches.
  • the system performing process 3310 can identify a category that includes the specified instances (step 3310).
  • the system can identify the category that includes the instances based on similar attributes, similar attribute values, combinations of these characteristics, and/or other techniques.
  • the system performing process 3300 can select one or more instance suggestions from the identified category (step 3315). For example, in some implementations, instance suggestions can be selected from the identified category based on the similarity between attribute values of the specified instances and attribute vales of the instances in the category.
  • FIG. 34 is a representation 3400 of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • representation 3400 illustrates a formulation of instance suggestions using one implementation of process 3300 (FIG. 33).
  • a preexisting structured presentation specifies a collection of instances 2405 (i.e., the instances "Philadelphia” and "Chicago.")
  • instances drawn from different documents in an electronic document collection e.g., collection 102
  • Categorized instance collections 3410, 3415, 3420 can be identified as relevant to specified instances 2405 based on, e.g., the same instance identifiers "Philadelphia” and "Chicago” appearing therein.
  • categorized instance collections 3410, 3415, 3420 have been categorized in a variety of different ways.
  • categorized instance collection 3410 has been categorized as a collection of "North American Cities.”
  • Categorized instance collection 3415 has been categorized as a collection of "National League East Teams.”
  • Categorized instance collection 3420 has been categorized as a collection of "Popular Movies.”
  • categorized instance collections 3410, 3415, 3420 can be stored in the data center 208 (FIG. 2). That is, the system 200 can generate one or more categories of instances based on previously received search strings.
  • search engine 202 can categorize the results and store them in data center 208.
  • a categorized instance collection that includes the instances specified in a preexisting structured presentation can be identified, e.g., based on a similarity between the attributes that characterize the specified instances and the attributes that characterize the instances in the different categories. For example, the common use of the attributes "year" and "rating" the preexisting structured presentation and categorized instance collection 3420 can be used to identify that categorized instance collection 3420 includes instances 2405.
  • a subset of the instances in a categorized instance collection can be selected as instance suggestions based on the values that characterize the instances in a category. For example, the instance “Star Wars” can be included on a list of instance suggestions based on the value characterizing the "rating" attribute of “Star Wars” being similar to the value characterizing the "rating” attribute of "Philadelphia” and “Chicago.” As another example, the instance “Peter Pan” can be excluded on a list of instance suggestions based on the value characterizing the "rating” attribute of "Peter Pan” different from the value characterizing the "rating” attribute of "Philadelphia” and “Chicago.”
  • FIG. 35 is a schematic representation of a collection 3500 of processes that can be used to formulate a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106.
  • Process collection 3500 can be thought of as filters that are applied in succession to a large collection of potential instances 3505 to yield a smaller collection 3510 of instance suggestions.
  • Process collection 3500 includes a category filter 3515, a related attribute filter 3520, and a related value filter 3525.
  • Category filter 3515 can include, e.g., aspects of process 3300 (FIG. 33).
  • Related attribute filter 3520 can include, e.g., aspects of process 2300 (FIG. 23), aspects of process 2500 (FIG. 25), process 2900 (FIG. 29), and/or process 3000 (FIG. 30).
  • Related value filter 3525 can include, e.g., aspects of process 2300 (FIG. 23), aspects of process 2500 (FIG. 25), process 3100 (FIG.
  • Each filter can exclude potential instances 3505 from an instance suggestion collection 3510 that can be presented to a user or added directly to a structured presentation.
  • Filters 3515, 3520, 3525 can be applied in any order. However, in general, filters
  • any of filters 3515, 3520, 3525 can be omitted from collection 3500 and/or additional filters added to collection 3500.
  • a user- specified filter that can filter the potential instances 3505 according to input provided by the user can be added to collection 3500.
  • FIG. 36 is a flow chart of a process 3600 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106.
  • Process 3600 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T).
  • Process 3600 can be performed alone or in conjunction with other activities.
  • process 3600 can be performed iteratively in conjunction with one or more of the processes in process collection 3500.
  • process 3600 can be performed in response to receipt of a search string.
  • the system performing process 3600 can make an initial match between the instances specified in a preexisting structured presentation 106 and instances drawn from a document collection (3605).
  • the initial match can be based on one or more of the filtering processes in process collection 3500.
  • the system performing process 3600 can determine whether the number of matches is too high, too low, or appropriate (step 3610). If the number of matches is too low, the system can broaden names of specified instances (step 3615). For example, the system performing process 3600 can user alternate spellings, abbreviations, synonyms, alternative names, nicknames, and/or other keywords for the specified instances in one or more of the processes in process collection 3500.
  • the system performing process 3600 can also broaden one or more ranges of attribute values used in any related value filtering 3525 (FIG. 35) (step 3618).
  • the range can be broadened based on input received from a user or automatically, without user input.
  • the system can broaden a range based on the distribution of attribute values for a selected group of instances to, e.g., include a certain percentage of the instances or a predetermined number of instances
  • the system performing process 3600 can also reduce the number of attributes and/or instances used in any related attribute filtering 3520 (FIG. 35) (step 3620).
  • the number of attributes and/or instances can be reduced based on, e.g., the number of potential instances excluded by a particular attribute and/or instance.
  • the characterize potential instances excludes all of the potential instances, then this attribute can be omitted from any related attribute filtering.
  • the attributes and/or instances to be removed can be determined, e.g., automatically, without user input, or based on input received from a user.
  • the system can again seek to make a match between the instances specified in a preexisting structured presentation 106 and instances drawn from a document collection, but this time using the changed parameters (step 3622). This match can also be made using one or more of the filtering processes in process collection 3500.
  • the system performing process 3600 can narrow one or more ranges of attribute values used in any related value filtering 3525 (FIG. 35) (step 3625).
  • the range can be narrowed based on input received from a user or automatically, without user input. For example, in some implementations, the system can narrow a range based on the distribution of attribute values for a selected group of instances to, e.g., exclude a certain percentage of the instances or a predetermined number of instances.
  • the system performing process 3600 can also increase the number of attributes and/or instances used in any related attribute filtering 3520 (FIG. 35) (step 3628).
  • the number of attributes and/or instances can be increased based on, e.g., the number of potential instances excluded by a particular attribute and/or instance.
  • the attributes and/or instances to be added 5 can be determined, e.g., automatically, without user input, or based on input received from a user.
  • the system performing process 3600 can winnow the matches based on the changed parameters (step 3630).
  • the narrowed ranges and/or increased numbers of attributes and/or instances can be used in any related value filtering 3525 (FIG. 35).
  • the system performing process 3600 can suggest the matched instances to a user (3635). For example, the system performing process 3600 can present one or more instances suggestions in a GUI on a display screen, e.g., display screen 104.
  • FIG. 37 is a schematic representation of a process 3700 by which new attributes can5 be added to expand a preexisting structured presentation.
  • Process 3700 can be performed by a system that includes one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. 2).
  • Process 3700 includes an extraction operation 3705 and a merge operation 3710 that add new attributes to a preexisting structured presentation based on information drawn from0 documents in electronic document collection 102.
  • process 3700 suggests one or more new attributes based on information presented in the preexisting structured presentation 106.
  • the system 200 can suggest additional movie attributes according to information drawn from the electronic document collection. That is, the system5 200 can identify and suggest additional attributes according to similarities of the instance identifiers, the category of the instances, values of the attributes, or combinations thereof.
  • extraction operation 3705 uses the characteristics of a preexisting structured presentation 106 to extract a collection of new attribute suggestions from electronic document collection 102.
  • Example characteristics include the instances in0 the preexisting structured presentation, the category of the instances in the preexisting structured presentation, and the values of the attributes in the preexisting structured presentation.
  • the characteristics of the preexisting structured presentation 106 can be expressed as a collection of machine-readable information and can be received by one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • the characteristics of the preexisting structured presentation 106 can be received by a search engine 202 (FIG. 2).
  • one or more new attribute suggestions can be formulated based on the content of documents in electronic document collection 102 and the characteristics of preexisting structured presentation 106.
  • a variety of different techniques for formulating new attribute suggestions can be used, as discussed further below.
  • Some or all of the new attribute suggestions can be merged with the preexisting structured presentation 102 in merge operation 3710 to form an expanded structured presentation 106.
  • the expanded structured presentation can be displayed for a viewer, e.g., at a display device such as display screen 106.
  • All the new attribute suggestions formulated during extraction operation 3705 need not be merged with the preexisting structured presentation 102 and displayed for a viewer.
  • a collection of new attribute suggestions can be presented to a viewer along with an interactive element that allows the viewer to select one or more attributes that are to be added.
  • the new attribute suggestions can be added automatically, without user interaction, and without winnowing of the new attribute suggestions before display. More details regarding the merger can be found, e.g., in FIGS. 9-20 and the associated text.
  • FIG. 38 is a flow chart of an example process 3800 for adding attributes to a structured presentation based on the content of documents in an electronic document collection.
  • Process 3800 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 3800 can be performed by the search engine 202 in system 200.
  • process 3800 can be performed in response to receiving input, e.g., from a user or from another system or process that triggers an update of the structured presentation 386.
  • process 3800 can be performed in response to a user request that one or more new attributes be added to a structured presentation 106.
  • process 3800 may be performed by a search engine, e.g., search engine 202 (FIG. 2), in response to receipt of a search query.
  • the system performing process 3800 can receive one or more characteristics of a preexisting structured presentation (step 3805). For example, the system can receive one or more instance identifiers that appear in the preexisting structured presentation.
  • the system can receive a description of a category that includes the instances identified in the preexisting structured presentation.
  • the system performing process 3800 can formulate one or more attribute suggestions from documents in an electronic document collection based on one or more characteristics of the preexisting structured presentation (step 3810). Attribute suggestions can be formulated based on these characteristics in a number of different ways. For example, in one implementation, the system can formulate attribute suggestions from documents in an electronic document collection 102 by constructing search queries using instance identifiers drawn from the preexisting structured presentation. These search queries can be used to identify attributes that may characterize the same or similar instances using string comparisons or other matching techniques. Examples of other approaches are discussed further below.
  • the system performing process 3800 can provide one or more attribute suggestions to a user (step 3815). For example, a list of attribute suggestions can be displayed for the user on the same display screen that displays the preexisting structured presentation.
  • the system performing process 3800 can receive user selection of one or more attribute suggestions (step 3820).
  • a user interface component can interact with a user to receive one or more user inputs (e.g., mouse clicks, key strokes, or other user input) that select one or more attribute suggestions.
  • the system performing process 3800 can add the selected attribute suggestions to a structured presentation (step 3825).
  • the selected attribute suggestions can be used to expand the existing structured records in the structured presentation.
  • the system can add new columns 304.
  • the system can add new attribute identifiers 308 to cards 500.
  • FIG. 39 is a flow chart of an example process 3900 for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • Process 3900 can be performed alone or in conjunction with other activities.
  • process 3900 can be performed at step 3810 in process 3800 (FIG. 38).
  • Process 3900 can be performed a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 3900 can be performed by search engine 202 in system 200 (FIG. 2).
  • the system performing process 3900 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation (step 3905).
  • Structured components are portions or regions of an electronic document that are structured.
  • Example structured components include tables, lists, records, collections of attribute-value pairs, and the like. Structured components can thus organize attribute values and instance identifiers in conformity with a defined structure, much like a structured presentation.
  • an electronic document can include a table between two paragraphs of unstructured text.
  • structured components in different documents need not have the same format or conform with a predetermined or persistent structure.
  • the organization of information in one structured component generally can be changed without regard to the organization of information in structured components that appear in other documents.
  • the system performing process 3900 can identify documents that include structured components in a variety of ways. For example, tables and other structured components can be identified using metadata labels, such as HTML tags, found in the documents themselves.
  • structured components can be identified by identifying repetitive elements (e.g., a series of comma or tab delineations) in a document.
  • Structured components relate to instances specified in a preexisting structured presentation when they include information that is relevant to the specified instances. For example, a structured component that characterizes one or more of the specified instances with one or more attribute values can be considered relevant to the instances specified in a preexisting structured presentation. As another example, a structured component that characterizes one or more of the same attributes of instances that differ from instances specified in a preexisting structured presentation can be considered relevant to the specified instances.
  • the instance and/or attribute identifiers need not be the same. Rather, conceptually related instances and attributes can be used to identify documents that include structured components.
  • the system performing process 3900 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation by identifying documents that include the same or related instance identifiers as found in the preexisting structured presentation and/or the same or related attribute identifiers as found in the preexisting structured presentation.
  • the system performing process 3900 can select one or more attribute suggestions from the structured components (step 3910). This selection process can winnow down the number of attributes that are to be suggested to a user.
  • the selection of attribute suggestions can be performed in a number of ways. For example, the system can select attribute suggestions based on a category of the instances in the structured components and/or the values of the attributes of the instances in the structured components, as discussed further below.
  • FIG. 40 is a representation 4000 of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • representation 4000 illustrates a formulation of attribute suggestions using one implementation of process 3900 (FIG. 39).
  • a preexisting structured presentation specifies a collection of instances 4005 (i.e., the instances "Philadelphia” and "Chicago.")
  • different documents in an electronic document collection include different structured components 4010, 4015, 4020.
  • Structured components 4010, 4015, 4020 can be identified as relevant to specified instances 4005 based on, e.g., the same instance identifiers "Philadelphia” and "Chicago” appearing therein.
  • structured components 4010, 4015, 4020 include a wide variety of different potential attribute suggestions based on different contexts.
  • the instances “Philadelphia” and “Chicago” are part of a tabular component that represents the properties of various cities.
  • the instances “Philadelphia” and “Chicago” are part of a structured component that represents part of the standings in the National League East sometimes in the 1970's.
  • the instances “Philadelphia” and “Chicago” are part of a tabular component that represents the properties of various films.
  • attribute selections can be selected from components 4010, 4015, 4020 based on the attributes used to characterize those instances.
  • preexisting structured presentation 106 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes “year,” “rating,” and “box office receipts.”
  • Structured component 4010 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes “population” and “area.”
  • Structured component 4015 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes “wins", “losses,” and "GB (i.e., games behind).”
  • Structured component 4020 characterizes the instances “Philadelphia” and “Chicago” using values of the attributes "year,” “runtime,” and “rating.”
  • a system can select from the attributes in structured components 4010, 4015, 4020 based on these characterized attributes. For example, the system can identify the correspondence between the attribute identifiers "year” and “rating” in preexisting structured presentation 106 and the attribute identifiers "year” and “rating” in structured component 4020 to select the attributes "director" and "runtime” as suggestions for addition to the preexisting structured presentation 106. As discussed in FIGS. 21-36 and the associated text, in some implementations, a system can also suggest or add additional instance identifiers. For example, structured component 4020 includes the instance identifiers "Peter Pan” and “Star Wars.” Such a system can thus suggest these instance identifiers for inclusion in structured presentation.
  • attributes drawn from structured components 4010, 4015 can be stored for use during future information requests. For example, even through the cities represented in structured component 4010 are not selected as attribute suggestions, these cities can be stored along with their respective attribute identifiers (e.g., "population" and "area") and attribute values in a data collection (such as, e.g., data center 208). When a subsequent user requests information regarding one or more cities, such a system can access this stored information and provide additional information to the user.
  • attribute identifiers e.g., "population" and "area”
  • attribute values in a data collection such as, e.g., data center 208.
  • FIG. 41 is a flow chart of an example process 4100 for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • Process 4100 can be performed alone or in conjunction with other activities.
  • process 4110 can be performed at step 3810 in process 3800 (FIG. 38).
  • Process 4100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 4110 can be performed by search engine 202 in system 200 (FIG. T).
  • the system performing process 4100 can identify one or more documents relevant to one or more specified instances (step 4105).
  • the system performing process 4100 can use string comparisons to match one or more of the specified instances and their attributes and/or values with documents in an electronic document collection such as electronic document collection 102.
  • the system performing process 4100 can access stored information (e.g., information in data center 208) to identify electronic documents that are relevant to the specified instances.
  • the system performing process 4100 can extract a template of one or more of the identified documents (step 4110).
  • a document template serves as a pattern for the arrangement of the content of individual documents in a subcollection of documents in an electronic document collection.
  • the documents in a subcollection generally originate from a single source, e.g., a single commercial entity.
  • a bookseller can use a single document template as a pattern for the arrangement of content describing different books.
  • a furniture retailer can use a single document template as a pattern for the arrangement of the content of fliers for different sofas.
  • the template of an electronic flyer for a sofa can specify the arrangement, on the flyer, of the brand name of the sofa, a picture of the sofa, an interactive element that allows the user to select the color in which the sofa is shown, a description of the sofa in text format, and a table that characterizes the sofa's dimensions, availability, and price.
  • Document templates can thus organize information regarding an instance in conformity with a defined structure, much like a structured presentation.
  • a document template can serve as a pattern for the entire content of an electronic document and, as discussed above, can even specify the arrangement of a structured component in a document.
  • document templates only specify the arrangement of the content of a subcollection of documents in an unstructured electronic document collection, the electronic document collection itself remains unstructured.
  • AMAZON.COM uses one template as a pattern for the arrangement of a description of every book that AMAZON.COM sells
  • BARNESANDNOBLE.COM and other booksellers do not necessarily use that same template as a pattern for the arrangement of descriptions of books that they sell.
  • a document template can be changed without that change necessarily being propagated throughout the entire collection, or even a subcollection, of electronic documents.
  • FIG. 42 is a representation of a portion of a hypertext markup language (HTML) template 4200 that is used as a pattern for descriptions of a movie (i.e., the movie "Philadelphia.”
  • the hypertext markup language (HTML) code of template portion 4200 is both machine-readable and human-readable.
  • the HTML code of template portion 4200 can be used by a browser to generate a web page.
  • template portion 4200 is split into two subsections 4205,
  • Subsection 4205 serves as a pattern for the arrangement of text that identifies the movie "Philadelphia.”
  • Subsection 4210 serves as a pattern for the arrangement of various attribute identifiers and their values. In general, the patterns in subsections 4205, 4210 are repeated a number of times in a particular subcollection of documents in an electronic document to describe different movies.
  • An HTML parser can be used to extract the formatting from template portion 4200 so the formatting can be used to identify documents having the same template. For example, the HTML tags ⁇ title>, ⁇ div>, other HTML tags, and their relative position to each other can be identified by an HTML parser. Such an HTML parser can determine that the HTML tag ⁇ title> appears before the HTML tag ⁇ div>. Thus, an HTML parser can extract the formatting from template portion 4200 from content that is arranged in accordance with the template.
  • the system performing process 4100 can identify one or more new attributes using the template (step 4115). For example, the system can identify the arrangement of attributes drawn from the preexisting structured display within the template. This arrangement can be used to infer other attributes.
  • the system performing process 4100 can also formulate one or more attribute suggestions from the attributes identified using the template (step 4120).
  • the templates themselves can thus be used to formulate the attribute suggestions. For example, HTML tags in template portion 4200 (FIG. 42) identify that the film entitled “Philadelphia (1993)" is characterized by the attributes "Director,” "Writer,” and
  • additional content in a document template can be used in formulating attribute suggestions.
  • the value of an attribute can be used in formulating attribute suggestions. For example, if the value of a "year" attribute is, e.g., 1976, the attribute "start time" can be excluded from a collection of attribute suggestions for characterizing films.
  • FIG. 43 is a schematic representation of a process 4300 by which a collection of new attribute suggestions 915 can be formulated based on information in a preexisting structured presentation 106.
  • Process 4300 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. 2).
  • Process 4300 performs an extraction operation 4305 on an instance/attribute collection 4310 based on the information in the preexisting structured presentation 106.
  • Instance/attribute collection 4310 is a collection of information that associates instances with their attributes and, in some implementation, with the values of those attributes as well.
  • the information in collection 4310 can be extracted from documents in an electronic document collection 102 either in response to receipt of a trigger (e.g., a search query) or in anticipation of receipt of a trigger, e.g., as part of a process of indexing electronic document collection 102.
  • information in collection 4310 can include the content of previous structured presentations that were presented to the current user or to other users.
  • the attribute suggestions are provided to a user who selects attribute suggestions to be added to a structured presentation, such as described in steps 3815, 3820, 3825 (FIG. 38).
  • FIG. 44 is a schematic representation of a table 4400 that associates attributes and instances in collection 4310.
  • Table 4400 includes a collection of records 4402, 4404, 4406, 4408 4410, 4412, 4414, each of which associates an identifier of an instance with descriptions of a document location and the attributes that characterize the identified instances in those documents.
  • the information in records 4402, 4404, 4406, 4408 4410, 4412, 4414 can be organized in a collection of columns 4415, 4420, 4425, 4430, 4435, 4440.
  • column 4415 can include instance identifiers.
  • Column 4420 can include a description of the location of an electronic document that includes the instance identified in column 4415.
  • Columns 4425, 4430, 4435, 4440 can identify attributes that characterize the instances identified in column 4415 in the document whose location is described in column 4420.
  • different electronic documents can include different categories and amounts of information characterizing the same instance.
  • the document whose location is identified in column 4420 of record 4404 includes two attributes of an instance "INSTANCE_2”
  • the document whose location is identified in column 4420 of record 4410 includes three attributes of an instance “INSTANCE_2.”
  • the attributes in record 4404 i.e., attribute "ATTR_5" and attribute "ATTR_6”
  • Data collections 4310 that associate attributes and instances can be formed in a number of different ways. For example, documents that include internal, structured components can be identified. Examples of such internal, structured components include tables and lists that appear in HTML documents. The relationships between attributes and instances in these internal structured components can be copied to form data collections 4310.
  • collection 4310 can be formed from the content of previous structured presentations that were presented to the current user or to other users.
  • the template of that document can be used to extract attributes and instances from other documents that include the same template. For example, if a stereo retailer uses the same document template to describe different stereos that are offered for sale, the arrangement of information in a first electronic document regarding a first stereo can be used to extract information from other electronic documents that regard other stereos.
  • techniques such as natural language parsing can be used to identify instances and attributes. For example, electronic documents can be parsed to identify phrases such as "[Instance] has a/an [attribute]" in electronic documents.
  • data collection 4310 can categorize instances and their attributes. For example, instances and attributes can be categorized as North American cities, National League East teams, or popular movies. The storage of information in data collection 4310 can be based on such categorizations. For example, different categories can be stored in different files, records, or the like.
  • process 4300 suggests one or more new attributes based on information presented in the preexisting structured presentation 106.
  • the system 200 can suggest additional attribute of movies according to information drawn from data collections 4310. That is, the system 200 can identify and suggest additional attributes based on the attributes being used to characterize the same instances. For example, the system 200 may suggest other attributes that are commonly used to characterized movies, such as show times, theaters, or run times.
  • FIG. 45 is a flow chart of a process 4500 for formulating attribute suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation.
  • Process 4500 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 4500 can be performed by the search engine 202 in system 200 (FIG. 2).
  • Process 4500 can be performed alone or in conjunction with other activities.
  • process 4500 can be performed during process 4300 (FIG. 43).
  • process 4500 can be performed at step 3810 in process 3800 (FIG. 38), alone or in conjunction with one or both of processes 3900, 4100 (FIGS. 39, 41).
  • the system performing process 4500 can access a collection of instances and their attributes (step 4505).
  • the system performing process 4500 can access instance and attribute collection 4310 (FIG. 43) stored in data center 208 (FIG. T).
  • the system performing process 4500 can identify one or more relevant attributes based on characteristics of instance attributes specified in the preexisting structured presentation (step 4510). For example, the system can compare instance attributes of instances stored in the instance and attribute collection 4310 (FIG. 43) with instances specified in the structured presentation 106. The system can use the comparison to determine which, if any, of the stored instances share attribute identifiers, or related attributes, with the attributes specified in the preexisting structured presentation 106. For example, suppose that a preexisting structured presentation 106 uses the attributes "ATTR_3" and "ATTR_5" to characterize a collection of instances. Upon review of an instance and attribute collection 4310 such as table 4400 (FIG.
  • the system can suggest the attributes "ATTR_7” and “ATTR_7” based on their use in conjunction with “ATTR_3" and “ATTR_5" in characterizing instances “INSTANCE_1” and “INSTANCE_2” in records 4402, 4406.
  • FIG. 46 is a flow chart of a process 4600 for formulating a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106.
  • Process 4600 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T).
  • Process 4600 can be performed alone or in conjunction with other activities.
  • process 4600 can be performed during process 4300 (FIG. 43).
  • process 4600 can be performed at step 3810 in process 3800 (FIG. 38), alone or in conjunction with one or more of processes 3900, 4100, 4500 (FIGS. 39, 41, 45).
  • process 4600 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process).
  • process 4600 can be performed by search engine 202 in response to receiving a search string.
  • the system performing process 4600 can identify one or more authoritative sources regarding one or more specified instances (step 4605). For example, the system can access a collection of authoritative sources of documents in electronic document collection 102 that has been assembled, e.g., by a programmer.
  • the system can receive user-specific input identifying one or more authoritative sources of documents in electronic document collection 102 as "authoritative" in the view of that user.
  • a display screen 104 that displays a preexisting structured presentation 106 can include a GUI component that allows a viewer to specify authoritative sources of documents.
  • the identification of an authoritative source can be received in conjunction with a search query.
  • a viewer can identify JD POWER AND ASSOCIATES, AMAZON.COM, and MAJOR LEAGUE BASEBALL as authoritative sources of the documents found at http://www.jdpower.com/, http://www.amazon.com/, and http://www.mlb.com/, respectively.
  • the user-specific input can identify the subject mater on which a source is authoritative.
  • MAJOR LEAGUE BASEBALL may be identified as an authoritative source for baseball statistics, but may not be considered an authoritative source for information regarding drug testing.
  • the system performing process 4600 can analyze a collection of user-specific input identifying authoritative sources from multiple users to assemble a "generic" collection of authoritative sources. For example, a large number of users may identify the AMERICAN AUTOMOBILE ASSOCIATION (AAA) as authoritative. Based on a statistical analysis of these identifications, the AAA can then be added to a collection of authoritative sources.
  • AAA AMERICAN AUTOMOBILE ASSOCIATION
  • the system performing process 4600 can determine additional attributes from the authoritative sources for instances that are specified in the preexisting structured presentation (step 4610). For example, the system can access documents provided by an authoritative source and identify one or more documents that characterize specified instances using one or more attributes. The system can extract attribute identifiers from these documents using a parser or other string comparison techniques.
  • the system can access a data collection that associates attributes and instances, such as table 4400 (FIG. 44).
  • the system can filter records such as records 4402, 4404, 4406, 4408 4410, 4412, 4414 based on both the instances identified in the preexisting structured presentation and whether or not the documents whose location is identified in records 4402, 4404, 4406, 4408 4410, 4412, 4414 originated from an authoritative source. For example, if AMAZON.COM is an authoritative source, a collection that associates attributes and instances can be scanned to identify documents with the http://www.amazon.com/ domain.
  • the system performing process 4600 can compare these additional instances with attributes in an instance and attribute collection such as table 4400 (FIG. 44) (step 4615).
  • the system can use string comparisons, or other comparison techniques, to compare the additional instances with instances stored in the instance and attribute collection.
  • FIG. 47 is a flow chart of a process 4700 for identifying related instances for use in formulating attribute suggestions based on information in a preexisting structured presentation 106.
  • Process 4700 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. 2).
  • Process 4700 can be performed alone or in conjunction with other activities. For example, process 4700 can be performed during process 1500 (FIG. 15). As another example, process 4700 can be performed at step 3810 in process 3800 (FIG.
  • process 4700 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process).
  • process 4700 can be performed by search engine 202 in response to receiving a search string.
  • the system performing process 4700 can identify one or more related instances based on attributes and/or attribute values that characterize the instances specified in a preexisting structured presentation. For example, the system can identify related instances by comparing attribute values of specified instances with attribute values of other instances to determine if the other instances are related.
  • FIG. 48 is a flow chart of a process 4800 for formulating a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106.
  • Process 4800 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 480 (FIG. 2).
  • Process 4800 can be performed alone or in conjunction with other activities. For example, process 4800 can be performed during process 1500 (FIG. 15). As another example, process 4800 can be performed at step 3810 in process 3800 (FIG. 38), alone or in conjunction with one or more of processes 3900, 4100, 4500, 4600, 4700 (FIGS. 39, 41, 45, 46, 47). In some implementations, process 4800 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process). For example, process 4800 can be performed by search engine 202 in response to receiving a search string.
  • the system performing process 4800 can access categorized collections of instances and attributes (step 4805). For example, the system can access the instance and attribute collection 1510 to access one or more categorized collections of instances and attributes generated during previous searches.
  • the system performing process 4810 can identify a category that includes the specified instances (step 4810). In some implementations, the system can identify the category that includes the instances based on similar attributes, similar attribute values, combinations of these characteristics, and/or other techniques.
  • the system performing process 4800 can select one or more attribute suggestions from the identified category (step 4815). For example, in some implementations, attribute suggestions can be selected from the identified category based on the number of times that the attributes are used to characterize the instances in the category.
  • FIG. 49 is a representation 4900 of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
  • representation 4900 illustrates a formulation of attribute suggestions using one implementation of process 4800 (FIG. 48).
  • a preexisting structured presentation specifies a collection of instances 4005 (i.e., the instances "Philadelphia” and "Chicago.")
  • instances drawn from different documents in an electronic document collection e.g., collection 102
  • Categorized instance collections 4910, 4915, 4920 can be identified as relevant to specified instances 4005 based on, e.g., the same instance identifiers "Philadelphia” and “Chicago” appearing therein.
  • categorized instance collections 4910, 4915, 4920 have been categorized in a variety of different ways.
  • categorized instance collection 4910 has been categorized as a collection of "North American Cities.”
  • Categorized instance collection 4915 has been categorized as a collection of "National League East Teams.”
  • Categorized instance collection 4920 has been categorized as a collection of "Popular Movies.”
  • categorized instance collections 4910, 4915, 4920 can be stored in the data center 208 (FIG. 2). That is, the system 200 can generate one or more categories of instances based on previously received search strings.
  • search engine 202 can categorize the results and store them in data center 208. These categorized results can be accessed and analyzed during subsequent searches to generate attribute suggestions.
  • a categorized instance collection that includes the instances specified in a preexisting structured presentation can be identified, e.g., based on a similarity between the attributes that characterize the specified instances and the attributes that characterize the instances in the different categories. For example, the common use of the attributes "year" and "rating" the preexisting structured presentation and categorized instance collection 4920 can be used to identify that categorized instance collection 4920 includes instances 4005.
  • a subset of the attributes in a categorized instance collection can be selected as attribute suggestions based on the attributes that characterize the instances in a category. For example, the use of the attribute "Start time" to characterize movie instances can be taken as an indication that only information about currently playing movies is to be included in a structured display. Thus, attributes such as “playing at” and “coupons available” can be included in a list of attribute suggestions. As another example, the attribute "year” can be excluded from a list of attribute suggestions based on the use of the attribute "Start time” to characterize movie instances in a preexisting structured display.
  • FIG. 50 is a schematic representation of a collection 5000 of processes that can be used to formulate a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106.
  • the processes in collection 5000 can be thought of as filters that are applied in succession to a large collection of potential attributes 5005 to yield a smaller collection 5010 of attribute suggestions.
  • Each filter can exclude potential attributes 5005 from an attribute suggestion collection 5010 that can be presented to a user or added directly to a structured presentation.
  • Filters 5015, 5020, 5025 can be applied in any order. However, in general, filters 5015, 5020, 5025 are applied in order of granularity. In particular, the filter 5015, 5020, 5025 are that reduces the number of potential attributes by the greatest amount is applied first and the filter 5015, 5020, 5025 are that reduces the number of potential attributes by the smallest amount is applied last.
  • any of filters 5015, 5020, 5025 can be omitted from collection 5000 and/or additional filters added to collection 5000.
  • a user- specified filter that can filter the potential attributes 5005 according to input provided by the user can be added to collection 5000.
  • FIG. 51 is a flow chart of a process 5100 for formulating a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106.
  • Process 5100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T).
  • Process 5100 can be performed alone or in conjunction with other activities.
  • process 5100 can be performed iteratively in conjunction with one or more of the processes in process collection 5000.
  • process 5100 can be performed in response to receipt of a search string.
  • the system performing process 5100 can make an initial match between the instances specified in a preexisting structured presentation 106 and attributes drawn from a document collection (step 5105).
  • the initial match can be based on one or more of the filtering processes in process collection 5000.
  • the system performing process 5100 can determine whether the number of matches is too high, too low, or appropriate (step 5110). If the number of matches is too low, the system can broaden names of specified instances (step 5115). For example, the system performing process 5100 can user alternate spellings, abbreviations, synonyms, alternative names, nicknames, and/or other keywords for the specified instances in one or more of the processes in process collection 5000.
  • the system performing process 5100 can also reduce the number of attributes and/or instances used in any related instance and/or attribute filtering 5020 (FIG. 50) (step 5120).
  • the number of attributes and/or instances can be reduced based on, e.g., the number of potential instances excluded by a particular attribute and/or instance. For example, if the requirement that a specific instance be characterized by an attribute excludes all of the potential attributes, then this instance can be omitted from any related instance and/or attribute filtering.
  • the attributes and/or instances to be removed can be determined, e.g., automatically, without user input, or based on input received from a user.
  • the system can again seek to make a match between the instances specified in a preexisting structured presentation 106 and instances drawn from a document collection, but this time using the changed parameters (step 5122). This match can also be made using one or more of the filtering processes in process collection 5000.
  • the system performing process 5100 can increase the number of attributes and/or instances used in any related attribute and/or instance filtering 5020 (FIG. 50) (step 5128).
  • the number of attributes and/or instances can be increased based on, e.g., the number of potential attributes excluded by a particular attribute and/or instance.
  • the attributes and/or instances to be added can be determined, e.g., automatically, without user input, or based on input received from a user. For example, instances to be added can be determined using process 4800 (FIG. 48).
  • the system performing process 5100 can winnow the matches based on the changed parameters (step 5130).
  • the narrowed ranges and/or increased numbers of instances can be used in any related value filtering 5025 (FIG. 50).
  • the system performing process 5100 can suggest the matched attributes to a user (step 5135). For example, the system performing process 5100 can present one or more attributes suggestions in a GUI on a display screen, e.g., display screen 104.
  • FIG. 52 is a schematic representation of a system 5200 in which attribute values 307 drawn from two or more electronic documents in electronic document collection are presented to a user in a structured presentation.
  • system 5200 includes a structured data 5205 and a merge module 5210. In operation, system 5200 extracts attribute values from an unstructured collection of electronic documents in electronic document collection 102 and merges that information with information drawn from structured data 5205 to populate structured presentation 106.
  • System 5200 can populate all or only a fraction of structured presentation 106 with attribute values. There are many circumstances in which only a fraction of a structured presentation may be populated with attribute values. For example, the population may be part of the addition of new instances (and hence new structured records) to structured presentation 106. As another example, the population may be part of the addition of new attributes to structured presentation 106. As yet another example, the population may be part of the refinement of a fraction of the existing attribute values in structured presentation 106. For example, some fraction of the original attribute values can be checked for accuracy or to ensure that the characterized instances haven't changed.
  • Structured data 5205 is a structured collection of information. The information in structured data 5205 can be organized in accordance with a defined data model.
  • structured data 5205 can be organized in accordance with a hierarchical or a relational data model and stored in a data storage device. In some instances, all or a portion of structured data 5205 can be presented to a user in a structured presentation.
  • structured data 5205 can be a pre-existing structured presentation 106 that is presented to a user on the same display screen 104 on which the structured presentation 106 that is populated with new attribute values drawn collection 102 is to be presented.
  • Merge module 5210 is a collection of one or more sets of machine-readable instructions deployed on one or more data processing devices. Merge module 5210 can include functionality for identifying new attribute values as well as their disposition within the structured presentation 106 that is to be populated therewith.
  • FIG. 53 is a schematic representation of an implementation of system 5300 in which attribute values drawn from two or more electronic documents in electronic document collection 102 are presented to a user in a structured presentation.
  • system 5300 also includes an attribute/value/instance collection 5305 at data center 208.
  • Attribute/value/instance collection 5305 is a collection of information that associates instances with their attributes, as well as the values of those attributes.
  • the information in collection 5305 can be extracted from electronic documents in collection 5302 either in response to receipt of a trigger (e.g., a search query) or in anticipation of receipt of a trigger, e.g., as part of a process of indexing electronic document collection 102.
  • a trigger e.g., a search query
  • anticipation of receipt of a trigger e.g., as part of a process of indexing electronic document collection 102.
  • FIG. 54 is a schematic representation of a table 5400 that can associate attributes, values, and instances in collection 5305 (FIG. 53).
  • Table 5400 includes a collection of records 5402, 5404, 5406, 5408, 5410, each of which associates an identifier of an instance with descriptions of a document location and the attributes and values that characterize the identified instances.
  • the information in records 5402, 5404, 5406, 5408, 5410 can be organized in a collection of columns 5415, 5420, 5450, 5430, 5435, 5440, 5445, 5450.
  • column 5415 can include instance identifiers.
  • Column 5420 can include a description of the location of an electronic document that includes the instance identified in column 5415.
  • Columns 5425, 5435, 5445 can include descriptions of attributes that both characterize the instances identified in column 5415 and that are themselves characterized by a value in the document whose location is described in column 5420.
  • 5450 can include descriptions of the values that characterize the attributes described in columns 5425, 5435, 5445 of the instances identified in column 5415 in the documents whose location is described in column 5420.
  • different electronic documents can include different categories and amounts of information characterizing the same instance.
  • the document whose location is identified in column 5420 of record 5404 includes information characterizing three attributes of an instance "INSTANCE_1,” whereas the document whose location is identified in column 5420 of record 5406 includes information characterizing two attributes of an instance "INSTANCE_1.”
  • the attributes characterized in record 5404 i.e., attribute "ATTR_5,” attribute “ATTR_6,” attribute “ATTR_7”
  • attribute “ATTR_3,” attribute "ATTR_4” differ from the attributes characterized in record 5406 (i.e., attribute "ATTR_3,” attribute "ATTR_4.”
  • the values used to characterize even the same attribute of the same entity can differ in different electronic documents.
  • the document whose location is identified in column 5420 of record 5402 includes a value "VALUE_3A” characterizing the attribute "ATTR_3" of instance "INSTANCE_1,”
  • the document whose location is identified in column 5420 of record 5406 includes a value "VALUE_3B” characterizing the same attribute "ATTR_3" of the same instance "INSTANCE_1.”
  • a document can include false information that mischaracterizes the attributes of an entity.
  • the values of an attribute may change over time.
  • Examples of this include, e.g., the value of the "height" attribute of a high school basketball player instance, the value of a "list price” attribute of a house instance, or the value of the "mayor" attribute of a city instance.
  • Some documents may be updated with the correct value whereas other documents may retain the original — but now incorrect — value.
  • even completely accurate documents can characterize the same attribute of the same entity in different ways. For example, different documents can use different units to express the same value. As another example, different documents can express the same value with different precision (e.g., "about a two hour drive to Phoenix" versus "a 130 minute drive to Phoenix at the posted speed limits").
  • Data collections 5305 that associate attributes, values, and instances can be formed in a number of different ways. For example, documents that include internal, structured components can be identified. Examples of such internal, structured components include tables and lists that appear in HTML documents, and the like. The relationships between attributes, values, and instances in these internal structured components can be copied to form data collections 5305.
  • the template of that document can be used to extract attributes, values, and instances from other documents that include the same template. For example, if a stereo retailer uses the same document template to describe different stereos that are offered for sale, the arrangement of information in a first electronic document regarding a first stereo can be used to extract information from other electronic documents that regard other stereos.
  • the template of a single document can be used to extract attributes, values, and/or instances from that document.
  • the template can specify an arrangement of several attribute and values that characterize those attributes relative to an identifier of an instance. If some of those attributes and/or values are known, then the arrangement of those known attributes and/or values can be identified and used to identify other attributes and/or values.
  • the template of a single webpage may specify the arrangement of the attribute/value pairs "Director: Orson Welles,” “Writer: Orson Welles, Herman J. Mankiewicz,” and “Release Date: May 1, 1941” relative to an identifier of the movie instance "Citizen Kane.” If the attributes and values "Director: Orson Welles” and “Release Date: May 1, 1941” were already known, the arrangement of those attributes and values relative to the movie instance identifier "Citizen Kane" can be used to extrapolate the attribute/value pair "Writer: Orson Welles, Herman J. Mankiewicz.”
  • techniques such as natural language parsing can be used to identify instances, attributes, and their values.
  • electronic documents can be parsed to identify phrases such as "[Instance] has a/an [attribute] of [value]" in electronic documents.
  • FIG. 55 is a flow chart of an example process 5500 for presenting attribute values drawn from two or more electronic documents in an electronic document collection to a user in a structured presentation.
  • Process 5500 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • Process 5500 can be performed in isolation or in conjunction with other data processing activities.
  • process 5500 can be performed as part of process 600 (FIG. 6).
  • the system performing process 5500 can receive an instance identifier and an attribute identifier (step 5505).
  • the system performing process 5500 can receive the instance identifier and the attribute identifier directly from a user (e.g., in the form of a search query) or indirectly (e.g., as part of a structured data collection 905 (FIG. 9)).
  • the system performing process 5500 can identify electronic documents relevant to the received instance that include values of the attribute (step 5510).
  • the system can access an attribute/value/instance collection 5405 in a data center 208 (FIG. 53) to identify the relevant electronic documents.
  • a search engine e.g., search engine 202 can perform keyword searches using the instance and attribute identifier to identify relevant documents. In some cases, such keyword searches can be supplemented with language parsing or other techniques that facilitate the identification of values.
  • the system performing process 5500 can establish a subset of the values for the identified attribute of the identified instance for presentation in a structured presentation (step 5515). The subset of the values can include one or more values that are thought to be appropriate, or likely to be appropriate, for populating the structured presentation.
  • the subset of the values(s) can be considered to characterize the identified attribute of identified instance both accurately and consistently with the desires of the viewer of the structured presentation.
  • the desires of the viewer of the structured presentation can be ascertained, e.g., based on a selection of a value received from the viewer or based on the characterization of the same or other attributes of the same or other instances in a preexisting structured collection of information such as, e.g., structured data 905 (FIG. 9).
  • the system performing process 5500 can provide instructions for displaying structured presentation populated by the subset of values (step 5420).
  • a structured presentation can be presented based on information gathered from a collection of electronic documents (i.e., the subset of values gathered from an electronic document collection) (step 615, FIG. 6).
  • process 5500 can be performed a number of times, e.g., for a number of instance identifiers and/or attribute identifiers.
  • FIG. 13 is a flow chart of a process 1300 for establishing one or more values for presentation in a structured presentation.
  • Process 1300 can be performed in isolation or in conjunction with other activities. For example, process 1300 can be performed at step 5515 in process 5500.
  • the system performing process 1300 can group values of an attribute from two or more documents in an electronic document collection into two or more groups (step 1305).
  • the grouped values can be drawn directly from the electronic documents or drawn from a description of the content of the electronic documents, such as an association of attributes, values, and instances like table 5400 (FIG. 54).
  • the system can group values using one or more different standards for determining when values are to be grouped. For example, one standard can require that grouped values be identical. For example, two values "4" can be grouped. Another standard can require that numerical values be within a certain range of being identical. For example, the values "3.14" and "3.14159” can be grouped. Another standard can require that term values be identical or have an identical meaning. For example, the terms “czar,” “czar,” and “tsar” can all be grouped. Another standard can require that term values express the same concept in an ontology of concepts. For example, the terms “pink” and “mauve” can be grouped. Another standard allows values written in different formats to be grouped.
  • the dates "July 25, 1982” and “7/25/1982” can be grouped.
  • Another standard allows values written in different units to be grouped.
  • the units of measure “Im” and “100cm” can be grouped.
  • Another standard allows values written in different formats to be grouped.
  • the dates "July 25, 1982” and “7/25/1982” can be grouped.
  • Another standard allows values written in written in different, but approximately equal, units to be grouped.
  • the units of measure "Im” and "39 inches” can be grouped.
  • the system performing process 1300 can perform one or more of the following subprocesses in any order to select one group, and hence select a subset of the values from a collection of attribute values.
  • the system performing process 1300 can select the group with the highest "value” for presentation in a structured presentation (step 1310).
  • the "value" of a group is reflects the count of values in that group.
  • the system performing process 5500 can select values with high frequencies in the electronic document collection. In effect, this approach allows the documents in an electronic document collection to "vote" on the values of an attribute of an instance.
  • the "value" of a group not only reflects the count of values but also weights or scores individual counts by parameters that reflect a characteristic of the document from which the values are drawn. For example, a count can be weighted based on, e.g., a page rank of the document from which the values are drawn, a weighting factor for that document provided by a user, the number of other values that have been drawn from that document, or the "age" of the document. For example, documents that have been created more recently can be considered to more accurately characterize the attributes of certain instances.
  • This subprocess is effective at eliminating inadvertent mischaracterizations of attributes, e.g., when the value on one electronic document is a typographic error.
  • this approach can under certain circumstances select inappropriate values. For example, even though a large number of documents characterizes a volume attribute in liters, the viewer may be interested in having that attribute characterized in gallons in a structured presentation.
  • the system performing process 1300 can receive a user specification of a constraint on, e.g., a range of an acceptable value or a unit of an acceptable value (step 1315).
  • the system can provide a GUI component at a display screen, e.g., display 104 (FIG. 1) that allows the user to select a range of values or a unit of measurement constraint.
  • the constraint can be open-ended (e.g., "a value >1") or closed (e.g., "a value between 1 and 10.”).
  • the system performing process 1300 can select the group meeting the received constraint for presentation in a structured presentation (step
  • the system performing can select one or more groups of values that are expressed in meters.
  • the approach of this subprocess is effective at ensuring that the values presented in a structured presentation are presented in an organized, systematic arrangement.
  • the units of measure of the value used to characterize e.g., Michael Jordan's height can be constrained to be identical to the units of measure of the value used to characterize Magic Johnson's height.
  • Such an organized, systematic arrangement allows a user to compare values of the same attribute of different instances easily, without concern as to units in which the values are presented.
  • the system performing process 1300 can determine a "quality" of the documents from which the attribute values in each group were drawn (step 1325). The "quality" of a document can reflect the likelihood that the information in the document is accurate and does not mischaracterize a value of an attribute.
  • the "quality" of information provided by a commercial supplier can be considered higher than the "quality” of information provided by an individual.
  • bias can be considered in determining the quality of the documents from which the information is drawn. For example, information drawn from an allegedly independent source (such as, e.g., the Congressional Budget Office) can be considered to be higher quality than information drawn from a political party.
  • the quality of a document can be based on a specification of the quality of a source of the document, or the document itself, by a user. For example, a user can indicate that automobile reliability information drawn from the Consumer Union (the makers of Consumer Reports) is high quality but that automobile reliability information drawn from Road and Track magazine is not.
  • the system performing process 1300 can also select the group that includes values drawn from the highest quality document(s) (step 1330).
  • process 1300 can provide one or more values from a remaining group that are free from mischaracterizations, with consistent units of measurement, and drawn from sources that the viewer prefers.
  • FIG. 57 is a flow chart of a process 5700 for selecting one or more values for presentation in a structured presentation.
  • Process 5700 can be performed in isolation or in conjunction with other activities.
  • process 5700 can be performed at step 5515 in process 5500 (FIG. 55), alone or in conjunction with one or more of the subprocesses of process 5600 (FIG. 56).
  • the system performing process 5700 can group values of an attribute extracted from two or more documents in an electronic document collection into two or more groups (step 5605).
  • the system performing process 5700 can present descriptions of the groups of values to the user (step 5705).
  • the system can display the most common value in each group, or a list of the some of the values in each group, to the user at a display, e.g., at display screen 104 (FIG. 1).
  • the descriptions of the groups of values can include additional information that characterizes the groups. For example, a number count of the number of values in each group can be displayed, a percentage that reflects the percent of the extracted values that are found in each group can be displayed, and/or a description of the units of measure in the group can be displayed. As another example, an estimate of the quality of the electronic documents from which the values in each group were extracted can be displayed. As yet another example, the identity, location, and/or snippets or other excerpts of documents from which the values in each group were extracted can be displayed.
  • the descriptions of the groups of values are sorted in a confidence-based ordering. That is, the descriptions of the groups of values are ordered according to how confident the system performing process 5700 is as to the accuracy of the value(s) in each group.
  • the confidence in the accuracy of the value(s) in each group can be determined based on, e.g., the number of values in each group, the quality of the documents from which the values were extracted, and the like.
  • the system performing process 5700 can receive user selection of a desired group of values (step 5710).
  • the system can receive user interaction that identifies a selection of a desired value group.
  • the system performing process 5700 can also change other aspects of the structured presentation based on the user selection (step 5715). For example, if a user selects a group of values with a unit of measurement in meters, and there are other values that characterize the same attribute of other instances but that are presented with units of measurement in feet, such values can be converted in the structured presentation 106 to be presented in meters.
  • FIG. 58 is a flow chart of an example process 5800 for selecting one or more values for presentation in a structured presentation.
  • Process 5800 can be performed in isolation or in conjunction with other activities.
  • process 5800 can be performed at step 5515 in process 5500 (FIG. 55), alone or in conjunction with one or more of the subprocesses of process 5600 (FIG. 56) and/or process 5700 (FIG. 57).
  • the system performing process 5800 can identify electronic documents in the electronic document collection that are relevant to the instances and other attributes in a structured data collection, e.g., structured data collection 905 (step 5805).
  • structured data collection 905 can be a version of a structured presentation 106.
  • Documents that are relevant to the instances and other attributes in a structured data collection can be identified in a variety of ways.
  • the system performing process 5800 can access a data collection that associates instances, their attributes, and values characterizing those attributes, e.g., attribute/value/instance collection 5305 (FIG. 53).
  • Documents that include information relevant to the instances and other attributes in a structured data collection can be identified therein, e.g., by comparing the identifiers of the instances and the attributes in both data collections.
  • the system performing process 5800 can use the identifiers of the instances and the attributes as search terms in one or more search queries. Such search queries, alone or in conjunction with other extraction techniques such as language parsing and string comparisons, can be used to identify relevant documents in an electronic document collection.
  • the system performing process 5800 can also select one or more values for presentation in a structured presentation from the identified documents (step 5810).
  • FIG. 59 is a schematic representation of a circumstance in which attribute values drawn from electronic documents in electronic document collection 102 are presented to a user in a structured presentation 106.
  • a system such as system 900 (FIG. 9) draws attribute values from a table 5400 that associates attributes, values, and instances drawn from electronic documents in electronic document collection 102.
  • the system also merges those attribute values with an initial structured presentation 106 to form a final structured presentation 106.
  • the initial structured presentation 106 is thus acting as structured data 905 (FIG. 9).
  • the initial structured presentation has been modified to associate values of a new attribute (i.e., the attribute "AIRPORT") with instances identified in the structured presentation.
  • a new column 5905 has been added to the initial structured presentation.
  • Column 5905 is headed by an attribute identifier 5910 that identifies the new attribute using the term "AIRPORT.”
  • the addition of values of a new attribute to the structured presentation can be triggered, e.g., based on interaction with a user or automatically, as discussed further FIGS. 9-20 and the associated text.
  • a system such as search engine 202 can access a data collection that associates attributes, values, and instances drawn from electronic documents in electronic document collection 102 (such as table 5400). Using such a data collection, the system can select one or more values that characterize the new attribute of one or more of the instances in the initial structured presentation. For example, in the illustrated circumstance, value 5915 (i.e., the value "value_ai") characterizes the attribute "AIRPORT" of the instance "NEW YORK" in the document "DOC_3.” If necessary, the system can select one or more values of the new attribute for display, e.g., using one or more of processes 5600, 5700, 5800 (FIGS. 56, 57, 58).
  • a final structured presentation 106 can be presented to a viewer.
  • the final structured presentation 106 can include the selected values that characterize the new attribute of one or more of the instances in the structured presentations. For example, as shown, value 5915 can be presented in final structured presentation 106 to a viewer.
  • FIG. 60 is a schematic representation of a process 6000 in which both attributes and attribute values are drawn from electronic documents in an electronic document collection and presented to a user in a structured presentation.
  • an initial structured data collection 905 can include an preexisting structured presentation 6005.
  • the preexisting structured presentation 6005 can characterize instances using one or more attribute values, e.g., the attribute values in column 6010.
  • New attributes that characterize the instances in preexisting structured presentation 6005 can be formulated based on the content of electronic documents in electronic document collection 102, as described in FIGS. 37-51 and the associated text.
  • the new attributes can be added at step 6015 to preexisting structured presentation 6005 and appear as part of a structured presentation 6020.
  • New values of such attributes can be formulated based on the content of electronic documents in electronic document collection 102, as described herein.
  • the new values can be added at step 6025 to preexisting structured presentation 6005 and appear as part of a structured presentation 6020.
  • a new column 6030 can include an new attribute identifier 308 (namely, attribute identifier 6035) that identifies the new attribute and a new collection of attribute values 307 (namely, attribute values 6040, 6045) that characterize the new attribute.
  • attribute identifier 308 namely, attribute identifier 6035
  • attribute values 307 namely, attribute values 6040, 6045
  • FIG. 61 is a flow chart of a process 6100 for adding values to a structured presentation based on the content of documents in an electronic document collection.
  • Process 6100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 6100 can be performed by the search engine 202 in system 200.
  • Process 6100 can be performed in isolation or in conjunction with other activities.
  • process 6100 can be performed as part of one or more of processes 600, 700, 800 (FIGS. 6, 7, 8).
  • process 6100 can be performed in response to receiving input, e.g., from a user or from another system or process that triggers the creation of a new structured presentation or an update of the structured presentation.
  • process 6100 can be performed in response to a user request that one or more new attributes be added to a structured presentation 106.
  • process 6100 may be performed by a search engine, e.g., search engine 202 (FIG. 2), in response to receipt of a search query.
  • the system performing process 6100 can receive a specification of an instance and an attribute in a structured presentation (step 6105).
  • the structured presentation can be a new or a preexisting structured presentation.
  • the system can receive a search query specifying instances, or a category of instances, that are to be characterized in a structured presentation.
  • a user can interact with a preexisting structured presentation to specify an instance, and attribute, or both.
  • User interaction with a preexisting structured presentation can specify an instance and/or an attribute inherently or manually.
  • Inherent specification draws upon the systematic arrangement of instance and attribute identifiers in a structured display so that user interaction with a cell specifies an instance and an attribute associated with that cell.
  • a user manually identifies which cells include the identifiers of instances and attributes that are associated with a cell.
  • a user can enter a search query into a cell that specifies the arrangement of an instance identifier, an attribute identifier, or both within the structured presentation.
  • a search query that includes the formula "(CELL_1, CELL_2)" can specify that this cell is associated with the attribute identified in cell "CELL_2" of the instance identified in cell "CELL_1" and that a search for this attribute of this instance is to be conducted.
  • Such manual specification of instance and attribute identifiers is particularly useful in structured presentations such as spreadsheet tables, where the position of instance and attribute identifiers may be apparent to a user but unknown to a data processing device that presents a structured presentation.
  • the system performing process 6100 can formulate one or more value suggestions from documents in an electronic document collection for the specified attribute of the instance (step 6110).
  • Value suggestions can be formulated for the specified attribute in a number of different ways.
  • the system can formulate value suggestions from documents in an electronic document collection 102 by conducting a search using a search query that is constructed using the specified instance and attribute.
  • value suggestions can be formulated by, e.g., locating documents that include structured components related to the specified instance and attribute as discussed FIGS. 52- 60 and the associated text.
  • a search query can require that identifiers of the specified instance and attribute be found in a linguistic pattern indicating that a value characterizing the attribute of the instance is likely to appear. Examples of such patterns include "the
  • ⁇ attribute> of ⁇ entity> is
  • ⁇ entity> with an ⁇ attribute> of
  • ⁇ entity> has an ⁇ attribute> of
  • ⁇ entity>'s ⁇ attribute> is
  • Such patterns can be used to extract value suggestions from textual content in electronic documents.
  • the system performing process 6100 can provide one or more value suggestions to a user (step 6115). For example, a list of value suggestions can be displayed for the user on the same display screen that displays a preexisting structured presentation. The display of a list of value suggestions can be done before a value is selected for addition to the preexisting structured presentation.
  • the value suggestions can be concealed, along with search information and interactive elements, in a structured presentation. Examples of such implementations are discussed further below.
  • the system performing process 6100 can receive a user selection of a value suggestion that is to be presented in a structured display (step 6120).
  • an interactive element can interact with a user to receive one or more user inputs (e.g., mouse clicks, key strokes, or other user input) that select a value suggestion.
  • the interactive element can be concealed in a structured presentation, as discussed further below.
  • the system performing process 6100 can also add the selected value to a structured presentation (step 6125) to display the selected value in the structured presentation.
  • FIG. 62 is a schematic representation of a structured presentation in which a search interface is concealed, namely, a structured presentation 6200.
  • a search interface can include search information, one or more search interactive elements, or a combination thereof.
  • Interactive elements are components of a graphical user interface that can interact with a user, e.g., to receive input instructions. Search interactive elements and search information are relevant to a search.
  • a search is the process of locating information in an electronic document collection.
  • a search interface can include, e.g., information indicating the availability of a search to populate a structured presentations with values, an interactive element that allows a user to indicate that such a search is to be conducted, a display identifying electronic documents located during a search, an interactive element that allows a user to select from among electronic documents for populating a structured presentation with values, or combinations of these and other features.
  • Structured presentation 6200 can be any form of structured presentation, including any of the structured presentations discussed above.
  • structured presentation 6200 can be a data table displayed in a spreadsheet framework, as shown.
  • the data table of structured presentation 6200 includes a collection of rows 302 and columns 304.
  • Each row 302 includes a respective instance identifier 306 and each column 304 includes a respective attribute identifier 308.
  • the arrangement and positioning of instance identifiers 306 and attribute identifiers 308 in rows 302 and columns 304 associates each cell of the spreadsheet framework in which structured presentation 6200 is displayed with an instance and an attribute.
  • a cell 6205 in structured presentation 6200 is associated with the instance identified as “Tesla Roadster” and the attribute identified as “mpg.”
  • a cell 6210 in structured presentation 6200 is associated with the instance identified as “Chevy Volt” and the attribute identified as “range.”
  • a cell 6215 in structured presentation 6200 is associated with the instance identified as “Myers NmG” and the attribute identified as “top speed.”
  • a cell 6220 in structured presentation 6200 is associated with the instance identified as "Myers NmG” and the attribute identified as "mpg.”
  • the associations between instance, attributes, and cells such as cells 6205, 6210, 6215, 6220 can be used to receive a specification of an instance and an attribute from a user.
  • receipt of user interaction selecting cell 6220 can be taken as input specifying the instance identified as "Myers NmG" and attribute identified as "mpg.”
  • User interaction selecting a cell can include, e.g., receipt of input positioning a cursor 6225 over the cell, the user clicking on the cell, or the like.
  • the selection of a cell can be denoted by positioning a visual indicia such a perimetrical highlight 6230 in or around the cell.
  • selected cell 6220 does not include a value 307 at the time of selection.
  • structured presentation 6200 can be a new structured presentation that has not yet been populated with values.
  • structured presentation 6200 can be a preexisting structured presentation from which a value has been deleted.
  • structured presentation 6200 can be a preexisting structured presentation that drew a former value from a source document which, for some reason, is no longer operable as a source of a value.
  • FIG. 63 is a schematic representation of another structured presentation 6300 in which a search interface is concealed.
  • structured presentation 6300 includes a value 307 in selected cell 6220.
  • cell 6220 can have been populated with value 307 automatically, e.g., in response to receipt of a search query.
  • cell 6220 can have been populated by a user manually interacting with cell 6220 to enter a value.
  • cell 6220 can have been populated with value 307 in response to user specifying — either inherently or manually — an instance, an attribute, or both that are associated with cell 6220.
  • selection of cell 6220 specifies the instance identified as "Myers NmG" and the attribute identified as "mpg" that are associated therewith.
  • FIG. 64 is a schematic representation of another structured presentation 6400 in which a search interface is concealed.
  • Structured presentation 6400 includes visual indicia 6405.
  • Visual indicia 6405 visually indicate that concealed search information or interactive elements are accessible from structured presentation 6400.
  • each visual indicium 6405 is found in a separate cell, such as cells 6205, 6210, 6215, 6220.
  • the positioning and arrangement of visual indicia 6405 in cells and concomitantly the positioning and arrangement of visual indicia 6405 relative to instance identifiers 306 and attribute identifiers 308 in rows 302 and columns
  • a search interface can include search information, search interactive elements, or both.
  • a search interface can be concealed in a structured presentation in that the search information and interactive elements need not always be discernible in the structured presentation. Rather, a concealed search interface can be concealed wholly or partially from view while a structured presentation is in certain states. For example, in states where a viewer is likely to be reviewing the other information content of a structured presentation, a concealed search interface can be concealed. Such concealment can increase the portion of the structured presentation that is available for the presentation of the other information and reduce visual clutter to improve the readability of the structured presentation. FIG.
  • FIG. 65 illustrates a display element 6500 in which a formerly concealed search interface is presented.
  • display element 6500 can be presented in response to user interaction with the structured presentation itself.
  • Display element 6500 can "pop-up" in front of a structured presentation (such as structured presentations 6200, 6300, 6400) to present a search interactive element 6505 in a window 6510 in response to user interaction with cell 6220.
  • search interactive element 6505 and window 6510 can be presented in response to a user clicking on cell 6220 using a mouse.
  • Search interactive element 6505 is a hyperlink that includes text indicating that "more options" may be available for populating cell 6220.
  • FIG. 66 illustrates a display element 6600 in which a formerly concealed search interface is presented.
  • display element 6600 can be presented in response to user interaction with the structured presentation itself.
  • display element 6600 presents a source identifier 6605 in window 6510 in response to user interaction with cell 6220.
  • Source identifier 6605 includes text or other information that identifies an electronic document that is a source of the value 307 populating cell 6220.
  • the source document identified by source identifier 6605 can be a document that was located as a result of a prior search.
  • source identifier 6605 can also include a hyperlink to the source document.
  • FIG. 67 illustrates a display element 6700 in which a formerly concealed search interface is presented.
  • display element 6700 can be presented in response to user interaction with the structured presentation itself.
  • search interactive element 6505 and source identifier 6605 display element 6700 presents a snippet 6705 in window 6510 in response to user interaction with cell 6220.
  • Snippet 6705 is text or other information that describes the context of value 307 in an electronic document that is a source of the value 307 populating cell 6220.
  • FIG. 68 illustrates a display element 6800 in which a formerly concealed search interface is presented.
  • display element 6800 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 6505.
  • Display element 6800 includes a header 6802, a relevant source selection region 6805, and a consistent source selection region 6810.
  • Header 6802 can include text or other information that identifies a cell to which a value is to be added.
  • cell 6230 is identified by the attribute and value (i.e., Myers NmG: mpg) that are characterized by the value 307 in cell
  • Relevant source selection region 6805 can include information and interactive elements that allow a user to specify that the relevancy of a source electronic document to a specified instance and attribute is to be used in selecting a value that is to populate a structured presentation.
  • the user can specify that a single "most relevant" document is to be the sole source of a value that is to populate a structured presentation.
  • the relevancy of a document can characterize how closely a the document matches, e.g., an attribute and an instance that define a search.
  • relevant source selection region 6805 includes a header 6815, a selection widget 6820, a value identifier 6825, and a source identifier 6830.
  • Header 6815 includes text or other information that identifies that relevant source selection region 6805 allows a user to specify that the most relevant electronic document is to be used as the source of the value populating the cell identified in header 6802.
  • Selection widget 6820 allows a user to select the use of the most relevant document as the source of the value populating the structured presentation.
  • Value identifier 6825 includes text or other information that identifies the value drawn from the currently most relevant document.
  • Source identifier 6830 includes text or other information that identifies the currently most relevant electronic document.
  • source identifier 6830 can also include a hyperlink to the currently most relevant document. Since the most relevant document can change over time, the value identified by value identifier 6825 and the document identified by source identifier 6830 can also change over time.
  • Consistent source selection region 6810 can include information and interactive elements that allow a user to specify that a source electronic document is to be used consistently in selecting a value that is to populate a structured presentation.
  • the user can select from among three candidate documents to specify that document that is to be consistently used as the source of a value that is to populate a structured presentation.
  • consistent source selection region 6810 includes a header 6835, a collection of selection widgets 6840, a collection of value identifiers 6845, and a collection of source identifiers 6850.
  • Header 6835 includes text or other information that identifies that relevant source selection region 6805 allows a user to specify that a source electronic document is to be used consistently in selecting a value.
  • Selection widgets 6840 allow a user to select the document that is to consistently be used. In the illustrated implementation, the user can select from among three different documents.
  • Value identifiers 6845 include text or other information that identifies the current values that can be drawn from particular documents to populate a structured presentation.
  • Source identifiers 6850 include text or other information that identifies the electronic documents from which the values identified by value identifiers 6825 are drawn. In some implementations, source identifiers 6850 can also include hyperlinks to the electronic documents from which the values identified by value identifiers 6825 are drawn.
  • Both the relevancy of an electronic document and the value in an electronic document can change over time.
  • the person who adds an electronic document to an electronic document collection can change the content of the electronic document so that the relevancy of that document to an instance and attribute changes.
  • the person who adds an electronic document to an electronic document collection can change the value that is used to characterize an attribute of an instance.
  • Headers 6815, 6835 can include text or other information identifying the nature of the changes that can occur.
  • header 6815 includes text identifying that both the most relevant document and the value of an attribute can change when the user specifies that the relevancy of a source electronic document to a specified instance and attribute is to be used in selecting a value that is to populate a structured presentation.
  • header 6835 includes text stating that the value of an attribute can change when the user specifies that a source electronic document is consistently to be used in selecting a value that is to populate a structured presentation.
  • FIG. 69 illustrates a display element 6900 in which a formerly concealed search interface is presented.
  • display element 6900 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 6505.
  • display element 6900 includes a collection of snippets 6905 and a collection of search interactive elements 6910.
  • Each snippet 6905 is text or other information that describes the context of the respective values identified by value identifiers 6825, 6845 in an electronic document that is a source of the identified value.
  • Search interactive elements 6910 are hyperlinks that allow a user to navigate to the respective electronic document that is the source of the value identified by the respective value identifier 6845.
  • FIG. 70 illustrates a display element 7000 in which a formerly concealed search interface is presented.
  • display element 7000 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 6505.
  • display element 6900 includes a search trigger 7005.
  • Search trigger 7005 is an interactive element that triggers a search of an electronic document collection.
  • search trigger 7005 can allow a user to indicate dissatisfaction with the results of the previous searches.
  • the search triggered by search trigger 7005 can be a "full search" that is conducted using a general purpose search engine such as the GoogleTM search engine.
  • the search engine can be presented with a query that is automatically generated using the instance and attribute specified by previous user interaction.
  • the nature of the user interaction that triggers the display of formerly concealed search information and interactive elements can determine the category of the search information and interactive elements that are displayed. For example, user interaction specifying a single cell in a structured presentation can trigger presentation of search information and interactive elements that are relevant to populating that same cell with values.
  • user interaction with a column, a row, or other collection of cells can trigger presentation of search information and interactive elements that are relevant to populating that collection of cells with values.
  • user interaction with a column can allow a user to specify that the values populating that column are to be consistently drawn from a single source document or family of source documents.
  • user interaction with a row can allow a user to specify that the values populating that row are to be drawn from the source document is most relevant to an instance and the attributes of that row.
  • FIG. 71 is a flow chart of a process 7100 for adding values to a structured presentation by drawing the values from the content of documents in an electronic document collection.
  • Process 7100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 7100 can be performed by the search engine 202 in system 200.
  • Process 7100 can be performed in isolation or in conjunction with other activities.
  • process 7100 can be performed as part of steps 6105, 6115, and 6120 of process 6100 (FIG. 61).
  • the system performing process 7100 can receive data characterizing a user interaction specifying one or more cells of a structured presentation (step 7105).
  • the structured presentation can be a new or a preexisting structured presentation.
  • the interaction with one or more cells can concomitantly specify one or more attributes and instances, as discussed above.
  • the system performing process 7100 can determine whether or not one or more values populating the one or more cells resulted from a prior search of an electronic document collection (step 7110). Such a determination can be made by accessing a data storage device that stores information characterizing not only the information that is visibly displayed in a structured presentation but also information characterizing any prior search conducted to populate the structured presentation.
  • the stored information characterizing the prior search can include, e.g., an indication that a search was indeed conducted, URLs of source document in the result set of the prior search, and snippets characterizing the context of the values in the source documents. If the system performing process 7100 determines that a value resulted from a previous search, the system can present search information characterizing the prior search (step 7115).
  • Such information can include, e.g., information identifying a source document in the result set from which a value was drawn, a snippet characterizing the context of the value in a source document, and a hyperlink to the source document.
  • the system can present search information characterizing a single source document in presentations such as display elements 6600, 6700 (FIGS. 66, 67).
  • the system can present search information regarding multiple source documents — including source documents having values different from those visibly populating a structured presentation — in presentations such as display elements 6800, 6900, 7000 (FIGS. 68, 69, 70).
  • the system performing process 7100 can transition between presentation of search information regarding a single source document and search information regarding multiple source documents in response to interaction with a user.
  • the system can receive user interaction with an search interactive element such as search interactive element 6505 and transition between display elements 6600, 6700 and display elements 6800, 6900, 7000 (FIGS. 65-70).
  • the system performing process 7100 can also conduct a new search and provide information characterizing one or more electronic documents in the result set yielded by the new search (step 7120).
  • the characterizing information can include, e.g., names and URLs of the electronic documents, snippets of the electronic documents, summaries of the electronic documents, or the like.
  • the result set can characterize a single source document in presentations such as display elements 6600, 6700 (FIGS. 66, 67) or multiple source documents in presentations such as display elements 6800, 6900, 7000 (FIGS. 68, 69, 70).
  • the system can transition between presentation of search information regarding a single source document and search information regarding multiple source documents in response to interaction with a user. For example, the system can receive user interaction with an search interactive element such as search interactive element 6505 and transition between display elements 6600, 6700 and display elements 6800, 6900, 7000 (FIGS. 65-70).
  • FIG. 72 illustrates a display element 7200 in which a formerly concealed search interface is presented.
  • display element 7200 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 7005.
  • Display element 7200 can receive a value that results from a search, e.g., a search conducted using a general purpose search engine.
  • the value received by display element 7200 can be added into a structured presentation to characterize an attribute of an instance.
  • Display element 7200 includes a header 7205, an instance identifier 7210, an attribute identifier 7215, a value entry element 7220, a value use trigger element 7225, and a presentation close element 7230.
  • Header 7205 is text or other information that describes that display element 7200 can receive a value of an attribute of an instance. Header 7205 can also prompt the user to enter a value resulting from a search. For example, header 7205 can be text asking if a search was successful.
  • Instance identifier 7210 is text or other information that identifies an instance, or a category of instances, that is to be characterized by the value entered using presentation 7200.
  • instance identifier 7210 is text identifying the instance "China.”
  • Attribute identifier 7215 is text or other information that identifies an attribute of the instance identified by instance identifier 7210. The attribute identified by attribute identifier 7215 can be characterized by the value received by presentation 7200.
  • attribute identifier 7215 is text identifying the attribute
  • Value entry element 7220 is an interactive element that allows a user to specify a value characterizing the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210.
  • Value entry element 7220 can be, e.g., a text entry field.
  • Value use trigger element 7225 is an interactive element that allows a user to trigger the use of a value entered in value entry element 7220 to characterize the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210 in a structured presentation.
  • Value use trigger element 7225 can be, e.g., a button that includes text identifying that user interaction with value use trigger element 7225 will result in the value entered in value entry element 7220 being used in a structured presentation.
  • Presentation close element 7230 is an interactive element that allows a user to close display element 7200.
  • display element 7200 can be closed regardless of whether the value entered in value entry element 7220 is used, in a structured presentation, to characterize the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210.
  • Presentation close element 7230 can be, e.g., a button that includes text identifying that user interaction with presentation close element 7230 will close display element 7200.
  • FIG. 73 illustrates a display element 7300 in which a formerly concealed search interface is presented.
  • display element 7300 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 7005.
  • Display element 7300 can receive a value of an attribute of an instance to be added into a structured presentation.
  • instance identifier 7210, attribute identifier 7215, value entry element 7220, value use trigger element 7225, and presentation close element 7230 display element 7300 includes a source entry element 7305 and a source entry element identifier 7310.
  • Source entry element 7305 is an interactive element that allows a user to specify a source of a value characterizing the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210.
  • Source entry element 7220 can be, e.g., a text entry field.
  • Source entry element identifier 7310 is text or other information that describes that source entry element 7305 can be used to specify a source of the value.
  • display elements 7200, 7300 can be displayed for a user on a display screen after an unsuccessful search. For example, display elements 7200, 7300 can be displayed in response to receipt of an indication from a user that the user is dissatisfied with the results of a previous search. For example, the display of display elements 7200,
  • display elements 7200, 7300 can be displayed after an automatic search for values of an attribute of an instance has provided unsatisfactory results.
  • an attribute, and instance, or both may be improperly specified, e.g., due to a misspelling or other error.
  • an attribute or an instance can be specified without error but relative to an unknown or indefinite value.
  • the instance "suitable for Jim and Diane” is specified relative to indefinite values, namely, the identity of Jim and Diane, as well as the nature of what is "suitable” for them.
  • the instance "my car” is specified relative to an indefinite value, namely, the identity of the person whose car is to be characterized.
  • a search for values can also provide unsatisfactory results because an electronic document that resulted from a prior search is inoperative to provide a value for the structured presentation.
  • a source document from which a value is to be drawn can become unavailable.
  • a source document can become unavailable, e.g., when the party who had added the source document withdraws it from an electronic document collection.
  • such a source document can remain available but the value itself can become unavailable in the source document.
  • a value can become unavailable, e.g., when the party who added a source document to an electronic document collection changes the content of the source document.
  • FIG. 74 illustrates a display element 7400 in which a formerly concealed search interface is presented. Display element 7400 can be presented in response to use interaction or automatically in response to a triggering event. For example, display element 7400 can be presented automatically in response to a prior search becoming inoperative.
  • Display element 7400 includes a search interactive element 6505, a source identifier 6605, and an error message 7405 in a window 6510.
  • Search interactive element 6505 is a hyperlink that includes anchor text indicating that "more options" are available for searching for values to populate cell 6220.
  • Source identifier 6605 is a collection of text that identifies an electronic document that is to be a source of value 307 populating cell 6220.
  • Error message 7405 can include text or other information indicating that the results of a prior search have been rendered inoperative. For example, error message 7405 can indicate that value 307 has become unavailable in the source document identified by source identifier 6605. Error message 7405 can include information describing the nature of the inoperativeness or simply indicating that an error has occurred. For example, in the illustrated implementation, error message 7405 indicates that the value is no longer available within an electronic document that itself remains available.
  • FIG. 75 is a flow chart of a process 7500 for adding values to a structured presentation based on the content of documents in an electronic document collection.
  • Process 7500 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions.
  • process 7500 can be performed by the search engine 202 in system 200.
  • Process 7500 can be performed in isolation or in conjunction with other activities.
  • process 7100 can be performed as part of process 6100 (FIG. 61).
  • the system performing process 7500 can receive an update trigger triggering an update of the one or more values of one or more cells of a preexisting structured presentation (step 7505).
  • the update trigger can be, e.g., generated automatically in response to the passage of a period of time since a previous update, manually in response to user interaction, or the like.
  • user interaction with a cell 6220 can trigger the update of that cell, as discussed above.
  • the update trigger can trigger the update of the value or a single cell, the value of a collection of cells, or the values of all the cells in a structured presentation.
  • the update trigger in can concomitantly specify one or more attributes of one or more instances, as discussed above.
  • the system performing process 7500 can determine whether or not one or more prior searches for populating the structured presentation with values has become inoperative (step 7510). Such a determination can be made by seeking to access documents from which the values populating the structured presentation are to be drawn.
  • the system performing process 7100 determines that a prior search has not become inoperative, the system can update a structured presentation with the content of one or more source documents identified in the prior search (step 7515).
  • a new values used to update the structured presentation need not be identical to a value previously used to populate the structured presentation. Rather, the updated structured presentation can include a value provided by the source electronic document with its current content.
  • the system performing process 7100 determines that a prior search has become inoperative, the system can inform the user of the inoperability of the prior search (step
  • a display element such as display element 7400 can be used to inform the user of the operability and provide the user with the opportunity to conduct a new search to populate the structured presentation with values.
  • the system can also conduct a new search and provide information characterizing one or more electronic documents in the result set yielded by the new search, as described in reference to step 7120 of process 7100 (FIG. 71).
  • Embodiment 1 A machine-implemented method comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation; adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on a display device.
  • Embodiment 2 The method of embodiment 1, wherein adding the identifier of the new instance comprises: formulating a collection of instance suggestions; providing the instance suggestion collection to a user; and receiving a user selection of the new instance, wherein the new instance is in the collection of instance suggestions.
  • Embodiment 3 The method of embodiment 2, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation.
  • Embodiment 4 The method of embodiment 2, wherein formulating the collection of instance suggestions comprises: identifying a first document in the electronic document collection that includes an identifier of an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; identifying a second document that is arranged in accordance with the template but relevant to a second instance; and including the second instance in the instance suggestion collection.
  • Embodiment 5 The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation.
  • Embodiment 6 The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises locating the new instance in a stored collection of associations of instances with attributes.
  • Embodiment 7 The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the characteristics of the preexisting structured presentation with the attributes characterized in the preexisting structured presentation.
  • Embodiment 8 The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 9 The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the value of attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 10 The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 11 The method of embodiment 1 , wherein: the collection of electronic documents comprises the electronic documents available on the Internet; and the electronic documents comprise web pages.
  • Embodiment 12 The method of embodiment 1, wherein the expanded structured presentation comprises a table.
  • Embodiment 13 The method of embodiment 1, wherein the expanded structured presentation comprises a collection of cards.
  • Embodiment 14 The method of embodiment 1, further comprising visually presenting the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • Embodiment 15 An apparatus comprising one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations, the operations comprising: formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the instance suggestion collection to a user; receiving a user selection of a first instance in the collection of instance suggestions; and adding an identifier of the first instance suggestion to a structured presentation presented on a display device, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation.
  • Embodiment 16 The apparatus of embodiment 15, wherein formulating the collection of instance suggestions comprises comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection.
  • Embodiment 17 The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation.
  • Embodiment 18 The apparatus of embodiment 16, wherein formulating the instance suggestion collection comprises: identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; identifying a second document that is arranged in accordance with the template but relevant to the a second instance; and including the second instance in the instance suggestion collection.
  • Embodiment 19 The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises identifying documents in the electronic document collection that include identifiers of one or more instances in the preexisting structured presentation.
  • Embodiment 20 The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises identifying additional attributes used to characterize instances in the preexisting structured presentation.
  • Embodiment 21 The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises comparing values of attributes used to characterize instances in the preexisting structured presentation with values of the instance suggestions.
  • Embodiment 22 The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises: identifying a category of instances that includes instances in the preexisting structured presentation; and formulating the collection of instance suggestions using instances in the category of instances.
  • Embodiment 23 The apparatus of embodiment 15, wherein formulating the collection of instance suggestions comprises identifying the instance suggestions in a stored collection of associations of instances with attributes.
  • Embodiment 24 The apparatus of embodiment 15, wherein formulating the collection of instance suggestions comprises comparing the attributes characterized in the preexisting structured presentation with the content of the documents in the unstructured electronic document collection.
  • Embodiment 25 The apparatus of embodiment 15, wherein: the collection of electronic documents comprises the documents available on the Internet; and the electronic documents comprise web pages.
  • Embodiment 26 The apparatus of embodiment 15, wherein the structured presentation comprises a table.
  • Embodiment 27 The apparatus of embodiment 15, wherein the structured presentation comprises a collection of cards.
  • Embodiment 28 A system comprising: a client device; and one or more computers programmed to interact with the client device and to perform operations comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation; adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on a display device coupled in data communication with the client device.
  • Embodiment 29 A system comprising: a client device; and one or more computers programmed to interact with the client device and to perform operations comprising: formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the instance suggestion collection to a user using the client device; receiving a user selection of a first instance in the collection of instance suggestions; and adding an identifier of the first instance suggestion to a structured presentation presented on a display device coupled in data communication with the client device, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation.
  • Embodiment 30 The system of embodiment 29, wherein: the one or more computers comprise a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client; the client device comprises a personal computer running a web browser; and the personal computer comprises the display device.
  • Embodiment 31 A machine-implemented method comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation; adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on a display screen.
  • Embodiment 32 The method of embodiment 31, wherein adding the identifier of the new attribute comprises: formulating a collection of attribute suggestions; providing the attribute suggestion collection to a user; and receiving a user selection of the new attribute, wherein the new attribute is in the collection of instance suggestions.
  • Embodiment 33 The method of embodiment 32, wherein formulating the attribute suggestion collection comprises: identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; and adding an attribute used in the first document to characterize the instance in the attribute suggestion collection.
  • Embodiment 34 The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation.
  • Embodiment 35 The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation.
  • Embodiment 36 The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying the new attribute in a stored collection of associations of instances with attributes.
  • Embodiment 37 The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the instances characterized in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 38 The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying additional instances related to the instances identified in the preexisting structured presentation.
  • Embodiment 39 The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing an attribute or a value of an attribute used to characterize an instances in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 40 The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 41 The method of embodiment 31, wherein: the collection of electronic documents comprises the electronic documents available on the Internet; and the electronic documents comprise web pages.
  • Embodiment 42 The method of embodiment 31, wherein the expanded structured presentation comprises a table.
  • Embodiment 43 The method of embodiment 31, wherein the expanded structured presentation comprises a collection of cards.
  • Embodiment 44 The method of embodiment 31, further comprising visually presenting the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • Embodiment 45 An apparatus comprising one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations, the operations comprising: formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the attribute suggestion collection to a user; receiving a user selection of a first attribute in the collection of attribute suggestions; and adding an identifier of the first attribute suggestion to a structured presentation presented on a display screen, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation.
  • Embodiment 46 The apparatus of embodiment 45, wherein formulating the collection of attribute suggestions comprises comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection.
  • Embodiment 47 The apparatus of embodiment 46, wherein formulating the collection of attribute suggestions comprises identifying documents in the electronic document collection that include structured components that characterize instances identified in the preexisting structured presentation.
  • Embodiment 48 The apparatus of embodiment 46, wherein formulating the attribute suggestion collection comprises: identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; and including an attribute used to characterize the instance in the attribute suggestion collection.
  • Embodiment 49 The apparatus of embodiment 46, wherein formulating the collection of attribute suggestions comprises identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation.
  • Embodiment 50 The apparatus of embodiment 46, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing instances identified in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 51 The apparatus of embodiment 46, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing an attribute or a value of an attribute used to characterize an instance in the preexisting structured presentation with the content of the electronic documents.
  • Embodiment 52 The apparatus of embodiment 46, wherein formulating the collection of attribute suggestions comprises: identifying a category of instances that includes instances in the preexisting structured presentation; and formulating the collection of attribute suggestions from attributes used to characterize instances in the category of instances.
  • Embodiment 53 The apparatus of embodiment 45, wherein formulating the collection of attribute suggestions comprises identifying the attribute suggestions in a stored collection of associations of instances with attributes.
  • Embodiment 54 The apparatus of embodiment 45, wherein: the collection of electronic documents comprises electronic documents available on the Internet; and the electronic documents comprise web pages.
  • Embodiment 55 The apparatus of embodiment 45, wherein the structured presentation comprises a table.
  • Embodiment 56 The apparatus of embodiment 45, wherein the structured presentation comprises a collection of cards.
  • Embodiment 57 A system comprising: a client device comprising a display screen; and one or more computers programmed to interact with the client device and to perform operations comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation; adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on the display screen.
  • Embodiment 58 A system comprising: a client device comprising a display screen; and one or more computers programmed to interact with the client device and to perform operations comprising: formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the attribute suggestion collection to the client device; receiving a selection of a first attribute in the collection of attribute suggestions from the client device; and adding an identifier of the first attribute suggestion to a structured presentation presented on the display screen, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation.
  • Embodiment 59 A machine-implemented method comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation; adding the new value to the preexisting structured presentation to form a new structured presentation; and outputting instructions for visually presenting the new structured presentation.
  • Embodiment 60 The method of claim 59, wherein: comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises locating an identifier of a first instance that appears in the structured presentation in a first electronic document; and the method further comprises extracting the new value from the first electronic document.
  • Embodiment 61 The method of claim 59, wherein adding the new value comprises: identifying a collection of values of a first attribute of a first instance; and establishing a subset of one or more of the identified values as suitably characterizing the first attribute of the first instance.
  • Embodiment 62 The method of claim 61, wherein establishing the subset of values as suitable comprises grouping the values in the collection into groups.
  • Embodiment 63 The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a count of values in the subset.
  • Embodiment 64 The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on values in the subset meeting a user- specified constraint.
  • Embodiment 65 The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a value in the subset being drawn from a high quality document.
  • Embodiment 66 The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another instance in the preexisting structured presentation.
  • Embodiment 67 The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another attribute in the preexisting structured presentation.
  • Embodiment 68 The method of claim 59, wherein: the collection of electronic documents comprises the electronic documents available on the Internet; and the electronic documents comprise web pages.
  • Embodiment 69 The method of claim 59, wherein the preexisting structured presentation comprises a table.
  • Embodiment 70 The method of claim 59, wherein the preexisting structured presentation comprises a collection of cards.
  • Embodiment 71 The method of claim 59, further comprising visually presenting the new structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • Embodiment 72 An apparatus comprising one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations, the operations comprising: receiving description data describing a first instance, a second instance, and a first attribute; extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection; extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection; establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance; establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance; and generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset, wherein the structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values.
  • Embodiment 74 The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises selecting the first subset based at least in part on a count of values in the first subset.
  • Embodiment 75 The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises comparing the values in the first subset with a user- specified constraint on the values.
  • Embodiment 76 The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises determining that a value in the first subset is drawn from a high quality document.
  • Embodiment 77 The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises determining that a value in the first subset is drawn from a document relevant to the second instance.
  • Embodiment 78 The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises determining that a value in the first subset is drawn from a document relevant to another attribute that characterizes both the first instance and the second instance.
  • Embodiment 79 The apparatus of claim 72, wherein: the description of the first instance comprises an identifier of the first instance that appears in a preexisting structured presentation; and the description of the second instance comprises an identifier of the second instance that appears in the preexisting structured presentation.
  • Embodiment 80 The apparatus of claim 72, wherein the description of the first attribute comprises a description of a new attribute that is to be added to a preexisting structured presentation.
  • Embodiment 81 The apparatus of claim 72, wherein the unstructured electronic document collection comprises electronic documents available on the Internet.
  • Embodiment 83 The apparatus of claim 72, wherein the structured presentation comprises a collection of cards.
  • Embodiment 84 The apparatus of claim 72, further comprising visually presenting the structured presentation on a display screen, including physically transforming one or more elements of the display screen.
  • Embodiment 85 A system comprising: a device; and one or more computers programmed to interact with the device and to perform operations comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation; adding the new value to the preexisting structured presentation to form a new structured presentation; and outputting instructions for visually presenting the new structured presentation to the device.
  • Embodiment 86 A system comprising: a device; and one or more computers programmed to interact with the device and to perform operations comprising: receiving description data describing a first instance, a second instance, and a first attribute; extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection; extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection; establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance; establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance; generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset, wherein the structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values; and sending the machine-readable instructions to the device.
  • Embodiment 87 A machine-implemented method comprising: displaying a structured presentation on a display device, the structured presentation visually presenting information in a systematic and structured arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; receiving data characterizing a user interaction with the displayed structured presentation, the data including a specification of a first instance and a first attribute of the structured presentation; and displaying a formerly concealed search interface on the display device in response to receiving the data, the search interface including information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
  • Embodiment 88 The method of embodiment 87, wherein receiving the data characterizing the user interaction with the displayed structured presentation comprises receiving a manual user specification of the first instance and the first attribute that are associated with a cell in the structured presentation.
  • Embodiment 89 The method of embodiment 87, wherein receiving data characterizing the user interaction comprises receiving data characterizing the user interaction with a cell in the structured presentation, the cell being associated with the first instance and the first attribute by virtue of the arrangement of the cell relative to identifiers of the first instance and the first attribute in the structured presentation.
  • Embodiment 90 The method of embodiment 89, wherein receiving data characterizing the user interaction with the cell comprises receiving data characterizing the user interaction with an empty cell.
  • Embodiment 91 The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying an interactive element that can be selected by a user to trigger a search of the electronic document collection to locate the first value.
  • Embodiment 92 The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying an interactive value entry element that can be selected by a user to specify a value characterizing the first attribute of the first instance. 93. The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying a snippet characterizing a context of the first value in a first document of the electronic document collection.
  • Embodiment 94 The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying a result of a prior search of the electronic document collection to locate the first value.
  • Embodiment 95 The method of embodiment 87, wherein: the first value appears in the structured presentation as a value characterizing the first attribute of the first instance; and displaying the formerly concealed search interface comprises displaying an identifier of a first electronic document in the electronic document collection, wherein the first value is drawn from the first electronic document.
  • Embodiment 96 The method of embodiment 95, further comprising: determining that the first electronic document is inoperative to provide the first value; and displaying a visual indication of the inoperativeness of the first document.
  • Embodiment 97 The method of embodiment 87, wherein displaying the formerly concealed search interface comprises presenting the user with an option to select the first value consistently from a first document regardless of changes in relevancy of the first document to the first instance and the first attribute.
  • Embodiment 98 The method of embodiment 87, wherein displaying the formerly concealed search interface comprises presenting the user with an option to select the first value from a first document that is most relevant to the first instance and the first attribute.
  • Embodiment 99 The method of embodiment 87, further comprising: searching an unstructured collection of electronic documents to locate the first value in response to a user interaction with the search interface; and adding the first value to the structured presentation.
  • Embodiment 100 The method of embodiment 87, wherein receiving the specification of the first instance and the first attribute comprises receiving a specification of a collection of attributes or a collection of instances.
  • Embodiment 101 The method of embodiment 87, further comprising updating the display of the structured presentation in response to a passage of a time.
  • Embodiment 102 A system comprising: one or more computers programmed to interact with client devices and to perform operations comprising: receiving data characterizing user interaction specifying a first cell of a structured presentation displayed on a display device, the structured presentation visually presenting information in a systematic and structured arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of the values in cells; determining that a prior search has been conducted to populate the first cell with a first value; and in response to determining that a prior search was conducted, displaying information characterizing the prior search on the display device.
  • Embodiment 103 The system of embodiment 102, wherein receiving the data characterizing user interaction specifying the first cell comprises receiving data characterizing a manual user specification of the first instance and the first attribute that are associated with the first cell.
  • Embodiment 104 The system of embodiment 102, wherein displaying information characterizing the prior search comprises displaying information identifying an electronic document from which the first value is drawn.
  • Embodiment 105 The system of embodiment 102, wherein displaying information characterizing the prior search comprises displaying information identifying a collection of electronic documents from which the first value could have been drawn.
  • Embodiment 106 The system of embodiment 102, wherein displaying the information characterizing the prior search comprises displaying the information in a display element of a formerly concealed search interface.
  • Embodiment 107 The system of embodiment 102, wherein displaying information characterizing the prior search comprises displaying information identifying a first electronic document in the electronic document collection from which the first value is drawn.
  • Embodiment 108 The system of embodiment 107, wherein the operations further comprise: determining that the first electronic document is inoperable to provide the first value; and displaying a visual indication of the inoperability of the first document.
  • Embodiment 109 The system of embodiment 102, wherein the operations further comprise updating a display of a value in the first cell of the structured presentation in response to the user interaction.
  • Embodiment 110 The system of embodiment 102, wherein displaying the information characterizing the prior search comprises displaying a snippet characterizing a context of the first value in a first document of the electronic document collection.
  • Embodiment 111 The system of embodiment 110, wherein: the collection of electronic documents comprises electronic documents available on the Internet; and the electronic documents comprise web pages.
  • Embodiment 112 The system of embodiment 102, wherein the structured presentation comprises a collection of cards.
  • Embodiment 113 A system comprising: one or more computers programmed to interact with a client device comprising a display device and to perform operations comprising: displaying a structured presentation on the display device, the structured presentation visually presenting information in a systematic and structured arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; receiving data characterizing a user interaction with the displayed structured presentation, the data including a specification of a first instance and a first attribute of the structured presentation; and displaying a formerly concealed search interface on the display device in response to receiving the data, the search interface including information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
  • Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a computer storage medium for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded on a propagated signal that is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by data processing apparatus.
  • the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • data processing apparatus encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
  • the apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a computer program (also known as a program, software, software application, script, or code) may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it may be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
  • a computer program may be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification may be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows may also be performed by, and apparatus may also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer may be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few.
  • Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
  • the processor and the memory may be supplemented by, or incorporated in, special purpose logic circuitry.
  • a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user may provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input.
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Methods, systems, and apparatus, including computer programs stored on computer storage media, for retrieval and display of information from an unstructured electronic document collection. One aspect can be embodied in machine-implemented methods that include the actions of receiving a machine-readable search query from a user and responding to the search query with instructions for presenting the user with a structured presentation of instances relevant to the search query. A visual presentation of the structured presentation denotes associations between the instances and values that characterize attributes of the instances by virtue of an arrangement of identifiers of the instances and the values.

Description

RETRIEVING AND DISPLAYING INFORMATION FROM AN UNSTRUCTURED ELECTRONIC DOCUMENT COLLECTION
BACKGROUND
This specification relates to retrieving and displaying information from an unstructured electronic document collection.
An electronic document is a collection of machine-readable data. Electronic documents are generally individual files and are formatted in accordance with a defined format (e.g., PDF, TIFF, HTML, MS Word, PCL, PostScript, or the like). Electronic documents can be electronically stored and disseminated. In some cases, electronic documents include audio content, visual content, and other information, as well as text and links to other electronic documents.
Electronic document can be collected into electronic document collections. Electronic document collections can either be unstructured or structured. The formatting of the documents in an unstructured electronic document collection is not constrained to conform with a predetermined structure and can evolve in often unforeseen ways. In other words, the formatting of individual documents in an unstructured electronic document collection is neither restrictive nor permanent across the entire document collection. Further, in an unstructured electronic document collection, there are no mechanisms for ensuring that new documents adhere to a format or that changes to a format are applied to previously existing documents. Thus, the documents in an unstructured electronic document collection cannot be expected to share a common structure that can be exploited in the extraction of information. Examples of unstructured electronic document collections include the documents available on the Internet, collections of resumes, collections of journal articles, and collections of news articles. Documents in some unstructured electronic document collections are not prohibited from including links to other documents inside and outside of the collection.
In contrast, the documents in structured electronic document collections generally conform with formats that can be both restrictive and permanent. The formats imposed on documents in structured electronic document collections can be restrictive in that common formats are applied to all of the documents in the collections, even when the applied formats are not completely appropriate. The formats can be permanent in that an upfront commitment to a particular format by the party who assembles the structured electronic document collection is generally required. Further, users of the collections — in particular, programs that use the documents in the collection — rely on the documents' having the expected format. As a result, format changes can be difficult to implement. Structured electronic document collections are best suited to applications where the information content lends itself to simple and stable categorizations. Thus, the documents in a structured electronic document collection generally share a common structure that can be exploited in the extraction of information. Examples of structured electronic document collections include databases that are organized and viewed through a database management system (DBMS) in accordance with hierarchical and relational data models, as well as a collections of electronic documents that are created by a single entity for presenting information consistently. For example, a collection of web pages that are provided by an online bookseller to present information about individual books can form a structured electronic document collection. As another example, a collection of web pages that is created by server-side scripts and viewed through an application server can form a structured electronic document collection. Thus, one or more structured electronic document collections can each be a subset of an unstructured electronic document collection.
SUMMARY
This specification describes technologies relating to retrieval and display of information from an unstructured electronic document collection, for example, the electronic documents available on the Internet. Although an electronic document collection may be unstructured, the information content of the unstructured electronic document collection can be displayed in a structured presentation. In particular, the information content of an unstructured electronic document collection can be used not only to determine the values of attributes but also to identify, select, and name attributes and instances in a structured presentation. Such structured presentations can present information in a coherent manner to a user despite the diversity in sources. Examples of structured presentations include tables and other collections of records.
In general, one aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving a machine- readable search query from a user and responding to the search query with instructions for presenting the user with a structured presentation of instances relevant to the search query. A visual presentation of the structured presentation denotes associations between the instances and values that characterize attributes of the instances by virtue of an arrangement of identifiers of the instances and the values. The identifiers of the instances and the values are drawn from two or more documents in an unstructured collection of electronic documents. The electronic document collection being unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
This and other aspects can include one or more of the following features. Responding to the search query can include identifying a first collection of electronic documents in the unstructured collection that relate to the instances, extracting values of the attributes of the instances from the first collection of electronic documents, and populating the structured presentation with the values extracted from two or more electronic documents. Responding to the search query can include extracting a first value of a first attribute of a first instance from a first electronic document, extracting a second value of a second attribute of the first instance from a second electronic document, and associating the first value and the second value with the first instance in a single in the structured presentation. The first attribute can differ from the second attribute and the first electronic document can differ from the second electronic document. Responding to the search query can include extracting a first value of an attribute of a first instance from a first electronic document, extracting a second value of an attribute of a second instance from the first electronic document, associating the first value with the first instance in a first record, and associating the second value in with the second instance in a second record. The first instance can differ from the second instance. The structured presentation can include a table and the records can include rows or columns of the table. The structured presentation can include a collection of cards and the records can be individual cards in the collection. The method can also include receiving a trigger for the addition of a new instance to the structured presentation and suggesting new instances for addition to the structured presentation in response to the trigger. The method can also include receiving a specification of a constraint from a user and suggesting new instances comprises suggesting new instances that satisfy the user- specified constraint. The method can include receiving a trigger for the addition of a new attribute to the structured presentation and adding a new attribute to the structured presentation in response to the trigger. The method can also include receiving a user specification of a trait of the new attribute and populating the structured presentation with values of the attribute based on the user- specified trait. The unstructured electronic document collection can include electronic documents available on the Internet. The structured presentation can be physically presented on a display screen, including physically transforming one or more elements of the display screen. Other embodiments of this aspect include corresponding systems, apparatus, and computer programs recorded on computer storage devices, each configured to perform the operations of the methods.
Another aspect of the subject matter described in this specification can be embodied in an apparatus that includes one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations. The operations can include receiving description data describing a preexisting structured presentation, drawing an identifier of a first instance from a first web site, drawing a first value of a first attribute of the first instance from a second web site, adding the identifier of a first instance and the new value to the preexisting structured presentation to form a new record in a new structured presentation, and outputting instructions for visually presenting the new structured presentation. A visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
This and other aspects can include one or more of the following features. Drawing the identifier of the first instance from the first web site can include comparing characteristics of the preexisting structured presentation with content of the preexisting structured presentation. The operations can also include receiving an identifier of a second instance from the user. The new structured presentation can include a second new record that presents the second instance in association with a second value of the first attribute of the second instance. The operations can include receiving the second value from the user. A collection of candidate values can be presented to the user and a selection of a second value can be received from the user. The collection of candidate values an include the second value. A collection of candidate values of the first attribute of the second instance can be identified and, for each of the candidate values, a confidence that the candidate value is correct can be determined. The operations can include suggesting a collection of new instances to be added to the structured presentation. The collection of new instances can be suggested by comparing characteristics of the preexisting structured presentation with content of the first web site and the second web site and/or by comparing a machine-readable search query with content of the first web site and the second web site. Drawing the first value from the second web site can include identifying that the second web site includes a review, extracting the identifier directly from the first web site, or extracting the identifier from a machine-readable database that includes information extracted from the first web site. The preexisting structured presentation can include a table and the records can include rows or columns of the table. The preexisting structured presentation can include a collection of cards and the records can be individual cards in the collection. The operations can include visually displaying the new structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Other embodiments of this aspect include corresponding systems, apparatus, and methods.
In another aspect, a system includes a client device and one or more computers programmed to interact with the client device and to perform operations. The operations include receiving description data describing a preexisting structured presentation, drawing an identifier of a first instance from a first web site, drawing a first value of a first attribute of the first instance from a second web site, adding the identifier of a first instance and the new value to the preexisting structured presentation to form a new record in a new structured presentation, and outputting to the client device instructions for visually presenting the new structured presentation. A visual presentation of the preexisting structured presentation visually presents information in a systematic arrangement that conforms with a structured design. The structured presentation including a collection of records, each of which denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
This and other aspects can include one or more of the following features. The one or more computers can include a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client.
Other embodiments of this aspect include corresponding systems, apparatus, and methods.
In another aspect, a system includes a client device and one or more computers programmed to interact with the client device and to perform operations. The operations include receiving a machine-readable search query from the client device and responding to the search query by sending to the client device instructions for presenting a structured presentation of instances relevant to the search query. A visual presentation of the structured presentation denotes associations between the instances and values that characterize attributes of the instances by virtue of an arrangement of identifiers of the instances and the values. The identifiers of the instances and the values are drawn from two or more documents in an unstructured collection of electronic documents. The electronic document collection being unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent. This and other aspects can include one or more of the following features. The one or more computers can include a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client.
Other embodiments of this aspect include corresponding systems, apparatus, and methods.
Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation, adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation. A visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design. The structured presentation associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
This and other aspects can include one or more of the following features. Adding the identifier of the new instance can include formulating a collection of instance suggestions, providing the instance suggestion collection to a user, and receiving a user selection of the new instance, wherein the new instance is in the collection of instance suggestions. Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation. Formulating the collection of instance suggestions can include identifying a first document in the electronic document collection that includes an identifier of an instance identified in the preexisting structured presentation and that is arranged in accordance with a template, identifying a second document that is arranged in accordance with the template but relevant to a second instance, and including the second instance in the instance suggestion collection. Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include one or more of the following: identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation, locating the new instance in a stored collection of associations of instances with attributes, comparing the characteristics of the preexisting structured presentation with the attributes characterized in the preexisting structured presentation, comparing the attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents, comparing the value of attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents, and comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents. The collection of electronic documents can include the electronic documents available on the Internet. The electronic documents can include web pages. The expanded structured presentation can include a table or a collection of cards. The method can include visually displaying the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Other embodiments of this aspect include corresponding systems, apparatus, and computer programs recorded on computer storage devices, each configured to perform the operations of the methods.
Another aspect of the subject matter described in this specification can be embodied in apparatuses that include one or more machine-readable data storage media storing instructions operable to cause one or more data processing machines to perform operations. The operations include formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, providing the instance suggestion collection to a user, receiving a user selection of a first instance in the collection of instance suggestions, and adding an identifier of the first instance suggestion to a structured presentation. The electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent. A visual presentation of the structured presentation visually presents information in an organized arrangement. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation.
This and other aspects can include one or more of the following features. Formulating the collection of instance suggestions can include one or more of the following: comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection; identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation; identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template, identifying a second document that is arranged in accordance with the template but relevant to the a second instance, and including the second instance in the instance suggestion collection; identifying documents in the electronic document collection that include identifiers of one or more instances in the preexisting structured presentation, identifying additional attributes used to characterize instances in the preexisting structured presentation; comparing values of attributes used to characterize instances in the preexisting structured presentation with values of the instance suggestions; identifying a category of instances that includes instances in the preexisting structured presentation and formulating the collection of instance suggestions using instances in the category of instances; identifying the instance suggestions in a stored collection of associations of instances with attributes; and comparing the attributes characterized in the preexisting structured presentation with the content of the documents in the unstructured electronic document collection. The collection of electronic documents can include the documents available on the Internet. The electronic documents can include web pages. The structured presentation can include a table or a collection of cards. Other embodiments of this aspect include corresponding systems, apparatus, and methods.
Another aspect of the subject matter described in this specification can be embodied in a system that includes a client device and one or more computers programmed to interact with the client device and to perform operations. The operations include receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation, adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation on a display device coupled in data communication with the client device. A visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design. The structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
Other embodiments of this aspect include corresponding computer program products, apparatus, and methods. Another aspect of the subject matter described in this specification can be embodied in a system that includes a client device and one or more computers programmed to interact with the client device and to perform operations. The operations include formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, providing the instance suggestion collection to a user using the client device, receiving a user selection of a first instance in the collection of instance suggestions, and adding an identifier of the first instance suggestion to a structured presentation presented on a display device coupled in data communication with the client device, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement. The electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation. This and other aspects can include one or more of the following features. The one or more computers can include a server operable to interact with the client device through a data communication network. The client device can be operable to interact with the server as a client. The client device can include a personal computer running a web browser. The personal computer can include the display device. Other embodiments of this aspect include corresponding computer program products, apparatus, and methods.
Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation, adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation. A visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
This and other aspects can include one or more of the following features. Adding the identifier of the new attribute can include formulating a collection of attribute suggestions, providing the attribute suggestion collection to a user, and receiving a user selection of the new attribute. The new attribute can be in the collection of instance suggestions. Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation. Formulating the attribute suggestion collection can include identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template and adding an attribute used in the first document to characterize the instance in the attribute suggestion collection. Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include one or more of the following: identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation; identifying the new attribute in a stored collection of associations of instances with attributes; comparing the instances characterized in the preexisting structured presentation with the content of the electronic documents; identifying additional instances related to the instances identified in the preexisting structured presentation; comparing an attribute or a value of an attribute used to characterize an instances in the preexisting structured presentation with the content of the electronic documents; comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents. The collection of electronic documents can include the electronic documents available on the Internet and the electronic documents can include web pages. The expanded structured presentation can include a table or a collection of cards. The method can include visually presenting the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.
Another aspect of the subject matter described in this specification can be embodied in apparatus that include one or more machine-readable data storage media storing instructions operable to cause one or more data processing machines to perform operations. The operations can include formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, providing the attribute suggestion collection to a user, receiving a user selection of a first attribute in the collection of attribute suggestions, and adding an identifier of the first attribute suggestion to a structured presentation. The electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent. A visual presentation of the structured presentation visually presents information in an organized arrangement. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation. This and other aspects can include one or more of the following features.
Formulating the collection of attribute suggestions can include one or more of the following: comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection; identifying documents in the electronic document collection that include structured components that characterize instances identified in the preexisting structured presentation; identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template and including an attribute used to characterize the instance in the attribute suggestion collection; and identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation. Comparing the characteristics of the preexisting structured presentation with the content of the electronic documents can include one or more of the following: comparing instances identified in the preexisting structured presentation with the content of the electronic documents; and comparing an attribute or a value of an attribute used to characterize an instance in the preexisting structured presentation with the content of the electronic documents. Formulating the collection of attribute suggestions can include identifying a category of instances that includes instances in the preexisting structured presentation and formulating the collection of attribute suggestions from attributes used to characterize instances in the category of instances. The collection of attribute suggestions can also be formulated by identifying the attribute suggestions in a stored collection of associations of instances with attributes. The collection of electronic documents can include electronic documents available on the Internet and the electronic documents can include web pages. The structured presentation can include a table or a collection of cards. The operations can also include visually presenting the structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.
Another aspect of the subject matter described in this specification can be embodied in a system that includes a client device comprising a display screen, and one or more computers programmed to interact with the client device and to perform operations. The operations include receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation, adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation, and outputting instructions for presenting the expanded structured presentation on the display screen. A visual presentation of the preexisting structured presentation visually presents information in an systematic arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
Other embodiments of this aspect include corresponding computer program products, apparatus, and methods.
Another aspect of the subject matter described in this specification can be embodied in a system that includes a client device comprising a display screen, and one or more computers programmed to interact with the client device and to perform operations. The operations include formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, providing the attribute suggestion collection to the client device, receiving a selection of a first attribute in the collection of attribute suggestions from the client device, and adding an identifier of the first attribute suggestion to a structured presentation presented on the display screen. The electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent. A visual presentation of the structured presentation visually presents information in an organized arrangement. The structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation.
Other embodiments of this aspect include corresponding computer program products, apparatus, and methods.
Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation, adding the new value to the preexisting structured presentation to form a new structured presentation, and outputting instructions for visually presenting the new structured presentation. A visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation.
This and other aspects can include one or more of the following features. An identifier of a first instance that appears in the structured presentation in a first electronic document can be located and the new value can be extracted from the first electronic document. The adding of the new value can include identifying a collection of values of a first attribute of a first instance and establishing a subset of one or more of the identified values as suitably characterizing the first attribute of the first instance. Establishing the subset of values as suitable can include one or more of the following: grouping the values in the collection into groups; selecting the subset based at least in part on a count of values in the subset; selecting the subset based at least in part on values in the subset meeting a user- specified constraint; selecting the subset based at least in part on a value in the subset being drawn from a high quality document; selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another instance in the preexisting structured presentation, and selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another attribute in the preexisting structured presentation. The collection of electronic documents can be the Internet and the electronic documents can be web pages. The preexisting structured presentation can include a table or a collection of cards. The method can include visually presenting the new structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.
Another aspect of the subject matter described in this specification can be embodied in an apparatus comprising one or more machine-readable data storage media storing instructions operable to cause one or more data processing machines to perform operations. The operations can include receiving description data describing a first instance, a second instance, and a first attribute, extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection, extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection, establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance, establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance, and generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset. The structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values.
This and other aspects can include one or more of the following features. The first subset of values can be established as suitable by grouping the values in the first collection into groups, wherein each group includes a subset of the first collection of values. The first subset of values can be established as suitable by selecting the first subset based at least in part on a count of values in the first subset. The first subset of values can be established as suitable by comparing the values in the first subset with a user- specified constraint on the values. The first subset of values can be established as suitable by determining that a value in the first subset is drawn from a high quality document. The first subset of values can be established as suitable by determining that a value in the first subset is drawn from a document relevant to the second instance. The first subset of values can be established as suitable by determining that a value in the first subset is drawn from a document relevant to another attribute that characterizes both the first instance and the second instance. The description of the first instance can include an identifier of the first instance that appears in a preexisting structured presentation. The description of the second instance can include an identifier of the second instance that appears in the preexisting structured presentation. The description of the first attribute can include a description of a new attribute that is to be added to a preexisting structured presentation. The unstructured electronic document collection can include electronic documents available on the Internet. The structured presentation can be a table or a collection of cards. The structured presentation can be visually presented on a display screen, including physically transforming one or more elements of the display screen. Other embodiments of this aspect include corresponding systems, apparatus, and methods.
Another aspect of the subject matter described in this specification can be embodied in a system that includes a device and one or more computers programmed to interact with the device and to perform operations. The operations include receiving description data describing a preexisting structured presentation, comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation, adding the new value to the preexisting structured presentation to form a new structured presentation, and outputting instructions for visually presenting the new structured presentation to the device. A visual presentation of the preexisting structured presentation visually presents information in an systematic arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation. Other embodiments of this aspect include corresponding computer program products, apparatus, and methods.
Another aspect of the subject matter described in this specification can be embodied in a system that includes a device and one or more computers programmed to interact with the device and to perform operations. The operations include receiving description data describing a first instance, a second instance, and a first attribute, extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection, extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection, establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance, establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance, generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset, wherein the structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values, and sending the machine-readable instructions to the device.
Other embodiments of this aspect include corresponding computer program products, apparatus, and methods. Another aspect of the subject matter described in this specification can be embodied in machine-implemented methods that include the actions of displaying a structured presentation on a display device, receiving data characterizing a user interaction with the displayed structured presentation, the data including a specification of a first instance and a first attribute of the structured presentation, and displaying a formerly concealed search interface on the display device in response to receiving the data. The structured presentation visually presents information in a systematic and structured arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation. The search interface includes information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
This and other aspects include one or more of the following features. Receiving the data characterizing the user interaction with the displayed structured presentation can include receiving a manual user specification of the first instance and the first attribute that are associated with a cell in the structured presentation or receiving data characterizing a user interaction with a cell in the structured presentation. The cell can be associated with the first instance and the first attribute by virtue of the arrangement of the cell relative to identifiers of the first instance and the first attribute in the structured presentation. Receiving data characterizing the user interaction with the cell can also include receiving data characterizing the user interaction with an empty cell. Displaying the formerly concealed search interface can include one or more of the following: displaying an interactive element that can be selected by a user to trigger a search of the electronic document collection to locate the first value; displaying an interactive value entry element that can be selected by a user to specify a value characterizing the first attribute of the first instance; displaying a snippet characterizing a context of the first value in a first document of the electronic document collection; and displaying a result of a prior search of the electronic document collection to locate the first value. The first value can appear in the structured presentation as a value characterizing the first attribute of the first instance. Displaying the formerly concealed search interface can also include displaying an identifier of a first electronic document in the electronic document collection, wherein the first value is drawn from the first electronic document. The method can also include determining that the first electronic document is inoperative to provide the first value and displaying a visual indication of the inoperativeness of the first document. The user can be presented with an option to select the first value consistently from a first document regardless of changes in relevancy of the first document to the first instance and the first attribute or with an option to select the first value from a first document that is most relevant to the first instance and the first attribute. The method of can also include searching an unstructured collection of electronic documents to locate the first value in response to a user interaction with the search interface and adding the first value to the structured presentation. Receiving the specification of the first instance and the first attribute can include receiving a specification of a collection of attributes or a collection of instances. The method can also include updating the display of the structured presentation in response to a passage of a time. Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.
Another aspect of the subject matter described in this specification can be embodied in a system that includes one or more computers programmed to interact with client devices and to perform operations. The operations include receiving data characterizing user interaction specifying a first cell of a structured presentation displayed on a display device, determining that a prior search has been conducted to populate the first cell with a first value, and, in response to determining that a prior search was conducted, displaying information characterizing the prior search on the display device. The structured presentation visually presents information in a systematic and structured arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of the values in cells.
This and other aspects include one or more of the following features. The data characterizing user interaction specifying the first cell can include a manual user specification of the first instance and the first attribute that are associated with the first cell. The information characterizing the prior search can include information identifying an electronic document from which the first value is drawn. The information characterizing the prior search can include one or more of the following: a collection of electronic documents from which the first value could have been drawn; information identifying a first electronic document in the electronic document collection from which the first value is drawn; and a snippet characterizing a context of the first value in a first document of the electronic document collection. The information characterizing the prior search can be displayed, e.g., in a display element of a formerly concealed search interface. The operations can also include determining that the first electronic document is inoperable to provide the first value and displaying a visual indication of the inoperability of the first document. The operations can also include updating a display of a value in the first cell of the structured presentation in response to the user interaction. The collection of electronic documents can include electronic documents available on the Internet. The electronic documents can include web pages. The structured presentation can be a collection of cards.
Other embodiments of this aspect include corresponding computer program products, apparatus, and computer program products.
Another aspect of the subject matter described in this specification can be embodied in a system that includes one or more computers programmed to interact with a client device comprising a display device and to perform operations. The operations include displaying a structured presentation on the display device, receiving data characterizing a user interaction with the displayed structured presentation, and displaying a formerly concealed search interface on the display device in response to receiving the data. The structured presentation visually presents information in a systematic and structured arrangement that conforms with a structured design. The structured presentation denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation. The data includes a specification of a first instance and a first attribute of the structured presentation.
The search interface includes information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
Other embodiments of this aspect include corresponding computer program products, apparatus, and computer program products. The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.
DESCRIPTION OF DRAWINGS
FIG. 1 is a schematic representation of a system in which information from an electronic document collection is presented to a user in a structured presentation.
FIG. 2 is a schematic representation of an implementation of another system in which information from an electronic document collection is presented to a user in a structured presentation.
FIGS. 3, 4, 5 are schematic representations of example structured presentations. FIG. 6 is a flow chart of an example process for presenting information from an electronic document collection to a user in a structured presentation.
FIGS. 7 and 8 are flow charts of example processes for identifying two or more relevant documents in an electronic document collection.
FIG. 9 is a flow chart of a process for suggesting and/or adding new instances to a structured presentation
FIG. 10 is a schematic representation of a user interface component for receiving user input specifying modifications of a structured presentation.
FIG. 11 is schematic representation of a user interface component for receiving user input specifying a technique for adding new instances to a structured presentation. FIG. 12 is schematic representation of a user interface component for receiving user input specifying constraints that are to be used in the user-specified constraint option for adding new instances to a structured presentation.
FIG. 13 is a flow chart of an example process for adding new attributes to a structured presentation. FIG. 14 is schematic representation of a user interface component for adding new attributes to a structured presentation.
FIG. 15 is a flow chart of an example process for adding new attribute values to a structured presentation. FIG. 16 is a flow chart of an example process for adding new attribute values to a structured presentation.
FIG. 17 is a schematic representation of a user interface component for selecting a candidate value to be added to a structured presentation. FIG. 18 a schematic representation of a structured presentation that includes highlights of deficiencies in the attribute values presented therein.
FIG. 19 is a schematic representation of a user interface component for selecting a candidate attribute to be added to a structured presentation.
FIG. 20 is a schematic representation of a user interface component for selecting a candidate instance to be added to a structured presentation.
FIG. 21 is a schematic representation of a process by which new instances can be added to expand a preexisting structured presentation.
FIG. 22 is a flow chart of an example process for adding instances to a structured presentation based on the content of documents in an electronic document collection. FIG. 23 is a flow chart of an example process for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
FIG. 24 is a representation of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
FIG. 25 is a flow chart of an example process for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
FIG. 26 is a representation of a portion of a hypertext markup language template that is used as a pattern for descriptions of a movie.
FIG. 27 is a schematic representation of a process by which a collection of new instance suggestions can be formulated based on information in a preexisting structured presentation.
FIG. 28 is a schematic representation of a table that associates attributes and instances in an electronic document collection.
FIG. 29 is a flow chart of a process for formulating instance suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation. FIG. 30 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation.
FIG. 31 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation. FIG. 32 is a schematic representation of a table that associates attributes, instances, and their values in data collection.
FIG. 33 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation.
FIG. 34 is a representation of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
FIG. 35 is a schematic representation of a collection of processes that can be used to formulate a collection of new instance suggestions based on information in a preexisting structured presentation. FIG. 36 is a flow chart of a process for formulating a collection of new instance suggestions based on information in a preexisting structured presentation.
FIG. 37 is a schematic representation of a process by which new attributes can be added to expand a preexisting structured presentation.
FIG. 38 is a flow chart of an example process for adding attributes to a structured presentation based on the content of documents in an electronic document collection.
FIG. 39 is a flow chart of an example process for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
FIG. 40 is a representation of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
FIG. 41 is a flow chart of an example process for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. FIG. 42 is a representation of a portion of a hypertext markup language (HTML) template that is used as a pattern for descriptions of a movie.
FIG. 43 is a schematic representation of a process by which a collection of new attribute suggestions can be formulated based on information in a preexisting structured presentation. FIG. 44 is a schematic representation of a table that associates attributes and instances in an electronic document collection.
FIG. 45 is a flow chart of a process for formulating attribute suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation.
FIG. 46 is a flow chart of a process for formulating a collection of new attribute suggestions based on information in a preexisting structured presentation.
FIG. 47 is a flow chart of a process for identifying related instances for use in formulating attribute suggestions based on information in a preexisting structured presentation.
FIG. 48 is a flow chart of a process for formulating a collection of new attribute suggestions based on information in a preexisting structured presentation.
FIG. 49 is a representation of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation.
FIG. 50 is a schematic representation of a collection of processes that can be used to formulate a collection of new attribute suggestions based on information in a preexisting structured presentation.
FIG. 51 is a flow chart of a process for formulating a collection of new attribute suggestions based on information in a preexisting structured presentation.
FIG. 52 is a schematic representation of a system in which attribute values drawn from two or more electronic documents in electronic document collection are presented to a user in a structured presentation.
FIG. 53 is a schematic representation of an implementation of system in which attribute values drawn from two or more electronic documents in electronic document collection are presented to a user in a structured presentation.
FIG. 54 is a schematic representation of a table that can associate attributes, values, and instances in an electronic document collection.
FIG. 55 is a flow chart of an example process for presenting attribute values drawn from two or more electronic documents in an electronic document collection to a user in a structured presentation.
FIG. 56 is a flow chart of a process for selecting one or more values for presentation in a structured presentation. FIG. 57 is a flow chart of a process for selecting one or more values for presentation in a structured presentation.
FIG. 58 is a flow chart of an example process for selecting one or more values for presentation in a structured presentation. FIG. 59 is a schematic representation of a circumstance in which attribute values drawn from electronic documents in electronic document collection are presented to a user in a structured presentation.
FIG. 60 is a schematic representation of a process in which both attributes and attribute values are drawn from electronic documents in an electronic document collection and presented to a user in a structured presentation.
FIG. 61 is a flow chart of a process for adding values to a structured presentation based on the content of documents in an electronic document collection.
FIGS. 62, 63, and 64 are schematic representations of structured presentations in which a search interface is concealed. FIGS. 65, 66, 67, 68, 69-70 illustrate display elements in which formerly concealed search interfaces are presented.
FIG. 71 is a flow chart of a process for adding values to a structured presentation by drawing the values from the content of documents in an electronic document collection.
FIGS. 72, 73, and 74 illustrate display elements in which formerly concealed search interfaces presented.
FIG. 75 is a flow chart of a process for adding values to a structured presentation based on the content of documents in an electronic document collection.
Like reference symbols in the various drawings indicate like elements.
DETAILED DESCRIPTION FIG. 1 is a schematic representation of a system 100 in which information from an unstructured electronic document collection 102 is presented to a user in a structured presentation 106. In addition to electronic document collection 102, system 100 includes a display screen 104 and a data communication infrastructure 108. In operation, system 100 extracts information from unstructured collection of electronic documents 102 and presents the extracted information in a structured presentation 106 on display screen 104.
Electronic document collection 102 is unstructured in that the organization of information within individual documents in electronic document collection 102 need not conform with a predetermined structure that can be exploited in the extraction of information. For example, consider three electronic documents in electronic document collection 102, namely, electronic documents 110, 112, 114. Documents 110, 112, 114 were added to collection 102 by three different users who organize the content of their respective electronic documents differently. The users need not collaborate to ensure that information within documents 110, 112, 114 is in a particular format. Moreover, if one user wishes to change the format of document 110, the user can do so without regard for the format of the documents added by the other users. There is no need for the user to inform the other users of the change. Indeed, in some cases, documents can be added to collection 102 by entities who not only fail to collaborate but who are also competitors who are adverse to one another, such as three different car manufacturers or three different sellers of digital cameras.
Regardless of the particular alignment of the entities who add documents to collection 102, there is no formal mechanism for insuring that the information in documents is similarly organized within the documents. Further, there is no formal mechanism for ensuring that the organization of information in each of each document in collection 102 remains unchanged. In contrast, structured presentation 106 is structured and presents information drawn from documents in collection 102 in an organized, systematic arrangement. Thus, the grouping, segmentation, and arrangement of information in structured presentation 106 conforms with a structured design even when the information therein is drawn from different contexts in a diverse set of documents in collection 102. Further, changes to one aspect of the design of structured presentation 106 can be propagated throughout structured presentation 106.
Examples of structured presentations include spreadsheet tables, collections of cards or other records, and other structured presentation formats. Such structured presentations can conform with rules that specify the spatial arrangement of information in the displays, the positioning and identification of various organizational and informational aspects (e.g., column headers, row headers, unit identifiers, and the like) of the structured presentations, the graphical representation of values, and other characteristics.
The structuring of information in structured presentations generally facilitates the understanding of the information by a viewer. For example, a viewer can discern the nature of the information contained within the structured presentation by reading headers. A viewer of can easily identify and compare values described in the structured presentation based on the arrangement and positioning of those values in the display. For example, a user can easily ascertain that certain values in a structured presentation all relate to attributes (i.e., characteristics) of different cars and can easily compare those values. System 100 is not limited to merely populating structured presentation 106 with values drawn from documents in collection 102. Instead, in many implementations, system 100 can determine entities (i.e., "instances") that are to be described in structured presentation 106, values that characterize the attributes of those instances, as well as an appropriate structuring of structured presentation 106. Such determinations can be based on information drawn from different documents in collection 102 that are not restricted to having a specific format, a permanent format, or both. For example, the attributes that appear in structured presentation 106 can be based on the attributes used in documents in collection 102 to characterize certain instances, as discussed further below. As another example, the units of the values (e.g., meters, feet, inches, miles) that appear in structured presentation 106 can be based on the units of the values that appear documents in collection 102. As another example, the instances that appear in structured presentation 106 can be determined based on collections of instances that appear in documents in collection 102.
Further, in many implementations, such information can be drawn from previously unspecified documents in collection 102. For example, a search query can be used to identify documents in collection 102 and the information can be drawn from these documents. There need not be preexisting limits on the identity or type of documents from which information can be drawn. For example, the identified documents need not be limited to being associated with the account of a particular individual or originating from a particular retailer. Instead, the information can be drawn from previously unspecified documents.
System 100 can thus exploit the diverse information content of documents in collection 102 in a variety of different ways to present a structured presentation to a user. In cases where electronic document collection 102 includes a large number of documents, the amount of information that can be exploited can be very large. Moreover, in many cases, this can be done automatically or with a relatively small amount of human interaction, as discussed further below.
FIG. 2 is a schematic representation of an implementation of a system 200 in which information from an unstructured electronic document collection 102 is presented to a user in a structured presentation 106. In system 200, the data communication infrastructure 108 interconnects electronic document collection 102, display screen 104, and a collection of data storage and processing elements, including a search engine 202, a crawler 204, a data center 208, and document compressing, indexing and ranking modules 210.
Search engine 202 is programmed with one or more sets of machine-readable instructions for searching unstructured electronic document collection 102. Search engine 202 can be implemented on one or more computers deployed at one or more geographical locations.
Crawler 204 is programmed with one or more sets of machine-readable instructions for crawling unstructured electronic document collection 102. Crawler 204 can be implemented on one or more computers deployed at more or more geographical locations.
Compressing, indexing, and ranking modules 210 are programmed with one or more sets of machine-readable instructions for compressing, indexing, and ranking documents in collection 102. Compressing, indexing, and ranking modules 210 can be implemented on one or more computers deployed at more or more geographical locations. The data center 208 stores information characterizing electronic documents in electronic document collection 102. The information characterizing such electronic documents can be stored in the form of an indexed database that includes indexed keywords and the locations of documents in collection 102 where the keywords can be found. The indexed database can be formed, e.g., by crawler 204. In some implementations, the information stored in data center 208 can itself be organized to facilitate presentation of structured presentation 106 to a user. For example, information can be organized by crawler 204 and compressing, indexing and ranking modules 210 in anticipation of the need to present structured presentations 106 that are relevant to certain topics. The structure of information in data center 208 can facilitate the grouping, segmentation, and arrangement of information in structured presentations 106. This organization can be based on a variety of different factors. For example, an ontology can be used to organize information stored in data center 208. As another example, a historical record of previous structured presentations 106 can be used to organize information stored in data center 208. As another example, the data tables described herein can be used to organize information stored in data center 208.
As shown, system 200 includes multiple display screens 104 that can present structured presentations in accordance with machine-readable instructions. Display screens 104 can include, e.g., cathode ray tubes (CRT's), light emitting diode (LED) screens, liquid crystal displays (LCD's), gas-plasma displays, and the like. Display screens 104 can be an integral part of a self-contained data processing system, e.g., a personal data assistant (PDA) 215, a desktop computer 217, or a mobile telephone. In general, instructions for presenting structured presentations are modified to the particularities of a display screen 104 after receipt by such a self-contained data processing system. However, this is not always the case. For example, display screens 104 can also be part of more disperse systems where the processing of instructions for presenting a structured presentation is completed before the instructions are received at display screen 104. For example, display screens 104 can be incorporated into "dumb" devices, e.g., television sets or computer monitors, that receive instructions for presenting structured presentation 106 from a local or remote source. In operation, system 200 can transform the unstructured information in collection 102 into structured presentation 106 that is presented to a viewer. Such transformations can be performed in the context of web search in which a search engine receives and responds to information requests based on information extracted from the electronic documents in collection 102. For example, personal data assistant (PDA) 215 or desktop computer 217 can interact with a user and thereby receive a search query, e.g., by way of a web browser application. A description 212 of the query can be transmitted over a wireless data link 219 and/or a wired data link 221 to search engine 202. In response, search engine 202 can use query description 212 to identify information in data center 208 that can be used in presenting structured presentation 106 on display screen 104. The identified information can be drawn from two or more unspecified electronic documents in unstructured electronic document collection 102. In some instances, query description 212 can include search terms that are used by search engine 202 to retrieve information for presenting a structured presentation 106 to a user. For example, search terms in query description 212 can be used to identify, in data center 208, a collection of related instances, attributes that characterize such instances, value that characterize the individual instances, and/or other aspects of structured presentation 106. The search engine 202 can also generate a response 214 to query description 212. The response 214 can be used to present structured presentation 106 for a user. In general, response 214 includes machine readable-instructions that can be interpreted by a data processing device in systems 215, 217 to present structured presentation 106. For example, response 214 can be coded in HTML to specify the characteristics and content of structured presentation 106. In other implementations, response 214 can include text snippets or other information from data center 208 that is used in presenting structured presentation 106. For example, response 214 can include a collection of values, the name of a new attribute, or an estimate of the likelihood that a value to be displayed in structured presentation 106 is correct, as discussed further below.
In many cases, system 200 uses the information stored in data center 208 to identify the location of one or more documents that are relevant to the query described in query description 212. For example, search engine 202 can compare the keywords in query description 212 to an index of keywords stored in data center 208. The comparison can be used to identify documents in collection 102 that are relevant to query description 212. The locations of such identified documents can be included in responses 214, e.g., as a hyperlink to the documents that are that are responsive to the described query. In some implementations, the system 200 can store attributes and/or their respective values in a manner that facilitates the grouping, segmentation, and arrangement of information in structured presentations 106. For example, collections of instances, their attributes, and their values can be stored in data center 208 as structured presentations 106 are amended and changed by users interacting with client systems such as systems 215, 217. For example, instances, attributes, and values in one structured presentation 106 presented to a first viewer can be stored in the data center 208 and used in providing subsequent structured presentations 106 to other viewers.
FIG. 3 is a schematic representation of an example structured presentation 106, namely, one that includes a table 300. Table 300 is an organized, systematic arrangement of one or more identifiers of instances, as well as the values of particular attributes of those instances. Instances are individually identifiable entities and generally share at least some common attributes. An attribute is a property, feature, or characteristic of an entity. For example, Tom, Dick, and Harry are instances of individuals. Each such individual has attributes such as a name, a height, a weight, and the like. As another example, city instances each have a geographic location, a mayor, and a population. As yet another example, a product instance can have a model name, a maker, and a year.
The attributes of an instance can be characterized by values. The values of a particular attribute of a particular instance thus characterize that particular instance. For example, the name of an individual can have the value "Tom," the population of a city can have the value "4 million," and the model name of a product can have the value "Wrangler." In some implementations, structured presentations such as table 300 can also include identifiers of attributes, as well as identifiers of the units in which values are expressed. The grouping, segmentation, and arrangement of information in table 300 can be selected to facilitate understanding of the information by a user. In this regard, table 300 includes a collection of rows 302. Each row 302 includes an instance identifier 306 and a collection of associated attribute values 307. The arrangement and positioning of attribute values 307 and instance identifiers 306 in rows 302 thus graphically represents the associations between them. For example, a user can discern the association between attribute values 307 and the instance identifier 306 that is found in the same row 302. Table 300 also includes a collection of columns 304. Each column 304 includes an attribute identifier 308 and a collection of associated attribute values 307. The arrangement and positioning of attribute values 307 and attribute identifier 308 in columns 304 thus graphically represent the associations between them. For example, a user can discern the association between attribute values 307 and the attribute identifier 308 that is found in the same column 304 based on their alignment.
Each row 302 is a structured record 310 in that each row 302 associates a single instance identifier 306 with a collection of associated attribute values 307. Further, the arrangement and positioning used to denote these associations in one structured record 310 is reproduced in other structured records 310 (i.e., in other rows 302). Indeed, in many cases, all of the structured records 310 in a structured presentation 106 are restricted to having the same arrangement and positioning of information. For example, values 307 of the attribute "ATTR_2" are restricted to appearing in the same column 304 in all rows 302. As another example, attribute identifiers 308 all bear the same spatial relationship to the values 307 appearing in the same column 304. Moreover, changes to the arrangement and positioning of information in one structured record 310 are generally propagated to other structured record 310 in the structured presentation 106. For example, if a new attribute value 307 that characterizes a new attribute (e.g., "ATTR_2%") is added to one structured record 310, then a new column 304 is added to structured presentation 106 so that the values of attribute "ATTR_2W of all instances can be added to structured presentation 106.
In some implementations, values 307 in table 300 can be presented in certain units of measure. Examples of units of measure include feet, yards, inches, miles, seconds, gallons, liters, degrees Celsius, and the like. In some instances, the units of measure in which values 307 are presented are indicated by unit identifiers 309. Unit identifiers 309 can appear, e.g., beside values 307 and/or beside relevant attribute identifiers 308. The association between unit identifiers 309 and the values 307 whose units of measure are indicated is indicated to a viewer by such positioning. In many cases, all of the values 307 associated with a single attribute (e.g., all of the values 307 in a single column 304) are restricted to being presented in the same unit of measure. The information extracted from electronic document collection 102 by systems 100,
200 can impact the presentation of table 300 to a user in a variety of different ways. For example, the information extracted from electronic document collection 102 can be used to determine values 307 for populating table 300. As another example, the information extracted from electronic document collection 102 can be used to suggest new attributes and/or new instances for addition to table 300.
In some implementations, instance identifiers 306 can be selected based on one or more search strings. For example, if the search string "hybrid vehicles" is received from a user by search engine 202, systems such as system 200 can generate and populate table 300 based on information extracted from electronic document collection 102 using the search string. For example, system 200 can access data center 208, identify instance identifiers 306 in the electronic documents that are relevant to the search string, determine a set of common attributes for the identified instances — as well as identifiers 308 of those attributes and values 307 for those attributes. In effect, system 200 can determine instance identifiers 306, attribute identifiers 308, as well as the associated values 307 based on the received search string.
In some implementations, one or more attribute identifiers 308, instance identifiers 306, and/or values 307 can be received from a user for whom table 300 is to be displayed. As discussed further below, systems such as system 200 can generate and populate table 300 based on information extracted from electronic document collection 102 using one or more received attribute identifiers 308, instance identifiers 306, and/or values 307. In effect, system 200 can formulate new instance identifiers 306, attribute identifiers 308, as well as the associated values 307 based on the received attribute identifiers 308, instance identifiers 306, and/or values 307.
FIG. 4 is a schematic representation of another implementation of a structured presentation, namely, one that includes a table 400. In addition to including attribute identifiers 308, instance identifiers 306, values 307, unit identifiers 309 organized into rows 302 and columns 304, table 400 also includes a number of interactive elements for interacting with a user. In particular, table 400 includes a collection of instance selection widgets 405, a collection of action triggers 410, a collection of column action trigger widgets 415, and a notes column 420.
Instance selection widgets 405 are user interface components that allow a user to select structured records 310 in table 400. For example, instance selection widgets 405 can be a collection of clickable checkboxes that are associated with a particular structured record 310 by virtue of arrangement and positioning relative to that structured record 310. Instance selection widgets 405 are "clickable" in that a user can interact with widgets 405 using a mouse (e.g., hovering over the component and clicking a particular mouse button), a stylus (e.g., pressing a user interface component displayed on a touch screen with the stylus), a keyboard, or other input device to invoke the functionality provided by that component. Action triggers 410 are user interface components that allow a user to trigger the performance of an action on one or more structured records 310 in table 400 selected using instance selection widgets 405. For example, action triggers 410 can be clickable text phrases, each of which can be used by a user to trigger an action described in the phrase. For example, a "keep and remove others" action trigger 410 triggers the removal of structured records 310 that are not selected using instance selection widgets 405 from the display of table 400. As another example, a "remove selected" action trigger 410 triggers the removal of structured records 310 that are selected using instance selection widgets 405 from the display of table 400. As yet another example, a "show on map" action trigger 410 triggers display of the position of structured records 310 that are selected using instance selection widgets 405 on a geographic map. For example, if a selected instance is a car, locations of car dealerships that sell the selected car can be displayed on a map. As another example, if the selected instances are spring break destinations, these destinations can be displayed on a map.
Column action trigger widgets 415 are user interface components that allow a user to apply an action to all of the cells within a single column 304. When a user interacts with the clickable '+' sign, a further user interface component is displayed which offers to the user a set of possible actions to be performed. The actions in this set can include, e.g., removing the entire column 304 from the structured presentation 400 or a search to find values for all the cells in column 304 which are currently blank.
Notes column 420 is a user interface component that allows a user to associate information with an instance identifier 306. In particular, notes column 420 includes one or more notes 425 that are each associated with a structured record 310 by virtue of arrangement and positioning relative to that structured record 310. The information content of notes 425 is unrestricted in that, unlike columns 304, notes 425 are not alleged to be values of any particular attribute. Instead, the information in notes 425 can characterize unrelated aspects of the instance identified in structured record 310. In some implementations, table 400 can include additional information other than values of any particular attribute. For example, table 400 can include a collection of images 430 that are associated with the instance identified in a structured record 310 by virtue of arrangement and positioning relative to that structured record 310. As another example, table
400 can include a collection of text snippets 435 extracted from electronic documents in collection 102. The sources of the snippets can be highly ranked results in searches conducted using instance identifiers 306 as a search string. Text snippets 435 are associated with the instance identified in a structured record 310 by virtue of arrangement and positioning relative to that structured record 310. As another example, table 400 can include one or more hypertext links 440 to individual electronic documents in collection 102. For example, the linked documents can be highly ranked results in searches conducted using instance identifiers 306 as a search string. As another example, the linked documents can be source of a value 307 that was extracted to populate table 400. In some instances, interaction with hypertext link 440 can trigger navigation to the source electronic document based on information embedded in hypertext link 440 (e.g., a web site address).
FIG. 5 is a schematic representation of another implementation of a structured presentation, namely, a collection of cards 500. Card collection 500 is an organized, systematic arrangement of one or more identifiers of instances, as well as the values of particular attributes of those instances. The attributes of an instance can be specified by values. Moreover, card collection 500 generally includes identifiers of attributes, as well as identifiers of the units in which values are expressed, where appropriate.
The grouping, segmentation, and arrangement of information in card collection 500 can be selected to facilitate an understanding of the information by a user. In this regard, card collection 500 includes a collection of cards 502. Each card 502 includes an instance identifier 306 and a collection of associated attribute values 307. The arrangement and positioning of attribute values 307 and instance identifiers 306 in cards 502 thus graphically represents the associations between them. For example, a user can discern the association between attribute values 307 and the instance identifier 306 that is found on the same card 502.
In the illustrated implementation, cards 502 in card collection 500 also include a collection of attribute identifiers 308. Attribute identifiers 308 are organized in a column 504 and attribute values 307 are organized in a column 506. Columns 504, 506 are positioned adjacent one another and aligned so that individual attribute identifiers 308 are positioned next to the attribute value 307 that characterizes that identified attribute. This positioning and arrangement allows a viewer to discern the association between attribute identifiers 308 and the attribute values 307 that characterize those attributes.
Each card 502 is a structured record 310 in that each card 502 associates a single instance identifier 306 with a collection of associated attribute values 307. Further, the arrangement and positioning used to denote these associations in one card 502 is reproduced in other cards 502. Indeed, in many cases, all of the cards 502 are restricted to having the same arrangement and positioning of information. For example, the value 307 that characterizes the attribute "ATTR_1" is restricted to bearing the same spatial relationship to instance identifiers 306 in all cards 502. As another example, the order and positioning of attribute identifiers 308 in all of the cards 502 is the same.
Moreover, changes to the arrangement and positioning of information in one card 502 are generally propagated to other cards 502 in card collection 500. For example, if a new attribute value 307 that characterizes a new attribute (e.g., "ATTR_1 W) is inserted between the attribute values "value_l_l" and "value_2_l" in one card 502, then the positioning of the corresponding attribute values 307 in other cards 502 is likewise changed.
In some implementations, cards 502 in card collection 500 can include other features. For example, cards 502 can include interactive elements for interacting with a user, e.g., instance selection widgets, action triggers, attribute selection widgets, a notes entry, and the like. As another example, cards 502 in card collection 500 can include additional information other than values of any particular attribute, e.g., images and/or text snippets that are associated with an identified instance. As another example, cards 502 in card collection 500 can include one or more hypertext links to individual electronic documents in collection 102. Such features can be associated with particular instances by virtue of appearing on a card 502 that includes an instance identifier 306 that identifies that instance.
During operation, a viewer can interact with the system presenting card collection 500 to change the display of one or more cards 502. For example, a viewer can trigger the side- by- side display of two or more of the cards 502 so that a comparison of the particular instances identified on those cards is facilitated. As another example, a viewer can trigger a reordering of card 502, an end to the display of a particular card 502, or the like. As another example, a viewer can trigger the selection, change, addition, and/or deletion of attributes and/or instances displayed in cards 502. As yet another example, a viewer can trigger a sorting of cards into multiple piles according to, e.g., the values of an attribute values 307 in the cards. In some implementations, cards 502 will be displayed with two "sides." For example, a first side can include a graphic representation of the instance identified by instance identifier 306, while a second side can include instance identifier 306 and values 307. This can be useful, for example, if the user is searching for a particular card in the collection of cards 500, allowing the user to identify the particular card with a cursory review of the graphical representations on the first side of the cards 502.
FIG. 6 is a flow chart of an example process 600 for presenting information from an electronic document collection to a user in a structured presentation. Process 600 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 600 can be performed by the search engine 202 in system 200. In some implementations, process 600 can be performed in response to the receipt of a trigger, e.g., a user request, to create or change a structured presentation. The system performing process 600 can identify two or more responsive electronic documents in the electronic document collection (step 605). The responsive documents can be identified in a number of different ways. In some instances, documents are identified based on "new" information — e.g., a new search query — received from viewer. For example, the system can compare a newly received search query with the content of the electronic documents in the electronic document collection using string comparisons. As another example, the system can access a data center such as data center 208 and compare the terms in a search query with an index of keywords to identify the location of responsive electronic documents.
In some instances, documents are identified based on "old" information that is already found in a structured presentation. Among the information found in a structured presentation are the identities of instances, attributes, values, and the units in which the values are represented. The system performing process 600 can use this old information to identify responsive electronic documents in the electronic document collection. For example, documents that include instances already found in a structured presentation can be identified as responsive. As another example, documents that characterize instances using attributes already found in a structured presentation can be identified as responsive. Additional examples of such identifications are discussed further below.
The system performing process 600 can also gather information from the identified electronic documents (step 610). The gathered information can regard one or more instances, attributes, and/or values. The system performing process 600 can gather this information directly from the documents in an electronic document collection or from previously assembled collections of information that characterize the electronic documents in an electronic document collection. For example, in the context of system 200 (FIG. 2), the system performing process 600 can locate documents in collection 102, access the located documents, and extract the information directly from the original documents in collection 102. As another example in the context of system 200 (FIG. 2), the system performing process 600 can access a collection of information in data center 208 and gather the information from, e.g., a database that includes an index of keywords and the location of documents that include those keywords, an ontology, and/or a historical record of previous structured presentations that were presented using information extracted from documents in collection 102.
The system performing process 600 can use the gathered information to provide instructions for presenting structured presentations based on the gathered information (step 615). For example, the system performing process 600 can generate machine-readable instructions for presenting a structured presentation, e.g., tables 300, 400 or collection of cards 500.
FIG. 7 is a flow chart of an example process 700 for identifying responsive documents in an electronic document collection. Process 700 can be performed in isolation or in conjunction with other data processing activities. For example, process 700 can be performed at step 605 in process 600 (FIG. 6).
The system performing process 700 receives a search query (step 705). For example, the system can receive one or more search strings (e.g., "hybrid vehicles") from a user. As another example, the system can receive a search string from another process or system. In some implementations, the search string is received through an application programming interface (API), a common gateway interface (CGI) script, or other programming interfaces. In other implementations, the search string is received through a web portal, a web page, or web site, or the like.
In response, the system performing process 700 identifies two or more documents that contain instances, attributes, and/or values that are responsive to the search query (step 710). The documents can be identified by classifying the role that terms in the search query are to play in a structured presentation. For example, the terms in a search query can be classified as a categorization of the instances that are to appear in a structured presentation based on, e.g., the particular terms in the search query, an express indication by the user as to how search query terms are to be classified, and/or the context of the search. By way of example, the terms in a search query "cities in California" can be classified as a categorization of instances such as "San Diego," "Los Angeles," and "Bakersfield" due to the plural term "cities" being characterized by an attribute, namely, being "in California." As another example, the terms in a search query "Ivy League schools" can be classified as categorization of instances (such as "Cornell," "Columbia," and "Brown") due to the plural term "cities" being characterized by an attribute "Ivy League."
In some cases, additional information must be used to classify the terms in a search query. For example, the search query "Ivy League" can reasonably be taken as a categorization of school instances or as an example instance of the category "athletic conferences" which includes instances such as "Atlantic Coast Conference" and "PAC-10." In such cases, the terms can be classified, e.g., based on an express indication by the user as to how they are to be classified or based on the context of the terms in a search session. For example, if a user had previously entered the phrases "Atlantic Coast Conference" and "PAC- 10" as search queries, the search query "Ivy League" can be taken as an example instance that is to appear in a structured presentation alongside those other instances.
The documents can be identified either directly in electronic document collection 102 or indirectly based on information in electronic data center 208. Such identifying information can include, e.g., the URL where the document was found the last time it was crawled. FIG. 8 is a flow chart of another example process 800 for identifying two or more responsive documents in an electronic document collection. Process 800 can be performed in isolation or in conjunction with other data processing activities. For example, process 800 can be performed at step 605 in process 600 (FIG. 6). As another example, process 800 can be performed in conjunction with process 700 at step 605 in process 600 (FIG. 6). For example, processes 700, 800 can be part of an iterative, interactive process in which a search query is received and used to identify a first collection of responsive documents, a first structured presentation that includes content drawn from the identified documents is presented to a user, user modifications are received, and a description of the modified structured presentation is used to identify a second collection of relevant documents. In some implementations, process 800 can be performed several times. In some implementations, process 800 can be performed without user input, e.g., by crawler 206 in system 200 (FIG. 2).
The system performing process 800 receives a description of existing content of a structured presentation (step 805). In particular, the system can receive a description of the instances, the attributes, the values, and/or the units in which values are presented in an existing structured presentation. The description can include, e.g., identifiers of the instances and the attributes and/or ranges of the values of the attributes. The description can also include a categorization of the instances and/or attributes. Such a categorization can be determined, e.g., using an ontology or based on a categorization assigned by a viewer to a structured presentation. For example, if a user entitles a structured presentation "Ivy League Schools," then this title can be taken as a categorization of the instances in that structured presentation.
In response, the system performing process 800 can identify one or more documents that contain instances, attributes, and/or values that are relevant to the existing content (step 810). For example, the system can compare the identifiers of instances and/or attributes to indexed keywords to determine if particular documents contains one or more of the instances and/or attributes that already appear in the existing content of a structured presentation. As another example, the system can identify new instances, their attributes, and the values of such attributes from such documents, compare these values to values that already appear in the existing content of a structured presentation, and determine whether the new instances are potentially relevant to the to the existing content of the structured presentation.
The documents can be identified either directly in electronic document collection 102 or using identifying information in electronic data center 208. Such identifying information can include, e.g., the memory location where the document was found the last time it was crawled.
FIG. 9 is a flow chart of a process 900 for suggesting and/or adding new instances to a structured presentation. Process 900 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. These digital data processing devices can interact with a user over input and output devices, such as keyboards, mice, touchscreens, displays screens, and the like. For example, in the context of system 200 (FIG. 2), user interaction in process 900 can be performed at clients such and PDA 215 or desktop computer 217.
Process 900 can be performed alone or in conjunction with other data processing activities. For example, as discussed further below, process 900 can be performed in conjunction with various processes for formulating instance suggestions for addition to a preexisting structured presentation. Examples of such formulation processes are described in FIGS. 21-26 and the associated text. In general, process 900 will be performed by multiple digital data processing devices. For example, in the context of system 200 (FIG. 2), activities for formulating instance suggestions can be performed at search engine 202 while user interaction can occur at clients such and PDA 215 or desktop computer 217 (FIG. 2).
The system performing process 900 can receive a new instance trigger (step 905). A new instance is an instance that is not currently displayed in a structured presentation, e.g., structured presentation 106 (FIG. 1). A new instance trigger is an event that activates the processes for adding a new instance to a structured presentation. For example, a new instance can be triggered by user input received over a mouse, stylus, keyboard, or the like. In other implementations, a new instance can be triggered by another process or system. A new instance trigger can be received through inter-process communication or an application's message handler, to name two examples. The system performing process 900 can present, to a user, options for adding new instances to a structured presentation (step 910). Options are alternative approaches for adding new instances. Example options include fully automatic options, automatic options with user-specified constraints, and manual options. These options are discussed in further detail below. The system performing process 900 can present options to a user using a user interface device, e.g., a display screen. In many cases, the display screen that presents the options can be the same display screen that presents the structured presentation to which the instances are to be added. For example, options can be presented to a user using a display screen 104 (FIG. 1). The system performing process 900 can receive user selection of an option (step 915).
The user selection can be received using one or more input devices, e.g., a keyboard, touchpad, or touchscreen. The system can also determine the nature of the option selected by the user (step 920).
If the system performing process 900 determines that the user has selected an "automatic option," then the system can suggest and/or add additional instances to the structured presentation automatically, without interaction with a user.
In one implementation of a user-specified automatic option, the new instances can be suggested and/or added based on the characteristics of the structured presentation (step 925). Examples of such characteristics include the nature of the instances already specified in the structured presentation, categorizations of those instances, and the attributes of those instances. Approaches for formulating new instances based on such characteristics are described in FIGS. 21-26 and the associated text. For example, as described therein, search queries can be constructed using attribute identifiers drawn from the preexisting structured presentation, attribute values drawn from the preexisting structured presentation, and/or combinations thereof. These search queries can be used to identify instances for addition to the structured presentation using string comparisons or other matching techniques.
If the system performing process 900 determines that the user has selected an "user- specified constraint" option, then the system can suggest and/or add additional instances to the structured presentation automatically based on user-specified constraints on the nature of the additional instances. The constraints can be expressed as one or more parameters that characterize the suggested and/or added instances. For example, the constraints can be expressed as the acceptable value of an attribute of the instances or as a range of acceptable values of an attribute. 5 In one implementation of a user-specified constraint option, the system performing process 900 presents a user with options for constraining values of attributes of new instances (step 930). For example, the system can display a list of attributes that characterize the instances in a structured presentation as well as input fields that allow a user to input constraints on the values of those attributes. Often, the attributes in such a list also appear in o the structured presentation to which the new instances are to be added. However, in some implementations, the attributes in such a list can be formulated based on the attributes used to characterize the instances elsewhere, such as in the documents of an electronic document collection. Example approaches for formulating such attributes are described in FIGS. 37-51 and the associated text. 5 The system performing process 900 can also receive a user specification of one or more constraints on the values of attributes of the new instances (step 935). As discussed above, the constraints can limit the values of one or more attributes to a specific value or to a range of values. For example, one attribute that characterizes cars is "number of cylinders." A user specified constraint of the values of this attribute can limit the number of cylinders of0 new car instances to a specific value (e.g., "six") or to a range of values (e.g., "six to eight" or "more than six").
The system performing process 900 can also suggest and/or add new instances based on the user- specified constraints and on characteristics of the structured presentation (step 940). Examples of characteristics of a structured presentation include the nature of the5 instances already specified in the structured presentation, categorizations of those instances, and the attributes of those instances. Approaches for formulating new instances based on such characteristics are described in FIGS. 37-51 and the associated text. As another example, search queries can be constructed using attribute identifiers drawn from the preexisting structured presentation, attribute values drawn from the preexisting structured0 presentation, and/or combinations, as well as the constraints specified by a user. These search queries can be used to identify instances using string comparisons or other matching techniques. The identified instances can then be suggested and/or added to the structured presentation. If the system performing process 900 determines that the user has selected a "manual option," then the system can add additional instances to the structured presentation under the direction of a user.
In one implementation of a manual option, the system performing process 900 can receive a new instance from the user (step 945). For example, the user can input an instance name using a keyboard or other user input device. The system performing process 900 can add the new instance to the structured presentation (step 950). In general, the name of a new instance can be added directly to the structured presentation as instance identifier 306 in a new structured record 310. In some implementations, the new structured record 310 can be a new row 302 (FIGS. 3, 4) or a new card 502 (FIG. 5).
In some implementations, the system performing process 900 can also perform additional operations based on the received new instance. For example, the system can use a new instance to refine the set of suggested instances or a set of suggested attributes.
FIG. 10 is a schematic representation of a user interface component 1000 for receiving user input specifying modifications of a structured presentation. For example, user interface component 1000 can be used to receive a new instance trigger at step 905 in process 900 (FIG. 9).
User interface component 1000 includes an attribute modification region 1005 and an instance modification region 1010. Attribute modification region 1005 includes a header 1015, a collection 1020 of attribute identifiers 1025, each of which is associated with an attribute identifier selection widget 1030, and a new attribute addition trigger 1035.
Header 1015 includes text or other information that identifies that user interaction with attribute modification region 1005 will indeed allow the user to modify attributes. Attribute identifiers 1025 are text or other information that identifies attributes to be included in a structured presentation. For example, attribute identifiers 1025 can be the same text that appears as attribute identifiers 308 in structured presentations 300, 400, 500 (FIGS. 3, 4, 5). Attribute identifier selection widget 1030 is an interactive display element that allows users to select and deselect attributes for display in structured presentations. For example, in collection 1020, each attribute identifier selection widget 1030 is associated with a single attribute identifier 1025 by virtue of their arrangement and positioning adjacent one another. Attribute identifier selection widgets 1030 can indicate whether an attribute identifier 1025 is selected or deselected for display using one or more graphical indicia, e.g., the checks and coloring shown. For example, if a user interacts with the checked attribute identifier selection widget 1030 associated with attribute identifier 1025 "Attribute_l," the color and checked status in attribute identifier selection widget 1030 is changed and the removal of an attribute identifier associated with "Attribute_l" (as well as the values corresponding to "Attribute_l") from a structured presentation is triggered.
New attribute addition trigger 1035 is an interactive display element by which a user can trigger the addition of a new attribute to a structured presentation. The formulation of new attributes for addition is described in FIGS. 37-51 and the associated text. The addition of new attributes is also described in more detail below, e.g., in FIGS. 13-15.
Instance modification region 1010 includes a new instance addition trigger 1040 and an instance filter trigger 1045. New instance addition trigger 1040 is an interactive display element by which a user can trigger the addition of a new instance to a structured presentation. For example, new instance addition trigger 1040 can be used at step 905 in process 900 (FIG. 9).
Instance filter trigger 1045 is an interactive display element by which a user can trigger the filtering of instances in a structured presentation. Filtering instances yields a collection of instances that satisfy one or more criteria. For example, filtering can yield a collection of instances that have certain values, or values within a designated range. Filtering can thus reduce the number of instances to be included in a structured presentation.
The filtering triggered by instance filter trigger 1045 can include the presentation of a user interface component that allows a user to specify one or more filtering criteria and modifying a structured presentation so that instances which fail to meet the criteria are not displayed.
In some implementations, user interface component 1000 can respond dynamically to modifications made by a user using user interface component 1000 or otherwise. For example, if the user triggers and adds a new attribute to a structured presentation, an identifier of that new attribute can be added to collection 1020 and presented in user interface component 1000. For example, if the user adds "Attribute_9" to the structured presentation, the attribute identifier "Attribute_9" can be added to user interface component 1000 with an associated action trigger 1030.
FIG. 11 is schematic representation of a user interface component 1100 for receiving user input specifying a technique for adding new instances to a structured presentation. For example, user interface component 1100 can be used to present options for adding new instances to a structured presentation at step 910 and to receive a user selection of a option at step 915 in process 900 (FIG. 9). User interface component 1100 includes a header 1105, a prompt 1110, a collection of descriptions of techniques for adding new instances to a structured presentation 1115, 1120, 1125, each of which is associated with a selection widget 1130, 1135, 1140.
Header 1105 includes text or other information that identifies that user interaction with user interface component 1100 will indeed allow the user to specify a technique for adding new instances. Prompt 1110 prompts a user to interact with user interface component 1100 to specify a technique for adding new instances.
Description 1115 describes that user specification of this technique will result in new instances being added by a user-specified constraint option. User interaction with selection widget 1130 allows a user to specify the user- specified constraint option described by description 1115.
Description 1120 describes that user specification of this technique will result in new instances being added by a user-specified constraint option. Description 1120 includes a constraint addition widget 1145 and a constraint clear widget 1150. User interaction with constraint addition widget 1145 triggers the addition of new constraint that is to be used in the user-specified constraint option. User interaction with constraint clear widget 1150 clears all current constraints. User interaction with selection widget 1135 allows a user to specify the user- specified constraint option described by description 1120.
Description 1125 describes that user specification of this technique will result in new instances being added by a manual option. Description 1125 includes a new instance identifier input field 1155. User interaction with new instance identifier input field 1155 allows a user to identify a new instance, e.g., by name. User interaction with selection widget 1140 allows a user to specify the manual option described by description 1125.
FIG. 12 is schematic representation of a user interface component 1200 for receiving user input specifying constraints that are to be used in the user-specified constraint option for adding new instances to a structured presentation. User interface component 1200 can be used in isolation (e.g., on a dedicated window or portal) or in conjunction with other user interface component. For example, user interface component 1200 can be inserted into user interface component 1100 immediately below technique description 1120 (FIG. 11). For example, user interface component 1200 can be used to present options for specifying values of attributes of new instances that are to be added to a structured presentation at step 930 and to receive a user specification of such values of attributes at step 935 in process 900 (FIG. 9).
User interface component 1200 includes a collection of one or more attribute selection widgets 1205, 1210, each of which is associated with a value specification region 1215, 1220. Attribute selection widgets 1205, 1210 are interactive display elements that allow a user to select an attribute whose values are to be constrained. In the illustrated implementation, each attribute selection widget 1205, 1210 is drop-down box widget that lists identifiers of attributes. In some implementations, the listed attribute identifiers can be identical to the attribute identifiers 308 in a structured presentation to which the new instance is to be added.
Value specification regions 1215, 1220 are interactive display elements that allow a user to specify one or more constraints on the value of the attribute identified in the respective of attribute selection widgets 1205, 1210. In the illustrated implementation, value specification region 1215 includes a pair of text entry fields 1225 that allow a user to specify an acceptable range of values of the attribute identified in attribute selection widget 1205. Value specification region 1220 includes a collection of interactive check boxes 1230 that allow a user to specify an acceptable value of the attribute identified in attribute selection widget 1210.
In operation, user selection of a particular attribute identifier using an attribute selection widget 1205, 1210 can trigger a change in the associated value specification region 1215, 1220. For example, the nature of any interactive elements and the values and/or ranges that can be specified in the associated value specification region 1215, 1220 can be changed. In some implementations, these changes can be based on the distribution of values of such attributes in the structured presentation to which the new instance is to be added. For example, if only four values of the attribute "maker" appear in the structured presentation, these same four values can be presented for specification in the associated value specification region. In other implementations, the changes to the associated value specification region 1215, 1220 can be based on the values of the attribute that characterize similar instances in an electronic document collection 102. For example, the attribute "maker" of instances of cars may be characterized in documents in electronic document collection 102 using a wider variety of values. These values can be identified and presented for specification in the associated value specification region.
FIG. 13 is a flow chart of an example process 1300 for adding new attributes to a structured presentation. Process 1300 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. These digital data processing devices can interact with a user over input and output devices, e.g., keyboards, mice, touchscreens, displays screens, and the like. For example, in the context of system 200 (FIG. 2), user interaction in process 1300 can be performed at clients such and
PDA 215 or desktop computer 217. Process 1300 can be performed alone or in conjunction with other data processing activities. For example, as discussed further below, process 1300 can be performed in conjunction with various processes for formulating attribute suggestions for addition to a preexisting structured presentation. Examples of such formulation processes are described in FIGS. 37-51 and the associated text and in FIGS. 21-26 and the associated text. In general, process 1300 will be performed by multiple digital data processing devices. For example, in the context of system 200 (FIG. 2), activities for formulating attribute suggestions can be performed at search engine 202 while user interaction can occur at clients such and PDA 215 or desktop computer 217 (FIG. 2). The system performing process 1300 can receive a new attribute trigger (step 1305).
A new attribute is an attribute that is not currently displayed in a structured presentation, e.g., structured presentation 106 (FIG. 1). A new attribute trigger is an event that activates the processes for adding a new attribute to a structured presentation. For example, a new attribute can be triggered by user input received over a mouse, stylus, keyboard, or the like. In other implementations, a new attribute can be triggered by another process or system. A new attribute trigger can be received through inter-process communication or an application's message handler, to name two examples. For example, in some implementations, the system can receive a new attribute trigger from the user interface component 1000 through user selection of new attribute addition trigger 1035 (FIG. 10). The system performing process 1300 can present options for specifying new attributes
(step 1310). For example, the system can display a list of new attributes that are used to characterize the instances in a structured presentation as well as interactive display elements that allow a user select one or more of those attributes. In some implementations, the attributes in such a list can be formulated based on the attributes used to characterize the instances elsewhere, such as in the documents of an electronic document collection. Example approaches for formulating such attributes are described in FIGS. 37-51 and the associated text.
The system performing process 1300 can receive a specification of a new attribute from a user (step 1315). The specification of an attribute can characterize traits or characteristics of the new attribute, including, e.g., the name or other identifier of the new attribute, keywords associated with the new attribute, trustworthy sources of information regarding the new attribute, and the like. The specification of an attribute can be received from the user over one or more input devices, e.g., a keyboard, touchpad, or touchscreen. The system performing process 1300 can add the specified new attributes to a structured presentation (step 1320). For example, the system performing process 1300 can add a new attribute identifier 308 and column 304 to tables 300, 400 (FIGS. 3, 4). As another example, the system can add a new attribute identifier 308 into column 504, along with a corresponding attribute value 307 in column 506 of card collection 500 (FIG. 5). In some implementations, the system performing process 1300 can also add the new attribute not only to a structured presentation but also to a user interface component for receiving user input specifying modifications of a structured presentation. For example, the system can add the new attribute to attribute modification region 1005 of user interface component 1000 (FIG. 10).
The system performing process 1300 can populate the attribute values based at least in part on the user specification (step 1325). The system can populate the attribute values using various techniques, as described in further detail below.
FIG. 14 is schematic representation of a user interface component 1400 for adding new attributes to a structured presentation. User interface component 1400 can interact with a user for the specification of one or more traits or characteristics of the new attribute. These traits or characteristics can be used, e.g., in adding new attributes and attribute values to a structured presentation. For example, user interface component 1400 can be used to present options for adding a new attribute class to a structured presentation at step 1310 and to receive a user specification of a new attribute at step 1315 in process 1300 (FIG. 13).
User interface component 1400 includes a header 1405 and a collection of trait identifiers 1410, 1415, 1420, 1425 that identify traits that characterize the new attribute. Each trait identifier 1410, 1415, 1420, 1425 is associated with a trait specification widget 1410, 1415, 1420, 1425 and identifies the trait that can be specified by user interaction with that widget. Header 1405 includes text or other information that identifies that user interaction with user interface component 1400 will indeed allow the user to add a new attribute to a structured presentation.
Trait identifier 1410 identifies that a user can specify a class of the attribute to be added to a structured presentation by interacting with trait specification widget 1430. The class of an attribute indicates how the attribute and its values are to be identified. For example, an attribute class can specify a technique by which the attribute and its values to be identified in an electronic document collection. Example attribute classes include "auto-find values," "search results," "review," and "note" classes. Details regarding these attribute classes are discussed further below. Trait specification widget 1430 is an interactive display element that allows a user to specify the class of the attribute to be added to a structured presentation. In the illustrated implementation, trait specification widget 1430 is a dropdown box widget.
Trait identifier 1415 identifies that a user can specify a name or other identifier of the new attribute by interacting with trait specification widget 1435. Trait specification widget 1435 is an interactive display element that allows a user to specify the name or other identifier of the new attribute to be added to a structured presentation. In the illustrated implementation, trait specification widget 1435 includes a text entry field. In general, the attribute identifier identified in trait identifier 1415 can be added directly into a structured presentation as an attribute identifier 308.
Trait identifier 1420 identifies that a user can specify keywords that that characterize the new attribute by interacting with trait specification widget 1440. Trait specification widget 1440 is an interactive display element that allows a user to specify one or more keywords that characterize the attribute to be added to a structured presentation. In the illustrated implementation, trait specification widget 1440 includes a text entry field into which one or more keywords can be entered. The keywords can include, e.g., synonyms of the attribute identifier or terms that characterize the context of the attribute identifier. For example, if the attribute identifier is "bank," the keywords identified in trait specification widget 1440 can include "NASCAR" and "speedway" to indicate that the attribute refers to the "bank" of a racetrack as opposed to a financial institution.
In operation, the keywords specified in trait specification widget 1440 can be used to identify instances, attributes, and/or attribute values in searches of electronic document collections. For example, the keywords can be used when formulating new attributes and/or new instances, as described in FIGS. 21-26 and the associated text and in FIGS. 37-51 and the associated text.
Trait identifier 1425 identifies that a user can specify "favorite sites" that characterize the new attribute by interacting with trait specification widget 1445. "Favorite sites" are documents in an electronic document collection. User specification of a document as a "favorite site" is indicative that the user considers the content of the document to be both being relevant to the new attribute and likely to be true. The content of a "favorite site" can thus be assigned a high confidence value, e.g., in formulating new instances and new attributes for addition to a preexisting structured presentation (as discussed further below). User specification of a document as a "favorite sites" can also be used as an indication that the content of the document is a trustworthy of attribute values for populating a structured presentation.
Trait specification widget 1445 is an interactive display element that allows a user to specify one or more documents in an electronic document collection as "favorite sites." In the illustrated implementation, trait specification widget 1445 includes a text entry field into which, e.g., one or more domain names or other electronic document locations can be entered.
In some implementations, a trait "de-specification" widget allows a user to identify that one or more documents in an electronic document collection are "disfavored" sites. User specification of a document as a "disfavored site" indicates that the user does not trust the document as a source of attribute values. Such a trait de- specification widget can includes a text entry field into which, e.g., one or more domain names or other electronic document locations can be entered.
FIG. 15 is a flow chart of an example process 1500 for adding new attribute values to a structured presentation. Process 1500 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. Process 1500 can be performed alone or in conjunction with other data processing activities. For example, as discussed further below, process 1500 can be performed in conjunction with various processes for adding new attributes to a structured presentation, e.g., process 1300 (FIG. 13).
The system performing process 1500 can receive user specification of the class of a new attribute (step 1505). As discussed above, the class of an attribute indicates how the attribute and its values are to be identified. The receipt of the class of a new attribute can be part of the receipt of a specification of a new attribute at step 1315 in process 1300 (FIG. 13). In some implementations, the user specification of the class of a new attribute can be received over trait specification widget 1430 in user interface component 1400 (FIG. 14).
The system performing process 1500 can determine which class is specified for the new attribute (step 1510). Based on the class specified, the system performing process 1500 can determine which of various subprocesses for adding new attribute values to the structured presentation is to be performed. For example, the system can determine to add attribute values in accordance with a subprocess associated with a "note" class, a subprocess associated with a "reviews" class, a subprocess associated with a "search results" class, or a subprocess associated with an "already found" class. If the system performing process 1500 determines to add new attribute values using a subprocess associated with the "note" class, the system can populate attribute values with notes received from the user (step 1515). For example, in the context of FIG. 4, values in the notes column 420 in table 400 can be received from a user and used to populate the values of a new attribute.
If the system performing process 1500 determines to add new attribute values using a subprocess associated with the "reviews" class, the system can search for and identify electronic documents that include reviews (step 1520). Reviews are critical evaluations of one or more instances characterized by the new attribute. In some cases, reviews can be authored by someone with expertise in evaluating instances, such as a critic. Reviews can be identified, e.g., based on a label or other text that identifies them as reviews. For example, certain domain names (e.g., http://www.google.com/prdhp, http://www.epinions.com/, http://www.amazon.com/) can be used to identify electronic documents that include reviews. The electronic documents that include reviews can be found in an electronic document collection, e.g., collection 102.
The system performing process 1500 can populate attribute values using content from the identified reviews (step 1525). For example, the system can extract values from the review using one or more text- or table-based extraction patterns and present those extracted values in the structured presentation. These extraction patterns may preferentially select segments of the review documents that are "sentiment focused." Sentiment focused segments are identified as voicing strong sentiments, either positive or negative, about certain subject matter. For example, a review of a restaurant could include a sentiment focused segments such as "the food is exceptionally good" and "the service was very poor indeed." The presentation of those extracted values in the structured presentation can be part of a population of a structured presentation at step 1325 in process 1300 (FIG. 13).
If the system performing process 1500 determines to add new attribute values using a subprocess associated with the "search results" class, the system can generate a collection of search results from an electronic document collection, e.g., collection 102 (step 1530). The search can yield a result set that is not limited to reviews but rather can include a variety of electronic documents. The electronic documents can be found in an electronic document collection, e.g., collection 102.
The search results can be generated by searching based on an identifier of the new attribute, as well as the identifiers of instances characterized by that attribute. In some implementations, additional keywords that are associated with the new attribute can be used to refine search results, e.g., the keywords received from the user over trait specification widget 1440 of user interface component 1400 (FIG. 14).
The system performing process 1500 can populate attribute values in the structured presentation with content from the search result set (1535). For example, the system can 5 extract one or more values from the search result set using one or more text- or table-based extraction patterns and present those extracted values in the structured presentation. The population of those attribute values with the content of the search result set can be part of a population of a structured presentation at step 1325 in process 1300 (FIG. 13).
If the system performing process 1500 determines to add new attribute values using a o subprocess associated with the "already found" class, the system can identify values that have already been found and extracted from an electronic document collection, e.g., electronic document collection 102 (step 1540). The "already found" values can be stored, e.g., in a collection of information that characterizes the electronic documents, e.g., data center 208 in system 200 (FIG. T). In some implementations, such a collection of information can include5 a historical record of previous structured presentations. The system performing process 1500 can populate attribute values of a structured presentation with the previously extracted values (step 1545). The population of those attribute values with the content of the search result set can be part of a population of a structured presentation at step 1325 in process 1300 (FIG. 13). 0 FIG. 16 is a flow chart of an example process 1600 for adding new attribute values to a structured presentation. In particular, process 1600 is concerned with selecting attribute values to be used in populating the attribute values of a structured presentation. Process 1600 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. Process 1600 can be performed alone or in5 conjunction with other data processing activities. For example, process 1600 can be performed at step 1325 in process 1300 (FIG. 13), at step 1525 in process 1500 (FIG. 15), at step 1535 in process 1500 (FIG. 15), and/or at step 1545 in process 1500 (FIG. 15).
The system performing process 1600 can identify candidate attribute values (step 1605). The candidate attribute values can be, e.g., extracted directly from content (such as0 reviews or other documents in an electronic document collection) or identified from a collection of previously-extracted attribute values. For example, in the context of FIG. 2, the system can access data center 208 and extract one or more stored attribute values.
The system performing process 1600 can determine a confidence in the identified candidate values (step 1610). The confidence in a candidate value should characterize the degree of assurance that the candidate value correctly characterizes the attribute of an instance. The confidence in the correctness of a value can be determined based on, e.g., the number of times that the value is used to characterize an attribute of an instance, the quality of the documents from which the value is used to characterize an attribute of an instance, and the like.
The system performing process 1600 can determine whether the confidence in certain of the candidate values is low, medium, or high (step 1615). A low confidence in an attribute value indicates that it is unlikely that the candidate value correctly characterizes the attribute of an instance. A high confidence in an attribute value indicates that it is likely that the candidate value correctly characterizes the attribute of an instance.
Is the system performing process 1600 determines that the confidence in certain of the candidate values is high, then the system can populate attribute values in the structured presentation with the extracted values (step 1545). This can be done automatically, i.e., without input from a user. If the system performing process 1600 determines that the confidence in certain of the candidate values is medium, then the system can provide the candidate values to the user (step 1625). For example, the system can generate a user interface component that presents candidate values in association with identifiers of the instances and the attributes potentially characterized by those candidate values. The system performing process 1600 can receive user selections of certain of the presented values (step 1630). The user selection can be received as one or more user inputs. For example, a user interface component that presents candidate values can include one or more selection widgets that allow the user to select candidate values for populating a structured presentation. The selection can be received from a user using a mouse, keyboard or other user input device.
The system performing process 1600 can populate the attribute value with the selected values (step 1635). For example, the system performing process 1600 can present the selected value in the structured presentation.
In some implementations, the selected attribute values can be used to further refine the attributes, values, and/or instances presented in the structured presentation. For example, if a user specifies that the value of an attribute of an instances is several thousand dollars, the magnitude of the value can be used to exclude values of significantly different magnitude from the structured presentation. As another example, if a user specifies that the value of an attribute of an instances is several thousand dollars, the magnitude of the value can be used to exclude instances that have values of that attribute that are significantly different in magnitude.
If the system performing process 1600 determines that the confidence in certain of the candidate values is low, then the system performing process 1600 can highlight such deficiencies in the structured presentation (step 1640). The deficiencies can be highlighted, e.g., by leaving an open entry or by highlighting the low confidence values using colored or other indicia. The system may also be able to receive candidate values that remedy these deficiencies from a user who interacts with an interactive element, e.g., a text field in the open entry or a notes cell adjacent the deficient entry. FIG. 17 is a schematic representation of a user interface component 1700 for selecting a candidate value to be added to a structured presentation. User interface component 1700 can interact with a user for the selection of a value that is to characterize a new attribute in the structured presentation. For example, user interface component 1700 can be presented to a user at step 1625 and receive a user selection at step 1630 of process 1600 (FIG. 16). The user interface component 1700 includes a header 1705 and a table 1710. Header
1705 includes text or other information that identifies that user interaction with user interface component 1700 will allow the user to select a value of an attribute of an instance for display in a structured presentation. Table 1710 includes a collection of candidate value information organized into columns 1715, 1720, 1725, as well as a collection of row selection widgets 1730.
In particular, column 1715 includes a column header 1735 as well as a collection of candidate value identifiers. The candidate value identifiers can have been extracted directly from document the electronic document collection 102 or indirectly over data center 208. In some implementations, the values may also include unit identifiers 309 that specify the unit of measure for the particular value 307. Column header 1735 identifies that candidate value identifiers are found in column 1715.
Column 1720 includes a column header 1740 as well as a collection of confidence values. The confidence values indicate the likelihoods that the candidate values identified in column 1715 are correct. The confidence values can be expressed in numerical or word terms. For example, the confidence values can be presented as, e.g., the percentage chance that a value is correct or on a numeric scale. Column header 1740 identifies that confidence values are found in column 1720.
Column 1725 includes a column header 1745 as well as a collection of source identifiers. The source identifiers identify one or more sources of the candidate values identified in column 1715. The sources can be identified using, e.g., the title of an electronic document, a domain name, the author's name, or the like. In some implementations, the source identifiers can include text snippets that include the candidate values identified in column 1715. Column header 1744 identifies that source identifiers are found in column 1720.
Selection widget collection 1730 includes one or more user interactive elements for receiving input from a user. The user input can identify that a candidate value identified in column 1715 is to be added to a structured presentation.
In some implementations, user interface component 1700 can present candidate values in an order that is based on confidence values. For example, a candidate value with the highest confidence value can be presented on the top of column 1715 and the candidate value with the lowest confidence value can be presented on the bottom of column 1715.
In some implementations, user interface component 1700 can also include snippets of text surrounding attributes and values in a particular source identified in column 1725. Such snippets can allow a user to see the value in context.
FIG. 18 a schematic representation of a structured presentation 1800 that includes highlights 1802 of deficiencies in the attribute values presented therein. In the illustrated example, the confidence in the values that are candidates for characterizing the attributes "ATTR_1" and "ATTRIBUTE_N" of instance "INSTANCE_1" are low, as is the confidence in the values that are candidates for characterizing the attribute "ATTR_2" of instance
"INSTANCE_2." In the case of attribute "ATTR_1" of instance "INSTANCE_1," this lack of confidence is highlighted by an empty cell 1804. In the cases of attribute "ATTRIBUTE_N" of instance "INSTANCE_1" and attribute "ATTR_2" of instance "INSTANCE_2," this lack of confidence is highlighted by a color indicium 1806. Such highlights provide an intuitive form of feedback regarding the nature of particular attribute values. That is, the user can view the table 300 and immediately determine which values may be of questionable correctness. The system can receive user input that remedies one or more of the highlighted deficiencies. For example, the system may receive manually entered attribute values, additional constraints, or other user input described in this specification that the system can use to confidently identify additional attribute values.
In some implementations, user interaction with a cell in which a deficiency is highlighted can trigger a search directed to remedying the deficiency. For example, user interaction with empty cell 1804 can trigger a search. The search can use a customizable query that is based on, e.g., a category of the instances in the display, an identifier of the instance that is to be characterized by the new value, and/or an identifier of the attribute that is to be characterized by the new value. After returning a set search results, a system can receive further interaction that specifies the value that remedies the deficiency. In some implementations, the returned set of search results can include attribute- specific highlighting 5 in text snippets that demarcate potential values.
FIG. 19 is a schematic representation of a user interface component 1900 for selecting a candidate attribute to be added to a structured presentation. User interface component 1900 can interact with a user for the selection of an attribute that is to characterize an instance in the structured presentation. For example, user interface o component 1900 can be presented to a user to select which attribute is to be added to a structured display at step 1320 of process 1300 (FIG. 13).
The user interface component 1900 includes a header 1905 and a table 1910. Header 1905 includes text or other information that identifies that user interaction with user interface component 1900 will allow the user to select an attribute of an instance for display in a5 structured presentation. Table 1910 includes a collection of candidate attribute information organized into columns 1915, 1920, 1925, as well as a collection of row selection widgets 1930.
In particular, column 1915 includes a column header 1935 as well as a collection of candidate attribute identifiers. The candidate attribute identifiers can have been extracted0 directly from document the electronic document collection 102 or indirectly over data center 208. In some implementations, the attributes may also include unit identifiers 309 that specify the units of measure in which values of the candidate attributes are to be cast. Column header 1935 identifies that candidate attribute identifiers are found in column 1915. Column 1920 includes a column header 1940 as well as a collection of confidence5 values. The confidence values indicate the likelihoods that the candidate attributes identified in column 1915 are correct. The confidence values can be expressed in numerical or word terms. For example, the confidence values can be presented as, e.g., the percentage chance that an attribute is correct or on a numeric scale. Column header 1940 identifies that confidence values are found in column 1920. 0 Column 1925 includes a column header 1945 as well as a collection of source identifiers. The source identifiers identify one or more sources of the candidate attributes identified in column 1915. The sources can be identified using, e.g., the title of an electronic document, a domain name, the author's name, or the like. In some implementations, the source identifiers can include text snippets that include the candidate attributes identified in column 1915. Column header 1944 identifies that source identifiers are found in column 1920.
Selection widget collection 1930 includes one or more user interactive elements for receiving input from a user. The user input can identify that a candidate attribute identified in column 1915 is to be added to a structured presentation.
In some implementations, user interface component 1900 can present candidate attributes in an order that is based on confidence values. For example, a candidate attribute with the highest confidence value can be presented on the top of column 1915 and the candidate attribute with the lowest confidence value can be presented on the bottom of column 1915.
In some implementations, user interface component 1900 can also include snippets of text surrounding instances and attributes in a particular source identified in column 1925. Such snippets can allow a user to see the attributes in context.
FIG. 20 is a schematic representation of a user interface component 2000 for selecting a candidate instances to be added to a structured presentation. User interface component
2000 can interact with a user for the selection of an instance that is to be added to a structured presentation. For example, user interface component 2000 can be presented to a user to select which instance is to be added to a structured display at steps 925, 940 of process 900 (FIG. 9). The user interface component 2000 includes a header 2005 and a table 2010. Header
2005 includes text or other information that identifies that user interaction with user interface component 2000 will allow the user to select an instance for display in a structured presentation. Table 2010 includes a collection of candidate instance information organized into columns 2015, 2020, 2025, as well as a collection of row selection widgets 2030. In particular, column 2015 includes a column header 2035 as well as a collection of candidate instance identifiers. The candidate instance identifiers can have been extracted directly from document the electronic document collection 102 or indirectly over data center 208. Column header 2035 identifies that candidate instance identifiers are found in column 2015. Column 2020 includes a column header 2040 as well as a collection of confidence values. The confidence values indicate the likelihoods that the candidate instance identified in column 2015 are to be added. The confidence values can be expressed in numerical or word terms. For example, the confidence values can be presented as, e.g., the percentage chance that an instance is meets with user- specified constraints. Column header 2040 identifies that confidence values are found in column 2020.
Column 2025 includes a column header 2045 as well as a collection of source identifiers. The source identifiers identify one or more sources of the candidate instances 5 identified in column 2015. The sources can be identified using, e.g., the title of an electronic document, a domain name, the author's name, or the like. In some implementations, the source identifiers can include text snippets that include identifiers of the candidate instances in column 2015. Column header 2044 identifies that source identifiers are found in column 2020. o Selection widget collection 2030 includes one or more user interactive elements for receiving input from a user. The user input can identify that a candidate instance identified in column 2015 is to be added to a structured presentation.
In some implementations, user interface component 2000 can present candidate instances in an order that is based on confidence values. For example, a candidate instance5 with the highest confidence value can be presented on the top of column 2015 and the candidate instance with the lowest confidence value can be presented on the bottom of column 2015.
In some implementations, user interface component 2000 can also include snippets of text surrounding instance identifiers in a particular source identified in column 2025. Such0 snippets can allow a user to see the instances in context.
The changes made to a structured presentation using the systems and processes described herein can be part of an iterative process in which these changes are used to identify additional instances, attributes, and/or values. For example, process 800 (FIG. 8) can be repeated several times. Since the scope of existing content increases, the additional5 instances, attributes, and/or values that are identified are likely to be of increased confidence.
FIG. 21 is a schematic representation of a process 2100 by which new instances can be added to expand a preexisting structured presentation. Process 2100 can be performed by a system of one or more computer that perform operations by one or more sets of machine- readable instructions, e.g., a system 200 (FIG. T). 0 Process 2100 includes an extraction operation 2105 and a merge operation 2110 that add new instances to a preexisting structured presentation based on information drawn from documents in electronic document collection 102. In particular, process 2100 suggests one or more new instances based on information presented in the preexisting structured presentation
106. For example, if the structured presentation includes a number of instances corresponding to certain movies, the system 200 can suggest additional instances of movies according to information drawn from the electronic document collection. That is, the system 200 can identify and suggest additional instances according to similarities of the attribute identifiers, units of measurement of the attribute values, values of the attribute values, or combinations thereof. For example, the system 200 may suggest movies that have similar show times, theaters, or run times.
As shown in FIG. 21, extraction operation 2105 uses the characteristics of a preexisting structured presentation 106 to extract a collection of new instance suggestions from electronic document collection 102. Example characteristics include the instances in the preexisting structured presentation, the attributes in the preexisting structured presentation, and the values of the attributes in the preexisting structured presentation. The characteristics of the preexisting structured presentation 106 can be expressed as a collection of machine-readable information and can be received by a system of one or more computer that perform operations by one or more sets of machine-readable instructions. For example, the characteristics of the preexisting structured presentation 106 can be received by a search engine 202 (FIG. 2).
During extraction operation 2105, one or more new instance suggestions can be formulated based on the content of documents in electronic document collection 102 and the characteristics of preexisting structured presentation 106. A variety of different techniques for formulating new instance suggestions can be used, as discussed further below.
Some or all of the new instance suggestions can be merged with the preexisting structured presentation 102 in merge operation 2110 to form an expanded structured presentation 106. The expanded structured presentation can be displayed for a viewer, e.g., at a display device such as display screen 106. All the new instance suggestions formulated during extraction operation 2105 need not be merged with the preexisting structured presentation 102 and displayed for a viewer. For example, in some implementations, a collection of new instance suggestions can be presented to a viewer along with an interactive element that allows the viewer to select one or more instances that are to be added. However, in other implementations, the new instance suggestions can be added automatically, without user interaction, and without winnowing of the new instance suggestions before display. More details regarding the merger can be found, e.g., in FIGS. 9-20 and the associated text.
FIG. 22 is a flow chart of an example process 2200 for adding instances to a structured presentation based on the content of documents in an electronic document collection. Process 2200 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions. For example, process 2200 can be performed by the search engine 202 in system 200.
In some implementations, process 2200 can be performed in response to receiving input, e.g., from a user or from another system or process that triggers an update of the structured presentation 226. For example, process 2200 can be performed in response to a user request that one or more new instances be added to a structured presentation 226. As another example, process 2200 may be performed by a search engine, e.g., search engine 202 (FIG. 2), in response to receipt of a search query. The system performing process 2200 can receive one or more characteristics of a preexisting structure display (step 2205). For example, the system can receive one or more attribute identifiers of the preexisting structured presentation. As another example, the system can receive one or more instance identifiers that appear in the preexisting structured presentation. The system performing process 2200 can formulate one or more instance suggestions from documents in an electronic document collection based on one or more characteristics of the preexisting structured presentation (step 2210). Instance suggestions can be formulated based on these characteristics in a number of different ways. For example, in one implementation, the system can formulate instance suggestions from documents in an electronic document collection 102 by constructing search queries using attribute identifiers drawn from the preexisting structured presentation. These search queries can be used to identify instances that may share similar attributes using string comparisons or other matching techniques. Examples of other approaches are discussed further below.
The system performing process 2200 can provide one or more instance suggestions to a user (step 2215). For example, a list of instance suggestions can be displayed for the user on the same display screen that displays the preexisting structured presentation.
The system performing process 2200 can receive user selection of one or more instance suggestions (step 2220). For example, a user interface component can interact with a user to receive one or more user inputs (e.g., mouse clicks, key strokes, or other user input) that select one or more instance suggestions.
The system performing process 2200 can add the selected instance suggestions to a structured presentation as new structured records (step 2225). For example, when the structured presentation is a table such as table 300 (FIG 3), the system can add new rows 302. As another example, when the structured presentation is a collection of cards such as collection of cards 500 (FIG 5), the system can add new cards 500.
FIG. 23 is a flow chart of an example process 2300 for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. Process 2300 can be performed alone or in conjunction with other activities. For example, process 2300 can be performed at step 2210 in process 2200 (FIG. 22).
Process 2300 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions. For example, process 2300 can be performed by search engine 202 in system 200 (FIG. 2).
The system performing process 2300 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation (step 2305). Structured components are portions or regions of an electronic document that are structured. Example structured components include tables, lists, records, collections of attribute- value pairs, and the like. Structured components can thus organize attribute values and instance identifiers in conformity with a defined structure, much like a structured presentation.
The entirety of an electronic document that includes a structured component need not be structured. For example, an electronic document can include a table between two paragraphs of unstructured text. Moreover, structured components in different documents need not have the same format or conform with a predetermined or persistent structure. Indeed, the organization of information in one structured component generally can be changed without regard to the organization of information in structured components that appear in other documents. By way of example, if a structured list of schools in one person's resume is changed to delete the year of graduation, there is no need to insure that other structured lists of schools in other resumes are similarly changed.
The system performing process 2300 can identify documents that include structured components in a variety of ways. For example, tables and other structured components can be identified using metadata labels, e.g., HTML tags, found in the documents themselves. As another example, structured components can be identified by identifying repetitive elements (e.g., a series of comma or tab delineations) in a document.
Structured components relate to instances specified in a preexisting structured presentation when they include information that is relevant to the specified instances. For example, a structured component that characterizes one or more of the specified instances with one or more attribute values can be considered relevant to the instances specified in a preexisting structured presentation. As another example, a structured component that characterizes one or more of the same attributes of instances that differ from instances specified in a preexisting structured presentation can be considered relevant to the specified instances. In many implementations, the instance and/or attribute identifiers need not be the same. Rather, conceptually related instances and attributes can be used to identify documents that include structured components.
Thus, in some implementations, the system performing process 2300 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation by identifying documents that include the same or related instance identifiers as found in the preexisting structured presentation and/or the same or related attribute identifiers as found in the preexisting structured presentation.
The system performing process 2300 can select one or more instance suggestions from the structured components (step 2310). This selection process can winnow down the number of instances that are to be suggested to a user. The selection of instance suggestions can be performed in a number of ways. For example, the system can select instance suggestions based on a category of the instances in the structured components, the attributes of the instances in the structured components, and/or the values of the attributes of the instances in the structured components, as discussed further below. FIG. 24 is a representation 2400 of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. In particular, representation 2400 illustrates a formulation of instance suggestions using one implementation of process 2300 (FIG. 23). As shown, a preexisting structured presentation specifies a collection of instances 2405 (i.e., the instances "Philadelphia" and "Chicago.") Furthermore, different documents in an electronic document collection (e.g., collection 102) include different structured components 2410, 2415, 2420. Structured components 2410, 2415, 2420 can be identified as relevant to specified instances 2405 based on, e.g., the same instance identifiers "Philadelphia" and "Chicago" appearing therein. As shown, structured components 2410, 2415, 2420 include a wide variety of different potential instance suggestions based on different contexts. In particular, in the context of structured component 2410, the instances "Philadelphia" and "Chicago" are part of a tabular component that represents the properties of various cities. In the context of structured component 2415, the instances "Philadelphia" and "Chicago" are part of a structured component that represents part of the standings in the National League East sometimes in the 1970's. In the context of structured component 2420, the instances "Philadelphia" and "Chicago" are part of a tabular component that represents the properties of various films. Rather than suggesting all the various instances found in structured components 2410,
2415, 2420 to a user, instance selections can be selected from components 2410, 2415, 2420 based on the attributes used to characterize those instances. In particular, as shown, preexisting structured presentation 106 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "year," "rating," and "box office receipts." Structured component 2410 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "population" and "area." Structured component 2415 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "wins", "losses," and "GB (i.e., games behind)." Structured component 2420 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "year," "runtime," and "rating." A system can select from the instances in structured components 2410, 2415, 2420 based on these characterized attributes. For example, the system can identify the correspondence between the attribute identifiers "year" and "rating" in preexisting structured presentation 106 and the attribute identifiers "year" and "rating" in structured component 2420 to select the instances "Peter Pan" and "Star Wars" as suggestions for addition to the preexisting structured presentation 106.
As discussed in FIGS. 37-51 and the associated text, a system can also suggest or add additional attribute identifiers. For example, structured component 2420 includes an attribute identifier "runtime." Such a system can thus suggest the attribute identifier "runtime" with or without the corresponding attribute values. In some implementations, even if instances drawn from structured components 2410,
2415 are not suggested in a particular formulation, such instances can be stored for use during future information requests. For example, even through the cities represented in structured component 2410 are not selected as instance suggestions, these cities can be stored along with their respective attribute identifiers (e.g., "population" and "area") and attribute values in a data collection (such as, e.g., data center 208). When a subsequent user requests information regarding one or more cities, such a system can access this stored information and provide additional information to the user.
FIG. 25 is a flow chart of an example process 2500 for formulating instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. Process 2500 can be performed alone or in conjunction with other activities. For example, process 2510 can be performed at step 2210 in process 2200 (FIG. 22).
Process 2500 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions. For example, process 2510 can be performed by search engine 202 in system 200 (FIG. T).
The system performing process 2500 can identify one or more documents relevant to one or more specified instances (step 2505). For example, the system performing process 2500 can use string comparisons to match one or more of the specified instances and their attributes and/or values with documents in an electronic document collection such as electronic document collection 102. As another example, the system performing process 2500 can access stored information (such as information in data center 208) to identify electronic documents that are relevant to the specified instances.
The system performing process 2500 can extract a template of one or more of the identified documents (step 2510). A document template serves as a pattern for the arrangement of the content of individual documents in a subcollection of documents in an electronic document collection. The documents in a subcollection generally originate from a single source, such as a single commercial entity. For example, a bookseller can use a single document template as a pattern for the arrangement of content describing different books. As another example, a furniture retailer can use a single document template as a pattern for the arrangement of the content of fliers for different sofas. For example, the template of an electronic flyer for a sofa can specify the arrangement, on the flyer, of the brand name of the sofa, a picture of the sofa, an interactive element that allows the user to select the color in which the sofa is shown, a description of the sofa in text format, and a table that characterizes the sofa's dimensions, availability, and price. Document templates can thus organize information regarding an instance in conformity with a defined structure, much like a structured presentation.
In general, a document template can serve as a pattern for the entire content of an electronic document and, as discussed above, can even specify the arrangement of a structured component in a document. However, because document templates only specify the arrangement of the content of a subcollection of documents in an unstructured electronic document collection, the electronic document collection itself remains unstructured. For example, even if AMAZON.COM uses one template as a pattern for the arrangement of a description of every book that AMAZON.COM sells, BARNESANDNOBLE.COM and other booksellers do not necessarily use that same template as a pattern for the arrangement of descriptions of books that they sell. Moreover, a document template can be changed without that change necessarily being propagated throughout the entire collection, or even a subcollection, of electronic documents. FIG. 26 is a representation of a portion of a hypertext markup language (HTML) template 2600 that is used as a pattern for descriptions of a movie (i.e., the movie "Philadelphia." The hypertext markup language (HTML) code of template portion 2600 is both machine-readable and human-readable. For example, the HTML code of template portion 2600 can be used by a browser to generate a web page. In the depicted example, template portion 2600 is split into two subsections 2605,
2610. Subsection 2605 serves as a pattern for the arrangement of text that identifies the movie "Philadelphia." Subsection 2610 serves as a pattern for the arrangement of various attribute identifiers and their values. In general, the patterns in subsections 2605, 2610 are repeated a number of times in a particular subcollection of documents in an electronic document to describe different movies.
An HTML parser can be used to extract the formatting from template portion 2600 so the formatting can be used to identify documents having the same template. For example, the HTML tags <title>, <div>, other HTML tags, and their relative position to each other can be identified by an HTML parser. Such an HTML parser can determine that the HTML tag <title> appears before the HTML tag <div>. Thus, an HTML parser can extract the formatting from template portion 2600 from content that is arranged in accordance with the template.
Returning to FIG. 25, after extracting a template, the system performing process 2500 can identify one or more documents that have the same template (step 2515). For example, the system can compare the template of documents in the electronic document collection with the extracted template.
The system performing process 2500 can also formulate one or more instance suggestions from the documents identified as having the same template (step 2520). In particular, the system can use the repetition of the template within a subcollection of documents to infer that the documents in the subcollection include the same kind of content regarding the same category of instances. In other words, the system can infer that the context of two documents is the same since the same template serves as a pattern for the different documents. Once documents of similar contexts have been identified, the templates themselves can be used to formulate the instance suggestions. For example, HTML tags in template portion 2600 (FIG. 26) identify that the title of the film described in that document is "Philadelphia (1993)." By searching for similarly- tagged text in documents that share the 5 same template, the system can identify the titles of other films.
Moreover, in some implementations, additional content in a document template can be used in formulating instance suggestions. For example, the identification of a certain value (e.g., George Lucas) as a "director" can be used to select particular instance suggestions from a subcollection of documents. In other words, subsection 2610 of template o portion 2600 can be parsed or otherwise analyzed to determine if any of the attributes have similar values, identifiers, or other characteristics. In such situations, the instance identifier can be extracted from subsection 2605.
FIG. 27 is a schematic representation of a process 2700 by which a collection of new instance suggestions 2115 can be formulated based on information in a preexisting structured5 presentation 106. Process 2700 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions (FIG. 2).
Process 2700 performs an extraction operation 2705 on an instance/attribute collection 2710 based on the information in the preexisting structured presentation 106. Instance/attribute collection 2710 is a collection of information that associates instances with0 their attributes and, in some implementation, with the values of those attributes as well. The information in collection 2710 can be extracted from documents in an electronic document collection 102 either in response to receipt of a trigger (e.g., a search query) or in anticipation of receipt of a trigger, e.g., as part of a process of indexing electronic document collection 102. In some implementations, information in collection 2710 can include the content of5 previous structured presentations that were presented to the current user or to other users. In general, the instance suggestions are provided to a user who selects instance suggestions to be added to a structured presentation, e.g., as described in steps 2215, 2220, 2225 (FIG. 22).
The association between instances and their attributes can be established in collection 2710 by structuring the information storage within collection 2710. FIG. 28 is a schematic0 representation of a table 2800 that associates attributes and instances in collection 2710.
Table 2800 includes a collection of records 2802, 2804, 2806, 2808 2810, 2812, 2814, each of which associates an identifier of an instance with descriptions of a document location and the attributes that characterize the identified instances in those documents. The information in records 2802, 2804, 2806, 2808 2810, 2812, 2814 can be organized in a collection of columns 2815, 2820, 2825, 2830, 2835, 2840. In particular, column 2815 can include instance identifiers. Column 2820 can include a description of the location of an electronic document that includes the instance identified in column 2815. Columns 2825, 2830, 2835, 2840 can identify attributes that characterize the instances identified in column 2815 in the document whose location is described in column 2820.
As shown, different electronic documents can include different categories and amounts of information characterizing the same instance. For example, the document whose location is identified in column 2820 of record 2804 includes two attributes of an instance "INSTANCE_2," whereas the document whose location is identified in column 2820 of record 2810 includes three attributes of an instance "INSTANCE_2." Moreover, the attributes in record 2804 (i.e., attribute "ATTR_5" and attribute "ATTR_6") differ in part from the attributes in record 2810 (i.e., attribute "ATTR_5," attribute "ATTR_8," and attribute "ATTR_9.")
Data collections 2710 that associate attributes and instances (such as such as table 2800) can be formed in a number of different ways. For example, documents that include internal, structured components can be identified. Examples of such internal, structured components include tables and lists that appear in HTML documents. The relationships between attributes and instances in these internal structured components can be copied to form data collections 2710. As another example, collection 2710 can be formed from the content of previous structured presentations that were presented to the current user or to other users.
As yet another example, once a first document has been identified as including attributes and instances, the template of that document can be used to extract attributes and instances from other documents that include the same template. For example, if a stereo retailer uses the same document template to describe different stereos that are offered for sale, the arrangement of information in a first electronic document regarding a first stereo can be used to extract information from other electronic documents that regard other stereos.
In some implementations, techniques such as natural language parsing can be used to identify instances and attributes. For example, electronic documents can be parsed to identify phrases such as "[Instance] has a/an [attribute]" in electronic documents.
In some implementations, data collection 2710 can categorize instances and their attributes. For example, instances and attributes can be categorized as North American cities, National League East teams, or popular movies. The storage of information in data collection 2710 can be based on such categorizations. For example, different categories can be stored in different files, records, or the like.
Returning to FIG. 27, process 2700 suggests one or more new instances based on information presented in the preexisting structured presentation 106. For example, if the structured presentation includes a number of instances corresponding to certain movies, the system 200 can suggest additional instance of movies according to information drawn from data collections 2710. That is, the system 200 can identify and suggest additional instances according to similarities of the attribute identifiers. For example, the system 200 may suggest movies that have similar show times, theaters, or run times. FIG. 29 is a flow chart of a process 2900 for formulating instance suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation. Process 2900 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions. For example, process 2900 can be performed by the search engine 202 in system 200 (FIG. T). Process 2900 can be performed alone or in conjunction with other activities. For example, process 2900 can be performed during process 2700 (FIG. 27). As another example, process 2900 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or both of processes 2300, 2500 (FIGS. 23, 25).
The system performing process 2900 can access a collection of instances and their attributes (step 2905). For example, the system performing process 2900 can access instance and attribute collection 2710 (FIG. 27) stored in data center 208 (FIG. 2).
The system performing process 2900 can identify one or more relevant instances based on characteristics of instance attributes specified in the preexisting structured presentation (step 2910). For example, the system can compare instance attributes of instances stored in the instance and attribute collection 2710 (FIG. 27) with instances specified in the structured presentation 106. The system can use the comparison to determine which, if any, of the stored instances share attribute identifiers, or related attributes, with the attributes specified in the preexisting structured presentation 106. For example, suppose that a preexisting structured presentation 106 uses the attributes "ATTR_3" and "ATTR_5" to characterize a collection of instances. Upon review of an instance and attribute collection
2710 such as table 2800 (FIG. 28), the system can suggest the instances "INSTANCE_1" and "INSTANCE_2" based on the same attributes "ATTR_3" and "ATTR_5" being used to characterize those instances in records 2802, 2806. FIG. 30 is a flow chart of a process 3000 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106. Process 3000 can be performed by a system of one or more computers that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T). Process 3000 can be performed alone or in conjunction with other activities. For example, process 3000 can be performed during process 2700 (FIG. 27). As another example, process 3000 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or more of processes 2300, 2500, 2900 (FIGS. 23, 25, 29). In some implementations, process 3000 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process). For example, process 3000 can be performed by search engine 202 in response to receiving a search string.
The system performing process 3000 can identify one or more authoritative sources regarding one or more specified instance (step 3005). For example, the system can access a collection of authoritative sources of documents in electronic document collection 102 that has been assembled, e.g., by a programmer.
As another example, the system can receive user-specific input identifying one or more authoritative sources of documents in electronic document collection 102 as "authoritative" in the view of that user. For example, a display screen 104 that displays a preexisting structured presentation 106 can include a GUI component that allows a viewer to specify authoritative sources of documents. The identification of an authoritative source can be received in conjunction with receipt of a search query. For example, a viewer can identify JD POWER AND ASSOCIATES, AMAZON.COM, and MAJOR LEAGUE BASEBALL as authoritative sources of the documents found at http://www.jdpower.com/, http://www.amazon.com/, and http://www.mlb.com/, respectively. In some implementations, the user-specific input can identify the subject mater on which a source is authoritative. For example, MAJOR LEAGUE BASEBALL may be identified as an authoritative source for baseball statistics, but may not be considered an authoritative source for information regarding drug testing.
As yet another example, the system performing process 3000 can analyze a collection of user- specific input identifying authoritative sources from multiple users to assemble a "generic" collection of authoritative sources. For example, a large number of users may identify the AMERICAN AUTOMOBILE ASSOCIATION (AAA) as authoritative. Based on a statistical analysis of these identifications, the AAA can then be added to a collection of authoritative sources. The system performing process 3000 can determine additional attributes from the authoritative sources for instances that are specified in the preexisting structured presentation (step 3010). For example, the system can access documents provided by an authoritative source and identify one or more documents that characterize specified instances using one or more attributes. The system can extract attribute identifiers from these documents using a parser or other string comparison techniques.
As another example, the system can access a data collection that associates attributes and instances, such as table 2800 (FIG. 28). The system can filter records such as records 2802, 2804, 2806, 2808 2810, 2812, 2814 based on both the instances identified in the preexisting structured presentation and whether or not the documents whose location is identified in records 2802, 2804, 2806, 2808 2810, 2812, 2814 originated from an authoritative source. For example, if AMAZON.COM is an authoritative source, a collection that associates attributes and instances can be scanned to identify documents with the http://www.amazon.com/ domain. The system performing process 3000 can compare these additional attributes with attributes in an instance and attribute collection such as table 2800 (FIG. 28) (step 3015). For example, the system can use string comparisons, or other comparison techniques, to compare the additional attributes with attributes stored in the instance and attribute collection.
The system performing process 3000 can identify an instance in the instance and attribute collection based on the results of these comparisons (step 3020). For example, the system can determine the number of attributes that are used to characterize instances in documents from an authoritative source and the attributes that are associated with other instances in the instance and attribute collection.
FIG. 31 is a flow chart of a process 3100 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106. Process 3100 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T). Process 3100 can be performed alone or in conjunction with other activities. For example, process 3100 can be performed during process 2700 (FIG. 27). As another example, process 3100 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or more of processes 2300, 2500, 2900, 3000 (FIGS. 23, 25, 29, 30). In some implementations, process 3100 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process). For example, process 3100 can be performed by search engine 202 in response to receiving a search string. The system performing process 3100 can identify one or more relevant instances based on attribute values of the instances specified in a preexisting structured presentation. For example, the system can identify relevant instances by comparing attribute values of specified instances with attribute values of other instances to determine if the other instances are suitable suggestions. Such comparisons can require, e.g., that the attribute values be identical or that the attribute values fall within a certain range. Such a range can be determined, e.g., based on the range of attribute values that are specified by a user over an interactive element or that already characterize instances in a preexisting structured presentation. In some implementations, the system performing process 3100 can convert attribute values into a common unit of measurement prior to comparing the attribute values. For example, if the specified unit of measurement is in feet, but one or more extracted attribute values is in meters, the system can convert the values in meters into feet using conventional techniques. A schematic representation of one such approach is described in more detail below.
FIG. 32 is a schematic representation of a table 3200 that associates attributes, instances, and their values in data collection. Since table 3200 associates attributes and instances, table 3200 can also serve as instance/attribute collection 2710 (FIG. 27). Table 3200 can be generated based on information drawn from a collection of electronic documents, e.g., electronic document collection 102. Table 3200 can be generated, e.g., during a crawling process and stored, e.g., in data center 328 for subsequent use.
Table 3200 includes a collection of records 3202, 3204, 3206, 3208 3210, 3212, 3214, each of which associates an identifier of an instance with descriptions of a document location, attributes that characterize the identified instances in those documents, and values that characterize those attributes in those documents. The information in records 3202, 3204,
3206, 3208 3210, 3212, 3214 can be organized in a collection of columns 3215, 3220, 3225, 3230, 3235, 3240. In particular, column 3215 can include instance identifiers. Column 3220 can include a description of the location of an electronic document that includes the instance identified in column 3215. Columns 3225, 3235 can identify attributes that characterize the instances identified in column 2815 in the document whose location is described in column 2820. Columns 3230, 3240 can include values that characterize the attributes identified in columns 3225, 3235.
In the illustrated example, each record 3202, 3204, 3206, 3208 3210, 3212, 3214 relates to a different instance (e.g., INSTANCE_10 to INSTANCE_N). Each of the instances is characterized in at least one document by attribute identifiers ATTR_3, ATTR_6. As such, if instance suggestions were formulated based solely on the attributes that could be used to characterize INSTANCE_10 to INSTANCE_N, every INSTANCE_10 to INSTANCE_N could be suggested to a user. In many circumstances, this is unacceptable. For example, it is likely that many of the same attributes (e.g., number of students, student/teacher ratio, location, etc.) of every college or university in the world is characterized in some electronic document available on the Internet. However, a list if suggestions that includes every college or university is of little assistance to a student who is looking for a school to attend. Accordingly, as discussed above in process 3100 (FIG. 31), relevant instances can be identified by comparing the attribute values of the instances specified in a preexisting structured presentation with the attribute values of other instances. For example, if a specified instance in a structured presentation characterizes attribute "ATTR_3" of an instance with value "VAL_3" in units "unit_a" and attribute "ATTR_6" of an instance with value "VAL_6" in units "unit_c," then a system such as search engine 202 can identify that the instances identified in records 3202, 3206 (i.e., "INSTANCE_10" and "INSTANCE_12") can be suggested to a user based on their common values (albeit in different units). Thus, a system can convert the values in cells 3245, 3250 and 3255, 3260 into a common unit of measurement and compare those values to determine that they are similar. Thus, like instances can be selected even if the units in which those values are expressed are different. Furthermore, although the instance identified in record 3208 (i.e., INSTANCE_13) shares a common value of attribute "ATTR_3" with a specified instance in a structured presentation, the instance identified in record 3208 need not be suggested to a user. In particular, the value that characterizes attribute "ATTR_6" of this instance is value "VAL_8," which differs from the value which characterizes this attribute of a specified instance in a structured presentation. On the basis of this difference, the instance identified in record 3208 can be excluded from a list if suggested instances.
Different criteria for including and excluding instances from a list of suggested instances can be used. For example, the number of attribute values that must be similar can differ. As another example, in some implementations, a user can specify the number and/or the nature of the attribute values that are considered in formulating a list of suggested instances. As yet another example, instances can be ranked based on the correspondence between their attribute values and the attribute values of one or more specified instances in a preexisting structured presentation. As yet another example, a range of values can be determined based on the values of characterizes the attributes of one or more instance specified in a structured presentation, and this range can be used to identify relevant instances for inclusion in a list of suggested instances.
As discussed in FIGS. 9-20 and the associated text, in some implementations, a system can select from among a collection of different values based on criteria that reflect the likelihood that a value is appropriate. Examples of such include user- specified ranges, the number of documents that characterize an attribute with a certain value, and/or the quality of the documents that characterize an attribute with a certain value.
FIG. 33 is a flow chart of a process 3300 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106. Process 3300 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T). Process 3300 can be performed alone or in conjunction with other activities. For example, process 3300 can be performed during process 2700 (FIG. 27). As another example, process 3300 can be performed at step 2210 in process 2200 (FIG. 22), alone or in conjunction with one or more of processes 2300, 2500, 2900, 3000, 3100 (FIGS. 23, 25, 29, 30, 31). In some implementations, process 3300 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process). For example, process 3300 can be performed by search engine 202 in response to receiving a search string. The system performing process 3300 can access categorized collections of instances and attributes (step 3305). For example, the system can access the instance and attribute collection 2710 to access one or more categorized collections of instances and attributes generated during previous searches.
The system performing process 3310 can identify a category that includes the specified instances (step 3310). In some implementations, the system can identify the category that includes the instances based on similar attributes, similar attribute values, combinations of these characteristics, and/or other techniques.
The system performing process 3300 can select one or more instance suggestions from the identified category (step 3315). For example, in some implementations, instance suggestions can be selected from the identified category based on the similarity between attribute values of the specified instances and attribute vales of the instances in the category.
FIG. 34 is a representation 3400 of a formulation of instance suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. In particular, representation 3400 illustrates a formulation of instance suggestions using one implementation of process 3300 (FIG. 33). As shown, a preexisting structured presentation specifies a collection of instances 2405 (i.e., the instances "Philadelphia" and "Chicago.") Furthermore, instances drawn from different documents in an electronic document collection (e.g., collection 102) have been categorized into different collections 3410, 3415, 3420. Categorized instance collections 3410, 3415, 3420 can be identified as relevant to specified instances 2405 based on, e.g., the same instance identifiers "Philadelphia" and "Chicago" appearing therein.
As shown, categorized instance collections 3410, 3415, 3420 have been categorized in a variety of different ways. In particular, categorized instance collection 3410 has been categorized as a collection of "North American Cities." Categorized instance collection 3415 has been categorized as a collection of "National League East Teams." Categorized instance collection 3420 has been categorized as a collection of "Popular Movies." In the context of system 200, categorized instance collections 3410, 3415, 3420 can be stored in the data center 208 (FIG. 2). That is, the system 200 can generate one or more categories of instances based on previously received search strings. Thus, after search engine 202 conducts a search based on a search string, search engine 202 can categorize the results and store them in data center 208. These categorized results can be accessed and analyzed during subsequent searches to generate instance suggestions. A categorized instance collection that includes the instances specified in a preexisting structured presentation can be identified, e.g., based on a similarity between the attributes that characterize the specified instances and the attributes that characterize the instances in the different categories. For example, the common use of the attributes "year" and "rating" the preexisting structured presentation and categorized instance collection 3420 can be used to identify that categorized instance collection 3420 includes instances 2405.
In some implementations, a subset of the instances in a categorized instance collection can be selected as instance suggestions based on the values that characterize the instances in a category. For example, the instance "Star Wars" can be included on a list of instance suggestions based on the value characterizing the "rating" attribute of "Star Wars" being similar to the value characterizing the "rating" attribute of "Philadelphia" and "Chicago." As another example, the instance "Peter Pan" can be excluded on a list of instance suggestions based on the value characterizing the "rating" attribute of "Peter Pan" different from the value characterizing the "rating" attribute of "Philadelphia" and "Chicago." FIG. 35 is a schematic representation of a collection 3500 of processes that can be used to formulate a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106.
The processes in collection 3500 can be thought of as filters that are applied in succession to a large collection of potential instances 3505 to yield a smaller collection 3510 of instance suggestions. Process collection 3500 includes a category filter 3515, a related attribute filter 3520, and a related value filter 3525. Category filter 3515 can include, e.g., aspects of process 3300 (FIG. 33). Related attribute filter 3520 can include, e.g., aspects of process 2300 (FIG. 23), aspects of process 2500 (FIG. 25), process 2900 (FIG. 29), and/or process 3000 (FIG. 30). Related value filter 3525 can include, e.g., aspects of process 2300 (FIG. 23), aspects of process 2500 (FIG. 25), process 3100 (FIG. 31), and/or aspects of process 3300 (FIG. 33). Each filter can exclude potential instances 3505 from an instance suggestion collection 3510 that can be presented to a user or added directly to a structured presentation. Filters 3515, 3520, 3525 can be applied in any order. However, in general, filters
3515, 3520, 3525 are applied in order of granularity. In particular, the filter 3515, 3520, 3525 are that reduces the number of potential instances by the greatest amount is applied first and the filter 3515, 3520, 3525 are that reduces the number of potential instances by the smallest amount is applied last. In some implementations, any of filters 3515, 3520, 3525 can be omitted from collection 3500 and/or additional filters added to collection 3500. For example, a user- specified filter that can filter the potential instances 3505 according to input provided by the user can be added to collection 3500.
FIG. 36 is a flow chart of a process 3600 for formulating a collection of new instance suggestions 2115 based on information in a preexisting structured presentation 106. Process 3600 can be performed by a system of one or more computer that perform operations by one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T). Process 3600 can be performed alone or in conjunction with other activities. For example, process 3600 can be performed iteratively in conjunction with one or more of the processes in process collection 3500. In some implementations, process 3600 can be performed in response to receipt of a search string.
The system performing process 3600 can make an initial match between the instances specified in a preexisting structured presentation 106 and instances drawn from a document collection (3605). The initial match can be based on one or more of the filtering processes in process collection 3500.
The system performing process 3600 can determine whether the number of matches is too high, too low, or appropriate (step 3610). If the number of matches is too low, the system can broaden names of specified instances (step 3615). For example, the system performing process 3600 can user alternate spellings, abbreviations, synonyms, alternative names, nicknames, and/or other keywords for the specified instances in one or more of the processes in process collection 3500.
The system performing process 3600 can also broaden one or more ranges of attribute values used in any related value filtering 3525 (FIG. 35) (step 3618). The range can be broadened based on input received from a user or automatically, without user input. For example, in some implementations, the system can broaden a range based on the distribution of attribute values for a selected group of instances to, e.g., include a certain percentage of the instances or a predetermined number of instances The system performing process 3600 can also reduce the number of attributes and/or instances used in any related attribute filtering 3520 (FIG. 35) (step 3620). The number of attributes and/or instances can be reduced based on, e.g., the number of potential instances excluded by a particular attribute and/or instance. For example, if the requirement that a specific attribute be used the characterize potential instances excludes all of the potential instances, then this attribute can be omitted from any related attribute filtering. The attributes and/or instances to be removed can be determined, e.g., automatically, without user input, or based on input received from a user.
The system can again seek to make a match between the instances specified in a preexisting structured presentation 106 and instances drawn from a document collection, but this time using the changed parameters (step 3622). This match can also be made using one or more of the filtering processes in process collection 3500.
If number of matches is determined to be too high (step 3610), the system performing process 3600 can narrow one or more ranges of attribute values used in any related value filtering 3525 (FIG. 35) (step 3625). The range can be narrowed based on input received from a user or automatically, without user input. For example, in some implementations, the system can narrow a range based on the distribution of attribute values for a selected group of instances to, e.g., exclude a certain percentage of the instances or a predetermined number of instances. The system performing process 3600 can also increase the number of attributes and/or instances used in any related attribute filtering 3520 (FIG. 35) (step 3628). The number of attributes and/or instances can be increased based on, e.g., the number of potential instances excluded by a particular attribute and/or instance. The attributes and/or instances to be added 5 can be determined, e.g., automatically, without user input, or based on input received from a user.
The system performing process 3600 can winnow the matches based on the changed parameters (step 3630). In particular, the narrowed ranges and/or increased numbers of attributes and/or instances can be used in any related value filtering 3525 (FIG. 35). o If the number of matches is determined to be acceptable (step 3610), the system performing process 3600 can suggest the matched instances to a user (3635). For example, the system performing process 3600 can present one or more instances suggestions in a GUI on a display screen, e.g., display screen 104.
FIG. 37 is a schematic representation of a process 3700 by which new attributes can5 be added to expand a preexisting structured presentation. Process 3700 can be performed by a system that includes one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. 2).
Process 3700 includes an extraction operation 3705 and a merge operation 3710 that add new attributes to a preexisting structured presentation based on information drawn from0 documents in electronic document collection 102. In particular, process 3700 suggests one or more new attributes based on information presented in the preexisting structured presentation 106. For example, if the structured presentation includes a number of instances corresponding to certain movies, the system 200 can suggest additional movie attributes according to information drawn from the electronic document collection. That is, the system5 200 can identify and suggest additional attributes according to similarities of the instance identifiers, the category of the instances, values of the attributes, or combinations thereof.
As shown in FIG. 37, extraction operation 3705 uses the characteristics of a preexisting structured presentation 106 to extract a collection of new attribute suggestions from electronic document collection 102. Example characteristics include the instances in0 the preexisting structured presentation, the category of the instances in the preexisting structured presentation, and the values of the attributes in the preexisting structured presentation. The characteristics of the preexisting structured presentation 106 can be expressed as a collection of machine-readable information and can be received by one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, the characteristics of the preexisting structured presentation 106 can be received by a search engine 202 (FIG. 2).
During extraction operation 3705, one or more new attribute suggestions can be formulated based on the content of documents in electronic document collection 102 and the characteristics of preexisting structured presentation 106. A variety of different techniques for formulating new attribute suggestions can be used, as discussed further below.
Some or all of the new attribute suggestions can be merged with the preexisting structured presentation 102 in merge operation 3710 to form an expanded structured presentation 106. The expanded structured presentation can be displayed for a viewer, e.g., at a display device such as display screen 106.
All the new attribute suggestions formulated during extraction operation 3705 need not be merged with the preexisting structured presentation 102 and displayed for a viewer. For example, in some implementations, a collection of new attribute suggestions can be presented to a viewer along with an interactive element that allows the viewer to select one or more attributes that are to be added. However, in other implementations, the new attribute suggestions can be added automatically, without user interaction, and without winnowing of the new attribute suggestions before display. More details regarding the merger can be found, e.g., in FIGS. 9-20 and the associated text.
FIG. 38 is a flow chart of an example process 3800 for adding attributes to a structured presentation based on the content of documents in an electronic document collection. Process 3800 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 3800 can be performed by the search engine 202 in system 200.
In some implementations, process 3800 can be performed in response to receiving input, e.g., from a user or from another system or process that triggers an update of the structured presentation 386. For example, process 3800 can be performed in response to a user request that one or more new attributes be added to a structured presentation 106. As another example, process 3800 may be performed by a search engine, e.g., search engine 202 (FIG. 2), in response to receipt of a search query. The system performing process 3800 can receive one or more characteristics of a preexisting structured presentation (step 3805). For example, the system can receive one or more instance identifiers that appear in the preexisting structured presentation. As another example, the system can receive a description of a category that includes the instances identified in the preexisting structured presentation. The system performing process 3800 can formulate one or more attribute suggestions from documents in an electronic document collection based on one or more characteristics of the preexisting structured presentation (step 3810). Attribute suggestions can be formulated based on these characteristics in a number of different ways. For example, in one implementation, the system can formulate attribute suggestions from documents in an electronic document collection 102 by constructing search queries using instance identifiers drawn from the preexisting structured presentation. These search queries can be used to identify attributes that may characterize the same or similar instances using string comparisons or other matching techniques. Examples of other approaches are discussed further below.
The system performing process 3800 can provide one or more attribute suggestions to a user (step 3815). For example, a list of attribute suggestions can be displayed for the user on the same display screen that displays the preexisting structured presentation.
The system performing process 3800 can receive user selection of one or more attribute suggestions (step 3820). For example, a user interface component can interact with a user to receive one or more user inputs (e.g., mouse clicks, key strokes, or other user input) that select one or more attribute suggestions.
The system performing process 3800 can add the selected attribute suggestions to a structured presentation (step 3825). In particular, the selected attribute suggestions can be used to expand the existing structured records in the structured presentation. For example, when the structured presentation is a table such as table 300 (FIG 3), the system can add new columns 304. As another example, when the structured presentation is a collection of cards such as collection of cards 500 (FIG 5), the system can add new attribute identifiers 308 to cards 500. FIG. 39 is a flow chart of an example process 3900 for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. Process 3900 can be performed alone or in conjunction with other activities. For example, process 3900 can be performed at step 3810 in process 3800 (FIG. 38). Process 3900 can be performed a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 3900 can be performed by search engine 202 in system 200 (FIG. 2).
The system performing process 3900 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation (step 3905). Structured components are portions or regions of an electronic document that are structured. Example structured components include tables, lists, records, collections of attribute-value pairs, and the like. Structured components can thus organize attribute values and instance identifiers in conformity with a defined structure, much like a structured presentation.
The entirety of an electronic document that includes a structured component need not be structured. For example, an electronic document can include a table between two paragraphs of unstructured text. Moreover, structured components in different documents need not have the same format or conform with a predetermined or persistent structure. Indeed, the organization of information in one structured component generally can be changed without regard to the organization of information in structured components that appear in other documents. By way of example, if a structured list of schools in one person's resume is changed to delete the year of graduation, there is no need to insure that other structured lists of schools in other resumes are similarly changed. The system performing process 3900 can identify documents that include structured components in a variety of ways. For example, tables and other structured components can be identified using metadata labels, such as HTML tags, found in the documents themselves. As another example, structured components can be identified by identifying repetitive elements (e.g., a series of comma or tab delineations) in a document. Structured components relate to instances specified in a preexisting structured presentation when they include information that is relevant to the specified instances. For example, a structured component that characterizes one or more of the specified instances with one or more attribute values can be considered relevant to the instances specified in a preexisting structured presentation. As another example, a structured component that characterizes one or more of the same attributes of instances that differ from instances specified in a preexisting structured presentation can be considered relevant to the specified instances. In many implementations, the instance and/or attribute identifiers need not be the same. Rather, conceptually related instances and attributes can be used to identify documents that include structured components. Thus, in some implementations, the system performing process 3900 can identify one or more documents that include structured components related to instances that are specified in a preexisting structured presentation by identifying documents that include the same or related instance identifiers as found in the preexisting structured presentation and/or the same or related attribute identifiers as found in the preexisting structured presentation. The system performing process 3900 can select one or more attribute suggestions from the structured components (step 3910). This selection process can winnow down the number of attributes that are to be suggested to a user. The selection of attribute suggestions can be performed in a number of ways. For example, the system can select attribute suggestions based on a category of the instances in the structured components and/or the values of the attributes of the instances in the structured components, as discussed further below.
FIG. 40 is a representation 4000 of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. In particular, representation 4000 illustrates a formulation of attribute suggestions using one implementation of process 3900 (FIG. 39). As shown, a preexisting structured presentation specifies a collection of instances 4005 (i.e., the instances "Philadelphia" and "Chicago.") Furthermore, different documents in an electronic document collection (e.g., collection 102) include different structured components 4010, 4015, 4020. Structured components 4010, 4015, 4020 can be identified as relevant to specified instances 4005 based on, e.g., the same instance identifiers "Philadelphia" and "Chicago" appearing therein.
As shown, structured components 4010, 4015, 4020 include a wide variety of different potential attribute suggestions based on different contexts. In particular, in the context of structured component 4010, the instances "Philadelphia" and "Chicago" are part of a tabular component that represents the properties of various cities. In the context of structured component 4015, the instances "Philadelphia" and "Chicago" are part of a structured component that represents part of the standings in the National League East sometimes in the 1970's. In the context of structured component 4020, the instances "Philadelphia" and "Chicago" are part of a tabular component that represents the properties of various films.
Rather than suggesting all the various attributes found in structured components 4010, 4015, 4020 to a user, attribute selections can be selected from components 4010, 4015, 4020 based on the attributes used to characterize those instances. In particular, as shown, preexisting structured presentation 106 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "year," "rating," and "box office receipts." Structured component 4010 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "population" and "area." Structured component 4015 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "wins", "losses," and "GB (i.e., games behind)." Structured component 4020 characterizes the instances "Philadelphia" and "Chicago" using values of the attributes "year," "runtime," and "rating."
A system can select from the attributes in structured components 4010, 4015, 4020 based on these characterized attributes. For example, the system can identify the correspondence between the attribute identifiers "year" and "rating" in preexisting structured presentation 106 and the attribute identifiers "year" and "rating" in structured component 4020 to select the attributes "director" and "runtime" as suggestions for addition to the preexisting structured presentation 106. As discussed in FIGS. 21-36 and the associated text, in some implementations, a system can also suggest or add additional instance identifiers. For example, structured component 4020 includes the instance identifiers "Peter Pan" and "Star Wars." Such a system can thus suggest these instance identifiers for inclusion in structured presentation.
In some implementations, even if attributes drawn from structured components 4010, 4015 are not suggested in a particular formulation, such attributes can be stored for use during future information requests. For example, even through the cities represented in structured component 4010 are not selected as attribute suggestions, these cities can be stored along with their respective attribute identifiers (e.g., "population" and "area") and attribute values in a data collection (such as, e.g., data center 208). When a subsequent user requests information regarding one or more cities, such a system can access this stored information and provide additional information to the user.
FIG. 41 is a flow chart of an example process 4100 for formulating attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. Process 4100 can be performed alone or in conjunction with other activities. For example, process 4110 can be performed at step 3810 in process 3800 (FIG. 38).
Process 4100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 4110 can be performed by search engine 202 in system 200 (FIG. T). The system performing process 4100 can identify one or more documents relevant to one or more specified instances (step 4105). For example, the system performing process 4100 can use string comparisons to match one or more of the specified instances and their attributes and/or values with documents in an electronic document collection such as electronic document collection 102. As another example, the system performing process 4100 can access stored information (e.g., information in data center 208) to identify electronic documents that are relevant to the specified instances.
The system performing process 4100 can extract a template of one or more of the identified documents (step 4110). A document template serves as a pattern for the arrangement of the content of individual documents in a subcollection of documents in an electronic document collection. The documents in a subcollection generally originate from a single source, e.g., a single commercial entity. For example, a bookseller can use a single document template as a pattern for the arrangement of content describing different books. As another example, a furniture retailer can use a single document template as a pattern for the arrangement of the content of fliers for different sofas. For example, the template of an electronic flyer for a sofa can specify the arrangement, on the flyer, of the brand name of the sofa, a picture of the sofa, an interactive element that allows the user to select the color in which the sofa is shown, a description of the sofa in text format, and a table that characterizes the sofa's dimensions, availability, and price. Document templates can thus organize information regarding an instance in conformity with a defined structure, much like a structured presentation.
In general, a document template can serve as a pattern for the entire content of an electronic document and, as discussed above, can even specify the arrangement of a structured component in a document. However, because document templates only specify the arrangement of the content of a subcollection of documents in an unstructured electronic document collection, the electronic document collection itself remains unstructured. For example, even if AMAZON.COM uses one template as a pattern for the arrangement of a description of every book that AMAZON.COM sells, BARNESANDNOBLE.COM and other booksellers do not necessarily use that same template as a pattern for the arrangement of descriptions of books that they sell.
Moreover, a document template can be changed without that change necessarily being propagated throughout the entire collection, or even a subcollection, of electronic documents.
FIG. 42 is a representation of a portion of a hypertext markup language (HTML) template 4200 that is used as a pattern for descriptions of a movie (i.e., the movie "Philadelphia." The hypertext markup language (HTML) code of template portion 4200 is both machine-readable and human-readable. For example, the HTML code of template portion 4200 can be used by a browser to generate a web page.
In the depicted example, template portion 4200 is split into two subsections 4205,
4210. Subsection 4205 serves as a pattern for the arrangement of text that identifies the movie "Philadelphia." Subsection 4210 serves as a pattern for the arrangement of various attribute identifiers and their values. In general, the patterns in subsections 4205, 4210 are repeated a number of times in a particular subcollection of documents in an electronic document to describe different movies. An HTML parser can be used to extract the formatting from template portion 4200 so the formatting can be used to identify documents having the same template. For example, the HTML tags <title>, <div>, other HTML tags, and their relative position to each other can be identified by an HTML parser. Such an HTML parser can determine that the HTML tag <title> appears before the HTML tag <div>. Thus, an HTML parser can extract the formatting from template portion 4200 from content that is arranged in accordance with the template.
Returning to FIG. 41, after extracting a template, the system performing process 4100 can identify one or more new attributes using the template (step 4115). For example, the system can identify the arrangement of attributes drawn from the preexisting structured display within the template. This arrangement can be used to infer other attributes. The system performing process 4100 can also formulate one or more attribute suggestions from the attributes identified using the template (step 4120). The templates themselves can thus be used to formulate the attribute suggestions. For example, HTML tags in template portion 4200 (FIG. 42) identify that the film entitled "Philadelphia (1993)" is characterized by the attributes "Director," "Writer," and
"Release Date." Any of these attributes can be used to formulate one or more attribute suggestions.
Moreover, in some implementations, additional content in a document template can be used in formulating attribute suggestions. For example, the value of an attribute can be used in formulating attribute suggestions. For example, if the value of a "year" attribute is, e.g., 1976, the attribute "start time" can be excluded from a collection of attribute suggestions for characterizing films.
FIG. 43 is a schematic representation of a process 4300 by which a collection of new attribute suggestions 915 can be formulated based on information in a preexisting structured presentation 106. Process 4300 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. 2).
Process 4300 performs an extraction operation 4305 on an instance/attribute collection 4310 based on the information in the preexisting structured presentation 106. Instance/attribute collection 4310 is a collection of information that associates instances with their attributes and, in some implementation, with the values of those attributes as well. The information in collection 4310 can be extracted from documents in an electronic document collection 102 either in response to receipt of a trigger (e.g., a search query) or in anticipation of receipt of a trigger, e.g., as part of a process of indexing electronic document collection 102. In some implementations, information in collection 4310 can include the content of previous structured presentations that were presented to the current user or to other users. In general, the attribute suggestions are provided to a user who selects attribute suggestions to be added to a structured presentation, such as described in steps 3815, 3820, 3825 (FIG. 38).
The association between instances and their attributes can be established in collection 4310 by structuring the information storage within collection 4310. FIG. 44 is a schematic representation of a table 4400 that associates attributes and instances in collection 4310. Table 4400 includes a collection of records 4402, 4404, 4406, 4408 4410, 4412, 4414, each of which associates an identifier of an instance with descriptions of a document location and the attributes that characterize the identified instances in those documents. The information in records 4402, 4404, 4406, 4408 4410, 4412, 4414 can be organized in a collection of columns 4415, 4420, 4425, 4430, 4435, 4440. In particular, column 4415 can include instance identifiers. Column 4420 can include a description of the location of an electronic document that includes the instance identified in column 4415. Columns 4425, 4430, 4435, 4440 can identify attributes that characterize the instances identified in column 4415 in the document whose location is described in column 4420.
As shown, different electronic documents can include different categories and amounts of information characterizing the same instance. For example, the document whose location is identified in column 4420 of record 4404 includes two attributes of an instance "INSTANCE_2," whereas the document whose location is identified in column 4420 of record 4410 includes three attributes of an instance "INSTANCE_2." Moreover, the attributes in record 4404 (i.e., attribute "ATTR_5" and attribute "ATTR_6") differ in part from the attributes in record 4410 (i.e., attribute "ATTR_5," attribute "ATTR_8," and attribute "ATTR_9.")
Data collections 4310 that associate attributes and instances (such as table 4400) can be formed in a number of different ways. For example, documents that include internal, structured components can be identified. Examples of such internal, structured components include tables and lists that appear in HTML documents. The relationships between attributes and instances in these internal structured components can be copied to form data collections 4310.
As another example, collection 4310 can be formed from the content of previous structured presentations that were presented to the current user or to other users. As yet another example, once a first document has been identified as including attributes and instances, the template of that document can be used to extract attributes and instances from other documents that include the same template. For example, if a stereo retailer uses the same document template to describe different stereos that are offered for sale, the arrangement of information in a first electronic document regarding a first stereo can be used to extract information from other electronic documents that regard other stereos.
In some implementations, techniques such as natural language parsing can be used to identify instances and attributes. For example, electronic documents can be parsed to identify phrases such as "[Instance] has a/an [attribute]" in electronic documents.
In some implementations, data collection 4310 can categorize instances and their attributes. For example, instances and attributes can be categorized as North American cities, National League East teams, or popular movies. The storage of information in data collection 4310 can be based on such categorizations. For example, different categories can be stored in different files, records, or the like.
Returning to FIG. 43, process 4300 suggests one or more new attributes based on information presented in the preexisting structured presentation 106. For example, if the structured presentation includes a number of instances corresponding to certain movies, the system 200 can suggest additional attribute of movies according to information drawn from data collections 4310. That is, the system 200 can identify and suggest additional attributes based on the attributes being used to characterize the same instances. For example, the system 200 may suggest other attributes that are commonly used to characterized movies, such as show times, theaters, or run times.
FIG. 45 is a flow chart of a process 4500 for formulating attribute suggestions from a collection of instances and attributes based on characteristics of a preexisting structured presentation. Process 4500 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 4500 can be performed by the search engine 202 in system 200 (FIG. 2).
Process 4500 can be performed alone or in conjunction with other activities. For example, process 4500 can be performed during process 4300 (FIG. 43). As another example, process 4500 can be performed at step 3810 in process 3800 (FIG. 38), alone or in conjunction with one or both of processes 3900, 4100 (FIGS. 39, 41).
The system performing process 4500 can access a collection of instances and their attributes (step 4505). For example, the system performing process 4500 can access instance and attribute collection 4310 (FIG. 43) stored in data center 208 (FIG. T).
The system performing process 4500 can identify one or more relevant attributes based on characteristics of instance attributes specified in the preexisting structured presentation (step 4510). For example, the system can compare instance attributes of instances stored in the instance and attribute collection 4310 (FIG. 43) with instances specified in the structured presentation 106. The system can use the comparison to determine which, if any, of the stored instances share attribute identifiers, or related attributes, with the attributes specified in the preexisting structured presentation 106. For example, suppose that a preexisting structured presentation 106 uses the attributes "ATTR_3" and "ATTR_5" to characterize a collection of instances. Upon review of an instance and attribute collection 4310 such as table 4400 (FIG. 44), the system can suggest the attributes "ATTR_7" and "ATTR_7" based on their use in conjunction with "ATTR_3" and "ATTR_5" in characterizing instances "INSTANCE_1" and "INSTANCE_2" in records 4402, 4406.
FIG. 46 is a flow chart of a process 4600 for formulating a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106. Process 4600 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T). Process 4600 can be performed alone or in conjunction with other activities. For example, process 4600 can be performed during process 4300 (FIG. 43). As another example, process 4600 can be performed at step 3810 in process 3800 (FIG. 38), alone or in conjunction with one or more of processes 3900, 4100, 4500 (FIGS. 39, 41, 45). In some implementations, process 4600 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process). For example, process 4600 can be performed by search engine 202 in response to receiving a search string.
The system performing process 4600 can identify one or more authoritative sources regarding one or more specified instances (step 4605). For example, the system can access a collection of authoritative sources of documents in electronic document collection 102 that has been assembled, e.g., by a programmer.
As another example, the system can receive user-specific input identifying one or more authoritative sources of documents in electronic document collection 102 as "authoritative" in the view of that user. For example, a display screen 104 that displays a preexisting structured presentation 106 can include a GUI component that allows a viewer to specify authoritative sources of documents. The identification of an authoritative source can be received in conjunction with a search query. For example, a viewer can identify JD POWER AND ASSOCIATES, AMAZON.COM, and MAJOR LEAGUE BASEBALL as authoritative sources of the documents found at http://www.jdpower.com/, http://www.amazon.com/, and http://www.mlb.com/, respectively. In some implementations, the user-specific input can identify the subject mater on which a source is authoritative. For example, MAJOR LEAGUE BASEBALL may be identified as an authoritative source for baseball statistics, but may not be considered an authoritative source for information regarding drug testing.
As yet another example, the system performing process 4600 can analyze a collection of user- specific input identifying authoritative sources from multiple users to assemble a "generic" collection of authoritative sources. For example, a large number of users may identify the AMERICAN AUTOMOBILE ASSOCIATION (AAA) as authoritative. Based on a statistical analysis of these identifications, the AAA can then be added to a collection of authoritative sources.
The system performing process 4600 can determine additional attributes from the authoritative sources for instances that are specified in the preexisting structured presentation (step 4610). For example, the system can access documents provided by an authoritative source and identify one or more documents that characterize specified instances using one or more attributes. The system can extract attribute identifiers from these documents using a parser or other string comparison techniques.
As another example, the system can access a data collection that associates attributes and instances, such as table 4400 (FIG. 44). The system can filter records such as records 4402, 4404, 4406, 4408 4410, 4412, 4414 based on both the instances identified in the preexisting structured presentation and whether or not the documents whose location is identified in records 4402, 4404, 4406, 4408 4410, 4412, 4414 originated from an authoritative source. For example, if AMAZON.COM is an authoritative source, a collection that associates attributes and instances can be scanned to identify documents with the http://www.amazon.com/ domain.
The system performing process 4600 can compare these additional instances with attributes in an instance and attribute collection such as table 4400 (FIG. 44) (step 4615). For example, the system can use string comparisons, or other comparison techniques, to compare the additional instances with instances stored in the instance and attribute collection.
The system performing process 4600 can identify an attribute in the instance and attribute collection based on the results of these comparisons (step 4620). FIG. 47 is a flow chart of a process 4700 for identifying related instances for use in formulating attribute suggestions based on information in a preexisting structured presentation 106. Process 4700 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. 2). Process 4700 can be performed alone or in conjunction with other activities. For example, process 4700 can be performed during process 1500 (FIG. 15). As another example, process 4700 can be performed at step 3810 in process 3800 (FIG. 38), alone or in conjunction with one or more of processes 3900, 4100, 4500, 4600 (FIGS. 39, 41, 45, 46). In some implementations, process 4700 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process). For example, process 4700 can be performed by search engine 202 in response to receiving a search string. The system performing process 4700 can identify one or more related instances based on attributes and/or attribute values that characterize the instances specified in a preexisting structured presentation. For example, the system can identify related instances by comparing attribute values of specified instances with attribute values of other instances to determine if the other instances are related. Such comparisons can require, e.g., that the attribute values be identical or that the attribute values fall within a certain range. Such a range can be determined, e.g., based on the range of attribute values that are specified by a user over an interactive element or that already characterize instances in a preexisting structured presentation. In some implementations, the system performing process 4700 can convert attribute values into a common unit of measurement prior to comparing the attribute values. For example, if the specified unit of measurement is in feet, but one or more extracted attribute values is in meters, the system can convert the values in meters into feet using conventional techniques. FIG. 48 is a flow chart of a process 4800 for formulating a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106. Process 4800 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 480 (FIG. 2).
Process 4800 can be performed alone or in conjunction with other activities. For example, process 4800 can be performed during process 1500 (FIG. 15). As another example, process 4800 can be performed at step 3810 in process 3800 (FIG. 38), alone or in conjunction with one or more of processes 3900, 4100, 4500, 4600, 4700 (FIGS. 39, 41, 45, 46, 47). In some implementations, process 4800 can be performed in response to receiving input (e.g., from a user of the system 200 or from another system or process). For example, process 4800 can be performed by search engine 202 in response to receiving a search string.
The system performing process 4800 can access categorized collections of instances and attributes (step 4805). For example, the system can access the instance and attribute collection 1510 to access one or more categorized collections of instances and attributes generated during previous searches.
The system performing process 4810 can identify a category that includes the specified instances (step 4810). In some implementations, the system can identify the category that includes the instances based on similar attributes, similar attribute values, combinations of these characteristics, and/or other techniques. The system performing process 4800 can select one or more attribute suggestions from the identified category (step 4815). For example, in some implementations, attribute suggestions can be selected from the identified category based on the number of times that the attributes are used to characterize the instances in the category.
FIG. 49 is a representation 4900 of a formulation of attribute suggestions from electronic documents in an electronic document collection based on characteristics of a preexisting structured presentation. In particular, representation 4900 illustrates a formulation of attribute suggestions using one implementation of process 4800 (FIG. 48). As shown, a preexisting structured presentation specifies a collection of instances 4005 (i.e., the instances "Philadelphia" and "Chicago.") Furthermore, instances drawn from different documents in an electronic document collection (e.g., collection 102) have been categorized into different collections 4910, 4915, 4920. Categorized instance collections 4910, 4915, 4920 can be identified as relevant to specified instances 4005 based on, e.g., the same instance identifiers "Philadelphia" and "Chicago" appearing therein.
As shown, categorized instance collections 4910, 4915, 4920 have been categorized in a variety of different ways. In particular, categorized instance collection 4910 has been categorized as a collection of "North American Cities." Categorized instance collection 4915 has been categorized as a collection of "National League East Teams." Categorized instance collection 4920 has been categorized as a collection of "Popular Movies." In the context of system 200, categorized instance collections 4910, 4915, 4920 can be stored in the data center 208 (FIG. 2). That is, the system 200 can generate one or more categories of instances based on previously received search strings. Thus, after search engine 202 conducts a search based on a search string, search engine 202 can categorize the results and store them in data center 208. These categorized results can be accessed and analyzed during subsequent searches to generate attribute suggestions.
A categorized instance collection that includes the instances specified in a preexisting structured presentation can be identified, e.g., based on a similarity between the attributes that characterize the specified instances and the attributes that characterize the instances in the different categories. For example, the common use of the attributes "year" and "rating" the preexisting structured presentation and categorized instance collection 4920 can be used to identify that categorized instance collection 4920 includes instances 4005.
In some implementations, a subset of the attributes in a categorized instance collection can be selected as attribute suggestions based on the attributes that characterize the instances in a category. For example, the use of the attribute "Start time" to characterize movie instances can be taken as an indication that only information about currently playing movies is to be included in a structured display. Thus, attributes such as "playing at" and "coupons available" can be included in a list of attribute suggestions. As another example, the attribute "year" can be excluded from a list of attribute suggestions based on the use of the attribute "Start time" to characterize movie instances in a preexisting structured display. FIG. 50 is a schematic representation of a collection 5000 of processes that can be used to formulate a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106.
The processes in collection 5000 can be thought of as filters that are applied in succession to a large collection of potential attributes 5005 to yield a smaller collection 5010 of attribute suggestions. Each filter can exclude potential attributes 5005 from an attribute suggestion collection 5010 that can be presented to a user or added directly to a structured presentation.
Filters 5015, 5020, 5025 can be applied in any order. However, in general, filters 5015, 5020, 5025 are applied in order of granularity. In particular, the filter 5015, 5020, 5025 are that reduces the number of potential attributes by the greatest amount is applied first and the filter 5015, 5020, 5025 are that reduces the number of potential attributes by the smallest amount is applied last.
In some implementations, any of filters 5015, 5020, 5025 can be omitted from collection 5000 and/or additional filters added to collection 5000. For example, a user- specified filter that can filter the potential attributes 5005 according to input provided by the user can be added to collection 5000.
FIG. 51 is a flow chart of a process 5100 for formulating a collection of new attribute suggestions 915 based on information in a preexisting structured presentation 106. Process 5100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions, e.g., a system 200 (FIG. T). Process 5100 can be performed alone or in conjunction with other activities. For example, process 5100 can be performed iteratively in conjunction with one or more of the processes in process collection 5000. In some implementations, process 5100 can be performed in response to receipt of a search string.
The system performing process 5100 can make an initial match between the instances specified in a preexisting structured presentation 106 and attributes drawn from a document collection (step 5105). The initial match can be based on one or more of the filtering processes in process collection 5000. The system performing process 5100 can determine whether the number of matches is too high, too low, or appropriate (step 5110). If the number of matches is too low, the system can broaden names of specified instances (step 5115). For example, the system performing process 5100 can user alternate spellings, abbreviations, synonyms, alternative names, nicknames, and/or other keywords for the specified instances in one or more of the processes in process collection 5000.
The system performing process 5100 can also reduce the number of attributes and/or instances used in any related instance and/or attribute filtering 5020 (FIG. 50) (step 5120). The number of attributes and/or instances can be reduced based on, e.g., the number of potential instances excluded by a particular attribute and/or instance. For example, if the requirement that a specific instance be characterized by an attribute excludes all of the potential attributes, then this instance can be omitted from any related instance and/or attribute filtering. The attributes and/or instances to be removed can be determined, e.g., automatically, without user input, or based on input received from a user.
The system can again seek to make a match between the instances specified in a preexisting structured presentation 106 and instances drawn from a document collection, but this time using the changed parameters (step 5122). This match can also be made using one or more of the filtering processes in process collection 5000.
If number of matches is determined to be too high (step 5110), the system performing process 5100 can increase the number of attributes and/or instances used in any related attribute and/or instance filtering 5020 (FIG. 50) (step 5128). The number of attributes and/or instances can be increased based on, e.g., the number of potential attributes excluded by a particular attribute and/or instance. The attributes and/or instances to be added can be determined, e.g., automatically, without user input, or based on input received from a user. For example, instances to be added can be determined using process 4800 (FIG. 48).
The system performing process 5100 can winnow the matches based on the changed parameters (step 5130). In particular, the narrowed ranges and/or increased numbers of instances can be used in any related value filtering 5025 (FIG. 50).
If the number of matches is determined to be acceptable (step 5110), the system performing process 5100 can suggest the matched attributes to a user (step 5135). For example, the system performing process 5100 can present one or more attributes suggestions in a GUI on a display screen, e.g., display screen 104.
FIG. 52 is a schematic representation of a system 5200 in which attribute values 307 drawn from two or more electronic documents in electronic document collection are presented to a user in a structured presentation. In addition to electronic document collection 102, display screen 104, and data communication path 108, system 5200 includes a structured data 5205 and a merge module 5210. In operation, system 5200 extracts attribute values from an unstructured collection of electronic documents in electronic document collection 102 and merges that information with information drawn from structured data 5205 to populate structured presentation 106.
System 5200 can populate all or only a fraction of structured presentation 106 with attribute values. There are many circumstances in which only a fraction of a structured presentation may be populated with attribute values. For example, the population may be part of the addition of new instances (and hence new structured records) to structured presentation 106. As another example, the population may be part of the addition of new attributes to structured presentation 106. As yet another example, the population may be part of the refinement of a fraction of the existing attribute values in structured presentation 106. For example, some fraction of the original attribute values can be checked for accuracy or to ensure that the characterized instances haven't changed. Structured data 5205 is a structured collection of information. The information in structured data 5205 can be organized in accordance with a defined data model. For example, structured data 5205 can be organized in accordance with a hierarchical or a relational data model and stored in a data storage device. In some instances, all or a portion of structured data 5205 can be presented to a user in a structured presentation. For example, in some implementations, structured data 5205 can be a pre-existing structured presentation 106 that is presented to a user on the same display screen 104 on which the structured presentation 106 that is populated with new attribute values drawn collection 102 is to be presented. Merge module 5210 is a collection of one or more sets of machine-readable instructions deployed on one or more data processing devices. Merge module 5210 can include functionality for identifying new attribute values as well as their disposition within the structured presentation 106 that is to be populated therewith. The operations performed by merge module 5210 are described in further detail below. FIG. 53 is a schematic representation of an implementation of system 5300 in which attribute values drawn from two or more electronic documents in electronic document collection 102 are presented to a user in a structured presentation. In addition to electronic document collection 102, display screen 104, data communication path 108, search engine 202, crawler 204, and compressing, indexing and ranking modules 210, system 5300 also includes an attribute/value/instance collection 5305 at data center 208.
Attribute/value/instance collection 5305 is a collection of information that associates instances with their attributes, as well as the values of those attributes. The information in collection 5305 can be extracted from electronic documents in collection 5302 either in response to receipt of a trigger (e.g., a search query) or in anticipation of receipt of a trigger, e.g., as part of a process of indexing electronic document collection 102.
The association between instances, attributes, and their values can be established in collection 5305 by structuring the information storage within collection 5305. For example, FIG. 54 is a schematic representation of a table 5400 that can associate attributes, values, and instances in collection 5305 (FIG. 53). Table 5400 includes a collection of records 5402, 5404, 5406, 5408, 5410, each of which associates an identifier of an instance with descriptions of a document location and the attributes and values that characterize the identified instances. The information in records 5402, 5404, 5406, 5408, 5410 can be organized in a collection of columns 5415, 5420, 5450, 5430, 5435, 5440, 5445, 5450. In particular, column 5415 can include instance identifiers. Column 5420 can include a description of the location of an electronic document that includes the instance identified in column 5415. Columns 5425, 5435, 5445 can include descriptions of attributes that both characterize the instances identified in column 5415 and that are themselves characterized by a value in the document whose location is described in column 5420. Columns 5430, 5440,
5450 can include descriptions of the values that characterize the attributes described in columns 5425, 5435, 5445 of the instances identified in column 5415 in the documents whose location is described in column 5420.
As shown, different electronic documents can include different categories and amounts of information characterizing the same instance. For example, the document whose location is identified in column 5420 of record 5404 includes information characterizing three attributes of an instance "INSTANCE_1," whereas the document whose location is identified in column 5420 of record 5406 includes information characterizing two attributes of an instance "INSTANCE_1." Moreover, the attributes characterized in record 5404 (i.e., attribute "ATTR_5," attribute "ATTR_6," attribute "ATTR_7") differ from the attributes characterized in record 5406 (i.e., attribute "ATTR_3," attribute "ATTR_4."
Further, the values used to characterize even the same attribute of the same entity can differ in different electronic documents. For example, the document whose location is identified in column 5420 of record 5402 includes a value "VALUE_3A" characterizing the attribute "ATTR_3" of instance "INSTANCE_1," whereas the document whose location is identified in column 5420 of record 5406 includes a value "VALUE_3B" characterizing the same attribute "ATTR_3" of the same instance "INSTANCE_1."
There are a number of different potential sources of such discrepancies between the values that characterize the same attribute of the same entity in different documents. For example, a document can include false information that mischaracterizes the attributes of an entity. In addition to inadvertent errors, the values of an attribute may change over time.
Examples of this include, e.g., the value of the "height" attribute of a high school basketball player instance, the value of a "list price" attribute of a house instance, or the value of the "mayor" attribute of a city instance. Some documents may be updated with the correct value whereas other documents may retain the original — but now incorrect — value. Moreover, even completely accurate documents can characterize the same attribute of the same entity in different ways. For example, different documents can use different units to express the same value. As another example, different documents can express the same value with different precision (e.g., "about a two hour drive to Phoenix" versus "a 130 minute drive to Phoenix at the posted speed limits"). Such discrepancies are especially endemic in the context of an unstructured electronic document collection, e.g., document collection 102. In this regard, as discussed above, different documents can be added to collection 102 by different users who present information differently. The users who add the documents do not collaborate to ensure that information is presented in a consistent manner, nor is there a formal mechanism for ensuring that the presentation of information in different documents remains unchanged.
Data collections 5305 that associate attributes, values, and instances (e.g., table 5400) can be formed in a number of different ways. For example, documents that include internal, structured components can be identified. Examples of such internal, structured components include tables and lists that appear in HTML documents, and the like. The relationships between attributes, values, and instances in these internal structured components can be copied to form data collections 5305.
As another example, once a first document has been identified as including attributes, values, and/or instances, the template of that document can be used to extract attributes, values, and instances from other documents that include the same template. For example, if a stereo retailer uses the same document template to describe different stereos that are offered for sale, the arrangement of information in a first electronic document regarding a first stereo can be used to extract information from other electronic documents that regard other stereos. As still another example, the template of a single document can be used to extract attributes, values, and/or instances from that document. For example, the template can specify an arrangement of several attribute and values that characterize those attributes relative to an identifier of an instance. If some of those attributes and/or values are known, then the arrangement of those known attributes and/or values can be identified and used to identify other attributes and/or values.
For example, the template of a single webpage may specify the arrangement of the attribute/value pairs "Director: Orson Welles," "Writer: Orson Welles, Herman J. Mankiewicz," and "Release Date: May 1, 1941" relative to an identifier of the movie instance "Citizen Kane." If the attributes and values "Director: Orson Welles" and "Release Date: May 1, 1941" were already known, the arrangement of those attributes and values relative to the movie instance identifier "Citizen Kane" can be used to extrapolate the attribute/value pair "Writer: Orson Welles, Herman J. Mankiewicz."
In some implementations, techniques such as natural language parsing can be used to identify instances, attributes, and their values. For example, electronic documents can be parsed to identify phrases such as "[Instance] has a/an [attribute] of [value]" in electronic documents.
FIG. 55 is a flow chart of an example process 5500 for presenting attribute values drawn from two or more electronic documents in an electronic document collection to a user in a structured presentation. Process 5500 can be performed by one or more computers that perform operations by executing one or more sets of machine-readable instructions. Process 5500 can be performed in isolation or in conjunction with other data processing activities. For example, process 5500 can be performed as part of process 600 (FIG. 6).
The system performing process 5500 can receive an instance identifier and an attribute identifier (step 5505). The system performing process 5500 can receive the instance identifier and the attribute identifier directly from a user (e.g., in the form of a search query) or indirectly (e.g., as part of a structured data collection 905 (FIG. 9)).
The system performing process 5500 can identify electronic documents relevant to the received instance that include values of the attribute (step 5510). For example, the system can access an attribute/value/instance collection 5405 in a data center 208 (FIG. 53) to identify the relevant electronic documents. As another example, a search engine (e.g., search engine 202) can perform keyword searches using the instance and attribute identifier to identify relevant documents. In some cases, such keyword searches can be supplemented with language parsing or other techniques that facilitate the identification of values. The system performing process 5500 can establish a subset of the values for the identified attribute of the identified instance for presentation in a structured presentation (step 5515). The subset of the values can include one or more values that are thought to be appropriate, or likely to be appropriate, for populating the structured presentation. In particular, the subset of the values(s) can be considered to characterize the identified attribute of identified instance both accurately and consistently with the desires of the viewer of the structured presentation. As discussed further below, the desires of the viewer of the structured presentation can be ascertained, e.g., based on a selection of a value received from the viewer or based on the characterization of the same or other attributes of the same or other instances in a preexisting structured collection of information such as, e.g., structured data 905 (FIG. 9).
The system performing process 5500 can provide instructions for displaying structured presentation populated by the subset of values (step 5420). Thus, a structured presentation can be presented based on information gathered from a collection of electronic documents (i.e., the subset of values gathered from an electronic document collection) (step 615, FIG. 6).
In some implementations, process 5500 can be performed a number of times, e.g., for a number of instance identifiers and/or attribute identifiers.
FIG. 13 is a flow chart of a process 1300 for establishing one or more values for presentation in a structured presentation. Process 1300 can be performed in isolation or in conjunction with other activities. For example, process 1300 can be performed at step 5515 in process 5500.
The system performing process 1300 can group values of an attribute from two or more documents in an electronic document collection into two or more groups (step 1305). The grouped values can be drawn directly from the electronic documents or drawn from a description of the content of the electronic documents, such as an association of attributes, values, and instances like table 5400 (FIG. 54).
The system can group values using one or more different standards for determining when values are to be grouped. For example, one standard can require that grouped values be identical. For example, two values "4" can be grouped. Another standard can require that numerical values be within a certain range of being identical. For example, the values "3.14" and "3.14159" can be grouped. Another standard can require that term values be identical or have an identical meaning. For example, the terms "czar," "czar," and "tsar" can all be grouped. Another standard can require that term values express the same concept in an ontology of concepts. For example, the terms "pink" and "mauve" can be grouped. Another standard allows values written in different formats to be grouped. For example, the dates "July 25, 1982" and "7/25/1982" can be grouped. Another standard allows values written in different units to be grouped. For example, the units of measure "Im" and "100cm" can be grouped. Another standard allows values written in different formats to be grouped. For example, the dates "July 25, 1982" and "7/25/1982" can be grouped. Another standard allows values written in written in different, but approximately equal, units to be grouped. For example, the units of measure "Im" and "39 inches" can be grouped.
Using the grouping(s), the system performing process 1300 can perform one or more of the following subprocesses in any order to select one group, and hence select a subset of the values from a collection of attribute values.
In a first subprocess, the system performing process 1300 can select the group with the highest "value" for presentation in a structured presentation (step 1310). In some implementation, the "value" of a group is reflects the count of values in that group. In statistical terminology, the system performing process 5500 can select values with high frequencies in the electronic document collection. In effect, this approach allows the documents in an electronic document collection to "vote" on the values of an attribute of an instance.
In other implementations, the "value" of a group not only reflects the count of values but also weights or scores individual counts by parameters that reflect a characteristic of the document from which the values are drawn. For example, a count can be weighted based on, e.g., a page rank of the document from which the values are drawn, a weighting factor for that document provided by a user, the number of other values that have been drawn from that document, or the "age" of the document. For example, documents that have been created more recently can be considered to more accurately characterize the attributes of certain instances.
The approach of this subprocess is effective at eliminating inadvertent mischaracterizations of attributes, e.g., when the value on one electronic document is a typographic error. However, in isolation, this approach can under certain circumstances select inappropriate values. For example, even though a large number of documents characterizes a volume attribute in liters, the viewer may be interested in having that attribute characterized in gallons in a structured presentation.
In a second subprocess, the system performing process 1300 can receive a user specification of a constraint on, e.g., a range of an acceptable value or a unit of an acceptable value (step 1315). For example, the system can provide a GUI component at a display screen, e.g., display 104 (FIG. 1) that allows the user to select a range of values or a unit of measurement constraint. The constraint can be open-ended (e.g., "a value >1") or closed (e.g., "a value between 1 and 10.").
After receiving the constraint, the system performing process 1300 can select the group meeting the received constraint for presentation in a structured presentation (step
1320). For example, if the user selects "meters" as the appropriate unit of measurement, the system performing can select one or more groups of values that are expressed in meters.
The approach of this subprocess is effective at ensuring that the values presented in a structured presentation are presented in an organized, systematic arrangement. For example, the units of measure of the value used to characterize, e.g., Michael Jordan's height can be constrained to be identical to the units of measure of the value used to characterize Magic Johnson's height. Such an organized, systematic arrangement allows a user to compare values of the same attribute of different instances easily, without concern as to units in which the values are presented. In a third subprocess, the system performing process 1300 can determine a "quality" of the documents from which the attribute values in each group were drawn (step 1325). The "quality" of a document can reflect the likelihood that the information in the document is accurate and does not mischaracterize a value of an attribute. For example, commercial suppliers of goods generally provide accurate information regarding those goods. Hence, the "quality" of information provided by a commercial supplier can be considered higher than the "quality" of information provided by an individual. As another example, bias can be considered in determining the quality of the documents from which the information is drawn. For example, information drawn from an allegedly independent source (such as, e.g., the Congressional Budget Office) can be considered to be higher quality than information drawn from a political party.
As another example, the quality of a document can be based on a specification of the quality of a source of the document, or the document itself, by a user. For example, a user can indicate that automobile reliability information drawn from the Consumer Union (the makers of Consumer Reports) is high quality but that automobile reliability information drawn from Road and Track magazine is not. The system performing process 1300 can also select the group that includes values drawn from the highest quality document(s) (step 1330).
The approach of this subprocess is particularly effective in circumstances where the value of an attribute is the subject of debate. For example, there are disagreements over a variety of values, such as the true height of a collegiate point guard, the best cheesesteak in Philadelphia, and the number of stars awarded to a particular hotel. By allowing a viewer to specify the preferred "high quality" providers of values that characterize such attributes, a structured presentation can be tailored to the expectations of the viewer.
As discussed above, combinations of these and other activities can be performed in order to select one or more values for presentation in a structured presentation. For example, all the groups of values that represent some percentage (e.g., >10%) of the total number of values can be selected in a first screening (step 1310), and these groups can subsequently be further winnowed based on a unit specification (step 1320). The remaining groups can further be winnowed based on the quality of the documents from which the attribute values in each group were drawn (step 1330). Thus, in some implementations, process 1300 can provide one or more values from a remaining group that are free from mischaracterizations, with consistent units of measurement, and drawn from sources that the viewer prefers.
FIG. 57 is a flow chart of a process 5700 for selecting one or more values for presentation in a structured presentation. Process 5700 can be performed in isolation or in conjunction with other activities. For example, process 5700 can be performed at step 5515 in process 5500 (FIG. 55), alone or in conjunction with one or more of the subprocesses of process 5600 (FIG. 56).
The system performing process 5700 can group values of an attribute extracted from two or more documents in an electronic document collection into two or more groups (step 5605). The system performing process 5700 can present descriptions of the groups of values to the user (step 5705). For example, the system can display the most common value in each group, or a list of the some of the values in each group, to the user at a display, e.g., at display screen 104 (FIG. 1). In some implementations, the descriptions of the groups of values can include additional information that characterizes the groups. For example, a number count of the number of values in each group can be displayed, a percentage that reflects the percent of the extracted values that are found in each group can be displayed, and/or a description of the units of measure in the group can be displayed. As another example, an estimate of the quality of the electronic documents from which the values in each group were extracted can be displayed. As yet another example, the identity, location, and/or snippets or other excerpts of documents from which the values in each group were extracted can be displayed.
In some implementations, the descriptions of the groups of values are sorted in a confidence-based ordering. That is, the descriptions of the groups of values are ordered according to how confident the system performing process 5700 is as to the accuracy of the value(s) in each group. The confidence in the accuracy of the value(s) in each group can be determined based on, e.g., the number of values in each group, the quality of the documents from which the values were extracted, and the like.
The system performing process 5700 can receive user selection of a desired group of values (step 5710). For example, the system can receive user interaction that identifies a selection of a desired value group.
In some implementations, the system performing process 5700 can also change other aspects of the structured presentation based on the user selection (step 5715). For example, if a user selects a group of values with a unit of measurement in meters, and there are other values that characterize the same attribute of other instances but that are presented with units of measurement in feet, such values can be converted in the structured presentation 106 to be presented in meters.
FIG. 58 is a flow chart of an example process 5800 for selecting one or more values for presentation in a structured presentation. Process 5800 can be performed in isolation or in conjunction with other activities. For example, process 5800 can be performed at step 5515 in process 5500 (FIG. 55), alone or in conjunction with one or more of the subprocesses of process 5600 (FIG. 56) and/or process 5700 (FIG. 57).
The system performing process 5800 can identify electronic documents in the electronic document collection that are relevant to the instances and other attributes in a structured data collection, e.g., structured data collection 905 (step 5805). As discussed above, structured data collection 905 can be a version of a structured presentation 106.
Documents that are relevant to the instances and other attributes in a structured data collection can be identified in a variety of ways. For example, the system performing process 5800 can access a data collection that associates instances, their attributes, and values characterizing those attributes, e.g., attribute/value/instance collection 5305 (FIG. 53). Documents that include information relevant to the instances and other attributes in a structured data collection can be identified therein, e.g., by comparing the identifiers of the instances and the attributes in both data collections. As another example, the system performing process 5800 can use the identifiers of the instances and the attributes as search terms in one or more search queries. Such search queries, alone or in conjunction with other extraction techniques such as language parsing and string comparisons, can be used to identify relevant documents in an electronic document collection.
The system performing process 5800 can also select one or more values for presentation in a structured presentation from the identified documents (step 5810). FIG. 59 is a schematic representation of a circumstance in which attribute values drawn from electronic documents in electronic document collection 102 are presented to a user in a structured presentation 106. In particular, a system such as system 900 (FIG. 9) draws attribute values from a table 5400 that associates attributes, values, and instances drawn from electronic documents in electronic document collection 102. The system also merges those attribute values with an initial structured presentation 106 to form a final structured presentation 106. The initial structured presentation 106 is thus acting as structured data 905 (FIG. 9).
In the illustrated example, the initial structured presentation has been modified to associate values of a new attribute (i.e., the attribute "AIRPORT") with instances identified in the structured presentation. In particular, a new column 5905 has been added to the initial structured presentation. Column 5905 is headed by an attribute identifier 5910 that identifies the new attribute using the term "AIRPORT." The addition of values of a new attribute to the structured presentation can be triggered, e.g., based on interaction with a user or automatically, as discussed further FIGS. 9-20 and the associated text.
A system such as search engine 202 can access a data collection that associates attributes, values, and instances drawn from electronic documents in electronic document collection 102 (such as table 5400). Using such a data collection, the system can select one or more values that characterize the new attribute of one or more of the instances in the initial structured presentation. For example, in the illustrated circumstance, value 5915 (i.e., the value "value_ai") characterizes the attribute "AIRPORT" of the instance "NEW YORK" in the document "DOC_3." If necessary, the system can select one or more values of the new attribute for display, e.g., using one or more of processes 5600, 5700, 5800 (FIGS. 56, 57, 58).
After a value that characterizes the new attribute of one or more of the instances in the initial structured presentation has been selected, a final structured presentation 106 can be presented to a viewer. The final structured presentation 106 can include the selected values that characterize the new attribute of one or more of the instances in the structured presentations. For example, as shown, value 5915 can be presented in final structured presentation 106 to a viewer.
FIG. 60 is a schematic representation of a process 6000 in which both attributes and attribute values are drawn from electronic documents in an electronic document collection and presented to a user in a structured presentation. In process 6000, an initial structured data collection 905 can include an preexisting structured presentation 6005. The preexisting structured presentation 6005 can characterize instances using one or more attribute values, e.g., the attribute values in column 6010. New attributes that characterize the instances in preexisting structured presentation 6005 can be formulated based on the content of electronic documents in electronic document collection 102, as described in FIGS. 37-51 and the associated text. The new attributes can be added at step 6015 to preexisting structured presentation 6005 and appear as part of a structured presentation 6020. New values of such attributes can be formulated based on the content of electronic documents in electronic document collection 102, as described herein. The new values can be added at step 6025 to preexisting structured presentation 6005 and appear as part of a structured presentation 6020. In particular, in the illustrated example, a new column 6030 can include an new attribute identifier 308 (namely, attribute identifier 6035) that identifies the new attribute and a new collection of attribute values 307 (namely, attribute values 6040, 6045) that characterize the new attribute. In effect, the contents of preexisting structured presentation 6005 have been merged with content drawn from electronic document collection 102.
FIG. 61 is a flow chart of a process 6100 for adding values to a structured presentation based on the content of documents in an electronic document collection. Process 6100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 6100 can be performed by the search engine 202 in system 200. Process 6100 can be performed in isolation or in conjunction with other activities. For example, process 6100 can be performed as part of one or more of processes 600, 700, 800 (FIGS. 6, 7, 8).
In some implementations, process 6100 can be performed in response to receiving input, e.g., from a user or from another system or process that triggers the creation of a new structured presentation or an update of the structured presentation. For example, process 6100 can be performed in response to a user request that one or more new attributes be added to a structured presentation 106. As another example, process 6100 may be performed by a search engine, e.g., search engine 202 (FIG. 2), in response to receipt of a search query. The system performing process 6100 can receive a specification of an instance and an attribute in a structured presentation (step 6105). The structured presentation can be a new or a preexisting structured presentation. For example, the system can receive a search query specifying instances, or a category of instances, that are to be characterized in a structured presentation. As another example, a user can interact with a preexisting structured presentation to specify an instance, and attribute, or both. User interaction with a preexisting structured presentation can specify an instance and/or an attribute inherently or manually. Inherent specification draws upon the systematic arrangement of instance and attribute identifiers in a structured display so that user interaction with a cell specifies an instance and an attribute associated with that cell.
In contrast, in manual specification, a user manually identifies which cells include the identifiers of instances and attributes that are associated with a cell. For example, a user can enter a search query into a cell that specifies the arrangement of an instance identifier, an attribute identifier, or both within the structured presentation. For example, a search query that includes the formula "(CELL_1, CELL_2)" can specify that this cell is associated with the attribute identified in cell "CELL_2" of the instance identified in cell "CELL_1" and that a search for this attribute of this instance is to be conducted. Such manual specification of instance and attribute identifiers is particularly useful in structured presentations such as spreadsheet tables, where the position of instance and attribute identifiers may be apparent to a user but unknown to a data processing device that presents a structured presentation.
Further examples of user interaction with a preexisting structured presentation to specify one or more instances and attributes are discussed in detail below.
The system performing process 6100 can formulate one or more value suggestions from documents in an electronic document collection for the specified attribute of the instance (step 6110). Value suggestions can be formulated for the specified attribute in a number of different ways. For example, in one implementation, the system can formulate value suggestions from documents in an electronic document collection 102 by conducting a search using a search query that is constructed using the specified instance and attribute. For example, value suggestions can be formulated by, e.g., locating documents that include structured components related to the specified instance and attribute as discussed FIGS. 52- 60 and the associated text.
As another example, a search query can require that identifiers of the specified instance and attribute be found in a linguistic pattern indicating that a value characterizing the attribute of the instance is likely to appear. Examples of such patterns include "the
<attribute> of <entity> is," "<entity> with an <attribute> of," "<entity> has an <attribute> of," "<entity>'s <attribute> is," and the like. Such patterns can be used to extract value suggestions from textual content in electronic documents.
The system performing process 6100 can provide one or more value suggestions to a user (step 6115). For example, a list of value suggestions can be displayed for the user on the same display screen that displays a preexisting structured presentation. The display of a list of value suggestions can be done before a value is selected for addition to the preexisting structured presentation.
As another example, in some implementations, the value suggestions can be concealed, along with search information and interactive elements, in a structured presentation. Examples of such implementations are discussed further below.
The system performing process 6100 can receive a user selection of a value suggestion that is to be presented in a structured display (step 6120). For example, an interactive element can interact with a user to receive one or more user inputs (e.g., mouse clicks, key strokes, or other user input) that select a value suggestion. In some implements, the interactive element can be concealed in a structured presentation, as discussed further below. The system performing process 6100 can also add the selected value to a structured presentation (step 6125) to display the selected value in the structured presentation.
FIG. 62 is a schematic representation of a structured presentation in which a search interface is concealed, namely, a structured presentation 6200. A search interface can include search information, one or more search interactive elements, or a combination thereof. Interactive elements are components of a graphical user interface that can interact with a user, e.g., to receive input instructions. Search interactive elements and search information are relevant to a search. A search is the process of locating information in an electronic document collection. A search interface can include, e.g., information indicating the availability of a search to populate a structured presentations with values, an interactive element that allows a user to indicate that such a search is to be conducted, a display identifying electronic documents located during a search, an interactive element that allows a user to select from among electronic documents for populating a structured presentation with values, or combinations of these and other features.
Structured presentation 6200 can be any form of structured presentation, including any of the structured presentations discussed above. For example, structured presentation 6200 can be a data table displayed in a spreadsheet framework, as shown. The data table of structured presentation 6200 includes a collection of rows 302 and columns 304. Each row 302 includes a respective instance identifier 306 and each column 304 includes a respective attribute identifier 308. The arrangement and positioning of instance identifiers 306 and attribute identifiers 308 in rows 302 and columns 304 associates each cell of the spreadsheet framework in which structured presentation 6200 is displayed with an instance and an attribute. For example, a cell 6205 in structured presentation 6200 is associated with the instance identified as "Tesla Roadster" and the attribute identified as "mpg." A cell 6210 in structured presentation 6200 is associated with the instance identified as "Chevy Volt" and the attribute identified as "range." A cell 6215 in structured presentation 6200 is associated with the instance identified as "Myers NmG" and the attribute identified as "top speed." A cell 6220 in structured presentation 6200 is associated with the instance identified as "Myers NmG" and the attribute identified as "mpg."
The associations between instance, attributes, and cells such as cells 6205, 6210, 6215, 6220 can be used to receive a specification of an instance and an attribute from a user. For example, receipt of user interaction selecting cell 6220 can be taken as input specifying the instance identified as "Myers NmG" and attribute identified as "mpg." User interaction selecting a cell can include, e.g., receipt of input positioning a cursor 6225 over the cell, the user clicking on the cell, or the like. In some implementations, the selection of a cell can be denoted by positioning a visual indicia such a perimetrical highlight 6230 in or around the cell. In the illustrated implementation, selected cell 6220 does not include a value 307 at the time of selection. There can be several reasons for this. For example, structured presentation 6200 can be a new structured presentation that has not yet been populated with values. As another example, structured presentation 6200 can be a preexisting structured presentation from which a value has been deleted. As yet another example, structured presentation 6200 can be a preexisting structured presentation that drew a former value from a source document which, for some reason, is no longer operable as a source of a value.
FIG. 63 is a schematic representation of another structured presentation 6300 in which a search interface is concealed. In contract with structured presentation 6200, structured presentation 6300 includes a value 307 in selected cell 6220. There can be several reasons for this. For example, cell 6220 can have been populated with value 307 automatically, e.g., in response to receipt of a search query. As another example, cell 6220 can have been populated by a user manually interacting with cell 6220 to enter a value. As yet another example, cell 6220 can have been populated with value 307 in response to user specifying — either inherently or manually — an instance, an attribute, or both that are associated with cell 6220. In any case, selection of cell 6220 specifies the instance identified as "Myers NmG" and the attribute identified as "mpg" that are associated therewith.
FIG. 64 is a schematic representation of another structured presentation 6400 in which a search interface is concealed. Structured presentation 6400 includes visual indicia 6405. Visual indicia 6405 visually indicate that concealed search information or interactive elements are accessible from structured presentation 6400.
In the illustrated implementation, each visual indicium 6405 is found in a separate cell, such as cells 6205, 6210, 6215, 6220. The positioning and arrangement of visual indicia 6405 in cells — and concomitantly the positioning and arrangement of visual indicia 6405 relative to instance identifiers 306 and attribute identifiers 308 in rows 302 and columns
304 — can visually indicate the relevance and function of concealed search information and interactive elements, as discussed further below.
In operation, user interaction with structured presentations 6200, 6300, 6400 can trigger the presentation of a concealed search interface. As discussed above, a search interface can include search information, search interactive elements, or both. A search interface can be concealed in a structured presentation in that the search information and interactive elements need not always be discernible in the structured presentation. Rather, a concealed search interface can be concealed wholly or partially from view while a structured presentation is in certain states. For example, in states where a viewer is likely to be reviewing the other information content of a structured presentation, a concealed search interface can be concealed. Such concealment can increase the portion of the structured presentation that is available for the presentation of the other information and reduce visual clutter to improve the readability of the structured presentation. FIG. 65 illustrates a display element 6500 in which a formerly concealed search interface is presented. In some implementations, display element 6500 can be presented in response to user interaction with the structured presentation itself. Display element 6500 can "pop-up" in front of a structured presentation (such as structured presentations 6200, 6300, 6400) to present a search interactive element 6505 in a window 6510 in response to user interaction with cell 6220. For example, search interactive element 6505 and window 6510 can be presented in response to a user clicking on cell 6220 using a mouse. Search interactive element 6505 is a hyperlink that includes text indicating that "more options..." may be available for populating cell 6220. FIG. 66 illustrates a display element 6600 in which a formerly concealed search interface is presented. In some implementations, display element 6600 can be presented in response to user interaction with the structured presentation itself. In addition to search interactive element 6505, display element 6600 presents a source identifier 6605 in window 6510 in response to user interaction with cell 6220. Source identifier 6605 includes text or other information that identifies an electronic document that is a source of the value 307 populating cell 6220. The source document identified by source identifier 6605 can be a document that was located as a result of a prior search. In some implementations, source identifier 6605 can also include a hyperlink to the source document.
FIG. 67 illustrates a display element 6700 in which a formerly concealed search interface is presented. In some implementations, display element 6700 can be presented in response to user interaction with the structured presentation itself. In addition to search interactive element 6505 and source identifier 6605, display element 6700 presents a snippet 6705 in window 6510 in response to user interaction with cell 6220. Snippet 6705 is text or other information that describes the context of value 307 in an electronic document that is a source of the value 307 populating cell 6220.
FIG. 68 illustrates a display element 6800 in which a formerly concealed search interface is presented. In some implementations, display element 6800 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 6505. Display element 6800 includes a header 6802, a relevant source selection region 6805, and a consistent source selection region 6810. Header 6802 can include text or other information that identifies a cell to which a value is to be added. In the illustrated implementation, cell 6230 is identified by the attribute and value (i.e., Myers NmG: mpg) that are characterized by the value 307 in cell
6230. Relevant source selection region 6805 can include information and interactive elements that allow a user to specify that the relevancy of a source electronic document to a specified instance and attribute is to be used in selecting a value that is to populate a structured presentation. In the illustrated implementation, the user can specify that a single "most relevant" document is to be the sole source of a value that is to populate a structured presentation. The relevancy of a document can characterize how closely a the document matches, e.g., an attribute and an instance that define a search.
In the illustrated implementation, relevant source selection region 6805 includes a header 6815, a selection widget 6820, a value identifier 6825, and a source identifier 6830. Header 6815 includes text or other information that identifies that relevant source selection region 6805 allows a user to specify that the most relevant electronic document is to be used as the source of the value populating the cell identified in header 6802. Selection widget 6820 allows a user to select the use of the most relevant document as the source of the value populating the structured presentation. Value identifier 6825 includes text or other information that identifies the value drawn from the currently most relevant document.
Source identifier 6830 includes text or other information that identifies the currently most relevant electronic document. In some implementations, source identifier 6830 can also include a hyperlink to the currently most relevant document. Since the most relevant document can change over time, the value identified by value identifier 6825 and the document identified by source identifier 6830 can also change over time.
Consistent source selection region 6810 can include information and interactive elements that allow a user to specify that a source electronic document is to be used consistently in selecting a value that is to populate a structured presentation. In the illustrated implementation, the user can select from among three candidate documents to specify that document that is to be consistently used as the source of a value that is to populate a structured presentation.
In the illustrated implementation, consistent source selection region 6810 includes a header 6835, a collection of selection widgets 6840, a collection of value identifiers 6845, and a collection of source identifiers 6850. Header 6835 includes text or other information that identifies that relevant source selection region 6805 allows a user to specify that a source electronic document is to be used consistently in selecting a value. Selection widgets 6840 allow a user to select the document that is to consistently be used. In the illustrated implementation, the user can select from among three different documents. Value identifiers 6845 include text or other information that identifies the current values that can be drawn from particular documents to populate a structured presentation. Source identifiers 6850 include text or other information that identifies the electronic documents from which the values identified by value identifiers 6825 are drawn. In some implementations, source identifiers 6850 can also include hyperlinks to the electronic documents from which the values identified by value identifiers 6825 are drawn.
Both the relevancy of an electronic document and the value in an electronic document can change over time. For example, the person who adds an electronic document to an electronic document collection can change the content of the electronic document so that the relevancy of that document to an instance and attribute changes. As another example, the person who adds an electronic document to an electronic document collection can change the value that is used to characterize an attribute of an instance. Headers 6815, 6835 can include text or other information identifying the nature of the changes that can occur. For example, in the illustrated implementation, header 6815 includes text identifying that both the most relevant document and the value of an attribute can change when the user specifies that the relevancy of a source electronic document to a specified instance and attribute is to be used in selecting a value that is to populate a structured presentation. As another example, in the illustrated implementation, header 6835 includes text stating that the value of an attribute can change when the user specifies that a source electronic document is consistently to be used in selecting a value that is to populate a structured presentation.
FIG. 69 illustrates a display element 6900 in which a formerly concealed search interface is presented. In some implementations, display element 6900 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 6505. In addition to headers 6802, 6815, 6835, selection widgets 6820, 6840, value identifiers 6825, 6845, and source identifiers 6830, 6850, display element 6900 includes a collection of snippets 6905 and a collection of search interactive elements 6910. Each snippet 6905 is text or other information that describes the context of the respective values identified by value identifiers 6825, 6845 in an electronic document that is a source of the identified value. Search interactive elements 6910 are hyperlinks that allow a user to navigate to the respective electronic document that is the source of the value identified by the respective value identifier 6845.
FIG. 70 illustrates a display element 7000 in which a formerly concealed search interface is presented. In some implementations, display element 7000 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 6505. In addition to headers 6802, 6815, 6835, selection widgets 6820, 6840, value identifiers 6825, 6845, source identifiers 6830, 6850, snippets 6905, and search interactive elements 6910, display element 6900 includes a search trigger 7005. Search trigger 7005 is an interactive element that triggers a search of an electronic document collection. When displayed alongside descriptions of the results of previous searches (such as value identifiers 6825, 6845, source identifiers 6830, 6850, snippets 6905, and search interactive elements 6910), search trigger 7005 can allow a user to indicate dissatisfaction with the results of the previous searches. In some implementations, the search triggered by search trigger 7005 can be a "full search" that is conducted using a general purpose search engine such as the Google™ search engine. In some implementations, the search engine can be presented with a query that is automatically generated using the instance and attribute specified by previous user interaction.
As shown in FIGS. 65-70, the nature of the user interaction that triggers the display of formerly concealed search information and interactive elements can determine the category of the search information and interactive elements that are displayed. For example, user interaction specifying a single cell in a structured presentation can trigger presentation of search information and interactive elements that are relevant to populating that same cell with values. In other implementations, user interaction with a column, a row, or other collection of cells can trigger presentation of search information and interactive elements that are relevant to populating that collection of cells with values. For example, user interaction with a column can allow a user to specify that the values populating that column are to be consistently drawn from a single source document or family of source documents. As another example, user interaction with a row can allow a user to specify that the values populating that row are to be drawn from the source document is most relevant to an instance and the attributes of that row.
FIG. 71 is a flow chart of a process 7100 for adding values to a structured presentation by drawing the values from the content of documents in an electronic document collection. Process 7100 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 7100 can be performed by the search engine 202 in system 200. Process 7100 can be performed in isolation or in conjunction with other activities. For example, process 7100 can be performed as part of steps 6105, 6115, and 6120 of process 6100 (FIG. 61). The system performing process 7100 can receive data characterizing a user interaction specifying one or more cells of a structured presentation (step 7105). The structured presentation can be a new or a preexisting structured presentation. The interaction with one or more cells can concomitantly specify one or more attributes and instances, as discussed above.
The system performing process 7100 can determine whether or not one or more values populating the one or more cells resulted from a prior search of an electronic document collection (step 7110). Such a determination can be made by accessing a data storage device that stores information characterizing not only the information that is visibly displayed in a structured presentation but also information characterizing any prior search conducted to populate the structured presentation. The stored information characterizing the prior search can include, e.g., an indication that a search was indeed conducted, URLs of source document in the result set of the prior search, and snippets characterizing the context of the values in the source documents. If the system performing process 7100 determines that a value resulted from a previous search, the system can present search information characterizing the prior search (step 7115). Such information can include, e.g., information identifying a source document in the result set from which a value was drawn, a snippet characterizing the context of the value in a source document, and a hyperlink to the source document. For example, the system can present search information characterizing a single source document in presentations such as display elements 6600, 6700 (FIGS. 66, 67). As another example, the system can present search information regarding multiple source documents — including source documents having values different from those visibly populating a structured presentation — in presentations such as display elements 6800, 6900, 7000 (FIGS. 68, 69, 70).
In some implementations, the system performing process 7100 can transition between presentation of search information regarding a single source document and search information regarding multiple source documents in response to interaction with a user. For example, the system can receive user interaction with an search interactive element such as search interactive element 6505 and transition between display elements 6600, 6700 and display elements 6800, 6900, 7000 (FIGS. 65-70).
In some implementations, the system performing process 7100 can also conduct a new search and provide information characterizing one or more electronic documents in the result set yielded by the new search (step 7120). The characterizing information can include, e.g., names and URLs of the electronic documents, snippets of the electronic documents, summaries of the electronic documents, or the like. The result set can characterize a single source document in presentations such as display elements 6600, 6700 (FIGS. 66, 67) or multiple source documents in presentations such as display elements 6800, 6900, 7000 (FIGS. 68, 69, 70). In some implementations, the system can transition between presentation of search information regarding a single source document and search information regarding multiple source documents in response to interaction with a user. For example, the system can receive user interaction with an search interactive element such as search interactive element 6505 and transition between display elements 6600, 6700 and display elements 6800, 6900, 7000 (FIGS. 65-70).
FIG. 72 illustrates a display element 7200 in which a formerly concealed search interface is presented. In some implementations, display element 7200 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 7005. Display element 7200 can receive a value that results from a search, e.g., a search conducted using a general purpose search engine. The value received by display element 7200 can be added into a structured presentation to characterize an attribute of an instance. Display element 7200 includes a header 7205, an instance identifier 7210, an attribute identifier 7215, a value entry element 7220, a value use trigger element 7225, and a presentation close element 7230. Header 7205 is text or other information that describes that display element 7200 can receive a value of an attribute of an instance. Header 7205 can also prompt the user to enter a value resulting from a search. For example, header 7205 can be text asking if a search was successful.
Instance identifier 7210 is text or other information that identifies an instance, or a category of instances, that is to be characterized by the value entered using presentation 7200. In the illustrated implementation, instance identifier 7210 is text identifying the instance "China." Attribute identifier 7215 is text or other information that identifies an attribute of the instance identified by instance identifier 7210. The attribute identified by attribute identifier 7215 can be characterized by the value received by presentation 7200. In the illustrated implementation, attribute identifier 7215 is text identifying the attribute
"Population." Together, instance identifier 7210 and attribute identifier 7215 identify that the population of China is to be characterized by a value received using presentation 7200. Value entry element 7220 is an interactive element that allows a user to specify a value characterizing the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210. Value entry element 7220 can be, e.g., a text entry field.
Value use trigger element 7225 is an interactive element that allows a user to trigger the use of a value entered in value entry element 7220 to characterize the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210 in a structured presentation. Value use trigger element 7225 can be, e.g., a button that includes text identifying that user interaction with value use trigger element 7225 will result in the value entered in value entry element 7220 being used in a structured presentation.
Presentation close element 7230 is an interactive element that allows a user to close display element 7200. In response to user interaction with presentation close element 7230, display element 7200 can be closed regardless of whether the value entered in value entry element 7220 is used, in a structured presentation, to characterize the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210. Presentation close element 7230 can be, e.g., a button that includes text identifying that user interaction with presentation close element 7230 will close display element 7200.
FIG. 73 illustrates a display element 7300 in which a formerly concealed search interface is presented. In some implementations, display element 7300 can be presented in response to user interaction with the structured presentation itself or in response to user interaction with a formerly concealed search interactive element 7005. Display element 7300 can receive a value of an attribute of an instance to be added into a structured presentation. In addition to header 7205, instance identifier 7210, attribute identifier 7215, value entry element 7220, value use trigger element 7225, and presentation close element 7230, display element 7300 includes a source entry element 7305 and a source entry element identifier 7310. Source entry element 7305 is an interactive element that allows a user to specify a source of a value characterizing the attribute identified by attribute identifier 7215 of the instance identified by instance identifier 7210. Source entry element 7220 can be, e.g., a text entry field. Source entry element identifier 7310 is text or other information that describes that source entry element 7305 can be used to specify a source of the value. In some implementations, display elements 7200, 7300 can be displayed for a user on a display screen after an unsuccessful search. For example, display elements 7200, 7300 can be displayed in response to receipt of an indication from a user that the user is dissatisfied with the results of a previous search. For example, the display of display elements 7200,
7300 can be triggered by user interaction with search trigger 7005 (FIG. 70). As another example, display elements 7200, 7300 can be displayed after an automatic search for values of an attribute of an instance has provided unsatisfactory results.
There are many reasons why a search for values can provide unsatisfactory results. For example, an attribute, and instance, or both may be improperly specified, e.g., due to a misspelling or other error. As another example, an attribute or an instance can be specified without error but relative to an unknown or indefinite value. For example, the instance "suitable for Jim and Diane" is specified relative to indefinite values, namely, the identity of Jim and Diane, as well as the nature of what is "suitable" for them. As another example, the instance "my car" is specified relative to an indefinite value, namely, the identity of the person whose car is to be characterized.
A search for values can also provide unsatisfactory results because an electronic document that resulted from a prior search is inoperative to provide a value for the structured presentation. For example, a source document from which a value is to be drawn can become unavailable. A source document can become unavailable, e.g., when the party who had added the source document withdraws it from an electronic document collection. As yet another example, such a source document can remain available but the value itself can become unavailable in the source document. A value can become unavailable, e.g., when the party who added a source document to an electronic document collection changes the content of the source document. FIG. 74 illustrates a display element 7400 in which a formerly concealed search interface is presented. Display element 7400 can be presented in response to use interaction or automatically in response to a triggering event. For example, display element 7400 can be presented automatically in response to a prior search becoming inoperative.
Display element 7400 includes a search interactive element 6505, a source identifier 6605, and an error message 7405 in a window 6510. Search interactive element 6505 is a hyperlink that includes anchor text indicating that "more options..." are available for searching for values to populate cell 6220. Source identifier 6605 is a collection of text that identifies an electronic document that is to be a source of value 307 populating cell 6220.
Error message 7405 can include text or other information indicating that the results of a prior search have been rendered inoperative. For example, error message 7405 can indicate that value 307 has become unavailable in the source document identified by source identifier 6605. Error message 7405 can include information describing the nature of the inoperativeness or simply indicating that an error has occurred. For example, in the illustrated implementation, error message 7405 indicates that the value is no longer available within an electronic document that itself remains available.
FIG. 75 is a flow chart of a process 7500 for adding values to a structured presentation based on the content of documents in an electronic document collection. Process 7500 can be performed by a system of one or more computers that perform operations by executing one or more sets of machine-readable instructions. For example, process 7500 can be performed by the search engine 202 in system 200. Process 7500 can be performed in isolation or in conjunction with other activities. For example, process 7100 can be performed as part of process 6100 (FIG. 61). The system performing process 7500 can receive an update trigger triggering an update of the one or more values of one or more cells of a preexisting structured presentation (step 7505). The update trigger can be, e.g., generated automatically in response to the passage of a period of time since a previous update, manually in response to user interaction, or the like. For example, user interaction with a cell 6220 (FIGS. 62, 63, 64) can trigger the update of that cell, as discussed above. The update trigger can trigger the update of the value or a single cell, the value of a collection of cells, or the values of all the cells in a structured presentation. The update trigger in can concomitantly specify one or more attributes of one or more instances, as discussed above.
The system performing process 7500 can determine whether or not one or more prior searches for populating the structured presentation with values has become inoperative (step 7510). Such a determination can be made by seeking to access documents from which the values populating the structured presentation are to be drawn.
If the system performing process 7100 determines that a prior search has not become inoperative, the system can update a structured presentation with the content of one or more source documents identified in the prior search (step 7515). A new values used to update the structured presentation need not be identical to a value previously used to populate the structured presentation. Rather, the updated structured presentation can include a value provided by the source electronic document with its current content.
If the system performing process 7100 determines that a prior search has become inoperative, the system can inform the user of the inoperability of the prior search (step
7520). For example, a display element such as display element 7400 can be used to inform the user of the operability and provide the user with the opportunity to conduct a new search to populate the structured presentation with values. In some implementations, the system can also conduct a new search and provide information characterizing one or more electronic documents in the result set yielded by the new search, as described in reference to step 7120 of process 7100 (FIG. 71).
Thus, the teachings and subject matter of this specification can be realized in numerous inventive embodiments, including, for example, the following enumerated embodiments:
Embodiment 1: A machine-implemented method comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation; adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on a display device.
Embodiment 2: The method of embodiment 1, wherein adding the identifier of the new instance comprises: formulating a collection of instance suggestions; providing the instance suggestion collection to a user; and receiving a user selection of the new instance, wherein the new instance is in the collection of instance suggestions.
Embodiment 3: The method of embodiment 2, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation.
Embodiment 4: The method of embodiment 2, wherein formulating the collection of instance suggestions comprises: identifying a first document in the electronic document collection that includes an identifier of an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; identifying a second document that is arranged in accordance with the template but relevant to a second instance; and including the second instance in the instance suggestion collection.
Embodiment 5: The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation.
Embodiment 6: The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises locating the new instance in a stored collection of associations of instances with attributes.
Embodiment 7: The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the characteristics of the preexisting structured presentation with the attributes characterized in the preexisting structured presentation.
Embodiment 8: The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents. Embodiment 9: The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the value of attributes used to characterize instances in the preexisting structured presentation with the content of the electronic documents.
Embodiment 10: The method of embodiment 1, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents.
Embodiment 11 : The method of embodiment 1 , wherein: the collection of electronic documents comprises the electronic documents available on the Internet; and the electronic documents comprise web pages.
Embodiment 12: The method of embodiment 1, wherein the expanded structured presentation comprises a table.
Embodiment 13: The method of embodiment 1, wherein the expanded structured presentation comprises a collection of cards. Embodiment 14: The method of embodiment 1, further comprising visually presenting the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Embodiment 15: An apparatus comprising one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations, the operations comprising: formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the instance suggestion collection to a user; receiving a user selection of a first instance in the collection of instance suggestions; and adding an identifier of the first instance suggestion to a structured presentation presented on a display device, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation.
Embodiment 16: The apparatus of embodiment 15, wherein formulating the collection of instance suggestions comprises comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection.
Embodiment 17: The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation. Embodiment 18: The apparatus of embodiment 16, wherein formulating the instance suggestion collection comprises: identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; identifying a second document that is arranged in accordance with the template but relevant to the a second instance; and including the second instance in the instance suggestion collection.
Embodiment 19: The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises identifying documents in the electronic document collection that include identifiers of one or more instances in the preexisting structured presentation. Embodiment 20: The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises identifying additional attributes used to characterize instances in the preexisting structured presentation. Embodiment 21: The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises comparing values of attributes used to characterize instances in the preexisting structured presentation with values of the instance suggestions.
Embodiment 22: The apparatus of embodiment 16, wherein formulating the collection of instance suggestions comprises: identifying a category of instances that includes instances in the preexisting structured presentation; and formulating the collection of instance suggestions using instances in the category of instances.
Embodiment 23: The apparatus of embodiment 15, wherein formulating the collection of instance suggestions comprises identifying the instance suggestions in a stored collection of associations of instances with attributes.
Embodiment 24: The apparatus of embodiment 15, wherein formulating the collection of instance suggestions comprises comparing the attributes characterized in the preexisting structured presentation with the content of the documents in the unstructured electronic document collection. Embodiment 25: The apparatus of embodiment 15, wherein: the collection of electronic documents comprises the documents available on the Internet; and the electronic documents comprise web pages.
Embodiment 26: The apparatus of embodiment 15, wherein the structured presentation comprises a table. Embodiment 27: The apparatus of embodiment 15, wherein the structured presentation comprises a collection of cards.
Embodiment 28: A system comprising: a client device; and one or more computers programmed to interact with the client device and to perform operations comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new instance that is relevant to the preexisting structured presentation; adding an identifier of the new instance to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on a display device coupled in data communication with the client device.
Embodiment 29: A system comprising: a client device; and one or more computers programmed to interact with the client device and to perform operations comprising: formulating a collection of instance suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the instance suggestion collection to a user using the client device; receiving a user selection of a first instance in the collection of instance suggestions; and adding an identifier of the first instance suggestion to a structured presentation presented on a display device coupled in data communication with the client device, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the visual presentation of the structured presentation.
Embodiment 30: The system of embodiment 29, wherein: the one or more computers comprise a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client; the client device comprises a personal computer running a web browser; and the personal computer comprises the display device.
Embodiment 31: A machine-implemented method comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation; adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on a display screen. Embodiment 32: The method of embodiment 31, wherein adding the identifier of the new attribute comprises: formulating a collection of attribute suggestions; providing the attribute suggestion collection to a user; and receiving a user selection of the new attribute, wherein the new attribute is in the collection of instance suggestions. Embodiment 33: The method of embodiment 32, wherein formulating the attribute suggestion collection comprises: identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; and adding an attribute used in the first document to characterize the instance in the attribute suggestion collection. Embodiment 34: The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include structured components related to instances identified in the preexisting structured presentation. Embodiment 35: The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation. Embodiment 36: The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying the new attribute in a stored collection of associations of instances with attributes.
Embodiment 37: The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing the instances characterized in the preexisting structured presentation with the content of the electronic documents.
Embodiment 38: The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises identifying additional instances related to the instances identified in the preexisting structured presentation.
Embodiment 39: The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing an attribute or a value of an attribute used to characterize an instances in the preexisting structured presentation with the content of the electronic documents.
Embodiment 40: The method of embodiment 31, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing a category of instances that includes instances in the preexisting structured presentation with the content of the electronic documents.
Embodiment 41: The method of embodiment 31, wherein: the collection of electronic documents comprises the electronic documents available on the Internet; and the electronic documents comprise web pages. Embodiment 42: The method of embodiment 31, wherein the expanded structured presentation comprises a table.
Embodiment 43: The method of embodiment 31, wherein the expanded structured presentation comprises a collection of cards.
Embodiment 44: The method of embodiment 31, further comprising visually presenting the expanded structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Embodiment 45: An apparatus comprising one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations, the operations comprising: formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the attribute suggestion collection to a user; receiving a user selection of a first attribute in the collection of attribute suggestions; and adding an identifier of the first attribute suggestion to a structured presentation presented on a display screen, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation. Embodiment 46: The apparatus of embodiment 45, wherein formulating the collection of attribute suggestions comprises comparing characteristics of a preexisting structured presentation with content of electronic documents in the electronic document collection.
Embodiment 47: The apparatus of embodiment 46, wherein formulating the collection of attribute suggestions comprises identifying documents in the electronic document collection that include structured components that characterize instances identified in the preexisting structured presentation.
Embodiment 48: The apparatus of embodiment 46, wherein formulating the attribute suggestion collection comprises: identifying a first document in the electronic document collection that is relevant to an instance identified in the preexisting structured presentation and that is arranged in accordance with a template; and including an attribute used to characterize the instance in the attribute suggestion collection.
Embodiment 49: The apparatus of embodiment 46, wherein formulating the collection of attribute suggestions comprises identifying documents in the electronic document collection that include information regarding one or more instances in the preexisting structured presentation.
Embodiment 50: The apparatus of embodiment 46, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing instances identified in the preexisting structured presentation with the content of the electronic documents.
Embodiment 51: The apparatus of embodiment 46, wherein comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises comparing an attribute or a value of an attribute used to characterize an instance in the preexisting structured presentation with the content of the electronic documents.
Embodiment 52: The apparatus of embodiment 46, wherein formulating the collection of attribute suggestions comprises: identifying a category of instances that includes instances in the preexisting structured presentation; and formulating the collection of attribute suggestions from attributes used to characterize instances in the category of instances. Embodiment 53: The apparatus of embodiment 45, wherein formulating the collection of attribute suggestions comprises identifying the attribute suggestions in a stored collection of associations of instances with attributes.
Embodiment 54: The apparatus of embodiment 45, wherein: the collection of electronic documents comprises electronic documents available on the Internet; and the electronic documents comprise web pages.
Embodiment 55: The apparatus of embodiment 45, wherein the structured presentation comprises a table.
Embodiment 56: The apparatus of embodiment 45, wherein the structured presentation comprises a collection of cards. Embodiment 57: A system comprising: a client device comprising a display screen; and one or more computers programmed to interact with the client device and to perform operations comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new attribute that is relevant to the preexisting structured presentation; adding an identifier of the new attribute to the preexisting structured presentation to form an expanded structured presentation; and outputting instructions for presenting the expanded structured presentation on the display screen. Embodiment 58: A system comprising: a client device comprising a display screen; and one or more computers programmed to interact with the client device and to perform operations comprising: formulating a collection of attribute suggestions based on content of two or more documents in an unstructured electronic document collection, wherein the electronic document collection is unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent; providing the attribute suggestion collection to the client device; receiving a selection of a first attribute in the collection of attribute suggestions from the client device; and adding an identifier of the first attribute suggestion to a structured presentation presented on the display screen, wherein a visual presentation of the structured presentation visually presents information in an organized arrangement, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in the presentation of the structured presentation.
Embodiment 59: A machine-implemented method comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation; adding the new value to the preexisting structured presentation to form a new structured presentation; and outputting instructions for visually presenting the new structured presentation.
Embodiment 60: The method of claim 59, wherein: comparing the characteristics of the preexisting structured presentation with the content of the electronic documents comprises locating an identifier of a first instance that appears in the structured presentation in a first electronic document; and the method further comprises extracting the new value from the first electronic document.
Embodiment 61: The method of claim 59, wherein adding the new value comprises: identifying a collection of values of a first attribute of a first instance; and establishing a subset of one or more of the identified values as suitably characterizing the first attribute of the first instance. Embodiment 62: The method of claim 61, wherein establishing the subset of values as suitable comprises grouping the values in the collection into groups.
Embodiment 63: The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a count of values in the subset. Embodiment 64: The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on values in the subset meeting a user- specified constraint.
Embodiment 65: The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a value in the subset being drawn from a high quality document.
Embodiment 66: The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another instance in the preexisting structured presentation. Embodiment 67: The method of claim 61, wherein establishing the subset of values as suitable comprises selecting the subset based at least in part on a value in the subset being drawn from a document relevant to another attribute in the preexisting structured presentation. Embodiment 68: The method of claim 59, wherein: the collection of electronic documents comprises the electronic documents available on the Internet; and the electronic documents comprise web pages.
Embodiment 69: The method of claim 59, wherein the preexisting structured presentation comprises a table.
Embodiment 70: The method of claim 59, wherein the preexisting structured presentation comprises a collection of cards.
Embodiment 71: The method of claim 59, further comprising visually presenting the new structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Embodiment 72: An apparatus comprising one or more machine -readable data storage media storing instructions operable to cause one or more data processing machines to perform operations, the operations comprising: receiving description data describing a first instance, a second instance, and a first attribute; extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection; extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection; establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance; establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance; and generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset, wherein the structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values. Embodiment 73: The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises grouping the values in the first collection into groups, wherein each group includes a subset of the first collection of values.
Embodiment 74: The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises selecting the first subset based at least in part on a count of values in the first subset.
Embodiment 75: The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises comparing the values in the first subset with a user- specified constraint on the values. Embodiment 76: The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises determining that a value in the first subset is drawn from a high quality document.
Embodiment 77: The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises determining that a value in the first subset is drawn from a document relevant to the second instance.
Embodiment 78: The apparatus of claim 72, wherein establishing the first subset of values as suitable comprises determining that a value in the first subset is drawn from a document relevant to another attribute that characterizes both the first instance and the second instance.
Embodiment 79: The apparatus of claim 72, wherein: the description of the first instance comprises an identifier of the first instance that appears in a preexisting structured presentation; and the description of the second instance comprises an identifier of the second instance that appears in the preexisting structured presentation. Embodiment 80: The apparatus of claim 72, wherein the description of the first attribute comprises a description of a new attribute that is to be added to a preexisting structured presentation.
Embodiment 81: The apparatus of claim 72, wherein the unstructured electronic document collection comprises electronic documents available on the Internet. Embodiment 82: The apparatus of claim 72, wherein the structured presentation comprises a table.
Embodiment 83: The apparatus of claim 72, wherein the structured presentation comprises a collection of cards.
Embodiment 84: The apparatus of claim 72, further comprising visually presenting the structured presentation on a display screen, including physically transforming one or more elements of the display screen.
Embodiment 85: A system comprising: a device; and one or more computers programmed to interact with the device and to perform operations comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information in an systematic arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; comparing characteristics of the preexisting structured presentation with content of electronic documents in an unstructured collection of electronic documents to locate electronic documents that identify a new value that is relevant to the preexisting structured presentation; adding the new value to the preexisting structured presentation to form a new structured presentation; and outputting instructions for visually presenting the new structured presentation to the device.
Embodiment 86: A system comprising: a device; and one or more computers programmed to interact with the device and to perform operations comprising: receiving description data describing a first instance, a second instance, and a first attribute; extracting a first collection of values of the first attribute of the first instance from two or more documents of an unstructured electronic document collection; extracting a second collection of values of the first attribute of the second instance from two or more documents of the unstructured electronic document collection; establishing a first subset of the first collection of values as suitably characterizing the first attribute of the first instance; establishing a second subset of the second collection of values as suitably characterizing the first attribute of the second instance; generating machine-readable instructions for displaying a structured presentation including a first value of the first subset and a second value of the second subset, wherein the structured presentation denotes associations between instances and values that characterize attributes of the instanced by virtue of an arrangement of an identifier of the instance and the values; and sending the machine-readable instructions to the device. Embodiment 87: A machine-implemented method comprising: displaying a structured presentation on a display device, the structured presentation visually presenting information in a systematic and structured arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; receiving data characterizing a user interaction with the displayed structured presentation, the data including a specification of a first instance and a first attribute of the structured presentation; and displaying a formerly concealed search interface on the display device in response to receiving the data, the search interface including information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
Embodiment 88: The method of embodiment 87, wherein receiving the data characterizing the user interaction with the displayed structured presentation comprises receiving a manual user specification of the first instance and the first attribute that are associated with a cell in the structured presentation. Embodiment 89: The method of embodiment 87, wherein receiving data characterizing the user interaction comprises receiving data characterizing the user interaction with a cell in the structured presentation, the cell being associated with the first instance and the first attribute by virtue of the arrangement of the cell relative to identifiers of the first instance and the first attribute in the structured presentation.
Embodiment 90: The method of embodiment 89, wherein receiving data characterizing the user interaction with the cell comprises receiving data characterizing the user interaction with an empty cell.
Embodiment 91: The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying an interactive element that can be selected by a user to trigger a search of the electronic document collection to locate the first value.
Embodiment 92: The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying an interactive value entry element that can be selected by a user to specify a value characterizing the first attribute of the first instance. 93. The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying a snippet characterizing a context of the first value in a first document of the electronic document collection.
Embodiment 94: The method of embodiment 87, wherein displaying the formerly concealed search interface comprises displaying a result of a prior search of the electronic document collection to locate the first value.
Embodiment 95: The method of embodiment 87, wherein: the first value appears in the structured presentation as a value characterizing the first attribute of the first instance; and displaying the formerly concealed search interface comprises displaying an identifier of a first electronic document in the electronic document collection, wherein the first value is drawn from the first electronic document.
Embodiment 96: The method of embodiment 95, further comprising: determining that the first electronic document is inoperative to provide the first value; and displaying a visual indication of the inoperativeness of the first document.
Embodiment 97: The method of embodiment 87, wherein displaying the formerly concealed search interface comprises presenting the user with an option to select the first value consistently from a first document regardless of changes in relevancy of the first document to the first instance and the first attribute. Embodiment 98: The method of embodiment 87, wherein displaying the formerly concealed search interface comprises presenting the user with an option to select the first value from a first document that is most relevant to the first instance and the first attribute.
Embodiment 99: The method of embodiment 87, further comprising: searching an unstructured collection of electronic documents to locate the first value in response to a user interaction with the search interface; and adding the first value to the structured presentation.
Embodiment 100: The method of embodiment 87, wherein receiving the specification of the first instance and the first attribute comprises receiving a specification of a collection of attributes or a collection of instances. Embodiment 101 : The method of embodiment 87, further comprising updating the display of the structured presentation in response to a passage of a time.
Embodiment 102: A system comprising: one or more computers programmed to interact with client devices and to perform operations comprising: receiving data characterizing user interaction specifying a first cell of a structured presentation displayed on a display device, the structured presentation visually presenting information in a systematic and structured arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of the values in cells; determining that a prior search has been conducted to populate the first cell with a first value; and in response to determining that a prior search was conducted, displaying information characterizing the prior search on the display device.
Embodiment 103: The system of embodiment 102, wherein receiving the data characterizing user interaction specifying the first cell comprises receiving data characterizing a manual user specification of the first instance and the first attribute that are associated with the first cell.
Embodiment 104: The system of embodiment 102, wherein displaying information characterizing the prior search comprises displaying information identifying an electronic document from which the first value is drawn.
Embodiment 105: The system of embodiment 102, wherein displaying information characterizing the prior search comprises displaying information identifying a collection of electronic documents from which the first value could have been drawn.
Embodiment 106: The system of embodiment 102, wherein displaying the information characterizing the prior search comprises displaying the information in a display element of a formerly concealed search interface. Embodiment 107: The system of embodiment 102, wherein displaying information characterizing the prior search comprises displaying information identifying a first electronic document in the electronic document collection from which the first value is drawn.
Embodiment 108: The system of embodiment 107, wherein the operations further comprise: determining that the first electronic document is inoperable to provide the first value; and displaying a visual indication of the inoperability of the first document.
Embodiment 109: The system of embodiment 102, wherein the operations further comprise updating a display of a value in the first cell of the structured presentation in response to the user interaction. Embodiment 110: The system of embodiment 102, wherein displaying the information characterizing the prior search comprises displaying a snippet characterizing a context of the first value in a first document of the electronic document collection.
Embodiment 111: The system of embodiment 110, wherein: the collection of electronic documents comprises electronic documents available on the Internet; and the electronic documents comprise web pages.
Embodiment 112: The system of embodiment 102, wherein the structured presentation comprises a collection of cards.
Embodiment 113: A system comprising: one or more computers programmed to interact with a client device comprising a display device and to perform operations comprising: displaying a structured presentation on the display device, the structured presentation visually presenting information in a systematic and structured arrangement that conforms with a structured design, the structured presentation denoting associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; receiving data characterizing a user interaction with the displayed structured presentation, the data including a specification of a first instance and a first attribute of the structured presentation; and displaying a formerly concealed search interface on the display device in response to receiving the data, the search interface including information or an interactive element identifying location of a first value characterizing the first attribute of the first instance in an electronic document collection.
These and other embodiments of the subject matter and the functional operations described in this specification may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on a propagated signal that is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (also known as a program, software, software application, script, or code) may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it may be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program may be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. The processes and logic flows described in this specification may be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows may also be performed by, and apparatus may also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processor suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory may be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, embodiments of the subject matter described in this specification may be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged into multiple software products.
Particular embodiments of the subject matter described in this specification have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims may be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results.
WHAT IS CLAIMED IS:

Claims

1. A machine-implemented method comprising: receiving a machine-readable search query from a user; and responding to the search query with instructions for presenting the user with a 5 structured presentation of instances relevant to the search query, wherein a visual presentation of the structured presentation denotes associations between the instances and values that characterize attributes of the instances by virtue of an arrangement of identifiers of the instances and the values, wherein the identifiers of the instances and the values are drawn from two or more documents in an unstructured collection of electronic documents, o the electronic document collection being unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
2. The method of claim 1, wherein responding to the search query comprises: identifying a first collection of electronic documents in the unstructured collection that relate to the instances; 5 extracting values of the attributes of the instances from the first collection of electronic documents; and populating the structured presentation with the values extracted from two or more electronic documents.
3. The method of claim 1, wherein responding to the search query comprises: 0 extracting a first value of a first attribute of a first instance from a first electronic document; extracting a second value of a second attribute of the first instance from a second electronic document; and associating the first value and the second value with the first instance in a single5 record in the structured presentation, wherein the first attribute differs from the second attribute and the first electronic document differs from the second electronic document.
4. The method of claim 1, responding to the search query comprises: extracting a first value of an attribute of a first instance from a first electronic0 document; extracting a second value of an attribute of a second instance from the first electronic document; associating the first value with the first instance in a first record; and associating the second value in with the second instance in a second record, wherein the first instance differs from the second instance.
5. The method of claim 1, wherein the structured presentation comprises a table.
6. The method of claim 1, wherein the structured presentation comprises a collection of cards.
7. The method of claim 1, further comprising: receiving a trigger for the addition of a new instance to the structured presentation; and suggesting new instances for addition to the structured presentation in response to the trigger.
8. The method of claim 7, wherein: the method further comprises receiving a specification of a constraint from a user; and suggesting new instances comprises suggesting new instances that satisfy the user- specified constraint.
9. The method of claim 1, further comprising: receiving a trigger for the addition of a new attribute to the structured presentation; and adding a new attribute to the structured presentation in response to the trigger.
10. The method of claim 1, further comprising: receiving a user specification of a trait of the new attribute; and populating the structured presentation with values of the attribute based on the user- specified trait.
11. The method of claim 1, wherein the unstructured electronic document collection comprises electronic documents available on the Internet.
12. The method of claim 1, further comprising visually presenting the structured presentation on a display screen, including physically transforming one or more elements of the display screen.
13. An apparatus comprising one or more machine-readable data storage media storing instructions operable to cause one or more data processing machines to perform operations, the operations comprising: receiving description data describing a preexisting structured presentation, a visual 5 presentation of the preexisting structured presentation visually presenting information in a systematic arrangement that conforms with a structured design, the structured presentation including a collection of records, each of which denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; o drawing an identifier of a first instance from a first web site; drawing a first value of a first attribute of the first instance from a second web site; adding the identifier of a first instance and the new value to the preexisting structured presentation to form a new record in a new structured presentation; and outputting instructions for visually presenting the new structured presentation. 5
14. The apparatus of claim 13, wherein drawing the identifier of the first instance from the first web site comprises comparing characteristics of the preexisting structured presentation with content of the preexisting structured presentation.
15. The apparatus of claim 13, wherein: the operations further comprise receiving an identifier of a second instance from the0 user; and the new structured presentation includes a second new record that presents the second instance in association with a second value of the first attribute of the second instance.
16. The apparatus of claim 15, wherein the operations further comprise receiving the second value from the user. 5
17. The apparatus of claim 15, wherein the operations further comprise: presenting a collection of candidate values to the user, wherein the collection includes the second value; and receiving a selection of the second value from the user.
18. The apparatus of claim 15, wherein the operations further comprise: 0 identifying a collection of candidate values of the first attribute of the second instance; and determining, for each of the candidate values, a confidence that the candidate value is correct.
19. The apparatus of claim 13, wherein the operations further comprise suggesting a collection of new instances to be added to the structured presentation.
5 20. The apparatus of claim 19, wherein suggesting the collection of new instances comprises comparing characteristics of the preexisting structured presentation with content of the first web site and the second web site.
21. The apparatus of claim 19, wherein suggesting the collection of new instances comprises comparing a machine-readable search query with content of the first web site and the second o web site.
22. The apparatus of claim 13, wherein drawing the first value from the second web site comprises identifying that the second web site includes a review.
23. The apparatus of claim 13, wherein drawing the identifier from the first web site comprises extracting the identifier directly from the first web site. 5
24. The apparatus of claim 13, wherein drawing the identifier from the first web site comprises extracting the identifier from a machine-readable database that includes information extracted from the first web site.
25. The apparatus of claim 13, wherein: the preexisting structured presentation comprises a table; and 0 the records comprise rows or columns of the table.
26. The apparatus of claim 13, wherein: the preexisting structured presentation comprises a collection of cards; and the records comprise individual cards in the collection.
27. The apparatus of claim 13, wherein the operations further comprise visually presenting5 the new structured presentation on a display screen, including physically transforming one or more elements of the display screen.
28. A system comprising: a client device; and one or more computers programmed to interact with the client device and to perform operations comprising: receiving description data describing a preexisting structured presentation, a visual presentation of the preexisting structured presentation visually presenting information 5 in a systematic arrangement that conforms with a structured design, the structured presentation including a collection of records, each of which denotes associations between an instance and values that characterize attributes of the instance by virtue of an arrangement of an identifier of the instance and the values in a visual presentation of the structured presentation; o drawing an identifier of a first instance from a first web site; drawing a first value of a first attribute of the first instance from a second web site; adding the identifier of a first instance and the new value to the preexisting structured presentation to form a new record in a new structured presentation; and 5 outputting to the client device instructions for visually presenting the new structured presentation.
29. The system of claim 28, wherein the one or more computers comprise a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client. 0
30. A system comprising: a client device; and one or more computers programmed to interact with the client device and to perform operations comprising: receiving a machine-readable search query from the client device; and5 responding to the search query by sending to the client device instructions for presenting a structured presentation of instances relevant to the search query, wherein a visual presentation of the structured presentation denotes associations between the instances and values that characterize attributes of the instances by virtue of an arrangement of identifiers of the instances and the values, wherein the identifiers of the instances and the0 values are drawn from two or more documents in an unstructured collection of electronic documents, the electronic document collection being unstructured in that the format of the electronic documents in the electronic document collection is neither restrictive nor permanent.
31. The system of claim 30, wherein the one or more computers comprise a server operable to interact with the client device through a data communication network, and the client device is operable to interact with the server as a client.
EP10732191.1A 2009-01-16 2010-01-16 Retrieving and displaying information from an unstructured electronic document collection Withdrawn EP2387756A4 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US12/355,459 US8412749B2 (en) 2009-01-16 2009-01-16 Populating a structured presentation with new values
US12/355,607 US8615707B2 (en) 2009-01-16 2009-01-16 Adding new attributes to a structured presentation
US12/355,554 US8452791B2 (en) 2009-01-16 2009-01-16 Adding new instances to a structured presentation
US12/355,228 US20100185651A1 (en) 2009-01-16 2009-01-16 Retrieving and displaying information from an unstructured electronic document collection
US12/355,103 US8977645B2 (en) 2009-01-16 2009-01-16 Accessing a search interface in a structured presentation
PCT/US2010/021290 WO2010083478A2 (en) 2009-01-16 2010-01-16 Retrieving and displaying information from an unstructured electronic document collection

Publications (2)

Publication Number Publication Date
EP2387756A2 true EP2387756A2 (en) 2011-11-23
EP2387756A4 EP2387756A4 (en) 2013-06-12

Family

ID=42340312

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10732191.1A Withdrawn EP2387756A4 (en) 2009-01-16 2010-01-16 Retrieving and displaying information from an unstructured electronic document collection

Country Status (3)

Country Link
EP (1) EP2387756A4 (en)
JP (1) JP5581339B2 (en)
WO (1) WO2010083478A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6042974B2 (en) * 2013-04-09 2016-12-14 株式会社日立製作所 Data management apparatus, data management method, and non-temporary recording medium
JP6615456B2 (en) * 2014-07-28 2019-12-04 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Method and apparatus for providing search results

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181445B2 (en) * 2003-09-05 2007-02-20 Bellsouth Intellectual Property Corporation Aggregating, retrieving, and providing access to document visuals
US7293017B2 (en) * 2004-07-01 2007-11-06 Microsoft Corporation Presentation-level content filtering for a search result
US8386453B2 (en) * 2004-09-30 2013-02-26 Google Inc. Providing search information relating to a document
NO20054720L (en) * 2005-10-13 2007-04-16 Fast Search & Transfer Asa Information access with user-driven metadata feedback

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EMBLEY D W ET AL: "ONTOLOGY-BASED EXTRACTION AND STRUCTURING OF INFORMATION FROM DATA-RICH UNSTRUCTURED DOCUMENTS", PROCEEDINGS OF THE 1998 ACM CIKM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT. CIKM '98. BETHESDA, MD, NOV. 3 - 7, 1998; [ACM CIKM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT], NEW YORK, NY : ACM, US, vol. CONF. 7, 3 November 1998 (1998-11-03) , pages 52-59, XP000895350, DOI: 10.1145/288627.288641 ISBN: 978-1-58113-061-4 *
See also references of WO2010083478A2 *
Zimzalabim: "Web Search Engine", Wikipedia, the free encyclopedia, 13 January 2009 (2009-01-13), XP055061817, Retrieved from the Internet: URL:http://en.wikipedia.org/w/index.php?title=Web_search_engine&oldid=263824777 [retrieved on 2013-05-03] *

Also Published As

Publication number Publication date
JP5581339B2 (en) 2014-08-27
WO2010083478A3 (en) 2010-10-28
WO2010083478A2 (en) 2010-07-22
EP2387756A4 (en) 2013-06-12
JP2012515407A (en) 2012-07-05

Similar Documents

Publication Publication Date Title
US8924436B1 (en) Populating a structured presentation with new values
US8615707B2 (en) Adding new attributes to a structured presentation
US8452791B2 (en) Adding new instances to a structured presentation
US20100185651A1 (en) Retrieving and displaying information from an unstructured electronic document collection
US8977645B2 (en) Accessing a search interface in a structured presentation
US20230205828A1 (en) Related entities
AU2010284506B2 (en) Semantic trading floor
Aletras et al. Evaluating topic representations for exploring document collections
US9916384B2 (en) Related entities
US20240338378A1 (en) Semantic Search Interface for Data Repositories
KR20120038418A (en) Searching methods and devices
Gretzel et al. Intelligent search support: Building search term associations for tourism-specific search engines
Kules III Supporting exploratory web search with meaningful and stable categorized overviews
JP5581339B2 (en) Retrieve and display information from unstructured electronic document collections
Wong Search Strategies for Online Sources
Schlötterer Supporting the Discovery of Long-tail Resources on the Web
An Ontology learning for the semantic deep web
Alli Result Page Generation for Web Searching: Emerging Research and Opportunities: Emerging Research and Opportunities
WO2024211835A1 (en) Semantic search interface for data repositories
Penev Search in personal spaces
Li Methodologies for infographics retrieval
Sood The role of relevance in frictionless information systems: building systems that delight and inform
Komninos et al. ListCreator: Entity Ranking on the Web
Bernnard Library Catalogs Revisited: An Annotated Bibliography
Markines Socially induced semantic networks and applications

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20110803

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20130514

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 17/30 20060101AFI20130507BHEP

Ipc: G06F 17/21 20060101ALI20130507BHEP

Ipc: G06F 3/14 20060101ALI20130507BHEP

Ipc: G06F 17/40 20060101ALI20130507BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20170118

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230519