WO2001027712A2 - A method and system for automatically structuring content from universal marked-up documents - Google Patents
A method and system for automatically structuring content from universal marked-up documents Download PDFInfo
- Publication number
- WO2001027712A2 WO2001027712A2 PCT/IL2000/000648 IL0000648W WO0127712A2 WO 2001027712 A2 WO2001027712 A2 WO 2001027712A2 IL 0000648 W IL0000648 W IL 0000648W WO 0127712 A2 WO0127712 A2 WO 0127712A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- offers
- information
- offer
- database
- pages
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/81—Indexing, e.g. XML tags; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/221—Parsing markup language streams
Definitions
- the present invention relates to a system and method for automatically extracting, processing and structuring dynamic content from universal marked-up documents.
- universal marked-up documents refer to any type of document that has marked up properties, such as HTML, XML, Microsoft Word, PDF, WML, VML or any other current or future mark-up languages or document types.
- Mark-up refers to the sequence of characters or other symbols (called tags) that may be inserted at certain places in a file to delimit and describe document subcomponents as information objects. They provide metadata about the document's content that may be used for further processing of the document, such as displaying it, printing it, etc. It is also used to describe the document's logical structure.
- the use of the word "automatically” implies the ability to structure information from such documents without having had previous exposure to a particular style, type or example of such a document.
- General Internet search engines have historically tried to access and organize the great abundance of information available, and to bring it to Internet users. They work on a very large scale and cover a huge number of web pages. They usually index the pages for all words occurring in the page title/ keywords or full-text content. This indexing is usually done on a separate keyword basis, without any effort to understand the context or the grammar of the page. The only connection between different keywords in the same page is done in retrospect, when a user performs a Boolean search operation. There is no content understanding and structuring. .
- An alternative solution for automatically aggregating content is content robots (for some content sources) that analyze relevant site information and make it easily accessible. These robots run on sites that generate their pages using automated tools (usually database-driven) that give a uniform structure to their pages (such as www.amazon.com). Due to this uniformity, within a site (or parts of it), content robots can be created and programmed (for some of the sites) to analyze relevant site information and structure the information. However, as different sites have different structures, this is a site-specific adjustment task. In this category of solutions fall the various Internet comparative shopping engines, often based on Scrapers (Scrapers are software tools that are programmed to extract data from specific page formats.
- Scrapers Scrapers are software tools that are programmed to extract data from specific page formats.
- the aggregator may receive the information in a variety of ways. For example, the aggregator might write a specific software interface to the content producer's information systems (e.g., to its database). An alternative might be that the aggregator receives periodic feeds in some agreed upon format from the content provider.
- the drawback is that the aggregator needs to make specific arrangements (business and technical) with each individual information source. This requires much time and technical effort for every single source of information.
- VDB virtual database
- a central Web shop signs contracts with some dozens of affiliate retail sites.
- VDB virtual database
- the central shop writes special "software agents" to be able to extract and organize data from the affiliate's computerized catalog.
- the central shop issues a query to some of the distributed agents.
- Each agent searches the local catalog for the product. If the affiliate site has an offer for the product, the agent sends a price quote to the central shop. After a definitive time (usually 30 seconds), the central shop collects all the offers it has received and presents them to the consumer. The consumer can choose to buy according to one of these offers either through the central shop or through the affiliate shop.
- Scalability which is the ability to cover an immense and constantly growing number of data sources, in a way that is fast and cheap. To achieve such scalability the technology must operate without site-specific business or technical agreements or adjustments, iv. The ability to economically maintain and list updated offers from a multitude of 'data sources.
- a system or method for enabling automatic aggregation and structuring of information on a global and local scale This need is all the more acute when facing the emerging world of mobile Internet appliances, as experts predict that wireless consumers will be highly sensitive to data relevancy issues (and much less tolerant than wired Internet users).
- a structured search methodology that fits a wide variety of content and can be used by a variety of devices is therefore a major breakthrough in the consumer content search tools niche.
- the present invention provides such a method and system for enabling automatic structuring of information from universal marked-up documents.
- This invention can monitor, analyze, aggregate, compare and present information from varied, dynamic sources, in an organized, structured form.
- the present invention is driven by the trends discussed above, and it provides high quality aggregated content, i.e., multi-source, relevant, structured, and updated information, regarding topics that are in the scope of interest of users.
- the present system and method (unlike existing systems, methods and technologies) is able to satisfy these important requirements.
- the present invention further enables various applications for structuring information, based on its page processing technology. These include a local marketplace enabler, comparative shopping engine and site, geographic based searching, services comparisons and automatic aggregation of many types of content from Internet, Intranet, Extranet or any other network-based sources.
- the present invention is of a data processing system and method for automatic, knowledge-based processing and extracting of structured information from universal marked-up documents (documents that contain structural, presentational, and semantic information (e.g. tags) alongside content, such as SGML, XML, HTML, and Microsoft Word documents.
- A- The present invention consists of:
- a back-end system including communications means and a mark-up page processing algorithm
- a database system for storing structured data and processing requests from said back-end system.
- An optional front-end system for enabling user or third party interaction with said database.
- the back-end system automatically processes content from universal marked-up documents, independent of prior knowledge of content structure or type for particular sites.
- the back-end system comprises a page processing algorithm for automatically processing content from universal marked-up documents.
- D- The back-end system processes documents from any network-based source via any computerized communications means. This includes data found in any type of computerized information system, where the system is located on a network.
- E- Marked-up information sources include content existing in data formats such as
- SGML HTML, XML, Microsoft WORD, PDF, WML, VML, RTF, XHTML, SMIL,
- F- User interaction is executed using interactive devices such as PC's, cellular phones, pagers, handheld PC's, pocket PCs, Mobile computers, interactive TV's,
- G- User interaction is executed with user interfaces such as graphic user interfaces, text based interfaces, voice-based interfaces, keyboards & pointing devices and any combination of these.
- user interfaces such as graphic user interfaces, text based interfaces, voice-based interfaces, keyboards & pointing devices and any combination of these.
- H- Information offers processed by the system include product offers, such as consumer goods, auctions, classifieds, bartering, wholesale goods and B2B offers, and service offers, such as professional services, job offers, real estate offers, events, classifieds and job finding tools.
- product offers such as consumer goods, auctions, classifieds, bartering, wholesale goods and B2B offers
- service offers such as professional services, job offers, real estate offers, events, classifieds and job finding tools.
- I- Offers may be presented according to geographic preferences.
- the front-end system may be an e-commerce web site for comparison-shopping for products and services.
- the comparison shopping function is a geographically enabled localized shopping application, such that users can research product offers according to geographic preferences and/or online preferences.
- L- The present invention further comprises a method for automatically structuring network-based content, according to the following steps: i. Finding information pages for information offers; ii. Retrieving relevant content from said information pages; iii. Processing retrieved pages in order to identify information offers; and iv. Aggregating of said information offers in a central database.
- M- The method further comprises interfacing the central database in a front-end system for responding to user queries.
- the front-end system may be one or more Web, application and other server for running an interactive web site.
- User queries may be geographic location based queries, and may be for the purpose of comparative shopping.
- Shopping may be researched at online or offline stores.
- O- Processing retrieved pages further comprises the execution of a generalized algorithm for web page processing, according to the following steps: i. pre-processing for Web pages, for filtering out all Web pages that are not relevant; ii. Web page processing, for parsing Web pages to build legitimate product offer records; iii. post processing for enriching the knowledge base.
- P- Web page processing includes the following operations: i. Updating the site's identified offers in the offers database, including saving historical information about offers in the history database; ii. ' Updating the site's information in the merchants database; iii. Adjusting the site ' s next revisit time in the site revisit queue, based on the amount of change in the site ' s processed data; iv.
- the present invention includes a method for structuring information from universal marked-up documents, comprising the execution of a page processing algorithm, according to the following steps: i. Receiving information documents: ii. Scanning said documents for offers; iii. Parsing said document into a parsing tree; iv. running an attribute identification program to find candidates for offer components; and v. running a structure identification program in order to find structures in documents.
- the core technology of the data processing system and method is an offer processing system and method.
- the offer processing system collects, stores, processes, retrieves, and presents offers. It automatically aggregates offers from a very large number of information sources, such as Web merchants and service providers (potentially, most of such sites on the Web).
- Web merchants and service providers potentially, most of such sites on the Web.
- Figure 1 is a block diagram of the main blocks of the present invention.
- Figure 2 is a block diagram decomposition of the offer processing module.
- Figure 3 is a block diagram decomposition of the lead processor module.
- Figure 4 is a block diagram depicting the site classification module.
- Figure 5 is a block diagram decomposition of the offer processor module.
- Figure 6 is a block diagram decomposition of the page processor module.
- Figure 7 is a part of an exemplary web page from a site that offers wines.
- Figure 8 is a graphical representation of the results of processing the exemplary Web page.
- Figure 9 is a graphical representation of part of the parsing tree of the exemplary web page.
- Figure 10 is a block diagram decomposition of the database system/module.
- Figure 1 1 is an exemplary screen shots of a geographic based comparative pricing search. DESCRIPTION OF THE PREFERRED EMBODIMENT
- the present invention is of a system and method for automatic, knowledge-based processing and structuring of information from marked-up documents.
- universal marked-up documents refer to any type of documents that have marked up properties, such as SGML, HTML, XML, Microsoft WORD, PDF, WML, VML, RTF, XHTML, SMIL, SGML and HDML. or any other current or future mark-up languages or document types.
- Mark-up refers to the sequence of characters or other symbols that may be inserted at certain places in a file, such as tags, to indicate how the file should perform when it is printed, displayed or otherwise used or processed. It is also used to describe the document's logical structure.
- the core technology of the present invention is a back-end system, which processes data sources and stores the results of the processing in a database.
- the components of this back-end system include:
- An additional, optional element of the present invention is a front-end system, which may or may not be utilized in any particular application of the system.
- This is an Information Presentation system for presenting the information elements to users in a structured form, or alternatively it may be used for data retrieval for further processing, such as by 3rd party's secondary servers.
- the present invention provides a system and method for extracting information elements, referred to hereinafter as information offers, from information sources.
- This Information processing system retrieves, processes, and stores offers. Offers may be defined as any relevant information elements from a document that can be grouped together for the purpose of describing an item or service.
- the system may define a real estate offer, and may attach information elements such as type, location, price, size, features, owners etc.
- the core, back-end technology of the present invention provides for the means to create a database where processed information offers are stored
- the back-end system is self-standing and can be operated independently as a supplier of information offers to third parties.
- the front-end technology of the present invention provides various means to present the information elements in response to client queries, or to prepare these elements for further processing by a third party.
- the front-end system is not a necessary component of the present invention.
- the present invention is a system and method for processing and structuring information offers retrieved by the system from Internet, intranet, extranet and other network-based marked-up pages. Every information offer pertains to a certain, pre-defined, offer category (e.g., the VCR category of all VCRs).
- the formal system definition for an offer is an information element uniquely identified by specific values for a set of attributes.
- An attribute is a feature of the element, which is being offered. That element may be a product, a service etc.
- An offer category is the set of all the elements that share the same set of attributes. Every offer category has its own set of offer category attributes (e.g., the VCR category attributes may include the brand that makes the VCR, the model of the VCR, the number of heads it has, etc).
- Associated with every offer category attribute there is a list of known grammatical templates and values. These grammatical templates and values provide the set of possible values that the attribute may assume.
- a certain offer category there are subsets (one or more) of its offer category attributes, which collectively, uniquely identify an offer in that category (each subset is used separately for the identification). For example a brand and a model uniquely identify a certain VCR within the VCR offer category. An offer category attribute that belongs to one of a subset is called a (offer category) key attribute. Such a subset is called a key attribute subset.
- the other offer category attributes may be either identification attributes or search attributes. Identification attributes are use to further discriminate offers which were found using the same key attribute subset but differ in other important attributes. For example a computer may be identified by its brand and model, yet two offers for computers of the same brand and model may differ by the amount of RAM the computers have.
- a search attribute is an offer attribute, which is not required in order to generate an offer record, but still contains additional information about the offer.
- the color of the above mentioned computer might be a search attribute.
- Both the Identification attributes and the Search attributes are found by the page processor component of the present invention, but are used by the offer presentation system for the benefit of its users. Both attributes might have a default value in the knowledge base of the present invention, which is used if a value was not found in the offer itself.
- the Offer processing system is based on a generic algorithm - the page processor algorithm, which processes documents and identifies offers.
- the page processor behavior is defined by the system knowledge base.
- the system's knowledge base contains the data about offer categories that the system has knowledge of (the known offer categories), the attributes, associated grammatical templates and values, key attributes, key attribute subsets of each known offer category, etc.
- the knowledge base for a category is prepared using proprietary definitions, based on domain-specific expertise. Before starting system identification of a new offer category, a system editor inserts the known grammatical templates for the new category. Since the system's learning (see description below) capability enlarges the known grammatical templates automatically, during system operation, it is sufficient to prepare an initial (non-comprehensive) set of known grammatical templates. This minimizes necessary work when preparing for a new offer category, and also significantly helps lower maintenance costs.
- the knowledge base is stored in the database (specifically, the offer categories database).
- the offer processing system of the present invention scans selected information sources and retrieves pages from them. It then processes these pages, using the system's knowledge base, in order to identify offers that belong to one of the known offer categories.
- An offer is identified by searching for known grammatical templates that match key attributes. Each identified offer is stored in an offer database.
- the offer processing system regularly performs the scan, retrieve and identify procedure in order to find new offers as well as update the already found offers.
- the offer processing system applies a sophisticated learning algorithm in order to enrich its known template repertoire. This feature increases the number of identified offers. It also allows the system to start identifying offers using a smaller knowledge base, enlarging it over time.
- the offer processing system assumes a partial fulfillment requirement, i.e., it is a heuristic system that may miss offers and may also exhibit a certain number of singular errors. However, the system still provides substantial value and business benefits by its information structuring.
- the offer processing system as described has many generic aspects in its operation, which can be adjusted and tuned according to the specific application being applied.
- the present invention which enables automatic, knowledge-based data processing and structuring of information from marked-up documents, includes an innovative offer processing system. The principles and operations of the present invention may be better understood with reference to the attached drawings, and the accompanying descriptions, wherein:
- Figure 1 is a block diagram containing the main blocks of the present invention, referred to as the offers processing system.
- This system has 3 parts: a back-end (the offer processing 1000), a front-end (offer presentation 2000) and a database system 3000 that the other 2 parts use.
- the offer presentation 2000 system is not essential, and can be executed in various ways or by various third parties.
- 3 rd Parties are intended to include any business partners who have access to at least a part of the present invention for the purpose of further processing. This processing may be for presenting to consumers, corporations, and any other users, or alternatively to provide the processing means for any other purposes.
- the database system 3000 stores the offers and other data that the system needs for its operation.
- the purpose of the back-end is to build the offer database in the database system 3000. It performs this operation by accessing data sources (such as Internet sites, intranet files etc.), retrieving pages, and processing them.
- the purpose of the front-end is to allow prospective users to access the offer database and retrieve information from it.
- the front-end system includes one or more servers that may be accessed by different web-enabled means (e.g., web browsers, cellular phone / PDA, digital 7 TV, Internet appliances, voice activated user interfaces etc.) Secondary servers (servers of 3rd parties) could also access the front end. In this case, the secondary servers are "powered by" the offer processing system.
- the offer processing system has a scalable architecture, in that it may be operated across an unlimited number of Web, application, or other servers, according to need.
- the offer processing 1000 and offer presentation 2000 operate independently. Thus, even in case one of them fails, the other keeps functioning.
- the figures present the logical decomposition of the system into modules. Those skilled in the art will realize that the functionality of any of the modules can be distributed over a plurality of computers, wherein the databases and processors are housed in separate units or locations. Those skilled in the art will appreciate that an almost unlimited number of processors and / or storage units may be supported. This arrangement yields a scalable, high performance, dynamic, highly available, and flexible system that is secure in the face of catastrophic hardware failures affecting the entire system.
- FIG. 2 is a block diagram decomposition of the back-end offer processing system 1000.
- the Leads processor 100 traverses selected information sites that may contain offers. Each site is checked to see if it is likely to contain offers from known offer categories.
- the Leads processor 100 retrieves data about the known offer categories from the database system 3000. Sites that contain such offers are inserted into the site revisit queue 200.
- the site revisit queue 200 stores all the sites that probably contain offers. It regularly performs time-based scans of these sites.
- the Site Revisit Queue is a software means for managing continued interaction with information sources so that the offers are up to date.
- Each scanned site is processed in the offer processor 300.
- the offer processor 300 extracts offers of known offer categories , as described above.
- the identified offers are simply stored in the database system 3000.
- the site's identified offers are updated in the database system 3000.
- Sites containing offers that have been processed are subsequently stored in the Site revisit queue 200 for future monitoring and re-processing.
- FIG. 3 is a block diagram illustrating the decomposition of the leads processor 100.
- the leads processor 100 is the component of the backend system that enlarges the collection of sites that contain relevant information for the database system.
- the whole back-end system can still operate and keep virtually any number of constant sources up-to-date without activating the leads processor
- the leads processor 100 increases the number of sites that it processes by employing automated tools to search the relevant network for potentially valuable sites and pages.
- the tools, or components, that execute this function from within the leads processor 100 are the following:
- the manual Leads 110 component that allows a system operator to input Leads manually.
- External knowledge bases such as the yellow pages, search engines, directories, etc 120, which represent different, focused Web sources for Leads.
- the Leads processor 100 queries these focused sources to receive result pages with lists of possible relevant lead addresses (i.e., URLs). The exact manner of the querying depends on the type of the source and the interface it supports for user queries (search engines are used in a different manner than yellow pages). It may also depend on the category of offers that are expected to be found in the site. For example, when searching for leads, search engines will be queried using the known grammatical templates and values contained in the system's knowledge base.
- the Leads processor 100 follows "next" links between the result pages, to retrieve multiple pages.
- the Leads processor 100 filters all site addresses from the results, to obtain individual leads for its operations.
- the e-mail registry 130 which is not integral to the invention, is a registry of merchant sites that were received via e-mail. Site owners that would like to be covered by the system send these e-mails.
- the DNS (Domain Name System) scan 140 is an automatic scan of domain names.
- the Leads database 3400 ( Figure 10), which is a part of the database system 3000, for policy based, periodical check and classification of leads.
- the Leads database 3400 also contains all the leads that it has processed in the past together with their classification (as described below). It constantly retrieves leads from the Leads database 3400 using a priority policy. For example, manually fed addresses will be of top priority, while "brute force" global DNS searching will be of lower priority. Higher priority is assigned to Leads that have higher probability to contain offers in known offer categories.
- the site classification manager 150 is responsible for classifying sites into one of two categories: "interesting " (probably contains offers) or "not interesting" (probably doesn't contain offers). This classification is probabilistic.
- the general steps the Leads manager 160 performs are the following: 1) Retrieving a lead to be processed (according to priority). 2) Checking against the Leads database whether or not the lead has been processed before. 3) If it has, and it is classified as interesting, it skips it, and goes back to step 1 (retrieving a new lead). 4) Otherwise, classifying the lead using the site classification 150. 5) Storing the lead and its classification in the Leads database 3400. If the lead was classified as interesting, it is passed on to the Site Revisit Queue 200. Otherwise the Leads Manager 160 lowers the lead's priority in the Leads database 3400. This enables the system to handle changes in sites' content, changes in the known offer categories, and avoids frequent scanning of sites with low probability of being interesting.
- the Site Classification is an integral component for determining the relevancy of the content of information sources, as provided by the Leads Processor 100. It is comprised of:
- a site classifier 151 which manages the classification.
- a Web Walker 152 which retrieves documents from the information sources.
- a Reduced page Processor 153 which processes retrieved documents and estimates the probability of existence of offers.
- the site classifier 151 manages the classification process. It receives a URL of a site, or an address of an information source, for classification, and uses the web walker 152 to fetch Web pages from the site (the web walker is described below). It instructs the web walker 152 to retrieve a pre-defined, configurable number of pages from the site. It also limits its walking depth within the site (see the description of the web walker for more detail). These parameters can be relatively small. For example, a depth of 2-3 usually should be sufficient for good classification. This is because, most likely, relevant information such as offers are accessible by following 2-3 hyper-links. Retrieving a small number of documents, while maintaining good classification results saves system resources and provides an effective balance between missing relevant information sources and using system resources dedicated to offer extraction.
- the reduced page processor 153 is similar to the page processor 330 (see description below, Figure 5). It has many of the fundamental capabilities and methods of the page processor 330. The difference is the focus of its operation so that it generates a different output (it doesn't extract offers).
- the page processor 330 identifies partial or complete offers in the document. The results of this process are levels of certainty of the existence of offers in the document and their corresponding identified offer categories.
- the reduced page processor searches the pages for known keywords and templates, from all offer categories.
- the reduced page processor counts occurrence frequencies by offer category.
- High frequency of templates or values from one offer category classifies the page for that category (for example a page containing words or templates like "chateau XXX”, “Champaign”, “bottle”, 1986, etc. are classified into offer category "Wines"). If the frequency exceeds a certain value, the page is classified as interesting.
- the site classifier 151 accumulates the results from the processed documents, uses thresholds and decides on the information source classification.
- the site classifier 151 stores this classification in the Leads Database 3400.
- the web walker 152 locates pages by repeatedly processing web or other pages, extracting links from them, and following these links to other pages. It operates in a similar way to a user that uses a graphical tool that was built for site navigation (e.g., a browser). The difference is that compared to a human being, the web walker navigates the site in a much more orderly and exhaustive manner. It retrieves the documents from the site exactly as navigation tools do, by submitting client requests to web servers. Thus, the documents are retrieved without any need for an active participation of the information source, apart of its general accessibility to its users.
- the web walker 152 starts from a certain page, e.g., the home page of some site.
- the web walker 152 processes the page and finds all the links on the page that link to other pages in the same site.
- the web walker 152 uses a commercial marked-up document parser. There are various means to specify links between pages, depending on the markup language.
- the web walker 152 identifies the language and uses appropriate tools and methods to obtain the links. For example, in HTML-based documents the web walker 152 obtains links that are specified using methods (and combinations of such methods) such as:
- Forms (usually used to query a database-based site that contains offers): It handles both GET & POST form submitting.
- the web walker 152 automatically attempts to create a sufficient form data set that it submits to the form-processing agent. The exact method is determined after analysis of the form. For example, for a form that has lists of options to select from, the web walker 152 may iterate over the options of one selection, while keeping all the other selections constant. This method usually covers all the possible results in the database that the form enables access to.
- Links that are partially or wholly generated with the aid of client side scripts e.g., java-scripts.
- the web walker 152 uses commercial tools that can interpret the 'script, execute it on demand, and support integration with the underlying object model of the parsed document. Using these tools, the web walker 152 is able to handle and obtain links of this type.
- the web walker 152 only follows links to documents that are later processed by the page processor 330. For example, it follows links to HTML or XML documents, but does not necessarily follow links to GIF or JPEG documents. It has a configurable list of document types with an indication of whether or not to follow links to documents of the corresponding type. Documents whose type is unknown or can't be reasonably assumed, are retrieved and processed. Such cases are reported and monitored by a system operator in order to update the system with regard to existing document types.
- the web walker 152 navigates a site using the well-known A* algorithm.
- the basic traversal method uses BFS (breadth first search - a special case of the A* algorithm). Consequently, when following relevant links in a document, the web walker follows the links one by one, in the order of their appearance. It retrieves the documents that are directly linked to the processed page (direct descendants that are "one click" away from the processed page) before retrieving documents that are indirectly linked. As it follows the links in the site, the web walker maintains a tree of the documents in the site, their links, and the relations between them. It uses this information to avoid loops. During traversal, the web walker employs heuristics that may result in a non-BFS traversal.
- the results may span more than one page.
- These pages are usually linked together using common methods such as a "next" button.
- the web walker tries to locate these special links (the "next" button). If the web walker decides (using certainty thresholds) that it has identified a special link, it gives it precedence over regular links, thus deviating from the simple BFS traversal method.
- the web walker 152 uses, for every information source that it handles, several configurable operational parameters:
- the maximal walking depth This is the smallest number of successive links that need to be followed to retrieve some document, relative to the document where we started the retrieval.
- the web walker 152 stops retrieving documents from an information source when one of the following occurs:
- FIG. 5 is a block diagram depicting the site revisit module and a decomposition of the offer processor 300.
- the site revisit queue 200 manages the whole processing of the offers.
- the site revisit queue 200 usually receives sites to the queue from the leads processor 100. However, it also supports direct insertion of sites that haven't been processed and classified by the leads processor 100. The insertion could be manual (by a system operator) or computerized (e.g., by accessing some information system, possibly of a 3 rd party).
- the site revisit queue 200 uses the site processor 310 to process single sites, and perform operations at the site level.
- the site processor 310 manages various operations while processing a site, these operations include:
- the site processor 310 uses the web walker 152 to fetch pages from the site.
- the site processor 310 hands the fetched pages to the page processor 330 to identify and update offers.
- the site processor 310 is also responsible for removing sites from the site revisit queue 200 when it reaches a decision (based on the processing history of the site) that the site isn't relevant anymore. This could happen, for example, when a site is classified as interesting by the leads processor 100 and inserted into the site revisit queue 200. After full processing of the site by the site processor 310, no offers are identified. In this case the initial classification was wrong and the site is put back into the leads database 3400 and marked as not interesting.
- Figure 6 is a block diagram decomposition of the page processor 330.
- the page processor 330 receives a single marked-up document and scans the document for offers. If the document contains offers, it will have certain structural properties, which enables the users of the document to assemble the offer components into offers.
- HTML structural properties are usually the result of the fact that a human being is supposed to view the document (in addition to the inherent structural properties that the syntax of the mark-up language provides).
- a human being should be able to easily understand the offers in the exemplary HTML document.
- this HTML document author will arrange the contents of the offers in such a way that they will be visually comprehensible to a human being.
- the author uses HTML tags for that purpose.
- Any mark up document which is intended to be displayed to humans, will have the visual structural properties since the structural properties are derived from the capabilities of the document's user.
- Other types of documents which are not used by human still, have some pre-defined structural properties.
- XML may be used for inter-system communications, and so the structural properties may differ from those found in HTML documents.
- the knowledge base contains the structural properties per mark up language and possibly per application.
- the page processor 330 parses the marked-up document into a parsing tree, using a commercial mark up type designated parser (i.e. there is a different parser for each type of mark up document).
- Each mark up tag is a node, and the text content is in the leaves.
- the page processor 330 then runs the attribute identification 332 in order to find candidates for offer components.
- the attribute identification 332 scans each text node and looks for a text token that matches one of the templates and values stored in the offer categories database 3300. If some of mark up tags contains attribute information, which are mark-up language-specific, they will be analyzed as well. For example, graphic images and alt attributes of IMG elements in HTML documents.
- the result of the attribute identification 332 is that each node in the parsing tree has a list of all the recognized templates or values that matches one or more attributes of some offer categories. For example Sony is identified as a brand attribute of some consumer electronic offer categories (TV, VCRs, DVDs etc.).
- the page processor 330 uses the Structure Identification 331 to identify the structural properties within the document.
- the Structure Identification 331 is specific per mark up document type, since different mark up types use different set of tags and may have different structural properties.
- the Structure Identification 331 identifies pre-defined structures within the document for example, in the exemplary HTML-based documents, it handles the specific tables, paragraphs and lines structures. As time passes, there may be changes in the way documents are structured. A possible cause is a change in the mark-up technology. System engineers and operators will regularly monitor such changes, aided by system reports and performance tracking. They will adjust structural knowledge in the knowledge base accordingly.
- the page processor 330 tries also to integrate text tokens spanning over more than one node into a value that matches a known template or value.
- the page processor 330 identifies an offer within the identified structures. There must be candidates for all the key attributes in at least one of the key attribute subsets. For example, if the key attribute subset of a job offering is the offering firm, the title, the required experience and the location, an offer might be Vineto ltd., Software engineer, 3 years experience in Tel-Aviv, Israel. An offer is identified only if values were found for all necessary attributes within an identified structure. The offer identification 333 stores the identified offer in the database. If not all the necessary attribute has a value this is a partial offer.
- the offer identification 333 uses the structural knowledge in order to assume where an offer component is to be expected. For example, structural symmetries are a good source for learning new grammatical templates. "Ordered" sites (sites that organize the date in their pages in a consistent manner) will usually exhibit such symmetry properties. If the values collected this way pass a confirmation test (automatic or manual) they will be entered into the database, and enlarge the knowledge source regarding the grammatical templates associated with attributes.
- This process will gradually enlarge the number of identified components and will improve the page processor's 330 accuracy and percentage of identified offers. For example, if we found 5 offers in the first 5 rows of a table and didn't found an offer in the 6 th row, and the structure of the first 5 rows is similar to the 6 th row (it doesn't have to be identical) than there is high probability that the 6 th row contains an offer as well.
- the page processor 330 deduces the location of the 6 th offer attribute according to the location of the attribute of the preceding 5 offers.
- the page processor component of the back-end system is in itself is a unique sub-system that can operate independently of all the other system modules. It needs the knowledge base for its operation. However, for example, it can extract offers from documents that were placed on a local disk and process them one by one, without necessarily requiring the leads processor component, web walker component, etc.
- Figure 7 and Figure 8 are examples of HTML web pages that contain offers. They demonstrate 2 different exemplary methods to organize offers that conform to the principles stated above.
- the first method illustrated in Figure 7, is to present several offers in a table. Each offer is presented within a single table row. The human eye easily identifies the offers' border. Since a row is a known HTML structure the page processor will look into the rows. The attribute identification identifies the known templates and values. If a row contains values for all the necessary attributes an offer is identified. If and offer were not found in one of the rows, the page processor still might find the offer attribute due the structure similarity between the rows.
- the second method illustrated in figure 8, is to present each offer in a different table. Someone who is skilled in the art can easily see that what was described for the first method applies to the second as well.
- Figure 9 shows part of the parsing tree constructed from the exemplary web page of Figure 8
- the parsing tree is the base of the structural properties analysis. Under each table block the offer's attributes can be seen. If all the identified value within the table block is collected, an offer can be deduced.
- Figure 10 is a block diagram decomposition of the database system 3000.
- the entire database system is based on a commercial database management system (DBMS, such as Oracle).
- the database management 3100 represents that software. It is responsible for the creation and maintenance of the database system 3000. It provides interfaces for all the database operations (insertion, removal, queries, etc).
- the database system 3000 is the repository of the present invention's entire system data.
- the other blocks in the diagram (3200 through 3600) represent the various individual logical repositories.
- the database management 3100 manages the access to these repositories.
- the offer database 3200 stores all the data about identified offers. This repository is mainly updated by the offer processor.
- An offer record includes, at least, the following information: 1) The site that offers the offer (a link to the site's record in the offer providers database 3600); 2) The offer category to which it belongs; 3) Values for the key attributes that identify the offer and other attributes of the offer.
- the specific fields that are present depend upon the offer's category. For example, if the offer is for a room in a motel, 'there may be information about the price of the room, whether it's a single or a double, etc. In a different example, say for a job offer, the data may be the salary offered (if it was published and identified), the company offering the position, etc.; 4) The link to the page that contains the offer; 5) The date when the offer was last updated.
- the offer categories database 3300 contains the data about the known offer categories. System editors update it manually.
- An offer category record includes, at least, the following fields: 1) The offer category attributes. 2) The key attribute subset or subsets. 3) The grammatical templates associated with the offer category attributes. Grammatical templates may be applicable to one offer category, or may be applicable to more than one. For example, the grammatical templates for a price attribute may be applicable to many offer categories.
- the Leads database 3400 contains the data about Leads.
- a lead record includes, at least, the following fields: 1) The lead's URL 2) The lead's classification (containing / not containing offers).
- the history database 3500 contains data gathered by the system over time, which may be used for various database products.
- the history database 3500 could contain the average price of used cars, according to model, age, etc, at different points in time. This could be used, for example, to derive a price trend graph for used cars.
- location based applications e.g., local commerce aggregation
- valuable reports could also be generated based on geographic location.
- the offer providers' database 3600 contains the data about offer providers that the system recognized.
- An offer provider's record includes, at least, the following fields: 1) URL of web sites of the offer provider. 2) Information about brick and mortar stores that the merchant might have (an optional field - some offer providers have only web operations), such as the physical addresses of the stores.
- the offer processing system operates automatically, with minimal human intervention.
- the only mandatory manual operation is the preparation of an initial knowledge base, for every known offer category.
- the system contains auxiliary applications to be used by system operators, such as system administrators and editors.
- Such applications include:
- An application to maintain the offer categories database 3300 e.g., insertion of the initial knowledge base about an offer category).
- An application to maintain the Leads database 3400 e.g., manually insert new Leads.
- An application to maintain the site revisit queue 200 (e.g., to change the scheduled revisit time of a site, to directly insert sites into the revisit queue).
- An application to aid in the maintenance of learning process of the page processor 300 e.g., acknowledging new proposed grammatical templates.
- An application to view various system errors logs e.g., reports about broken links in the web walker 152, SW warnings and errors).
- a preferred embodiment of the present invention is a global and/or local content unification platform for various vertical markets.
- This platform is a local commerce unification platform in the Application Service Provider model, whereby the back-end and database modules may be outsourced to 3rd parties, such as wireless service providers, portals, local media players, VAR (Value Added Resellers), ISPs, content syndicators, local sites, other content aggregators, etc.
- This platform aggregates content from various vertical markets, and may optionally be bundled with a front-end application such as a web sites, for the end user of the distributor. Alternatively aggregated content may be supplied (or parts thereof, depending on the distributor - different distributors may be interested in different combinations of content type / market and geographic locations).
- a newspaper chain that has sites in New York and Los Angeles, may like to receive unified content for all the vertical markets, whereas the real-estate national portal may wish to receive real-estate content only over all the USA.
- This platform is beneficial for brands, merchants and aggregated businesses, content distributors - scalable, fast and efficient aggregation of content in multiple markets and localities, and end users (Internet & wireless subscribers) - an efficient, high-quality search tool.
- This platform enables content unification for markets that include retail, professional services (e.g., dry cleaning, plumbers, doctors, lawyers), job offers, real estate offers, events, accommodations, and job finding related sources (e.g., job offers, candidates' CVs), as well as auctions, classifieds, B2B processes, consumer goods and wholesale items.
- This platform leverages its automatic, structured, product/service-oriented search engine technology to aggregate information from an almost unlimited number of content sources (e.g., merchants or service providers), online as well as offline, in a cheap, fast and scalable method.
- This preferred embodiment uses the offer processing system of the present invention in a manner that is focused on finding, identifying and aggregating product or service offerings from these content sources (e.g., merchants or service providers).
- the present embodiment offers consumers a powerful product/service finding experience (including structured, feature based and location based searches) over the WWW using PC's, PDAs, mobile computers, cellular phones, Digital TV as well as any other Internet enabled devices or services.
- the Web site application of the present embodiment offers easy, structured location, price and product features based product/service offer search facilities, supportive editorial content and buying decision aids, user registration and personalization services, focused advertisements, bi-directional e-mail services, etc. Similar features are available for users accessing this embodiment with mobile devices, such as WAP phones and PDA's.
- the user interface on each of these devices is adapted according to the limitations and options of the specific device (which vary considerably).
- simple, WAP based interfaces in use by many current cellular phones have only several lines of textual display and a very inconvenient and limited "keyboard".
- the interface will be adapted to display very small amounts of data, and that data will be the most relevant to the user (aided by user personalization through the web site).
- the input from the user is based mainly on simple selections from short option lists that require a minimal use of keys.
- Another possible interaction channel is voice.
- the user uses a voice enabled Internet appliance to interact with the web site's servers. The user "surfs" by talking to an IVR-like system that uses commercial voice recognition technologies.
- the interface could be based solely on voice interaction, or it could be using combinations of voice, visual display, and keyboard inputs.
- the system could ask and direct the user using voice, the user provides her inputs by pressing on the device's keys, and the system will read the results of the search to the user.
- Another possibility is for the system to display the options on the device's display, and let the user select the options using her voice.
- the results could also be displayed on the device's display. The exact combination depends on many factors including the capabilities of the specific device and the details of the application at hand.
- the unique technology that is disclosed in the present embodiment has significant advantages over all known existing technologies and business practices of local or global content unification platforms. Among these advantages are:
- the page processor can automatically process offers from sites with new content structures and styles.
- Geographical segmentation capabilities enables localized service over all of the above mentioned devices and means.
- the present invention facilitates the structuring of information both from online and offline merchant stores and other businesses, enabling users to compare between an almost unlimited number of merchants and service providers. Furthermore the users can undertake local merchant or services searches, and receive product or service comparison information based on geographical specifications.
- Figure 11 illustrates a possible exemplary screen shot of the web site.
- the web site enables the style and layout discrimination of regular merchant product offerings from "Preferred business partners" (for example, merchants paying commissions) product offerings.
- the web site's user performs a hierarchical product or service search.
- the user drills-down from general product category into specific product category (for example from “Food and beverages” into “Wine” into “Red wines” etc.).
- specific product category for example from "Food and beverages” into “Wine” into “Red wines” etc.
- search for offers While searching for offers, the user can search products (or models) by choosing a combination of attributes like price range, specific vendor and product category specific values like size, color, features etc.
- a user can limit the offers for products that are only sold through the web, and/or to products offered by vendors that operate brick-and-mortar outlets in the geographic vicinity of the user (calculated from user's address/zip codes, see "User registration” section).
- the results can be seen in Figure 11 , illustrating a typical user interface reflecting the results of a product search.
- the user receives an answer to his or her query, ordered in a table (may be divided into some pages).
- the table is ordered by some default criteria (like ascending price), which the user can simple change according to his or her preferences. Offers which are in "uncertain” status, or offers that are price way too low for what is common in this product category are listed at the end, and are marked as "uncertain” (to prevent Spam or error).
- the user gets the main fields of data for each product, while they can ask for additional information or link to the original products web page.
- a user can easily filter his or her query by using a combination of field constraints.
- the system informs the user how different product and vendors offer products in each category are, and how much this number is reduced after each filtering criteria is applied.
- a "compare tool” lets the user to compare the "interesting product offers with each other, attribute by attribute. If the user has undertaken a localized search (asked for brick-and-mortar outlet near it's geographic location) she or he can get a web map with all the local stores offering this product & summarized report listing of those shops.
- the present invention supplies local as well as global content automation & infrastructure.
- the preferred embodiment's market includes Internet & wireless subscribers, wireless providers, Internet portals, Internet portal infrastructure integrators, merchants, and brands. While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Information Transfer Between Computers (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU79416/00A AU7941600A (en) | 1999-10-12 | 2000-10-12 | A method and system for automatically structuring content from universal marked-up documents |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15885499P | 1999-10-12 | 1999-10-12 | |
US60/158,854 | 1999-10-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001027712A2 true WO2001027712A2 (en) | 2001-04-19 |
WO2001027712A3 WO2001027712A3 (en) | 2001-10-18 |
Family
ID=22570004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL2000/000648 WO2001027712A2 (en) | 1999-10-12 | 2000-10-12 | A method and system for automatically structuring content from universal marked-up documents |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU7941600A (en) |
WO (1) | WO2001027712A2 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1253530A2 (en) * | 2001-04-26 | 2002-10-30 | Siemens Aktiengesellschaft | System and method for automatically updating product data in an electronic catalog |
EP1503306A2 (en) * | 2003-07-29 | 2005-02-02 | Matsushita Electric Industrial Co., Ltd. | Information display apparatus |
US6931397B1 (en) * | 2000-02-11 | 2005-08-16 | International Business Machines Corporation | System and method for automatic generation of dynamic search abstracts contain metadata by crawler |
EP1566750A1 (en) * | 2004-02-20 | 2005-08-24 | Brother Kogyo Kabushiki Kaisha | Data processing unit and data processing program stored in computer readable medium |
WO2008067191A2 (en) * | 2006-11-27 | 2008-06-05 | Designin Corporation | Systems, methods, and computer program products for home and landscape design |
US7529748B2 (en) | 2005-11-15 | 2009-05-05 | Ji-Rong Wen | Information classification paradigm |
US8117558B2 (en) | 2006-11-27 | 2012-02-14 | Designin Corporation | Converting web content into two-dimensional CAD drawings and three-dimensional CAD models |
US8122370B2 (en) | 2006-11-27 | 2012-02-21 | Designin Corporation | Visual bookmarks for home and landscape design |
US8260581B2 (en) | 2006-11-27 | 2012-09-04 | Designin Corporation | Joining and disjoining individual rooms in a floor plan |
US8868598B2 (en) | 2012-08-15 | 2014-10-21 | Microsoft Corporation | Smart user-centric information aggregation |
US9019266B2 (en) | 2006-11-27 | 2015-04-28 | Designin Corporation | Systems, methods, and computer program products for home and landscape design |
CN111401986A (en) * | 2020-02-28 | 2020-07-10 | 周永东 | Commodity trading method and system of trading platform |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5933822A (en) * | 1997-07-22 | 1999-08-03 | Microsoft Corporation | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision |
US5987454A (en) * | 1997-06-09 | 1999-11-16 | Hobbs; Allen | Method and apparatus for selectively augmenting retrieved text, numbers, maps, charts, still pictures and/or graphics, moving pictures and/or graphics and audio information from a network resource |
US6006225A (en) * | 1998-06-15 | 1999-12-21 | Amazon.Com | Refining search queries by the suggestion of correlated terms from prior searches |
US6081814A (en) * | 1997-07-07 | 2000-06-27 | Novell, Inc. | Document reference environment manager |
-
2000
- 2000-10-12 WO PCT/IL2000/000648 patent/WO2001027712A2/en active Application Filing
- 2000-10-12 AU AU79416/00A patent/AU7941600A/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5987454A (en) * | 1997-06-09 | 1999-11-16 | Hobbs; Allen | Method and apparatus for selectively augmenting retrieved text, numbers, maps, charts, still pictures and/or graphics, moving pictures and/or graphics and audio information from a network resource |
US6081814A (en) * | 1997-07-07 | 2000-06-27 | Novell, Inc. | Document reference environment manager |
US5933822A (en) * | 1997-07-22 | 1999-08-03 | Microsoft Corporation | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision |
US6006225A (en) * | 1998-06-15 | 1999-12-21 | Amazon.Com | Refining search queries by the suggestion of correlated terms from prior searches |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6931397B1 (en) * | 2000-02-11 | 2005-08-16 | International Business Machines Corporation | System and method for automatic generation of dynamic search abstracts contain metadata by crawler |
EP1253530A2 (en) * | 2001-04-26 | 2002-10-30 | Siemens Aktiengesellschaft | System and method for automatically updating product data in an electronic catalog |
EP1253530A3 (en) * | 2001-04-26 | 2005-04-20 | Siemens Aktiengesellschaft | System and method for automatically updating product data in an electronic catalog |
EP1503306A2 (en) * | 2003-07-29 | 2005-02-02 | Matsushita Electric Industrial Co., Ltd. | Information display apparatus |
EP1503306A3 (en) * | 2003-07-29 | 2006-01-04 | Matsushita Electric Industrial Co., Ltd. | Information display apparatus |
US7257573B2 (en) | 2003-07-29 | 2007-08-14 | Matsushita Electric Industrial Co., Ltd. | Information display apparatus |
EP1566750A1 (en) * | 2004-02-20 | 2005-08-24 | Brother Kogyo Kabushiki Kaisha | Data processing unit and data processing program stored in computer readable medium |
US7529748B2 (en) | 2005-11-15 | 2009-05-05 | Ji-Rong Wen | Information classification paradigm |
WO2008067191A3 (en) * | 2006-11-27 | 2008-10-02 | Designin Corp | Systems, methods, and computer program products for home and landscape design |
WO2008067191A2 (en) * | 2006-11-27 | 2008-06-05 | Designin Corporation | Systems, methods, and computer program products for home and landscape design |
US8117558B2 (en) | 2006-11-27 | 2012-02-14 | Designin Corporation | Converting web content into two-dimensional CAD drawings and three-dimensional CAD models |
US8122370B2 (en) | 2006-11-27 | 2012-02-21 | Designin Corporation | Visual bookmarks for home and landscape design |
US8260581B2 (en) | 2006-11-27 | 2012-09-04 | Designin Corporation | Joining and disjoining individual rooms in a floor plan |
US9019266B2 (en) | 2006-11-27 | 2015-04-28 | Designin Corporation | Systems, methods, and computer program products for home and landscape design |
US8868598B2 (en) | 2012-08-15 | 2014-10-21 | Microsoft Corporation | Smart user-centric information aggregation |
CN111401986A (en) * | 2020-02-28 | 2020-07-10 | 周永东 | Commodity trading method and system of trading platform |
Also Published As
Publication number | Publication date |
---|---|
WO2001027712A3 (en) | 2001-10-18 |
AU7941600A (en) | 2001-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103150352B (en) | System to generate related search queries | |
US7260579B2 (en) | Method and apparatus for accessing data within an electronic system by an external system | |
CN100462972C (en) | Document-based information and uniform resource locator (URL) management method and device | |
US7080064B2 (en) | System and method for integrating on-line user ratings of businesses with search engines | |
US8380721B2 (en) | System and method for context-based knowledge search, tagging, collaboration, management, and advertisement | |
US7099859B2 (en) | System and method for integrating off-line ratings of businesses with search engines | |
KR100601578B1 (en) | Summarization and clustering to conceptually classify documents | |
US6256623B1 (en) | Network search access construct for accessing web-based search services | |
US20020107718A1 (en) | "Host vendor driven multi-vendor search system for dynamic market preference tracking" | |
US20080222105A1 (en) | Entity recommendation system using restricted information tagged to selected entities | |
WO2009009192A2 (en) | Adaptive archive data management | |
US8560518B2 (en) | Method and apparatus for building sales tools by mining data from websites | |
JP2009271911A (en) | Method and system for symbolical linkage and intelligent categorization of information | |
WO2001027712A2 (en) | A method and system for automatically structuring content from universal marked-up documents | |
KR20030051577A (en) | Display method for research result in internet site | |
Kale et al. | XML Retrieval with the help of Personalization in Search: A Review | |
JP2007018476A (en) | Stock brand retrieval system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase in: |
Ref country code: US Ref document number: 2001 752552 Date of ref document: 20010103 Kind code of ref document: A Format of ref document f/p: F |
|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase in: |
Ref country code: JP |