[go: up one dir, main page]

WO2011000144A1 - Aggregation method of information items set directory and system thereof - Google Patents

Aggregation method of information items set directory and system thereof Download PDF

Info

Publication number
WO2011000144A1
WO2011000144A1 PCT/CN2009/072520 CN2009072520W WO2011000144A1 WO 2011000144 A1 WO2011000144 A1 WO 2011000144A1 CN 2009072520 W CN2009072520 W CN 2009072520W WO 2011000144 A1 WO2011000144 A1 WO 2011000144A1
Authority
WO
WIPO (PCT)
Prior art keywords
information item
directory
collection
item collection
source
Prior art date
Application number
PCT/CN2009/072520
Other languages
French (fr)
Chinese (zh)
Inventor
时文
Original Assignee
Shi Wen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shi Wen filed Critical Shi Wen
Priority to PCT/CN2009/072520 priority Critical patent/WO2011000144A1/en
Publication of WO2011000144A1 publication Critical patent/WO2011000144A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Definitions

  • the present invention relates to a computer information classification and retrieval technique, and more particularly to an aggregation method and system for an information item collection directory. Background technique
  • a file item refers to a unique identifier of a file in the file system in which it is located; a file item set is composed of n different file items. The set, n is an integer greater than or equal to 0; attribute, record characteristic information of a certain aspect of the file item, can be artificially given.
  • a file item can appear at most once in a collection of file items, but a file item can be attributed to multiple different sets of file items; a file item can be assigned n attributes, The attribute must be set to the corresponding attribute value, n is an integer greater than or equal to 0; a file item set can be assigned n attributes, and the corresponding attribute value range must be set for the attribute given, n is an integer greater than or equal to 0; A file item collection has a certain attribute and a corresponding attribute value range, and the attribute belonging to the file item set must have the attribute, and the corresponding attribute value is within the attribute value range set by the file item set, but a file item may have the same An attribute that is not available in the collection of owned file items.
  • the categorization method of the collection-based file item proposed by this application is to establish the attribution relationship between the file item and the existing file item set by directly specifying or assigning the attribute and setting its corresponding attribute value or a combination of the two.
  • the search method proposed by this application refers to obtaining a result file item set from an existing file item set by using a specified set operation mode or a specified attribute value range or a combination of the two.
  • the technical solution mainly aims to classify and retrieve information items for a single independent information source to establish an information classification directory, and in reality applications, there is a need to access information items from multiple information sources through a unified information classification directory. Because this method of using a unified search portal can be effectively saved.
  • RSS is a typical technical solution for accessing information items from multiple network information sources through a unified information directory. Its principle is to aggregate information items from different network information sources into a tree directory structure preset by the user's local computer.
  • RSS implements the function of aggregating multiple publishing information from different websites into one personal computer terminal
  • its directory structure still uses the traditional tree directory.
  • the information retrieval method is efficient. low. Summary of the invention
  • the object of the present invention is to solve the above problems, and to provide an aggregation method of a collection of information items, which can save time and effort for user information retrieval.
  • Another object of the present invention is to provide an aggregation system for a collection of information item collections.
  • the technical solution of the present invention is:
  • the present invention proposes an aggregation method of an information item collection directory, which aggregates the "source information item collection directory into a result information item collection directory, "is an integer greater than or equal to 1, the aggregation method Includes:
  • the current result information item collection directory at this time is the aggregated result information item collection directory
  • the information item set is a set consisting of ⁇ information items, wherein ⁇ is an integer greater than or equal to 0, and the information item set catalog is composed of " two sets of information items, wherein " 2 is an integer greater than or equal to 0.
  • An aggregation system for a collection of information item collections is also proposed, including:
  • An input module configured to input a “source information item collection directory, where “initialization module for an integer greater than or equal to 1, connecting an input module, setting an initial result information item collection directory, and setting it to be composed of 0 information item sets Information item collection directory;
  • a priority order determining module connecting the initialization module, and determining a processing priority order between the collection lists of the source information items; a directory aggregation module, a connection priority order determining module, processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;
  • the output module which is connected to the directory aggregation module, outputs the current result information item collection directory when the "source information item collection directory" is processed as the final result information item collection directory;
  • the information item set is a set consisting of ⁇ information items, wherein ⁇ is an integer greater than or equal to 0, and the information item set catalog is composed of " two sets of information items, wherein " 2 is an integer greater than or equal to 0.
  • the present invention aims to solve the problem that the search for the single information source and the RSS-like multi-information source aggregation scheme in the application described in the background section are inefficiently searched by the traditional tree directory.
  • the problem is to realize a unified information classification directory by aggregating a plurality of information item collection directories established by a single information source into a unified information item collection directory, thereby realizing the function of retrieving information items from multiple information sources. Thereby achieving the purpose of saving user information retrieval time and energy.
  • Figure 1 is an information item collection directory structure for music information item classification.
  • FIG. 2 is a flow chart showing an embodiment of an aggregation method of an information item collection directory of the present invention.
  • Figure 3 is a flow chart of processing the current source information item collection directory using the DAG topology sorting algorithm.
  • Figure 4 is a DAG description of a sample source information item collection directory structure for explaining the current source information item collection directory processing flow.
  • Figure 5 is a DAG description of the current result information item collection directory for the current source information item collection directory processing flow.
  • Fig. 6 is a DAG description of the directory of the result information item set obtained by aggregating the directory described in Fig. 4 as the current source information item set directory, the directory described in Fig. 5 as the current result information item set directory.
  • Figure 7 is a directory structure of an information item collection of a movie information item classification from the Y website.
  • Figure 8 is an information item collection directory structure of a news information item classification from the X website.
  • Fig. 9 is a structural plan for a result information item collection directory in which the information item set directories shown in Figs. 1, 7, and 8 are aggregated.
  • FIG. 10 is a first priority source information item set directory structure used to aggregate the information item set list shown in FIG. 1, FIG. 7, and FIG.
  • FIG. 11 is a result information item set directory structure in which the information item set list shown in FIG. 1, FIG. 7, and FIG. 8 and the first priority source information item set list shown in FIG. 10 are aggregated.
  • Figure 12 is a schematic diagram of an embodiment of an aggregation system of an information item collection directory of the present invention.
  • Figure 13 is a schematic diagram of a directory aggregation module in the system of the embodiment of Figure 12.
  • the information item in the present invention is defined as an information structure that can be processed and presented to a user on a computer system as a logical whole.
  • a file in a file system is the most typical example of an information item, but not only a file is an information item.
  • a record in a relational database is physically stored as part of a database file, but it can be logically processed and presented to the user as a whole, and thus can be considered a type of information item.
  • an e-mail in a mail delivery software such as Outlook is stored as part of a mail box file, but logically can be processed and displayed as a whole, and thus can be considered as a type of information item.
  • information items are often represented by a unique identifier.
  • a file in the operating system is represented by a unique file path
  • a web page in the Internet can be represented by a unique URL.
  • the simple way is to add a different type of prefix to each information item identifier. For example, when the browser loads the local file, the URL displayed on the URL field is increased before the file path of the operating system. "file : // " is used as a prefix. Therefore, how to create an information item collection catalog for an information source and how to search for information items in a collection of information items can refer to the application mentioned in the background technology.
  • An information item set is a set of mathematical meanings consisting of ⁇ information items, and ⁇ is an integer greater than or equal to 0.
  • a collection of information items can contain a variety of information items, as long as it can be effectively distinguished according to the method described in the previous paragraph.
  • Collection Contents item information is set by the "two items of information by the information structure composed of parent-child relationships, « 2 is an integer greater than or equal to 0.
  • the concept of the parent set is limited to: If the specified information item set A is the parent set of the information item set B, then all the information items contained in B are also included by A, which in turn may be said to be a subset of A.
  • the information item collection directory does not contain any information item collection, so it is also called an empty directory.
  • 2 is greater than or equal to 1, there must be a unique set of root information items (also referred to as the root set) in the directory. The set has no parent set, and any other letter in the directory other than this.
  • the set of interest items must have k parent collections belonging to the directory, and k is an integer greater than or equal to 1.
  • k is an integer greater than or equal to 1.
  • the items included in an information item collection can be divided into two categories: one type of project is a project directly assigned to the collection, which is called a direct membership project; the other is a sub-item
  • the items contained in the collection are called indirect dependent items.
  • Figure 1 shows an example of a collection of information items for music file classification.
  • Music is the collection of root information items for this directory
  • Author is the collection of region
  • year is its collection of subprojects, and so on.
  • West Life is a male group combination, so in the directory its corresponding "West Life”
  • the information project collection has two parent collections, "Male Author” and “Group Combination”. The same principle applies to the “Atomic Kitten” information project collection. "You raise me up. mp3, is the life that West Life sang in 2005. An mp3 file of a popular song is designated as a direct affiliate of "West Life", “European”, “2005”, and “Popular” according to its affiliate information. Since “male author” and “group combination” are the parent collection of "West Life”, Bayu "You raise me up. mp3" is an indirect membership of the first two, and so on. It is also "author” and "music”. Indirect affiliate program. The same principle applies to "If you come to me. mp3,,.
  • FIG. 2 shows an aggregation method of the information item collection directory of the embodiment, which is used for aggregating the "source information item collection directory" into a result information item collection directory, where "is an integer greater than or equal to 1.
  • Step S10 Set an initial result information item set, and set it to an empty directory, that is, an information item collection directory composed of 0 information item sets.
  • Step S12 Set the priority order of the "source information item collection directory", and use the data structure of the queue to store the source information item collection directory.
  • the priority processing order of the source information item collection directory has an impact on the final result, and the final directory structure will be preferentially organized according to the structure of the source information item collection directory in the processing order.
  • the user has established a directory structure on his local computer, which uses this method to aggregate the directories of multiple information sources on the network to the local computer. If a local directory is used as the source directory for the first priority processing, the resulting directory structure will be prioritized according to the local directory structure. In this way, users can easily retrieve information items from multiple sources using their familiar directory structure.
  • Step S14 It is judged whether the processing queue is empty. If the queue is not empty, the process proceeds to step S16, and if the queue is empty, the process ends.
  • Step S16 Take the first element of the team as the source information item collection directory of the current processing, and make a team operation.
  • the dequeue operation means that the first element of the team is removed from the queue and its successor element is taken as the new leader element.
  • Step S18 The current source information item collection directory and the current result information item collection directory are aggregated into a new result information item collection directory, and the new result information item collection directory is used as the current result information item collection directory in the next loop. Then it returns to step S14.
  • the current result information item collection directory is an empty directory, so the result information item obtained at this time is obtained.
  • the collection directory is a copy of the first priority source information item collection directory, and the source information item collection directory to be processed later is supplemented by the structure of the first priority source information item collection directory, and finally the complete result information item collection directory is obtained. Therefore, in the specific implementation, the first priority source information item collection directory is generally used as the blueprint of the main frame of the final result information item collection directory, and the special result can be used to adjust the final result information item collection directory according to the predetermined mapping rules described later. Structure.
  • the processing of the current source information item set directory composed of 0 information item sets is empty, that is, it is treated as the processed source information item set directory without any operation.
  • the processing of the current source information item collection directory composed of the information item set greater than or equal to one is: starting from the root information item set, mapping the source information item set in the current source information item collection directory one by one to the current result information item In the collection directory, where the root information item collection is defined as a collection of information items without a parent collection in a collection of information items. For a collection of information items consisting of a collection of information items greater than or equal to one, there must be a unique set of root information items.
  • mapping mentioned here is a conceptual extension of the mathematical mapping, defined as: fi '. Si ⁇ D, where & is the source information item collection directory, which is the result information item collection directory. For any of the information item sets in &, there is a set of information items corresponding to the result information item collection directory.
  • the method of establishing a mapping relationship is to find an information item set corresponding to the currently processed source information item set on the result information item collection directory according to a predetermined mapping rule as its mapping item set on the result information item collection directory. If there is no corresponding information item set in the current result information item collection directory according to the predetermined mapping rule, a new information item set is created in the result information item collection directory as a corresponding The mapping information item collection, so that the mapping definition is satisfied.
  • the direct affiliated items in the source information item set are specified as direct affiliated items of the mapping information item set.
  • mapping relationship can be established according to some predetermined mapping rules.
  • each information item set is taken with a name representing its connotation, for example:
  • the information item set containing the video information item is named "video”
  • the information item set containing the audio information item is named "audio”.
  • "Wait, the same name mapping rule can be predefined, that is, the information item set having the same name as the source information item set is found in the result information item collection directory as the corresponding mapping information item set.
  • the purpose of this is to aggregate a collection of information items containing similar information items in a plurality of source information item collection directories into one information item set in the result information item collection directory, and these can be included in the result information item collection directory. Similar information items from different sources are retrieved as a whole. Therefore, the predetermined mapping rule in this embodiment is generally established in accordance with the principle of information item set that maps the source information item set to the similar content on the result information item set directory.
  • a new information item set is created in the current result information item collection directory as a corresponding mapping.
  • Information item collection The parent collection of the new mapping information item collection is specified as the source information item collection.
  • the parent collection in the source information item collection directory maps to the corresponding mapping information item collection on the result information item collection directory.
  • any collection of information items on the collection directory so a new set of information items is created in the result information item collection directory. Since the new information item set does not have an existing parent-child relationship in the result information item collection directory, the function of specifying the parent collection according to the above principle is to transplant the corresponding parent-child relationship structure in the source information item collection directory to the result information item collection directory. Go in.
  • the overall effect achieved by processing the source information item collection directory in accordance with a certain priority order is to use the structure of the source information item collection directory with the lower priority order to fill the blank of the structure of the source information item collection directory with the higher priority, and finally obtain the result.
  • the main structure of the information item collection directory is organized according to the structure of the prioritized source information item collection directory, but some of the prioritized source information item collection directory structure does not contain the branch structure, and the source is followed by the priority order.
  • the structure of the information item collection directory is organized.
  • step S18 starts from the root information item set, but can be performed in various orders as long as the parent set designation principle of the new mapping information item set is satisfied.
  • a better implementation method is a topological sorting algorithm using Directed Acyclic Graph (DAG).
  • DAG Directed Acyclic Graph
  • the order is to proceed.
  • the advantage of adopting this algorithm is that the order can ensure that the mapping information item set corresponding to the parent set on the source information item collection directory already exists in the current result information item collection directory, so that only the corresponding parent-child relationship can be simply established. .
  • FIG. 4 is the DAG of the directory conversion of Figure 1.
  • the DAG topology sorting algorithm is used to complete the current source information item collection directory and the current result information item collection directory aggregation process. See Figure 3, specifically Proceed as follows.
  • Step S20 Add the root information item set of the current source information item collection directory to the to-be-processed collection list, and initialize the mapping relationship table to be an empty table.
  • Step S22 Determine whether the to-be-processed collection list is empty. If it is empty, the processing flow is ended. If it is not empty, the process proceeds to step S24.
  • Step S24 Select any one of the to-be-processed collections to be the current processing set.
  • Step S26 Mapping the current processing set to the result information item collection directory, and recording the mapping relationship into the mapping relationship table.
  • Step S28 Add the sub-set of the current processing set to the to-be-processed collection list (if not already added in the pending collection list), and decrement the in-degree of the subset of all currently processed collections in the list by one, Go to step S22.
  • mapping processing There are two additional data structures in the DAG topology sorting algorithm.
  • One is a list of pending collections, consisting of a collection of information items waiting to be processed and their corresponding indegrees.
  • the initial degree is the number of parent collections of a collection of information items. Whenever one of its parent collections completes the mapping process, it reduces its indegree by 1. When its degree of entry is 0, it indicates that all its parent collections have been completed.
  • Another additional data structure is a mapping relationship table, which records a set of mapping information items corresponding to each source information item set in the result information item collection directory as result information of the processing procedure.
  • the root information item set is added to the to-be-processed collection list, because the root information item set is the only information item set in the source information item collection directory that has no parent set, and the initial The degree of entry is o, and all sets of information items in the source information item collection directory can be accessed in turn from the root information item set.
  • step S18 shown in Fig. 2 by the DAG topology sorting algorithm shown in Fig. 3.
  • the directed graph depth-first traversal algorithm can be used to complete step S18 shown in FIG.
  • the process of using the directed graph depth-first traversal algorithm can be described by the following recursive function: void MapSet (NODE_ TYPE* pSetNode, NODE_ TYPE* pParent, NODE_ TYPE* pDest, MAPLIST_ TYPE* pMapList) do-map ( pSetNode, pParent, pDest, pMapList);
  • MapSet (pChilds [i] , pSetNode, pDest, pMapList)
  • MapSet (pSrcRoot, NULL, pDest, pMapList);
  • the function of the MapSet function is defined as mapping all the information item collection directory nodes in the directory starting from pSetNode to the result information item collection directory pDest in depth priority order, mapping correspondence records to pMapList, pParent representing the parent set of pSetNode node.
  • Process only needs to call MapSet with the root information item collection node pSrcRoot of the source information item collection directory as the starting node, and all the information item collections in the entire source information item collection directory can be completed by recursion.
  • Mapping processing The function of the do-map function is to map a single set of information items to the result information item collection directory and record the mapping relationship.
  • the get-chi lds function is a list of all the child nodes of the collection node of the information item.
  • the processing order of "West Life” and "Atomic Kitten” is after their parent collection node, then after creating their corresponding mapping information item collection in the result information item collection directory, only need to be established in the result information item collection directory.
  • the parent-child relationship between their mapping information item collection and the mapping information item collection of all parent collections is sufficient.
  • the root information item collection in the non-first priority source information item collection directory on the one hand, it has no parent set.
  • the location of its mapping information item set on the result information item collection directory affects the structure generated by the entire source information item collection directory mapping to the result information item collection directory, so it is generally adjusted in the implementation.
  • Figure 7 depicts the structure of a collection of movie information items from a website named Y
  • Figure 8 depicts the directory structure of a collection of news items from the X website. It is now assumed that the information item set catalogs described in Figures 1, 7, and 8 need to be aggregated into a unified result information item collection catalog.
  • Figure 1 and Figure 7 have a directory structure divided by author. In the same way, a unified author sub-directory structure is constructed to obtain the final result information project.
  • the collection directory the structure is roughly as shown in Figure 9. This structure removes the "news category”, “music style”, and “movie style” collection nodes in the source information project collection directory structure to make the overall structure more streamlined, facilitate information retrieval, and enable the "regional" subdirectory structure. At the same time, it adapts to the classification of news and movies and music.
  • the final result information item collection directory structure contains the structure of the entire first priority source information item collection directory, so the worst case is to design the final result information item collection directory structure as the first priority source.
  • the structure of the information item collection directory but in practice it is impossible to achieve. In practice, it is often the case that information directories from multiple websites are aggregated onto a personal computer terminal, and users on the personal computer have limited knowledge of directories from a particular website. For example, for the directory described in FIG.
  • the source information item collection directory can obtain the designed result information item collection directory structure through the aggregation process, so the first priority source information item collection directory is the result of the design information.
  • the item collection directory structure can obtain the final directory structure through the aggregation process. The smallest part, the first priority source information item designed according to such principle The directory structure of the directory is shown in Figure 10.
  • mapping rule is then determined, the predetermined mapping rule being a final result information item collection directory of the design in cooperation with the first priority source information item collection directory. Based on the final result information collection project catalog structure and the first priority source information project collection catalog, the following mapping rules can be designed:
  • the same name mapping rule The information item set on the source information item collection directory is mapped to the information item item list having the same name on the item collection directory;
  • the order of the source information item collection directory to be processed and the predetermined mapping rule are determined, and the aggregation operation can be performed by the method described in the previous embodiment to obtain the final result information item collection directory as shown in FIG. .
  • Fig. 12 shows an embodiment of the aggregation system of such information item collection directory. Referring to Fig. 12, the following is a detailed description of the aggregation system of the information item collection directory of the present embodiment.
  • the aggregation system of the information item collection directory of this embodiment includes the following modules connected in sequence: an input module 10, an initialization module 12, a priority order determination module 14, a directory aggregation module 16, and an output module 18 input module 10 for inputting a source information item collection directory, where criz is an integer greater than or equal to 1.
  • the initialization module 12 sets an initial result information item collection directory, and this initial value is actually an empty directory, that is, an information item set composed of 0 information item sets. table of Contents.
  • the priority order determination module 14 determines the order of processing between the "source information item collection catalogs."
  • the priority processing order of the source information item collection directory has an impact on the final result, and the final directory structure will be preferentially organized according to the structure of the source information item collection directory in the processing order.
  • the user has established a directory structure on his local computer, and uses this method to connect multiple information sources on the network.
  • the directory is aggregated to the local computer. If the local directory is used as the first priority processing source directory, the final aggregated directory structure will be preferentially organized according to the local directory structure. In this way, users can easily retrieve information items from multiple sources using their familiar directory structure.
  • the directory aggregation module 16 processes the "source information item collection directory" one by one according to the priority order determined by the priority order determining module 14, and aggregates the current source information item collection directory and the current result information item collection directory into a new result information item collection directory.
  • the directory aggregation module 16 stores the "source information item collection directory" in order to reflect the sequential processing, which is stored in the queue data structure. It will be readily understood by those skilled in the art that the data structure using the queue is only for the purpose of sequentially processing the source information item collection directory. Therefore, other data structures and algorithms that satisfy the sequential processing source information item collection directory are equivalent to the embodiment. .
  • the directory aggregation module 16 is further subdivided into a first aggregation unit 160 and a second aggregation unit 162.
  • the current source information item collection directory entering the directory aggregation module 16 can be processed by one of the two units. If the current source information item collection directory is an empty directory (that is, composed of 0 information item sets), the first aggregation unit 160 records the execution of the empty processing, that is, it is treated as the processed source information item set without any operation. table of Contents.
  • processing is performed by the second aggregation unit 162: starting from the root information item set, the source information item collection in the current source information item collection directory is one by one Mapping to the current result information item collection directory, wherein the root information item set is defined as an information item set without a parent set in an information item collection directory, and an information item collection directory composed of a set of information items greater than or equal to one There must be a unique set of root information items.
  • the second aggregating unit 162 it further includes: an information item set mapping unit 1620, an information item set creating unit 1622, a parent set specifying unit 1624, and a direct membership specifying unit 1626.
  • the connection relationship between these units is: Parent Set Designation Unit 1624 Connection Information Item Set Creation Unit 1622, Direct Membership Designation Unit 1626 connects the information item set mapping unit 1620 and the information item set creation unit 1622, respectively.
  • the internal processing of the second aggregating unit 162 is: entering the source information item set in the current source information item set directory of the second aggregating unit 162, which can be divided into two categories, one of which is a current result information item set according to a predetermined mapping rule.
  • the information item set found in the directory corresponds to the currently processed source information item set, and the other type is that the information item set corresponding to the current result information item set directory is not found according to the predetermined mapping rule.
  • the information item set mapping unit 1620 finds the current processed source information in the current result information item set directory according to the predetermined mapping rule.
  • the information item set corresponding to the item set is the set of mapping information items corresponding to the currently processed source information item set in the current result information item set directory.
  • the direct membership item in the source information item set is then designated by the direct membership item designation unit 1626 as a direct membership item of the mapping information item set in the result information item collection directory.
  • the information item is divided into a direct affiliate project and an indirect affiliate project, wherein the direct affiliate project refers to an information item directly assigned to the information item set, and the indirect affiliate project refers to the information item included in the subset of the information item set.
  • an information item may be a direct affiliate item of more than two information item sets.
  • the information item set creating unit 1622 creates a new information item set in the current result information item set directory as the map information item set corresponding to the source information item set.
  • the parent collection specifying unit 1624 is started.
  • the parent collection specifying unit 1624 specifies a parent collection of the new mapping information item set for the source information item set in which the parent collection exists in the source information item collection directory: the parent information collection of the source information item collection in the source information item collection directory is mapped to Result The set of mapping information items in the information item collection directory.
  • the direct membership item in the source information item set is then designated as a direct membership item of the mapping information item set in the result information item collection directory by the direct membership item specifying unit 1626.
  • the predetermined mapping rule described in this embodiment is the same as the predetermined mapping rule in the foregoing method embodiment. For details, refer to the description of the predetermined mapping rule in the foregoing method embodiment.
  • the processing sequence of the source information item set in the second aggregating unit 162 is to start the source information item set in the current source information item set directory in the order of the topological sorting algorithm of the directed acyclic graph starting from the root information item set. Map to the current result information item collection directory one by one.
  • a directed graph depth-first traversal algorithm can also be employed.
  • the topological sorting algorithm and the directed graph depth-first traversal algorithm of the directed acyclic graph in the present embodiment have been described in the foregoing method embodiments, and therefore will not be described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An aggregation method of information items set directory and a system thereof are disclosed. The aggregation method includes : setting the information items set directory of the initialized result, and setting it to be the information items set directory composed of 0 information item set; determining the priority processing order among the n source information items set directories; processing n source information items set directories one by one according to the priority processing order, and aggregating the present source information items set directory and the present result information items set directory to form a new result information items set directory, then, the present result information items set directory is the aggregated result information items set directory, after processing the n source information items set directories.

Description

信息项目集合目录的聚合方法和系统 技术领域  Aggregation method and system for information item collection catalogue
本发明涉及一种计算机信息分类和检索技术,尤其涉及一种关于信息项目集合 目录的聚合方法和系统。 背景技术  The present invention relates to a computer information classification and retrieval technique, and more particularly to an aggregation method and system for an information item collection directory. Background technique
本申请人在 2007年 7月 20日提交的、 于 2008年 4月 9日公开的申请号为 200710009235.7、公开号为 CN101158949的发明名称为《基于集合的文件项目归类 和检索的方法与系统》的专利申请。 在这个申请中, 本申请人提出了基于集合理论 的文件项目归类和检索, 解决了现有树形目录所存在的如下问题: (1 ) 多类别归 类冲突; (2) 模式组合引起的分支爆炸; (3 ) 分类和检索方式固定; (4) 不能 依据属性值对信息项目进行归类和检索。  The application name of the applicant filed on July 20, 2007, published on April 9, 2008, with the application number of 200710009235.7 and the publication number of CN101158949 is "Method and System for Classification and Retrieval of File Items Based on Sets" Patent application. In this application, the applicant proposed a file project classification and retrieval based on set theory, which solved the following problems in the existing tree directory: (1) multi-category collation conflict; (2) caused by pattern combination Branch explosion; (3) Classification and retrieval methods are fixed; (4) Information items cannot be classified and retrieved based on attribute values.
为了解决上述的技术问题, 本申请人引入了以下的几个概念: 文件项目, 是指 一个文件在所处的文件系统中唯一的标识; 文件项目集合, 是指由 n个不同文件项 目所组成的集合, n为大于或等于 0的整数; 属性, 记录文件项目的某方面的特征 信息, 可以人为赋予。 它们之间满足的条件包括: 一个文件项目在一个文件项目集 合中最多只能出现一次, 但一个文件项目可以归属于多个不同的文件项目集合;一 个文件项目可以被赋予 n个属性, 对于赋予的属性必须设置对应的属性值, n为大 于或等于 0的整数;一个文件项目集合可以被赋予 n个属性, 对于赋予的属性必须 设置对应的属性值范围, n为大于或等于 0的整数; 一个文件项目集合具有某种属 性及对应属性值范围, 则归属于该文件项目集合必须具有该属性, 且对应属性值在 文件项目集合所设置的该属性值范围内,但一个文件项目可以具有其归属的文件项 目集合所不具备的属性。这份申请提出的基于集合的文件项目的归类方法是通过直 接指定或是赋予属性并设置其对应属性值或者两者结合的方式建立文件项目与现 有文件项目集合之间的归属关系。这份申请提出的检索方法是指采用指定集合运算 模式或者指定属性值范围或者两者结合的方式从现有文件项目集合得到结果文件 项目集合。  In order to solve the above technical problems, the applicant has introduced the following concepts: A file item refers to a unique identifier of a file in the file system in which it is located; a file item set is composed of n different file items. The set, n is an integer greater than or equal to 0; attribute, record characteristic information of a certain aspect of the file item, can be artificially given. The conditions that are met between them include: A file item can appear at most once in a collection of file items, but a file item can be attributed to multiple different sets of file items; a file item can be assigned n attributes, The attribute must be set to the corresponding attribute value, n is an integer greater than or equal to 0; a file item set can be assigned n attributes, and the corresponding attribute value range must be set for the attribute given, n is an integer greater than or equal to 0; A file item collection has a certain attribute and a corresponding attribute value range, and the attribute belonging to the file item set must have the attribute, and the corresponding attribute value is within the attribute value range set by the file item set, but a file item may have the same An attribute that is not available in the collection of owned file items. The categorization method of the collection-based file item proposed by this application is to establish the attribution relationship between the file item and the existing file item set by directly specifying or assigning the attribute and setting its corresponding attribute value or a combination of the two. The search method proposed by this application refers to obtaining a result file item set from an existing file item set by using a specified set operation mode or a specified attribute value range or a combination of the two.
但是该技术方案主要是针对单个独立的信息源建立信息分类目录进行信息项 目的归类和检索,而在现实应用中存在通过统一的一个信息分类目录访问来自多个 信息来源的信息项目的需要。因为这种采用统一检索入口的方式可以有效地节约用 户需要访问多个信息源检索信息的时间成本, 例如 RSS就是一种典型的通过一个 统一的信息目录访问来自多个网络信息源的信息项目的技术方案。它的原理是将来 自不同网络信息源的信息项目聚合到用户本地计算机预先设置的树形目录结构中。 However, the technical solution mainly aims to classify and retrieve information items for a single independent information source to establish an information classification directory, and in reality applications, there is a need to access information items from multiple information sources through a unified information classification directory. Because this method of using a unified search portal can be effectively saved. The time cost for users to access multiple information sources to retrieve information. For example, RSS is a typical technical solution for accessing information items from multiple network information sources through a unified information directory. Its principle is to aggregate information items from different network information sources into a tree directory structure preset by the user's local computer.
RSS虽然实现了将多个来自不同网站的发布信息聚合到一台个人电脑终端的 功能, 但其目录结构依然采用传统的树形目录, 对比上述申请中的信息检索方案这 种信息检索方式显得效率低下。 发明内容  Although RSS implements the function of aggregating multiple publishing information from different websites into one personal computer terminal, its directory structure still uses the traditional tree directory. Compared with the information retrieval scheme in the above application, the information retrieval method is efficient. low. Summary of the invention
本发明的目的在于解决上述问题, 提供了一种信息项目集合目录的聚合方法, 可以节约用户信息检索的时间和精力。  The object of the present invention is to solve the above problems, and to provide an aggregation method of a collection of information items, which can save time and effort for user information retrieval.
本发明的另一目的在于提供了一种信息项目集合目录的聚合系统。  Another object of the present invention is to provide an aggregation system for a collection of information item collections.
本发明的技术方案为: 本发明提出了一种信息项目集合目录的聚合方法, 将《 个源信息项目集合目录聚合成一个结果信息项目集合目录, 《为大于或等于 1的整 数, 该聚合方法包括:  The technical solution of the present invention is: The present invention proposes an aggregation method of an information item collection directory, which aggregates the "source information item collection directory into a result information item collection directory, "is an integer greater than or equal to 1, the aggregation method Includes:
( 1 ) 设置初始结果信息项目集合目录, 将其设置为由 0个信息项目集合构成 的信息项目集合目录;  (1) setting an initial result information item collection directory, and setting it as an information item collection directory composed of 0 information item sets;
(2) 确定《个源信息项目集合目录之间的处理优先顺序;  (2) Determine the priority of processing between the collection lists of individual source information items;
(3 ) 按照该优先顺序逐个处理《个源信息项目集合目录, 将当前源信息项目 集合目录和当前结果信息项目集合目录聚合成新的结果信息项目集合目录;  (3) processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;
(4) 在处理完《个源信息项目集合目录后, 此时的当前结果信息项目集合目 录为聚合成的结果信息项目集合目录;  (4) After processing the "source information item collection directory", the current result information item collection directory at this time is the aggregated result information item collection directory;
其中信息项目集合是由^个信息项目组成的集合, 其中 ^为大于或等于 0的整 数,信息项目集合目录是由《2个信息项目集合构成,其中《2为大于或等于 0的整数 本发明还提出了一种信息项目集合目录的聚合系统, 包括: The information item set is a set consisting of ^ information items, wherein ^ is an integer greater than or equal to 0, and the information item set catalog is composed of " two sets of information items, wherein " 2 is an integer greater than or equal to 0. An aggregation system for a collection of information item collections is also proposed, including:
输入模块, 用于输入《个源信息项目集合目录, 其中《为大于或等于 1的整数 初始化模块, 连接输入模块, 设置初始结果信息项目集合目录, 将其设置为由 0个信息项目集合构成的信息项目集合目录;  An input module, configured to input a “source information item collection directory, where “initialization module for an integer greater than or equal to 1, connecting an input module, setting an initial result information item collection directory, and setting it to be composed of 0 information item sets Information item collection directory;
优先顺序确定模块, 连接初始化模块, 确定《个源信息项目集合目录之间的处 理优先顺序; 目录聚合模块, 连接优先顺序确定模块, 按照该优先顺序逐个处理《个源信息 项目集合目录, 将当前源信息项目集合目录和当前结果信息项目集合目录聚合成新 的结果信息项目集合目录; a priority order determining module, connecting the initialization module, and determining a processing priority order between the collection lists of the source information items; a directory aggregation module, a connection priority order determining module, processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;
输出模块, 连接目录聚合模块, 将处理完《个源信息项目集合目录时的当前结 果信息项目集合目录作为最终的结果信息项目集合目录输出;  The output module, which is connected to the directory aggregation module, outputs the current result information item collection directory when the "source information item collection directory" is processed as the final result information item collection directory;
其中信息项目集合是由^个信息项目组成的集合, 其中 ^为大于或等于 0的整 数,信息项目集合目录是由《2个信息项目集合构成,其中《2为大于或等于 0的整数 本发明对比现有技术有如下的有益效果:本发明旨在解决背景技术部分所述的 申请中主对单一信息源进行检索和类似 RSS的多信息源聚合方案采用传统树形目 录带来的检索效率低下的问题,通过将多个由单一信息源建立起的信息项目集合目 录聚合成一个统一的信息项目集合目录来实现一个统一的信息分类目录,以实现检 索来自多个信息源的信息项目的功能,从而达到节约用户信息检索时间和精力的目 的。 附图概述 The information item set is a set consisting of ^ information items, wherein ^ is an integer greater than or equal to 0, and the information item set catalog is composed of " two sets of information items, wherein " 2 is an integer greater than or equal to 0. Compared with the prior art, the following beneficial effects are obtained: the present invention aims to solve the problem that the search for the single information source and the RSS-like multi-information source aggregation scheme in the application described in the background section are inefficiently searched by the traditional tree directory. The problem is to realize a unified information classification directory by aggregating a plurality of information item collection directories established by a single information source into a unified information item collection directory, thereby realizing the function of retrieving information items from multiple information sources. Thereby achieving the purpose of saving user information retrieval time and energy. BRIEF abstract
图 1是一个用于音乐信息项目分类的信息项目集合目录结构。  Figure 1 is an information item collection directory structure for music information item classification.
图 2是本发明的信息项目集合目录的聚合方法的实施例的流程图。  2 is a flow chart showing an embodiment of an aggregation method of an information item collection directory of the present invention.
图 3是采用 DAG拓扑排序算法的处理当前源信息项目集合目录的流程图。 图 4是为说明当前源信息项目集合目录处理流程的样本源信息项目集合目录 结构的 DAG描述。  Figure 3 is a flow chart of processing the current source information item collection directory using the DAG topology sorting algorithm. Figure 4 is a DAG description of a sample source information item collection directory structure for explaining the current source information item collection directory processing flow.
图 5是为说明当前源信息项目集合目录处理流程的当前结果信息项目集合目 录的 DAG描述。  Figure 5 is a DAG description of the current result information item collection directory for the current source information item collection directory processing flow.
图 6是将图 4描述的目录作为当前源信息项目集合目录、图 5描述的目录作为 当前结果信息项目集合目录,聚合得到的结果信息项目集合目录的 DAG描述。  Fig. 6 is a DAG description of the directory of the result information item set obtained by aggregating the directory described in Fig. 4 as the current source information item set directory, the directory described in Fig. 5 as the current result information item set directory.
图 7是一个来自 Y网站的电影信息项目分类的信息项目集合目录结构。  Figure 7 is a directory structure of an information item collection of a movie information item classification from the Y website.
图 8是一个来自 X网站的新闻信息项目分类的信息项目集合目录结构。  Figure 8 is an information item collection directory structure of a news information item classification from the X website.
图 9是对将图 1、 图 7、 图 8所示的信息项目集合目录聚合成的结果信息项目 集合目录的结构规划。  Fig. 9 is a structural plan for a result information item collection directory in which the information item set directories shown in Figs. 1, 7, and 8 are aggregated.
图 10是聚合图 1、 图 7、 图 8所示的信息项目集合目录所采用的第一优先源信 息项目集合目录结构。 图 11是将图 1、 图 7、 图 8所示的信息项目集合目录及图 10所示的第一优先 源信息项目集合目录聚合成的结果信息项目集合目录结构。 FIG. 10 is a first priority source information item set directory structure used to aggregate the information item set list shown in FIG. 1, FIG. 7, and FIG. FIG. 11 is a result information item set directory structure in which the information item set list shown in FIG. 1, FIG. 7, and FIG. 8 and the first priority source information item set list shown in FIG. 10 are aggregated.
图 12是本发明的信息项目集合目录的聚合系统的实施例的原理图。  Figure 12 is a schematic diagram of an embodiment of an aggregation system of an information item collection directory of the present invention.
图 13是图 12实施例的系统中的目录聚合模块的原理图。 本发明的最佳实施方案  Figure 13 is a schematic diagram of a directory aggregation module in the system of the embodiment of Figure 12. BEST MODE FOR CARRYING OUT THE INVENTION
下面结合附图和实施例对本发明作进一步的描述。  The invention will now be further described with reference to the drawings and embodiments.
在描述本发明的实施例之前,首先对本发明中所涉及到的一些名词术语作一下 定义和解释。本发明中的信息项目定义为可以作为一个逻辑整体在计算机系统上处 理和展示给用户的信息结构体。 文件系统中的文件是信息项目最典型的实例, 但并 不只有文件才是信息项目。例如关系数据库中的一条记录在物理上是作为数据库文 件的一部分存储, 但其逻辑上可以作为一个整体进行处理和向用户展示, 因此可以 视为信息项目的一种类型。 再比如 Outlook此类邮件收发软件中的一封电子邮件也 是作为一部分在邮件箱文件中存储, 但逻辑上照样可以作为一个整体进行处理和展 示, 因此也可以视为信息项目的一种类型。  Before describing the embodiments of the present invention, some terms and terms involved in the present invention are first defined and explained. The information item in the present invention is defined as an information structure that can be processed and presented to a user on a computer system as a logical whole. A file in a file system is the most typical example of an information item, but not only a file is an information item. For example, a record in a relational database is physically stored as part of a database file, but it can be logically processed and presented to the user as a whole, and thus can be considered a type of information item. Another example is that an e-mail in a mail delivery software such as Outlook is stored as part of a mail box file, but logically can be processed and displayed as a whole, and thus can be considered as a type of information item.
在实际处理中, 信息项目往往用一个唯一的标识来代表, 例如操作系统中的一 个文件采用唯一的文件路径来代表, 互联网中的一张网页可以用唯一的 URL来代表 。 如果需要处理多种信息项目, 简单的办法是在每个信息项目标识上加上区别类型 的前缀, 例如浏览器在装载本地文件时, URL栏位上显示的 URL为操作系统的文件 路径前增加了 " f i l e : // "作为前缀。 因此, 如何针对一个信息源建立起一个信息 项目集合目录以及如何在一个信息项目集合目录上进行信息项目检索可以参照背 景技术中所提到的申请。  In actual processing, information items are often represented by a unique identifier. For example, a file in the operating system is represented by a unique file path, and a web page in the Internet can be represented by a unique URL. If you need to process multiple information items, the simple way is to add a different type of prefix to each information item identifier. For example, when the browser loads the local file, the URL displayed on the URL field is increased before the file path of the operating system. "file : // " is used as a prefix. Therefore, how to create an information item collection catalog for an information source and how to search for information items in a collection of information items can refer to the application mentioned in the background technology.
信息项目集合是由 ^个信息项目组成的数学意义上的集合, ^为大于或等于 0 的整数。 在实际应用中, 一个信息项目集合可以包含多种信息项目, 只要能按照类 似上段所述方法进行有效区分就可以了。  An information item set is a set of mathematical meanings consisting of ^ information items, and ^ is an integer greater than or equal to 0. In practical applications, a collection of information items can contain a variety of information items, as long as it can be effectively distinguished according to the method described in the previous paragraph.
信息项目集合目录则是由《2个信息项目集合通过父子关系构成的信息结构体, «2为大于或等于 0的整数。 父集合的概念限定为: 如果指定信息项目集合 A是信息 项目集合 B的父集合, 则 B包含的所有信息项目也被 A包含, 反过来可以称 B是 A 的子集合。 当《2为0时, 信息项目集合目录不包含任何信息项目集合, 所以也称为 空目录。 当《2大于或等于 1时, 该目录中必存在一个唯一的根信息项目集合(也可 简称为根集合) , 该集合没有父集合, 对于除此之外的该目录中的其他任意一个信 息项目集合必存在 k个属于该目录的父集合, k为大于或等于 1的整数。 根据信息项目集合目录中的父子关系的定义,一个信息项目集合所包含的项目 可以分为两类: 一类项目是直接指定归属到该集合的项目, 称为直接隶属项目; 另 一类是子集合包含的项目, 称为间接隶属项目。 Collection Contents item information is set by the "two items of information by the information structure composed of parent-child relationships,« 2 is an integer greater than or equal to 0. The concept of the parent set is limited to: If the specified information item set A is the parent set of the information item set B, then all the information items contained in B are also included by A, which in turn may be said to be a subset of A. When " 2 is 0, the information item collection directory does not contain any information item collection, so it is also called an empty directory. When 2 is greater than or equal to 1, there must be a unique set of root information items (also referred to as the root set) in the directory. The set has no parent set, and any other letter in the directory other than this. The set of interest items must have k parent collections belonging to the directory, and k is an integer greater than or equal to 1. According to the definition of the parent-child relationship in the information item collection directory, the items included in an information item collection can be divided into two categories: one type of project is a project directly assigned to the collection, which is called a direct membership project; the other is a sub-item The items contained in the collection are called indirect dependent items.
图 1示出了一个用于音乐文件分类的信息项目集合目录的实例。 "音乐"是该 目录的根信息项目集合, "作者" 、 "地区" 、 "年份 "是它的子项目集合, 依此 类推。 "West Life "是一个男性团体组合, 所以在该目录中其对应的 " West Life Figure 1 shows an example of a collection of information items for music file classification. "Music" is the collection of root information items for this directory, "author", "region", "year" is its collection of subprojects, and so on. "West Life" is a male group combination, so in the directory its corresponding "West Life"
"信息项目集合有两个父集合 "男性作者"和 "团体组合" , 同样的道理也适用于 "Atomic Kitten"信息项目集合。 "You raise me up. mp3,,是 West Life在 2005 年演唱的一首流行歌曲的 mp3文件, 按照其附属信息指定其为 " West Life "、 "欧 美" 、 " 2005年" 、 "流行 " 的直接隶属项目。 由于 "男性作者"和 "团体组合" 是 "West Life " 的父集合, 贝 U "You raise me up. mp3 "是前两者的间接隶属项目, 依次类推它也是 "作者"和 "音乐" 的间接隶属项目。 同样原理也适用于 " If you come to me. mp3,,。 "The information project collection has two parent collections, "Male Author" and "Group Combination". The same principle applies to the "Atomic Kitten" information project collection. "You raise me up. mp3, is the life that West Life sang in 2005. An mp3 file of a popular song is designated as a direct affiliate of "West Life", "European", "2005", and "Popular" according to its affiliate information. Since "male author" and "group combination" are the parent collection of "West Life", Bayu "You raise me up. mp3" is an indirect membership of the first two, and so on. It is also "author" and "music". Indirect affiliate program. The same principle applies to "If you come to me. mp3,,.
在介绍完上述的名词和概念之后, 以下具体说明本发明的多个实施例。 信息项目集合目录的聚合方法的实施例  Having described the above nouns and concepts, various embodiments of the invention are specifically described below. Embodiment of aggregation method of information item collection directory
图 2示出了本实施例的信息项目集合目录的聚合方法, 用于将《个源信息项目 集合目录聚合成一个结果信息项目集合目录, 其中《为大于或等于 1的整数。 请参 见图 2, 下面是对本实施例的聚合方法的各步骤的详细描述。  FIG. 2 shows an aggregation method of the information item collection directory of the embodiment, which is used for aggregating the "source information item collection directory" into a result information item collection directory, where "is an integer greater than or equal to 1. Referring to Fig. 2, the following is a detailed description of the steps of the polymerization method of the present embodiment.
步骤 S10: 设置初始结果信息项目集合, 将其设置为空目录, 即由 0个信息项 目集合构成的信息项目集合目录。  Step S10: Set an initial result information item set, and set it to an empty directory, that is, an information item collection directory composed of 0 information item sets.
步骤 S12 : 设置这《个源信息项目集合目录的优先顺序, 并用队列的数据结构 来存储这些源信息项目集合目录。  Step S12: Set the priority order of the "source information item collection directory", and use the data structure of the queue to store the source information item collection directory.
源信息项目集合目录的优先处理顺序对最终的结果是有影响的,最终的目录结 构将优先按照处理顺序靠前的源信息项目集合目录的结构组织。 对于如下应用场 景, 用户已经在他的本地计算机上建立了一个目录结构, 采用本方法将网络上的多 个信息源的目录聚合到本地计算机上。 如果采用本地目录作为第一优先处理的源目 录, 则最终聚合成的目录结构将按照本地目录结构优先组织。 这样的话, 用户就可 以方便地采用自己熟悉的目录结构来检索多个来源的信息项目。  The priority processing order of the source information item collection directory has an impact on the final result, and the final directory structure will be preferentially organized according to the structure of the source information item collection directory in the processing order. For the following application scenarios, the user has established a directory structure on his local computer, which uses this method to aggregate the directories of multiple information sources on the network to the local computer. If a local directory is used as the source directory for the first priority processing, the resulting directory structure will be prioritized according to the local directory structure. In this way, users can easily retrieve information items from multiple sources using their familiar directory structure.
本领域普通技术人员容易理解,采用队列这种数据结构只是为了达到顺序处理 源信息项目集合目录的目的, 因此采用其他的满足顺序处理源信息项目集合目录的 数据结构及其算法和本实施例等同。 It will be readily understood by those skilled in the art that the data structure using queues is only for sequential processing. The purpose of the source information item collection directory is therefore equivalent to the other embodiment of the data structure and its algorithm for processing the source information item collection directory.
步骤 S14: 判断处理队列是否为空, 如果队列非空则进入步骤 S16, 如果队列 为空则流程结束。  Step S14: It is judged whether the processing queue is empty. If the queue is not empty, the process proceeds to step S16, and if the queue is empty, the process ends.
步骤 S16: 取队首元素作为当前处理的源信息项目集合目录, 并做出队操作。 出队操作是指将队首元素从队列中删除, 并将其后继元素作为新的队首元素。 步骤 S18: 将当前源信息项目集合目录和当前结果信息项目集合目录聚合成新 的结果信息项目集合目录, 这个新的结果信息项目集合目录作为下一循环中的当前 结果信息项目集合目录。 然后返回步骤 S14。  Step S16: Take the first element of the team as the source information item collection directory of the current processing, and make a team operation. The dequeue operation means that the first element of the team is removed from the queue and its successor element is taken as the new leader element. Step S18: The current source information item collection directory and the current result information item collection directory are aggregated into a new result information item collection directory, and the new result information item collection directory is used as the current result information item collection directory in the next loop. Then it returns to step S14.
当处理顺序排在第一位的源信息项目集合目录(又称为第一优先源信息项目集 合目录)被处理时, 当前结果信息项目集合目录是一个空目录, 所以此时得到的结 果信息项目集合目录是第一优先源信息项目集合目录的拷贝, 后面被处理的源信息 项目集合目录则是通过对第一优先源信息项目集合目录的结构的补充, 最后得到完 整的结果信息项目集合目录。所以在具体实施中第一优先源信息项目集合目录一般 作为最终的结果信息项目集合目录的主框架的蓝图, 可以利用这种特殊性配合后面 描述到的预定映射规则调控最终的结果信息项目集合目录的结构。  When the source information item collection directory (also referred to as the first priority source information item collection directory) whose processing order is ranked first is processed, the current result information item collection directory is an empty directory, so the result information item obtained at this time is obtained. The collection directory is a copy of the first priority source information item collection directory, and the source information item collection directory to be processed later is supplemented by the structure of the first priority source information item collection directory, and finally the complete result information item collection directory is obtained. Therefore, in the specific implementation, the first priority source information item collection directory is generally used as the blueprint of the main frame of the final result information item collection directory, and the special result can be used to adjust the final result information item collection directory according to the predetermined mapping rules described later. Structure.
以下是对步骤 S18的聚合过程的详细描述。在聚合处理过程中, 对于由 0个信 息项目集合构成的当前源信息项目集合目录的处理为空处理, 也就是不进行任何操 作就将其作为已处理过的源信息项目集合目录。 对于由大于或等于 1个的信息项目 集合构成的当前源信息项目集合目录的处理为: 从根信息项目集合开始, 将当前源 信息项目集合目录中的源信息项目集合逐个映射到当前结果信息项目集合目录中, 其中根信息项目集合定义为一个信息项目集合目录中的没有父集合的信息项目集 合。 对于一个由大于或等于 1个的信息项目集合构成的信息项目集合目录, 必存在 唯一的一个根信息项目集合。  The following is a detailed description of the aggregation process of step S18. In the aggregation process, the processing of the current source information item set directory composed of 0 information item sets is empty, that is, it is treated as the processed source information item set directory without any operation. The processing of the current source information item collection directory composed of the information item set greater than or equal to one is: starting from the root information item set, mapping the source information item set in the current source information item collection directory one by one to the current result information item In the collection directory, where the root information item collection is defined as a collection of information items without a parent collection in a collection of information items. For a collection of information items consisting of a collection of information items greater than or equal to one, there must be a unique set of root information items.
这里所说的映射的概念即为数学上的映射的概念扩展, 定义为: fi '. Si → D, 其中 &为源信息项目集合目录, 为结果信息项目集合目录。 对于 &中的任意一个 信息项目集合在结果信息项目集合目录上均有一个信息项目集合与之对应。 建立映 射关系的方法是按照预定映射规则在结果信息项目集合目录上找到一个信息项目 集合与当前处理的源信息项目集合对应, 作为其在结果信息项目集合目录上的映射 项目集合。 如果按照预定映射规则在当前结果信息项目集合目录中没有与之对应的 信息项目集合, 则在结果信息项目集合目录中创建一个新的信息项目集合作为对应 的映射信息项目集合, 使映射定义满足。 为了达到信息聚合的目的, 特规定源信息 项目集合中的直接隶属项目成为其映射信息项目集合的直接隶属项目。 The concept of mapping mentioned here is a conceptual extension of the mathematical mapping, defined as: fi '. Si → D, where & is the source information item collection directory, which is the result information item collection directory. For any of the information item sets in &, there is a set of information items corresponding to the result information item collection directory. The method of establishing a mapping relationship is to find an information item set corresponding to the currently processed source information item set on the result information item collection directory according to a predetermined mapping rule as its mapping item set on the result information item collection directory. If there is no corresponding information item set in the current result information item collection directory according to the predetermined mapping rule, a new information item set is created in the result information item collection directory as a corresponding The mapping information item collection, so that the mapping definition is satisfied. In order to achieve the purpose of information aggregation, the direct affiliated items in the source information item set are specified as direct affiliated items of the mapping information item set.
在实际应用中映射关系的建立可以按照一些预定的映射规则来进行。例如在实 际应用中为每个信息项目集合都取了代表其内涵的名称, 比如: 包含视频信息项目 的信息项目集合取名为 "视频" , 包含音频信息项目的信息项目集合取名为 "音频 "等等, 则可以预定义同名映射规则, 即在结果信息项目集合目录中找到与源信息 项目集合相同名称的信息项目集合作为对应的映射信息项目集合。 这样做的作用是 能将多个源信息项目集合目录中的包含类似信息项目的信息项目集合汇聚到结果 信息项目集合目录中的一个信息项目集合中, 在结果信息项目集合目录中便可以将 这些来自不同的信息源的同类的信息项目作为一个整体进行检索。所以在本实施例 中的预定映射规则一般按照将源信息项目集合映射到结果信息项目集合目录上的 相似内涵的信息项目集合原则来订立。  In actual applications, the mapping relationship can be established according to some predetermined mapping rules. For example, in the actual application, each information item set is taken with a name representing its connotation, for example: The information item set containing the video information item is named "video", and the information item set containing the audio information item is named "audio". "Wait, the same name mapping rule can be predefined, that is, the information item set having the same name as the source information item set is found in the result information item collection directory as the corresponding mapping information item set. The purpose of this is to aggregate a collection of information items containing similar information items in a plurality of source information item collection directories into one information item set in the result information item collection directory, and these can be included in the result information item collection directory. Similar information items from different sources are retrieved as a whole. Therefore, the predetermined mapping rule in this embodiment is generally established in accordance with the principle of information item set that maps the source information item set to the similar content on the result information item set directory.
如果一个源信息项目集合按照预定映射规则在当前结果信息项目集合目录上 找不到与之对应的映射信息项目集合, 则在当前结果信息项目集合目录中创建一个 新的信息项目集合作为对应的映射信息项目集合。 新的映射信息项目集合的父集合 指定为源信息项目集合在源信息项目集合目录中的父集合映射到结果信息项目集 合目录上的对应的映射信息项目集合。 在实际应用中, 如果一个源信息项目集合按 照预定映射规则在当前结果信息项目集合目录上找不到与之对应的映射信息项目 集合, 则表明该源信息项目集合的内涵不同于当前结果信息项目集合目录上的任意 一个信息项目集合, 所以在结果信息项目集合目录中创建新的信息项目集合与之对 应。 由于新的信息项目集合在结果信息项目集合目录中不存在已有的父子关系, 按 照上述原则指定其父集合的作用是将源信息项目集合目录中的对应父子关系结构 移植到结果信息项目集合目录中去。 结合按照一定优先顺序处理源信息项目集合目 录达到的总体效果就是采用优先顺序较后的源信息项目集合目录的结构去填补优 先顺序较前的源信息项目集合目录的结构的空白, 最后得到的结果信息项目集合目 录的主结构按照优先顺序靠前的源信息项目集合目录的结构来组织, 但一些优先顺 序靠前的源信息项目集合目录的结构没有包含的分支结构则按照优先顺序靠后的 源信息项目集合目录的结构来组织。  If a source information item set cannot find a corresponding mapping information item set on the current result information item collection directory according to a predetermined mapping rule, a new information item set is created in the current result information item collection directory as a corresponding mapping. Information item collection. The parent collection of the new mapping information item collection is specified as the source information item collection. The parent collection in the source information item collection directory maps to the corresponding mapping information item collection on the result information item collection directory. In an actual application, if a source information item set cannot find a corresponding mapping information item set on the current result information item collection directory according to a predetermined mapping rule, it indicates that the source information item set has a different connotation than the current result information item. Any collection of information items on the collection directory, so a new set of information items is created in the result information item collection directory. Since the new information item set does not have an existing parent-child relationship in the result information item collection directory, the function of specifying the parent collection according to the above principle is to transplant the corresponding parent-child relationship structure in the source information item collection directory to the result information item collection directory. Go in. The overall effect achieved by processing the source information item collection directory in accordance with a certain priority order is to use the structure of the source information item collection directory with the lower priority order to fill the blank of the structure of the source information item collection directory with the higher priority, and finally obtain the result. The main structure of the information item collection directory is organized according to the structure of the prioritized source information item collection directory, but some of the prioritized source information item collection directory structure does not contain the branch structure, and the source is followed by the priority order. The structure of the information item collection directory is organized.
由上述可知, 步骤 S18的具体操作开始于根信息项目集合, 但可以按照多种顺 序来进行, 只要满足上述新的映射信息项目集合的父集合指定原则即可。一种较好 的实现方式是采用有向无环图 (DAG, Directed Acycl ic Graph) 的拓扑排序算法 的顺序来进行。采用这种算法的优点在于这种顺序能确保源信息项目集合目录上的 父集合对应的映射信息项目集合已经存在于当前结果信息项目集合目录中, 那么只 需要简单地建立相应的父子关系即可。 而按照其他一些顺序例如有向图深度优先遍 历算法的顺序, 创建新的映射信息项目集合的时候只能建立一部分父子关系, 因为 源信息项目集合的部分父集合对应的映射信息项目集合在当前结果信息项目集合 目录中还不存在, 只有等到映射这些父集合时再将对应的父子关系补上。 As can be seen from the above, the specific operation of step S18 starts from the root information item set, but can be performed in various orders as long as the parent set designation principle of the new mapping information item set is satisfied. A better implementation method is a topological sorting algorithm using Directed Acyclic Graph (DAG). The order is to proceed. The advantage of adopting this algorithm is that the order can ensure that the mapping information item set corresponding to the parent set on the source information item collection directory already exists in the current result information item collection directory, so that only the corresponding parent-child relationship can be simply established. . In other orders, such as the order of the depth-first traversal algorithm of the directed graph, only a part of the parent-child relationship can be established when creating a new set of mapping information items, because the mapping information item set corresponding to the partial parent set of the source information item set is in the current result. The information item collection directory does not exist yet. Only when the parent collection is mapped, the corresponding parent-child relationship is added.
根据信息项目集合中的父子关系限制为在一个合法的信息项目集合目录中不 能出现循环结构, 诸如 A是 B的父集合, B是 C的父集合, C又是 A的父集合的结 构。 所以信息项目集合目录可以采用 DAG来描述, 图 4为图 1的目录转化来的 DAG 采用 DAG拓扑排序算法完成当前源信息项目集合目录和当前结果信息项目集 合目录聚合的流程请参见图 3, 具体步骤如下。  According to the parent-child relationship in the information item collection, the loop structure cannot appear in a legitimate information item collection directory, such as A is the parent collection of B, B is the parent collection of C, and C is the structure of the parent collection of A. Therefore, the information item collection directory can be described by DAG. Figure 4 is the DAG of the directory conversion of Figure 1. The DAG topology sorting algorithm is used to complete the current source information item collection directory and the current result information item collection directory aggregation process. See Figure 3, specifically Proceed as follows.
步骤 S20:将当前源信息项目集合目录的根信息项目集合加入待处理集合列表, 初始化映射关系表为空表。  Step S20: Add the root information item set of the current source information item collection directory to the to-be-processed collection list, and initialize the mapping relationship table to be an empty table.
步骤 S22 : 判断待处理集合列表是否为空, 如果为空则结束处理流程, 如果非 空则进入步骤 S24。  Step S22: Determine whether the to-be-processed collection list is empty. If it is empty, the processing flow is ended. If it is not empty, the process proceeds to step S24.
步骤 S24: 在待处理集合列表中选取任意一个入度为 0的集合作为当前处理集 合。  Step S24: Select any one of the to-be-processed collections to be the current processing set.
步骤 S26: 将当前处理集合映射至结果信息项目集合目录, 并将映射关系记录 到映射关系表中。  Step S26: Mapping the current processing set to the result information item collection directory, and recording the mapping relationship into the mapping relationship table.
步骤 S28 : 将当前处理集合的子集合加入到待处理集合列表中 (如已在待处理 集合列表中则不重复加入) , 并将列表中所有当前处理集合的子集合的入度减 1, 回到步骤 S22。  Step S28: Add the sub-set of the current processing set to the to-be-processed collection list (if not already added in the pending collection list), and decrement the in-degree of the subset of all currently processed collections in the list by one, Go to step S22.
DAG拓扑排序算法中有两个附加的数据结构。 一个是待处理集合列表, 由等待 处理的信息项目集合以及其对应的入度构成。初始入度为一信息项目集合的父集合 数目, 每当它的一个父集合做完映射处理则将它的入度减 1, 当它的入度为 0时则 表明其所有父集合已经做完映射处理。 另一个附加数据结构是映射关系表, 记录每 个源信息项目集合在结果信息项目集合目录上对应的映射信息项目集合, 作为处理 过程的结果信息。  There are two additional data structures in the DAG topology sorting algorithm. One is a list of pending collections, consisting of a collection of information items waiting to be processed and their corresponding indegrees. The initial degree is the number of parent collections of a collection of information items. Whenever one of its parent collections completes the mapping process, it reduces its indegree by 1. When its degree of entry is 0, it indicates that all its parent collections have been completed. Mapping processing. Another additional data structure is a mapping relationship table, which records a set of mapping information items corresponding to each source information item set in the result information item collection directory as result information of the processing procedure.
在上述的步骤 S20中将根信息项目集合加入到待处理集合列表中,这是因为根 信息项目集合是源信息项目集合目录中唯一没有父集合的信息项目集合, 则其初始 入度为 o, 并且从根信息项目集合开始可以依次访问到源信息项目集合目录中所有 的信息项目集合。 In the above step S20, the root information item set is added to the to-be-processed collection list, because the root information item set is the only information item set in the source information item collection directory that has no parent set, and the initial The degree of entry is o, and all sets of information items in the source information item collection directory can be accessed in turn from the root information item set.
上述是通过图 3所示的 DAG拓扑排序算法完成图 2所示的步骤 S18。如前所述, 除了 DAG拓扑排序算法, 还可以采用有向图深度优先遍历算法来完成图 2所示的步 骤 S18。  The above is the step S18 shown in Fig. 2 by the DAG topology sorting algorithm shown in Fig. 3. As described above, in addition to the DAG topological sorting algorithm, the directed graph depth-first traversal algorithm can be used to complete step S18 shown in FIG.
采用有向图深度优先遍历算法的流程可以采用以下的递归函数来描述: void MapSet (NODE— TYPE* pSetNode, NODE— TYPE* pParent, NODE— TYPE* pDest, MAPLIST— TYPE* pMapList) do— map (pSetNode, pParent, pDest, pMapList);  The process of using the directed graph depth-first traversal algorithm can be described by the following recursive function: void MapSet (NODE_ TYPE* pSetNode, NODE_ TYPE* pParent, NODE_ TYPE* pDest, MAPLIST_ TYPE* pMapList) do-map ( pSetNode, pParent, pDest, pMapList);
NODE— TYPE** pChilds = get— childs (pSetNode) ;  NODE— TYPE** pChilds = get— childs (pSetNode) ;
for (int i=0; i<count_of (pChilds); ++i)  For (int i=0; i<count_of (pChilds); ++i)
MapSet (pChilds [i] , pSetNode, pDest, pMapList) MapSet (pChilds [i] , pSetNode, pDest, pMapList)
void Process () Void Process ()
MapSet (pSrcRoot, NULL, pDest, pMapList); MapSet (pSrcRoot, NULL, pDest, pMapList);
MapSet函数的功能定义为将以 pSetNode为起始节点的目录中所有信息项目集 合目录节点按照深度优先顺序映射到结果信息项目集合目录 pDest中, 映射对应关 系记录到 pMapList 中, pParent 代表 pSetNode 的父集合节点。 在上层处理过程 Process 只需要将源信息项目集合目录的根信息项目集合节点 pSrcRoot 作为起始 节点调用 MapSet,通过递归即可完成整个源信息项目集合目录中所有信息项目集合 的映射处理。 do— map函数的功能是将单个信息项目集合映射到结果信息项目集合目 录上并记录映射关系。 get— chi lds函数功能是获得该信息项目集合节点所有子集合 节点的列表。 The function of the MapSet function is defined as mapping all the information item collection directory nodes in the directory starting from pSetNode to the result information item collection directory pDest in depth priority order, mapping correspondence records to pMapList, pParent representing the parent set of pSetNode node. In the upper layer processing, Process only needs to call MapSet with the root information item collection node pSrcRoot of the source information item collection directory as the starting node, and all the information item collections in the entire source information item collection directory can be completed by recursion. Mapping processing. The function of the do-map function is to map a single set of information items to the result information item collection directory and record the mapping relationship. The get-chi lds function is a list of all the child nodes of the collection node of the information item.
下面假设将作为当前源信息项目集合目录的图 4所示的目录以及采用 DAG描述 的作为当前结果信息项目集合目录的图 5所示的目录聚合, 在聚合中的映射处理采 用同名映射规则。  The following assumes that the directory shown in Fig. 4 which is the current source information item collection directory and the directory aggregation shown in Fig. 5 which is the current result information item collection directory described by the DAG are used, and the mapping processing in the aggregation uses the same name mapping rule.
如果采用图 3所示的 DAG拓扑排序算法,则源信息项目集合目录中的信息项目 集合处理的某种实际顺序如下:  If the DAG topology sorting algorithm shown in Figure 3 is used, then some actual order of processing the information item set in the source information item collection directory is as follows:
"音乐"、 "音乐风格"、 "作者"、 "地区"、 "年份"……、 "男性作者 "、 "女性作者"、 "团体组合"、 ……、 "West Life " , ……、 "Atomic Kitten "Music", "Music Style", "Author", "Region", "Yature"..., "Male Author", "Female Author", "Group Combination", ..., "West Life", ..., " Atomic Kitten
,, 、 等0 ,, , etc. 0
其中 "West Life "和 "Atomic Kitten" 的处理顺序均在它们的父集合节点之 后, 那么在结果信息项目集合目录中创建它们对应的映射信息项目集合之后, 只需 要在结果信息项目集合目录中建立它们的映射信息项目集合和所有父集合的映射 信息项目集合之间的父子关系即可。  The processing order of "West Life" and "Atomic Kitten" is after their parent collection node, then after creating their corresponding mapping information item collection in the result information item collection directory, only need to be established in the result information item collection directory. The parent-child relationship between their mapping information item collection and the mapping information item collection of all parent collections is sufficient.
对应于上述的 DAG拓扑排序算法,采用有向图深度优先遍历算法处理的某种实 际顺序如下:  Corresponding to the DAG topological sorting algorithm described above, some practical order processed by the directed graph depth-first traversal algorithm is as follows:
"音乐"、 "音乐风格"、 "摇滚"、 "古典" 、 ……、 "作者" 、 "男性作 者" 、 "West Life " 、 "女性作者" 、 "Atom Kitten" 、 "团体组合" 、 ……等  "Music", "Music Style", "Rock", "Classical", ..., "Author", "Male Author", "West Life", "Female Author", "Atom Kitten", "Group Combination", ... …Wait
"West Life "先于 "团体组合"被处理, 当时结果信息项目集合目录中还没 有 "团体组合"对应的映射信息项目集合, 所以只能建立起 "West Life " 的映射 信息项目集合和 "男性作者" 的映射信息项目集合之间的父子关系, 待映射完 "团 体组合"再次递归到 "West Life "子节点, 如果发现 "West Life " 的映射信息项 目集合是创建出来的并且已存在于结果信息项目集合目录中(可以从映射关系表中 査到) , 则在结果信息项目集合目录上补充建立 "West Life " 的映射信息项目集 合与 "团体组合" 的映射信息项目集合之间的父子关系。 "West Life" was processed before the "group combination". At that time, there was no mapping information item set corresponding to "group combination" in the result information item collection directory, so only the "West Life" mapping information item set and "male" could be established. The author's mapping of the parent-child relationship between the collection of information items, the "combination of the group" is recursed to the "West Life" sub-node again, if the mapping information item set of "West Life" is found to be created and already exists in the result In the information item collection directory (can be found from the mapping relationship table), the parent-child relationship between the mapping information item set of "West Life" and the mapping information item set of "community combination" is added to the result information item collection directory. .
图 6是上述两种算法顺序得到相同的结果信息项目集合目录的 DAG描述, 可以 看出无论采用什么处理顺序, 只要满足在结果信息项目集合目录中对于新建的映射 信息项目集合与其父集合的指定原则, 得到的结果信息项目集合目录是相同的。  6 is a DAG description of the same result information item collection directory in which the above two algorithms are sequentially obtained, and it can be seen that no matter what processing order is adopted, only the designation of the newly created mapping information item set and its parent set in the result information item collection directory is satisfied. In principle, the resulting information item collection directory is the same.
对于非第一优先源信息项目集合目录中的根信息项目集合, 一方面它没有父集 合, 另一方面它的映射信息项目集合在结果信息项目集合目录上的位置会影响到整 个源信息项目集合目录映射到结果信息项目集合目录上所产生的结构, 所以在实施 中一般要通过调整第一优先源信息项目集合目录的结构和预定映射规则来确保非 第一优先源信息项目集合目录的根信息项目集合的映射信息项目集合出现在最终 的结果信息项目目录上的某个确切的和其内涵一致的位置。 信息项目集合目录的聚合方法的一个具体示例 For the root information item collection in the non-first priority source information item collection directory, on the one hand, it has no parent set. On the other hand, the location of its mapping information item set on the result information item collection directory affects the structure generated by the entire source information item collection directory mapping to the result information item collection directory, so it is generally adjusted in the implementation. The structure of the first priority source information item collection directory and the predetermined mapping rule to ensure that the mapping information item set of the root information item set of the non-first priority source information item collection directory appears in an exact sum on the final result information item directory Its connotation of the same position. A specific example of an aggregation method for a collection of information items
图 7描述的是来自名为 Y网站的电影信息项目集合目录的结构, 图 8描述的是 来自 X网站新闻信息项目集合目录结构。 现假设需要将图 1、 图 7和图 8描述的信 息项目集合目录聚合成一个统一的结果信息项目集合目录。 首先设计大致的最终的 结果信息项目集合目录结构, 考虑到图 1、 图 7和图 8中的目录均有按地区的划分 子目录, 所以可以将地区划分的目录结构从每个源信息项目集合目录中抽出来作为 一个统一的地区划分子目录结构, 同样图 1、 图 7共同具有按作者划分的目录结构, 按照同样的方式构造出一个统一的作者划分子目录结构, 得到最终的结果信息项目 集合目录, 结构大致如图 9所示。 这个结构来去掉了源信息项目集合目录结构中的 "新闻类别" 、 "音乐风格" 、 "电影风格"几个集合节点使总体结构更加精简, 便于信息检索, 并且使 "地区"子目录结构能同时适应新闻以及电影和音乐的分类 习惯。  Figure 7 depicts the structure of a collection of movie information items from a website named Y, and Figure 8 depicts the directory structure of a collection of news items from the X website. It is now assumed that the information item set catalogs described in Figures 1, 7, and 8 need to be aggregated into a unified result information item collection catalog. First, design the approximate final result information item collection directory structure. Considering that the directories in Figure 1, Figure 7, and Figure 8 have sub-directories divided by region, the directory structure of the regional division can be collected from each source information item. The catalogue is extracted as a unified sub-directory structure. Similarly, Figure 1 and Figure 7 have a directory structure divided by author. In the same way, a unified author sub-directory structure is constructed to obtain the final result information project. The collection directory, the structure is roughly as shown in Figure 9. This structure removes the "news category", "music style", and "movie style" collection nodes in the source information project collection directory structure to make the overall structure more streamlined, facilitate information retrieval, and enable the "regional" subdirectory structure. At the same time, it adapts to the classification of news and movies and music.
要实现图 9的目录结构, 需要设计相应的第一优先源信息项目集合目录和预定 映射规则。 从整个目录聚合过程可知, 最终的结果信息项目集合目录结构包含整个 第一优先源信息项目集合目录的结构, 所以最坏的情况就是将设计的最终的结果信 息项目集合目录结构作为第一优先源信息项目集合目录的结构, 但在实际中这样做 是不可能实现的。在实际中往往是将来自多个网站的信息目录聚合到一台个人电脑 终端上, 个人电脑上的用户对来自某个特定网站的目录的了解是有限的。 例如对图 7所描述的目录, 作为个人电脑用户不可能知道信息项目集合 "李连杰" 的存在, 只可能知道根据常识可以推导出的部分。将设计的最终的结果信息项目集合目录结 构作为第一优先源信息项目集合目录的结构的做法也是没有必要的, 因为只需要将 设计的最终的结果信息项目集合目录结构的主干部分作为第一优先源信息项目集 合目录便可以通过聚合过程得到设计的结果信息项目集合目录结构, 所以第一优先 源信息项目集合目录是取设计的结果信息项目集合目录结构中只要能通过聚合过 程得到最终的目录结构的最小部分, 根据这样的原则设计得到的第一优先源信息项 目集合目录结构如图 10所示。 To implement the directory structure of FIG. 9, it is necessary to design a corresponding first priority source information item set directory and a predetermined mapping rule. From the entire directory aggregation process, the final result information item collection directory structure contains the structure of the entire first priority source information item collection directory, so the worst case is to design the final result information item collection directory structure as the first priority source. The structure of the information item collection directory, but in practice it is impossible to achieve. In practice, it is often the case that information directories from multiple websites are aggregated onto a personal computer terminal, and users on the personal computer have limited knowledge of directories from a particular website. For example, for the directory described in FIG. 7, as a personal computer user, it is impossible to know the existence of the information item set "Li Lianjie", and it is only possible to know the part that can be derived based on common sense. It is also unnecessary to use the final result information item collection directory structure of the design as the structure of the first priority source information item collection directory, because only the main part of the design final result information item collection directory structure is required as the first priority. The source information item collection directory can obtain the designed result information item collection directory structure through the aggregation process, so the first priority source information item collection directory is the result of the design information. The item collection directory structure can obtain the final directory structure through the aggregation process. The smallest part, the first priority source information item designed according to such principle The directory structure of the directory is shown in Figure 10.
然后确定预定映射规则, 预定映射规则是配合第一优先源信息项目集合目录来 实现设计的最终的结果信息项目集合目录。 根据设计的最终的结果信息项目集合目 录结构和第一优先源信息项目集合目录, 可以设计出以下映射规则:  A predetermined mapping rule is then determined, the predetermined mapping rule being a final result information item collection directory of the design in cooperation with the first priority source information item collection directory. Based on the final result information collection project catalog structure and the first priority source information project collection catalog, the following mapping rules can be designed:
1、 同名映射规则: 源信息项目集合目录上的信息项目集合映射到结果信息项 目集合目录上具有同样名称的信息项目集合上;  1. The same name mapping rule: The information item set on the source information item collection directory is mapped to the information item item list having the same name on the item collection directory;
2、 "X网新闻" 映射到 "新闻" ;  2. "X Net News" maps to "News";
3、 "Y网电影" 映射到 "电影" ;  3. "Y Net Movie" maps to "Movie";
4、 "新闻类别" 映射到 "新闻" ;  4. "News category" maps to "news";
5、 "电影风格" 映射到 "电影" ;  5, "movie style" is mapped to "movie";
6、 "音乐风格" 映射到 "音乐" ;  6, "music style" maps to "music";
7、 "中国" 映射到 "国内" 。  7. "China" maps to "domestic".
根据上述的步骤确定了欲处理的源信息项目集合目录的顺序以及预定映射规 则, 可以通过上一实施例中所述的方法进行聚合操作, 得到如图 11 所示的最终的 结果信息项目集合目录。 信息项目集合目录的聚合系统的实施例  According to the above steps, the order of the source information item collection directory to be processed and the predetermined mapping rule are determined, and the aggregation operation can be performed by the method described in the previous embodiment to obtain the final result information item collection directory as shown in FIG. . Embodiment of an aggregation system of an information item collection directory
基于上述的信息项目集合目录的聚合方法的实施例,本发明相应的提出了信息 项目集合目录的聚合系统, 图 12 示出了这种信息项目集合目录的聚合系统的一个 实施例。请参见图 12, 下面是对本实施例的信息项目集合目录的聚合系统的详细描 述。  Based on the above-described embodiment of the aggregation method of the information item collection directory, the present invention accordingly proposes an aggregation system of the information item collection directory, and Fig. 12 shows an embodiment of the aggregation system of such information item collection directory. Referring to Fig. 12, the following is a detailed description of the aggregation system of the information item collection directory of the present embodiment.
本实施例的信息项目集合目录的聚合系统包括如下的依次相连的几个模块:输 入模块 10、 初始化模块 12、 优先顺序确定模块 14、 目录聚合模块 16、 输出模块 18 输入模块 10用于输入《个源信息项目集合目录,其中„为大于或等于 1的整数 。 初始化模块 12设置初始结果信息项目集合目录, 这个初始值实际上就是空目录, 也即由 0个信息项目集合构成的信息项目集合目录。  The aggregation system of the information item collection directory of this embodiment includes the following modules connected in sequence: an input module 10, an initialization module 12, a priority order determination module 14, a directory aggregation module 16, and an output module 18 input module 10 for inputting a source information item collection directory, where „ is an integer greater than or equal to 1. The initialization module 12 sets an initial result information item collection directory, and this initial value is actually an empty directory, that is, an information item set composed of 0 information item sets. table of Contents.
优先顺序确定模块 14确定这《个源信息项目集合目录之间的处理顺序。 源信 息项目集合目录的优先处理顺序对最终的结果是有影响的, 最终的目录结构将优先 按照处理顺序靠前的源信息项目集合目录的结构组织。 对于如下应用场景, 用户已 经在他的本地计算机上建立了一个目录结构, 采用本方法将网络上的多个信息源的 目录聚合到本地计算机上。 如果采用本地目录作为第一优先处理的源目录, 则最终 聚合成的目录结构将按照本地目录结构优先组织。 这样的话, 用户就可以方便地采 用自己熟悉的目录结构来检索多个来源的信息项目。 The priority order determination module 14 determines the order of processing between the "source information item collection catalogs." The priority processing order of the source information item collection directory has an impact on the final result, and the final directory structure will be preferentially organized according to the structure of the source information item collection directory in the processing order. For the following application scenario, the user has established a directory structure on his local computer, and uses this method to connect multiple information sources on the network. The directory is aggregated to the local computer. If the local directory is used as the first priority processing source directory, the final aggregated directory structure will be preferentially organized according to the local directory structure. In this way, users can easily retrieve information items from multiple sources using their familiar directory structure.
目录聚合模块 16按照优先顺序确定模块 14确定的优先顺序逐个处理《个源信 息项目集合目录, 将当前源信息项目集合目录和当前结果信息项目集合目录聚合成 新的结果信息项目集合目录。而目录聚合模块 16对这《个源信息项目集合目录的存 储为了体现按顺序处理的方式, 是以队列的数据结构来存储的。 本领域普通技术人 员容易理解, 采用队列这种数据结构只是为了达到顺序处理源信息项目集合目录的 目的, 因此采用其他的满足顺序处理源信息项目集合目录的数据结构及其算法和本 实施例等同。  The directory aggregation module 16 processes the "source information item collection directory" one by one according to the priority order determined by the priority order determining module 14, and aggregates the current source information item collection directory and the current result information item collection directory into a new result information item collection directory. The directory aggregation module 16 stores the "source information item collection directory" in order to reflect the sequential processing, which is stored in the queue data structure. It will be readily understood by those skilled in the art that the data structure using the queue is only for the purpose of sequentially processing the source information item collection directory. Therefore, other data structures and algorithms that satisfy the sequential processing source information item collection directory are equivalent to the embodiment. .
目录聚合模块 16进一步细分为第一聚合单元 160和第二聚合单元 162。 进入 目录聚合模块 16 的当前源信息项目集合目录可通过这两个单元的其中之一进行处 理。 如果当前源信息项目集合目录是空目录 (即由 0个信息项目集合构成) , 则通 过第一聚合单元 160录执行空处理, 即不进行任何操作就将其作为已处理过的源信 息项目集合目录。 如果当前源信息项目集合目录是由大于或等于 1个的信息项目构 成, 则通过第二聚合单元 162进行处理: 从根信息项目集合开始, 将当前源信息项 目集合目录中的源信息项目集合逐个映射到当前结果信息项目集合目录中, 其中根 信息项目集合定义为一个信息项目集合目录中的没有父集合的信息项目集合, 对于 一个由大于或等于 1个的信息项目集合构成的信息项目集合目录, 必存在唯一的一 个根信息项目集合。  The directory aggregation module 16 is further subdivided into a first aggregation unit 160 and a second aggregation unit 162. The current source information item collection directory entering the directory aggregation module 16 can be processed by one of the two units. If the current source information item collection directory is an empty directory (that is, composed of 0 information item sets), the first aggregation unit 160 records the execution of the empty processing, that is, it is treated as the processed source information item set without any operation. table of Contents. If the current source information item collection directory is composed of information items greater than or equal to one, processing is performed by the second aggregation unit 162: starting from the root information item set, the source information item collection in the current source information item collection directory is one by one Mapping to the current result information item collection directory, wherein the root information item set is defined as an information item set without a parent set in an information item collection directory, and an information item collection directory composed of a set of information items greater than or equal to one There must be a unique set of root information items.
对于第二聚合单元 162来说, 其进一步包括: 信息项目集合映射单元 1620、 信息项目集合创建单元 1622、 父集合指定单元 1624以及直接隶属指定单元 1626。 这些单元之间的连接关系是: 父集合指定单元 1624 连接信息项目集合创建单元 1622, 直接隶属指定单元 1626分别连接信息项目集合映射单元 1620和信息项目集 合创建单元 1622。  For the second aggregating unit 162, it further includes: an information item set mapping unit 1620, an information item set creating unit 1622, a parent set specifying unit 1624, and a direct membership specifying unit 1626. The connection relationship between these units is: Parent Set Designation Unit 1624 Connection Information Item Set Creation Unit 1622, Direct Membership Designation Unit 1626 connects the information item set mapping unit 1620 and the information item set creation unit 1622, respectively.
第二聚合单元 162的内部处理为:进入第二聚合单元 162的当前源信息项目集 合目录中的源信息项目集合, 可分为两类, 其中一类是按照预定映射规则在当前结 果信息项目集合目录中找到一个信息项目集合与当前处理的源信息项目集合相对 应, 另一类是按照预定映射规则在当前结果信息项目集合目录中没有找到与之对应 的信息项目集合。 对于前一类的源信息项目集合, 由信息项目集合映射单元 1620 将按照预定映射规则在当前结果信息项目集合目录中找到的与当前处理的源信息 项目集合相对应的信息项目集合, 作为当前处理的源信息项目集合在当前结果信息 项目集合目录中对应的映射信息项目集合。 然后通过直接隶属项目指定单元 1626 将源信息项目集合中的直接隶属项目指定为其在结果信息项目集合目录中的映射 信息项目集合的直接隶属项目。 在本实施例中, 信息项目分为直接隶属项目和间接 隶属项目, 其中直接隶属项目是指直接指定归属到信息项目集合的信息项目, 间接 隶属项目是指信息项目集合的子集合包含的信息项目, 在一个信息项目集合目录 中, 一个信息项目可以是 2个以上的信息项目集合的直接隶属项目。 The internal processing of the second aggregating unit 162 is: entering the source information item set in the current source information item set directory of the second aggregating unit 162, which can be divided into two categories, one of which is a current result information item set according to a predetermined mapping rule. The information item set found in the directory corresponds to the currently processed source information item set, and the other type is that the information item set corresponding to the current result information item set directory is not found according to the predetermined mapping rule. For the source information item set of the former category, the information item set mapping unit 1620 finds the current processed source information in the current result information item set directory according to the predetermined mapping rule. The information item set corresponding to the item set is the set of mapping information items corresponding to the currently processed source information item set in the current result information item set directory. The direct membership item in the source information item set is then designated by the direct membership item designation unit 1626 as a direct membership item of the mapping information item set in the result information item collection directory. In this embodiment, the information item is divided into a direct affiliate project and an indirect affiliate project, wherein the direct affiliate project refers to an information item directly assigned to the information item set, and the indirect affiliate project refers to the information item included in the subset of the information item set. In an information item collection directory, an information item may be a direct affiliate item of more than two information item sets.
对于后一类的源信息项目集合, 由信息项目集合创建单元 1622在当前的结果 信息项目集合目录中创建一个新的信息项目集合, 作为与源信息项目集合对应的映 射信息项目集合。 在信息项目集合创建单元 1622运行之后, 启动父集合指定单元 1624。 父集合指定单元 1624对于在源信息项目集合目录中存在父集合的源信息项 目集合, 指定新的映射信息项目集合的父集合为: 源信息项目集合在源信息项目集 合目录中的父集合映射到结果信息项目集合目录中的对应的映射信息项目集合。然 后通过直接隶属项目指定单元 1626将源信息项目集合中的直接隶属项目指定为其 在结果信息项目集合目录中的映射信息项目集合的直接隶属项目。  For the source information item set of the latter type, the information item set creating unit 1622 creates a new information item set in the current result information item set directory as the map information item set corresponding to the source information item set. After the information item collection creating unit 1622 is run, the parent collection specifying unit 1624 is started. The parent collection specifying unit 1624 specifies a parent collection of the new mapping information item set for the source information item set in which the parent collection exists in the source information item collection directory: the parent information collection of the source information item collection in the source information item collection directory is mapped to Result The set of mapping information items in the information item collection directory. The direct membership item in the source information item set is then designated as a direct membership item of the mapping information item set in the result information item collection directory by the direct membership item specifying unit 1626.
本实施例中所述的预定映射规则和上述方法实施例的预定映射规则是相同的, 请参见上述方法实施例中对预定映射规则的描述。  The predetermined mapping rule described in this embodiment is the same as the predetermined mapping rule in the foregoing method embodiment. For details, refer to the description of the predetermined mapping rule in the foregoing method embodiment.
而第二聚合单元 162中的对源信息项目集合的处理顺序,是从根信息项目集合 开始, 按照有向无环图的拓扑排序算法的顺序将当前源信息项目集合目录中的源信 息项目集合逐个映射到当前结果信息项目集合目录中去。 当然, 除了有向无环图的 拓扑排序算法之外, 还可以采用有向图深度优先遍历算法。 本实施例中的有向无环 图的拓扑排序算法和有向图深度优先遍历算法都已经在上述的方法实施例中描述, 因此在此不再赘述。 上述实施例是提供给本领域普通技术人员来实现或使用本发明的, 本领域普 通技术人员可在不脱离本发明的发明思想的情况下,对上述实施例做出种种修改或 变化, 因而本发明的保护范围并不被上述实施例所限, 而应该是符合权利要求书提 到的创新性特征的最大范围。  The processing sequence of the source information item set in the second aggregating unit 162 is to start the source information item set in the current source information item set directory in the order of the topological sorting algorithm of the directed acyclic graph starting from the root information item set. Map to the current result information item collection directory one by one. Of course, in addition to the topological sorting algorithm for acyclic graphs, a directed graph depth-first traversal algorithm can also be employed. The topological sorting algorithm and the directed graph depth-first traversal algorithm of the directed acyclic graph in the present embodiment have been described in the foregoing method embodiments, and therefore will not be described herein. The above embodiments are provided to enable a person skilled in the art to implement or use the present invention, and those skilled in the art can make various modifications or changes to the above embodiments without departing from the inventive concept. The scope of protection of the invention is not limited by the embodiments described above, but should be the maximum range of the innovative features mentioned in the claims.

Claims

权 利 要 求 Rights request
1、 一种信息项目集合目录的聚合方法, 将《个源信息项目集合目录聚合成一 个结果信息项目集合目录, 《为大于或等于 1的整数, 该聚合方法包括: 1. An aggregation method of an information item collection directory, which aggregates the "source information item collection directory" into a result information item collection directory, "is an integer greater than or equal to 1, the aggregation method includes:
( 1 ) 设置初始结果信息项目集合目录, 将其设置为由 0个信息项目集合构成 的信息项目集合目录;  (1) setting an initial result information item collection directory, and setting it as an information item collection directory composed of 0 information item sets;
(2) 确定《个源信息项目集合目录之间的处理优先顺序;  (2) Determine the priority of processing between the collection lists of individual source information items;
( 3 ) 按照该优先顺序逐个处理《个源信息项目集合目录, 将当前源信息项目 集合目录和当前结果信息项目集合目录聚合成新的结果信息项目集合目录;  (3) processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;
(4 ) 在处理完《个源信息项目集合目录后, 此时的当前结果信息项目集合目 录为聚合成的结果信息项目集合目录;  (4) After processing the "source information item collection directory", the current result information item collection directory at this time is the aggregated result information item collection directory;
其中信息项目集合是由^个信息项目组成的集合, 其中 ^为大于或等于 0的整 数, 信息项目集合目录是由《2个信息项目集合构成, 其中《2为大于或等于 0的整数 The information item set is a set consisting of ^ information items, wherein ^ is an integer greater than or equal to 0, and the information item collection directory is composed of " two information item sets, wherein " 2 is an integer greater than or equal to 0"
2、 根据权利要求 1所述的信息项目集合目录的聚合方法, 其特征在于, 在聚 合处理过程中, 对于由 0个信息项目集合构成的当前源信息项目集合目录的处理为 执行空处理。 The aggregation method of an information item collection directory according to claim 1, wherein in the aggregation processing, the processing of the current source information item collection directory composed of 0 information item sets is execution empty processing.
3、 根据权利要求 1所述的信息项目集合目录的聚合方法, 其特征在于, 在步 骤 (3 ) 的聚合处理过程中, 对于由大于或等于 1 个的信息项目集合构成的当前源 信息项目集合目录的处理为: 从根信息项目集合开始, 将当前源信息项目集合目录 中的源信息项目集合逐个映射到当前结果信息项目集合目录中, 其中根信息项目集 合定义为一个信息项目集合目录中的没有父集合的信息项目集合, 对于一个由大于 或等于 1个的信息项目集合构成的信息项目集合目录, 必存在唯一的一个根信息项 目集合。 The aggregation method of the information item collection directory according to claim 1, wherein in the aggregation processing of the step (3), the current source information item set composed of the information item set greater than or equal to one The processing of the directory is: starting from the root information item collection, mapping the source information item collection in the current source information item collection directory one by one to the current result information item collection directory, wherein the root information item set is defined as an information item collection directory There is no collection of information items of the parent set. For a list of information items set consisting of a set of information items greater than or equal to one, there must be a unique set of root information items.
4、 根据权利要求 3所述的信息项目集合目录的聚合方法, 其特征在于, 将当 前源信息项目集合目录中的源信息项目集合映射到当前结果信息项目集合目录中 的步骤进一步包括: The aggregation method of the information item collection directory according to claim 3, wherein the step of mapping the source information item set in the current source information item collection directory to the current result information item collection directory further comprises:
按照预定映射规则在当前结果信息项目集合目录中找到一个信息项目集合与 当前处理的源信息项目集合相对应, 作为当前处理的源信息项目集合在当前结果信 息项目集合目录中对应的映射信息项目集合, 如果按照该映射规则在当前结果信息 项目集合目录中没有与之对应的信息项目集合, 则在当前结果信息项目集合目录中 创建一个新的信息项目集合作为与源信息项目集合对应的映射信息项目集合, 其中 如果源信息项目集合在源信息项目集合目录中存在父集合, 则新的映射信息项目集 合的父集合指定为源信息项目集合在源信息项目集合目录中的父集合映射到结果 信息项目集合目录中的对应的映射信息项目集合。 Find a set of information items in the current result information item collection directory according to a predetermined mapping rule The currently processed source information item set corresponds to the currently processed source information item set in the current result information item set directory corresponding mapping information item set, if there is no corresponding in the current result information item set directory according to the mapping rule The information item collection, in the current result information item collection directory, creates a new information item set as a mapping information item set corresponding to the source information item set, wherein if the source information item set has a parent set in the source information item collection directory Then, the parent set of the new mapping information item set is specified as the source information item set in the source information item collection directory, and the parent set is mapped to the corresponding information item item set in the result information item set directory.
5、 根据权利要求 4所述的信息项目集合目录的聚合方法, 其特征在于, 在源 信息项目集合中的直接隶属项目为其在结果信息项目集合目录中的映射项目集合 的直接隶属项目, 其中直接隶属项目是指直接指定归属到信息项目集合的信息项 目, 间接隶属项目是指信息项目集合的子集合包含的项目, 在一个信息项目集合目 录中, 一个信息项目可以是 2个以上的信息项目集合的直接隶属项目。 5. The aggregation method of an information item collection directory according to claim 4, wherein the direct membership item in the source information item set is a direct membership item of the mapping item set in the result information item collection directory, wherein The direct affiliate project refers to the information item directly assigned to the information item set, and the indirect belonging item refers to the item included in the subset of the information item set. In one information item set catalogue, one information item may be two or more information items. The direct membership of the collection.
6、 根据权利要求 5所述的信息项目集合目录的聚合方法, 其特征在于, 预定 映射规则是指将源信息项目集合映射到结果信息项目集合目录中的相似内涵的信 息项目集合。 The aggregation method of the information item collection directory according to claim 5, wherein the predetermined mapping rule refers to mapping the source information item set to a similar content information item set in the result information item collection directory.
7、 根据权利要求 3所述的信息项目集合目录的聚合方法, 其特征在于, 从根 信息项目集合开始, 按照有向无环图的拓扑排序算法的顺序将当前源信息项目集合 目录中的源信息项目集合逐个映射到当前结果信息项目集合目录中。 7. The aggregation method of an information item collection directory according to claim 3, wherein, from the root information item set, the source in the current source information item collection directory is in the order of the directed acyclic graph topological sorting algorithm. The information item collection is mapped one by one to the current result information item collection directory.
8、根据权利要求 1〜7中任一项所述的信息项目集合目录的聚合方法, 其特征 在于, 步骤 (3 ) 进一步包括: The aggregation method of the information item collection directory according to any one of claims 1 to 7, wherein the step (3) further comprises:
将 "个源信息项目集合目录组织成队列的数据结构;  Organize the "source information project collection directory into a queued data structure;
判断队列是否为空, 如果队列为空则结束处理, 如果队列非空则取队首元素作 为当前要处理的源信息项目集合目录, 并对队首元素做出队操作;  Determine whether the queue is empty. If the queue is empty, the processing ends. If the queue is not empty, the first element of the queue is taken as the source information item collection directory to be processed, and the team operation is performed on the first element of the team;
将当前处理的源信息项目集合目录和当前结果信息项目集合目录聚合成新的 结果信息项目集合目录以作为下一个循环的当前结果信息项目集合目录, 返回到上 一步。 The currently processed source information item collection directory and the current result information item collection directory are aggregated into a new result information item collection directory as the current result information item collection directory of the next loop, and the previous step is returned.
9、 一种信息项目集合目录的聚合系统, 包括: 9. An aggregation system for a collection of information items, comprising:
输入模块, 用于输入《个源信息项目集合目录, 其中《为大于或等于 1的整数 初始化模块, 连接输入模块, 设置初始结果信息项目集合目录, 将其设置为由 0个信息项目集合构成的信息项目集合目录;  An input module, configured to input a “source information item collection directory, where “initialization module for an integer greater than or equal to 1, connecting an input module, setting an initial result information item collection directory, and setting it to be composed of 0 information item sets Information item collection directory;
优先顺序确定模块, 连接初始化模块, 确定《个源信息项目集合目录之间的处 理优先顺序;  a priority order determining module, connecting the initialization module, and determining a processing priority order between the collection lists of the source information items;
目录聚合模块, 连接优先顺序确定模块, 按照该优先顺序逐个处理《个源信息 项目集合目录, 将当前源信息项目集合目录和当前结果信息项目集合目录聚合成新 的结果信息项目集合目录;  a directory aggregation module, a connection priority order determining module, processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;
输出模块, 连接目录聚合模块, 将处理完《个源信息项目集合目录时的当前结 果信息项目集合目录作为最终的结果信息项目集合目录输出;  The output module, which is connected to the directory aggregation module, outputs the current result information item collection directory when the "source information item collection directory" is processed as the final result information item collection directory;
其中信息项目集合是由^个信息项目组成的集合, 其中 ^为大于或等于 0的整 数, 信息项目集合目录是由《2个信息项目集合构成, 其中《2为大于或等于 0的整数 The information item set is a set consisting of ^ information items, wherein ^ is an integer greater than or equal to 0, and the information item collection directory is composed of " two information item sets, wherein " 2 is an integer greater than or equal to 0"
10、 根据权利要求 9所述的信息项目集合目录的聚合系统, 其特征在于, 在目 录聚合模块中设有第一聚合单元, 第一聚合单元对由 0个信息项目集合构成的当前 源信息项目集合目录执行空处理。 The aggregation system of the information item collection directory according to claim 9, wherein the directory aggregation module is provided with a first aggregation unit, and the first aggregation unit has a current source information item composed of 0 information item sets. The collection directory performs null processing.
11、 根据权利要求 9所述的信息项目集合目录的聚合系统, 其特征在于, 在目 录聚合模块中设有第二聚合单元, 第二聚合单元对由大于或等于 1个的信息项目集 合构成的当前源信息项目集合目录的处理为: 从根信息项目集合开始, 将当前源信 息项目集合目录中的源信息项目集合逐个映射到当前结果信息项目集合目录中, 其 中根信息项目集合定义为一个信息项目集合目录中的没有父集合的信息项目集合, 对于一个由大于或等于 1个的信息项目集合构成的信息项目集合目录, 必存在唯一 的一个根信息项目集合。 The aggregation system of the information item collection directory according to claim 9, wherein the directory aggregation module is provided with a second aggregation unit, and the second aggregation unit is composed of a set of information items greater than or equal to one. The processing of the current source information item collection directory is: starting from the root information item collection, mapping the source information item collection in the current source information item collection directory one by one to the current result information item collection directory, wherein the root information item set is defined as one information A collection of information items without a parent collection in the project collection directory. For a collection of information items consisting of a collection of information items greater than or equal to one, there must be a unique collection of root information items.
12、 根据权利要求 11所述的信息项目集合目录的聚合系统, 其特征在于, 第 二聚合单元进一步包括: The aggregation system of the information item collection directory according to claim 11, wherein the second aggregation unit further comprises:
信息项目集合映射单元,按照预定映射规则在当前结果信息项目集合目录中找 到一个信息项目集合与当前处理的源信息项目集合相对应, 作为当前处理的源信息 项目集合在当前结果信息项目集合目录中对应的映射信息项目集合; The information item collection mapping unit searches for the current result information item collection directory according to a predetermined mapping rule. Corresponding to the set of information items corresponding to the currently processed source information item set, the set of mapping information items corresponding to the currently processed source information item set in the current result information item set directory;
信息项目集合创建单元,对于按照预定映射规则在当前结果信息项目集合目录 中没有与之对应的信息项目集合, 在当前结果信息项目集合目录中创建一个新的信 息项目集合作为与源信息项目集合对应的映射信息项目集合;  The information item collection creating unit creates a new information item set in the current result information item collection directory as corresponding to the source information item set for the information item set corresponding to the current result information item collection directory according to the predetermined mapping rule. Collection of mapping information items;
父集合指定单元, 连接信息项目集合创建单元, 对于在源信息项目集合目录中 存在父集合的源信息项目集合, 指定新的映射信息项目集合的父集合为: 源信息项 目集合在源信息项目集合目录中的父集合映射到结果信息项目目录中的对应的映 射信息项目集合。  The parent collection specified unit, the connection information item collection creation unit, and the source information item collection in which the parent collection exists in the source information item collection directory, the parent collection of the new mapping information item collection is: Source information item collection in the source information item collection The parent collection in the catalog maps to the corresponding set of mapping information items in the result information project directory.
13、 根据权利要求 12所述的信息项目集合目录的聚合系统, 其特征在于, 该 第二聚合单元进一步包括: The aggregation system of the information item collection directory according to claim 12, wherein the second aggregation unit further comprises:
直接隶属项目指定单元,分别连接信息项目集合映射单元和信息项目集合创建 单元, 将源信息项目集合中的直接隶属项目指定为其在结果信息项目集合目录中的 映射项目集合的直接隶属项目, 其中直接隶属项目是指直接指定归属到信息项目集 合的信息项目, 间接隶属项目是指信息项目集合的子集合包含的项目, 在一个信息 项目集合目录中, 一个信息项目可以是 2个以上的信息项目集合的直接隶属项目。  Directly belonging to the project specifying unit, respectively connecting the information item set mapping unit and the information item set creating unit, and designating the direct affiliated items in the source information item set as direct affiliated items of the mapping item set in the result information item set directory, wherein The direct affiliate project refers to the information item directly assigned to the information item set, and the indirect belonging item refers to the item included in the subset of the information item set. In one information item set catalogue, one information item may be two or more information items. The direct membership of the collection.
14、 根据权利要求 13所述的信息项目集合目录的聚合系统, 其特征在于, 预 定映射规则是指将源信息项目集合映射到结果信息项目集合目录中的相似内涵的 信息项目集合。 The aggregation system of the information item collection directory according to claim 13, wherein the predetermined mapping rule refers to mapping the source information item set to the information item set of the similar connotation in the result information item collection directory.
15、 根据权利要求 11所述的信息项目集合目录的聚合系统, 其特征在于, 该 第二聚合单元从根信息项目集合开始, 按照有向无环图的拓扑排序算法的顺序将当 前源信息项目集合目录中的源信息项目集合逐个映射到当前结果信息项目集合目 录中。 The aggregation system of the information item collection directory according to claim 11, wherein the second aggregation unit starts the current source information item in the order of the topological sorting algorithm of the directed acyclic graph, starting from the root information item set. The collection of source information items in the collection directory is mapped one by one to the current result information item collection directory.
16、 根据权利要求 9所述的信息项目集合目录的聚合系统, 其特征在于, 该目 录聚合模块将《个源信息项目集合目录以队列的数据结构来存储。 The aggregation system of the information item collection directory according to claim 9, wherein the directory aggregation module stores the "source information item collection directory" in a queue data structure.
PCT/CN2009/072520 2009-06-30 2009-06-30 Aggregation method of information items set directory and system thereof WO2011000144A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2009/072520 WO2011000144A1 (en) 2009-06-30 2009-06-30 Aggregation method of information items set directory and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2009/072520 WO2011000144A1 (en) 2009-06-30 2009-06-30 Aggregation method of information items set directory and system thereof

Publications (1)

Publication Number Publication Date
WO2011000144A1 true WO2011000144A1 (en) 2011-01-06

Family

ID=43410445

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/072520 WO2011000144A1 (en) 2009-06-30 2009-06-30 Aggregation method of information items set directory and system thereof

Country Status (1)

Country Link
WO (1) WO2011000144A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897560A (en) * 2005-07-12 2007-01-17 中兴通讯股份有限公司 Method for improving routing list capacity
CN1897564A (en) * 2005-07-11 2007-01-17 中兴通讯股份有限公司 Strategic routing matching method based on recursive-flow category algorithm
CN101158949A (en) * 2007-07-20 2008-04-09 时文 Method and system for file items ranking and searching based on aggregation
CN101334793A (en) * 2008-08-01 2008-12-31 中国科学院软件研究所 A Method to Automatically Identify Requirement Dependency

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897564A (en) * 2005-07-11 2007-01-17 中兴通讯股份有限公司 Strategic routing matching method based on recursive-flow category algorithm
CN1897560A (en) * 2005-07-12 2007-01-17 中兴通讯股份有限公司 Method for improving routing list capacity
CN101158949A (en) * 2007-07-20 2008-04-09 时文 Method and system for file items ranking and searching based on aggregation
CN101334793A (en) * 2008-08-01 2008-12-31 中国科学院软件研究所 A Method to Automatically Identify Requirement Dependency

Similar Documents

Publication Publication Date Title
US7836056B2 (en) Location management of off-premise resources
US7617184B2 (en) Scalable hierarchical data-driven navigation system and method for information retrieval
US7984035B2 (en) Context-based document search
Stuckenschmidt et al. Index structures and algorithms for querying distributed RDF repositories
JP6006267B2 (en) System and method for narrowing a search using index keys
JP4950444B2 (en) System and method for ranking search results using click distance
KR101183312B1 (en) Dispersing search engine results by using page category information
JP4996300B2 (en) File system search ranking method and related search engine
US20080082490A1 (en) Rich index to cloud-based resources
US20080249995A1 (en) Method and system for attribute management in a namespace
US20060184512A1 (en) Content searching and configuration of search results
US20050246313A1 (en) Metadata editing control
Ilyas et al. Adaptive rank-aware query optimization in relational databases
JP2006505872A (en) Techniques for managing multiple hierarchies of data from a single interface
JP2006107446A (en) Batch indexing system and method for network document
CN102483762B (en) Method for accessing files of a file system according to metadata and device implementing the method
JP2000090076A (en) Method and system for managing document
JP2002351873A (en) Metadata management system and search method
US6968331B2 (en) Method and system for improving data quality in large hyperlinked text databases using pagelets and templates
US20050216430A1 (en) Generation of meaningful names in flattened hierarchical structures
CN107203557A (en) The method and device of object to be searched for handling
CN103186650A (en) Searching method and device
US9135343B2 (en) Search engine platform
WO2011000144A1 (en) Aggregation method of information items set directory and system thereof
EP2083364A1 (en) Method for retrieving a document, a computer-readable medium, a computer program product, and a system that facilitates retrieving a document

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09846673

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09846673

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.06.12)

122 Ep: pct application non-entry in european phase

Ref document number: 09846673

Country of ref document: EP

Kind code of ref document: A1