WO2011000144A1

WO2011000144A1 - Aggregation method of information items set directory and system thereof

Info

Publication number: WO2011000144A1
Application number: PCT/CN2009/072520
Authority: WO
Inventors: 时文
Original assignee: Shi Wen
Priority date: 2009-06-30
Filing date: 2009-06-30
Publication date: 2011-01-06

Abstract

An aggregation method of information items set directory and a system thereof are disclosed. The aggregation method includes : setting the information items set directory of the initialized result, and setting it to be the information items set directory composed of 0 information item set; determining the priority processing order among the n source information items set directories; processing n source information items set directories one by one according to the priority processing order, and aggregating the present source information items set directory and the present result information items set directory to form a new result information items set directory, then, the present result information items set directory is the aggregated result information items set directory, after processing the n source information items set directories.

Description

Aggregation method and system for information item collection catalogue

The present invention relates to a computer information classification and retrieval technique, and more particularly to an aggregation method and system for an information item collection directory. Background technique

The application name of the applicant filed on July 20, 2007, published on April 9, 2008, with the application number of 200710009235.7 and the publication number of CN101158949 is "Method and System for Classification and Retrieval of File Items Based on Sets" Patent application. In this application, the applicant proposed a file project classification and retrieval based on set theory, which solved the following problems in the existing tree directory: (1) multi-category collation conflict; (2) caused by pattern combination Branch explosion; (3) Classification and retrieval methods are fixed; (4) Information items cannot be classified and retrieved based on attribute values.

In order to solve the above technical problems, the applicant has introduced the following concepts: A file item refers to a unique identifier of a file in the file system in which it is located; a file item set is composed of n different file items. The set, n is an integer greater than or equal to 0; attribute, record characteristic information of a certain aspect of the file item, can be artificially given. The conditions that are met between them include: A file item can appear at most once in a collection of file items, but a file item can be attributed to multiple different sets of file items; a file item can be assigned n attributes, The attribute must be set to the corresponding attribute value, n is an integer greater than or equal to 0; a file item set can be assigned n attributes, and the corresponding attribute value range must be set for the attribute given, n is an integer greater than or equal to 0; A file item collection has a certain attribute and a corresponding attribute value range, and the attribute belonging to the file item set must have the attribute, and the corresponding attribute value is within the attribute value range set by the file item set, but a file item may have the same An attribute that is not available in the collection of owned file items. The categorization method of the collection-based file item proposed by this application is to establish the attribution relationship between the file item and the existing file item set by directly specifying or assigning the attribute and setting its corresponding attribute value or a combination of the two. The search method proposed by this application refers to obtaining a result file item set from an existing file item set by using a specified set operation mode or a specified attribute value range or a combination of the two.

However, the technical solution mainly aims to classify and retrieve information items for a single independent information source to establish an information classification directory, and in reality applications, there is a need to access information items from multiple information sources through a unified information classification directory. Because this method of using a unified search portal can be effectively saved. The time cost for users to access multiple information sources to retrieve information. For example, RSS is a typical technical solution for accessing information items from multiple network information sources through a unified information directory. Its principle is to aggregate information items from different network information sources into a tree directory structure preset by the user's local computer.

Although RSS implements the function of aggregating multiple publishing information from different websites into one personal computer terminal, its directory structure still uses the traditional tree directory. Compared with the information retrieval scheme in the above application, the information retrieval method is efficient. low. Summary of the invention

The object of the present invention is to solve the above problems, and to provide an aggregation method of a collection of information items, which can save time and effort for user information retrieval.

Another object of the present invention is to provide an aggregation system for a collection of information item collections.

The technical solution of the present invention is: The present invention proposes an aggregation method of an information item collection directory, which aggregates the "source information item collection directory into a result information item collection directory, "is an integer greater than or equal to 1, the aggregation method Includes:

(1) setting an initial result information item collection directory, and setting it as an information item collection directory composed of 0 information item sets;

(2) Determine the priority of processing between the collection lists of individual source information items;

(3) processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;

(4) After processing the "source information item collection directory", the current result information item collection directory at this time is the aggregated result information item collection directory;

The information item set is a set consisting of ^ information items, wherein ^ is an integer greater than or equal to 0, and the information item set catalog is composed of " _two sets of information items, wherein " ₂ is an integer greater than or equal to 0. An aggregation system for a collection of information item collections is also proposed, including:

An input module, configured to input a “source information item collection directory, where “initialization module for an integer greater than or equal to 1, connecting an input module, setting an initial result information item collection directory, and setting it to be composed of 0 information item sets Information item collection directory;

a priority order determining module, connecting the initialization module, and determining a processing priority order between the collection lists of the source information items; a directory aggregation module, a connection priority order determining module, processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;

The output module, which is connected to the directory aggregation module, outputs the current result information item collection directory when the "source information item collection directory" is processed as the final result information item collection directory;

The information item set is a set consisting of ^ information items, wherein ^ is an integer greater than or equal to 0, and the information item set catalog is composed of " _two sets of information items, wherein " ₂ is an integer greater than or equal to 0. Compared with the prior art, the following beneficial effects are obtained: the present invention aims to solve the problem that the search for the single information source and the RSS-like multi-information source aggregation scheme in the application described in the background section are inefficiently searched by the traditional tree directory. The problem is to realize a unified information classification directory by aggregating a plurality of information item collection directories established by a single information source into a unified information item collection directory, thereby realizing the function of retrieving information items from multiple information sources. Thereby achieving the purpose of saving user information retrieval time and energy. BRIEF abstract

Figure 1 is an information item collection directory structure for music information item classification.

2 is a flow chart showing an embodiment of an aggregation method of an information item collection directory of the present invention.

Figure 3 is a flow chart of processing the current source information item collection directory using the DAG topology sorting algorithm. Figure 4 is a DAG description of a sample source information item collection directory structure for explaining the current source information item collection directory processing flow.

Figure 5 is a DAG description of the current result information item collection directory for the current source information item collection directory processing flow.

Fig. 6 is a DAG description of the directory of the result information item set obtained by aggregating the directory described in Fig. 4 as the current source information item set directory, the directory described in Fig. 5 as the current result information item set directory.

Figure 7 is a directory structure of an information item collection of a movie information item classification from the Y website.

Figure 8 is an information item collection directory structure of a news information item classification from the X website.

Fig. 9 is a structural plan for a result information item collection directory in which the information item set directories shown in Figs. 1, 7, and 8 are aggregated.

FIG. 10 is a first priority source information item set directory structure used to aggregate the information item set list shown in FIG. 1, FIG. 7, and FIG. FIG. 11 is a result information item set directory structure in which the information item set list shown in FIG. 1, FIG. 7, and FIG. 8 and the first priority source information item set list shown in FIG. 10 are aggregated.

Figure 12 is a schematic diagram of an embodiment of an aggregation system of an information item collection directory of the present invention.

Figure 13 is a schematic diagram of a directory aggregation module in the system of the embodiment of Figure 12. BEST MODE FOR CARRYING OUT THE INVENTION

The invention will now be further described with reference to the drawings and embodiments.

Before describing the embodiments of the present invention, some terms and terms involved in the present invention are first defined and explained. The information item in the present invention is defined as an information structure that can be processed and presented to a user on a computer system as a logical whole. A file in a file system is the most typical example of an information item, but not only a file is an information item. For example, a record in a relational database is physically stored as part of a database file, but it can be logically processed and presented to the user as a whole, and thus can be considered a type of information item. Another example is that an e-mail in a mail delivery software such as Outlook is stored as part of a mail box file, but logically can be processed and displayed as a whole, and thus can be considered as a type of information item.

In actual processing, information items are often represented by a unique identifier. For example, a file in the operating system is represented by a unique file path, and a web page in the Internet can be represented by a unique URL. If you need to process multiple information items, the simple way is to add a different type of prefix to each information item identifier. For example, when the browser loads the local file, the URL displayed on the URL field is increased before the file path of the operating system. "file : // " is used as a prefix. Therefore, how to create an information item collection catalog for an information source and how to search for information items in a collection of information items can refer to the application mentioned in the background technology.

An information item set is a set of mathematical meanings consisting of ^ information items, and ^ is an integer greater than or equal to 0. In practical applications, a collection of information items can contain a variety of information items, as long as it can be effectively distinguished according to the method described in the previous paragraph.

Collection Contents item information is set by the _"two items of information by the information structure composed of parent-child relationships,« ₂ is an integer greater than or equal to 0. The concept of the parent set is limited to: If the specified information item set A is the parent set of the information item set B, then all the information items contained in B are also included by A, which in turn may be said to be a subset of A. When " ₂ is 0, the information item collection directory does not contain any information item collection, so it is also called an empty directory. When _{2 is} greater than or equal to 1, there must be a unique set of root information items (also referred to as the root set) in the directory. The set has no parent set, and any other letter in the directory other than this. The set of interest items must have k parent collections belonging to the directory, and k is an integer greater than or equal to 1. According to the definition of the parent-child relationship in the information item collection directory, the items included in an information item collection can be divided into two categories: one type of project is a project directly assigned to the collection, which is called a direct membership project; the other is a sub-item The items contained in the collection are called indirect dependent items.

Figure 1 shows an example of a collection of information items for music file classification. "Music" is the collection of root information items for this directory, "author", "region", "year" is its collection of subprojects, and so on. "West Life" is a male group combination, so in the directory its corresponding "West Life"

"The information project collection has two parent collections, "Male Author" and "Group Combination". The same principle applies to the "Atomic Kitten" information project collection. "You raise me up. mp3, is the life that West Life sang in 2005. An mp3 file of a popular song is designated as a direct affiliate of "West Life", "European", "2005", and "Popular" according to its affiliate information. Since "male author" and "group combination" are the parent collection of "West Life", Bayu "You raise me up. mp3" is an indirect membership of the first two, and so on. It is also "author" and "music". Indirect affiliate program. The same principle applies to "If you come to me. mp3,,.

Having described the above nouns and concepts, various embodiments of the invention are specifically described below. Embodiment of aggregation method of information item collection directory

FIG. 2 shows an aggregation method of the information item collection directory of the embodiment, which is used for aggregating the "source information item collection directory" into a result information item collection directory, where "is an integer greater than or equal to 1. Referring to Fig. 2, the following is a detailed description of the steps of the polymerization method of the present embodiment.

Step S10: Set an initial result information item set, and set it to an empty directory, that is, an information item collection directory composed of 0 information item sets.

Step S12: Set the priority order of the "source information item collection directory", and use the data structure of the queue to store the source information item collection directory.

The priority processing order of the source information item collection directory has an impact on the final result, and the final directory structure will be preferentially organized according to the structure of the source information item collection directory in the processing order. For the following application scenarios, the user has established a directory structure on his local computer, which uses this method to aggregate the directories of multiple information sources on the network to the local computer. If a local directory is used as the source directory for the first priority processing, the resulting directory structure will be prioritized according to the local directory structure. In this way, users can easily retrieve information items from multiple sources using their familiar directory structure.

It will be readily understood by those skilled in the art that the data structure using queues is only for sequential processing. The purpose of the source information item collection directory is therefore equivalent to the other embodiment of the data structure and its algorithm for processing the source information item collection directory.

Step S14: It is judged whether the processing queue is empty. If the queue is not empty, the process proceeds to step S16, and if the queue is empty, the process ends.

Step S16: Take the first element of the team as the source information item collection directory of the current processing, and make a team operation. The dequeue operation means that the first element of the team is removed from the queue and its successor element is taken as the new leader element. Step S18: The current source information item collection directory and the current result information item collection directory are aggregated into a new result information item collection directory, and the new result information item collection directory is used as the current result information item collection directory in the next loop. Then it returns to step S14.

When the source information item collection directory (also referred to as the first priority source information item collection directory) whose processing order is ranked first is processed, the current result information item collection directory is an empty directory, so the result information item obtained at this time is obtained. The collection directory is a copy of the first priority source information item collection directory, and the source information item collection directory to be processed later is supplemented by the structure of the first priority source information item collection directory, and finally the complete result information item collection directory is obtained. Therefore, in the specific implementation, the first priority source information item collection directory is generally used as the blueprint of the main frame of the final result information item collection directory, and the special result can be used to adjust the final result information item collection directory according to the predetermined mapping rules described later. Structure.

The following is a detailed description of the aggregation process of step S18. In the aggregation process, the processing of the current source information item set directory composed of 0 information item sets is empty, that is, it is treated as the processed source information item set directory without any operation. The processing of the current source information item collection directory composed of the information item set greater than or equal to one is: starting from the root information item set, mapping the source information item set in the current source information item collection directory one by one to the current result information item In the collection directory, where the root information item collection is defined as a collection of information items without a parent collection in a collection of information items. For a collection of information items consisting of a collection of information items greater than or equal to one, there must be a unique set of root information items.

The concept of mapping mentioned here is a conceptual extension of the mathematical mapping, defined as: fi '. Si → D, where & is the source information item collection directory, which is the result information item collection directory. For any of the information item sets in &, there is a set of information items corresponding to the result information item collection directory. The method of establishing a mapping relationship is to find an information item set corresponding to the currently processed source information item set on the result information item collection directory according to a predetermined mapping rule as its mapping item set on the result information item collection directory. If there is no corresponding information item set in the current result information item collection directory according to the predetermined mapping rule, a new information item set is created in the result information item collection directory as a corresponding The mapping information item collection, so that the mapping definition is satisfied. In order to achieve the purpose of information aggregation, the direct affiliated items in the source information item set are specified as direct affiliated items of the mapping information item set.

In actual applications, the mapping relationship can be established according to some predetermined mapping rules. For example, in the actual application, each information item set is taken with a name representing its connotation, for example: The information item set containing the video information item is named "video", and the information item set containing the audio information item is named "audio". "Wait, the same name mapping rule can be predefined, that is, the information item set having the same name as the source information item set is found in the result information item collection directory as the corresponding mapping information item set. The purpose of this is to aggregate a collection of information items containing similar information items in a plurality of source information item collection directories into one information item set in the result information item collection directory, and these can be included in the result information item collection directory. Similar information items from different sources are retrieved as a whole. Therefore, the predetermined mapping rule in this embodiment is generally established in accordance with the principle of information item set that maps the source information item set to the similar content on the result information item set directory.

If a source information item set cannot find a corresponding mapping information item set on the current result information item collection directory according to a predetermined mapping rule, a new information item set is created in the current result information item collection directory as a corresponding mapping. Information item collection. The parent collection of the new mapping information item collection is specified as the source information item collection. The parent collection in the source information item collection directory maps to the corresponding mapping information item collection on the result information item collection directory. In an actual application, if a source information item set cannot find a corresponding mapping information item set on the current result information item collection directory according to a predetermined mapping rule, it indicates that the source information item set has a different connotation than the current result information item. Any collection of information items on the collection directory, so a new set of information items is created in the result information item collection directory. Since the new information item set does not have an existing parent-child relationship in the result information item collection directory, the function of specifying the parent collection according to the above principle is to transplant the corresponding parent-child relationship structure in the source information item collection directory to the result information item collection directory. Go in. The overall effect achieved by processing the source information item collection directory in accordance with a certain priority order is to use the structure of the source information item collection directory with the lower priority order to fill the blank of the structure of the source information item collection directory with the higher priority, and finally obtain the result. The main structure of the information item collection directory is organized according to the structure of the prioritized source information item collection directory, but some of the prioritized source information item collection directory structure does not contain the branch structure, and the source is followed by the priority order. The structure of the information item collection directory is organized.

As can be seen from the above, the specific operation of step S18 starts from the root information item set, but can be performed in various orders as long as the parent set designation principle of the new mapping information item set is satisfied. A better implementation method is a topological sorting algorithm using Directed Acyclic Graph (DAG). The order is to proceed. The advantage of adopting this algorithm is that the order can ensure that the mapping information item set corresponding to the parent set on the source information item collection directory already exists in the current result information item collection directory, so that only the corresponding parent-child relationship can be simply established. . In other orders, such as the order of the depth-first traversal algorithm of the directed graph, only a part of the parent-child relationship can be established when creating a new set of mapping information items, because the mapping information item set corresponding to the partial parent set of the source information item set is in the current result. The information item collection directory does not exist yet. Only when the parent collection is mapped, the corresponding parent-child relationship is added.

According to the parent-child relationship in the information item collection, the loop structure cannot appear in a legitimate information item collection directory, such as A is the parent collection of B, B is the parent collection of C, and C is the structure of the parent collection of A. Therefore, the information item collection directory can be described by DAG. Figure 4 is the DAG of the directory conversion of Figure 1. The DAG topology sorting algorithm is used to complete the current source information item collection directory and the current result information item collection directory aggregation process. See Figure 3, specifically Proceed as follows.

Step S20: Add the root information item set of the current source information item collection directory to the to-be-processed collection list, and initialize the mapping relationship table to be an empty table.

Step S22: Determine whether the to-be-processed collection list is empty. If it is empty, the processing flow is ended. If it is not empty, the process proceeds to step S24.

Step S24: Select any one of the to-be-processed collections to be the current processing set.

Step S26: Mapping the current processing set to the result information item collection directory, and recording the mapping relationship into the mapping relationship table.

Step S28: Add the sub-set of the current processing set to the to-be-processed collection list (if not already added in the pending collection list), and decrement the in-degree of the subset of all currently processed collections in the list by one, Go to step S22.

There are two additional data structures in the DAG topology sorting algorithm. One is a list of pending collections, consisting of a collection of information items waiting to be processed and their corresponding indegrees. The initial degree is the number of parent collections of a collection of information items. Whenever one of its parent collections completes the mapping process, it reduces its indegree by 1. When its degree of entry is 0, it indicates that all its parent collections have been completed. Mapping processing. Another additional data structure is a mapping relationship table, which records a set of mapping information items corresponding to each source information item set in the result information item collection directory as result information of the processing procedure.

In the above step S20, the root information item set is added to the to-be-processed collection list, because the root information item set is the only information item set in the source information item collection directory that has no parent set, and the initial The degree of entry is o, and all sets of information items in the source information item collection directory can be accessed in turn from the root information item set.

The above is the step S18 shown in Fig. 2 by the DAG topology sorting algorithm shown in Fig. 3. As described above, in addition to the DAG topological sorting algorithm, the directed graph depth-first traversal algorithm can be used to complete step S18 shown in FIG.

The process of using the directed graph depth-first traversal algorithm can be described by the following recursive function: void MapSet (NODE_ TYPE* pSetNode, NODE_ TYPE* pParent, NODE_ TYPE* pDest, MAPLIST_ TYPE* pMapList) do-map ( pSetNode, pParent, pDest, pMapList);

NODE— TYPE** pChilds = get— childs (pSetNode) ;

For (int i=0; i<count_of (pChilds); ++i)

MapSet (pChilds [i] , pSetNode, pDest, pMapList)

Void Process ()

MapSet (pSrcRoot, NULL, pDest, pMapList);

The function of the MapSet function is defined as mapping all the information item collection directory nodes in the directory starting from pSetNode to the result information item collection directory pDest in depth priority order, mapping correspondence records to pMapList, pParent representing the parent set of pSetNode node. In the upper layer processing, Process only needs to call MapSet with the root information item collection node pSrcRoot of the source information item collection directory as the starting node, and all the information item collections in the entire source information item collection directory can be completed by recursion. Mapping processing. The function of the do-map function is to map a single set of information items to the result information item collection directory and record the mapping relationship. The get-chi lds function is a list of all the child nodes of the collection node of the information item.

The following assumes that the directory shown in Fig. 4 which is the current source information item collection directory and the directory aggregation shown in Fig. 5 which is the current result information item collection directory described by the DAG are used, and the mapping processing in the aggregation uses the same name mapping rule.

If the DAG topology sorting algorithm shown in Figure 3 is used, then some actual order of processing the information item set in the source information item collection directory is as follows:

"Music", "Music Style", "Author", "Region", "Yature"..., "Male Author", "Female Author", "Group Combination", ..., "West Life", ..., " Atomic Kitten

,, , etc. ₀

The processing order of "West Life" and "Atomic Kitten" is after their parent collection node, then after creating their corresponding mapping information item collection in the result information item collection directory, only need to be established in the result information item collection directory. The parent-child relationship between their mapping information item collection and the mapping information item collection of all parent collections is sufficient.

Corresponding to the DAG topological sorting algorithm described above, some practical order processed by the directed graph depth-first traversal algorithm is as follows:

"Music", "Music Style", "Rock", "Classical", ..., "Author", "Male Author", "West Life", "Female Author", "Atom Kitten", "Group Combination", ... …Wait

"West Life" was processed before the "group combination". At that time, there was no mapping information item set corresponding to "group combination" in the result information item collection directory, so only the "West Life" mapping information item set and "male" could be established. The author's mapping of the parent-child relationship between the collection of information items, the "combination of the group" is recursed to the "West Life" sub-node again, if the mapping information item set of "West Life" is found to be created and already exists in the result In the information item collection directory (can be found from the mapping relationship table), the parent-child relationship between the mapping information item set of "West Life" and the mapping information item set of "community combination" is added to the result information item collection directory. .

6 is a DAG description of the same result information item collection directory in which the above two algorithms are sequentially obtained, and it can be seen that no matter what processing order is adopted, only the designation of the newly created mapping information item set and its parent set in the result information item collection directory is satisfied. In principle, the resulting information item collection directory is the same.

For the root information item collection in the non-first priority source information item collection directory, on the one hand, it has no parent set. On the other hand, the location of its mapping information item set on the result information item collection directory affects the structure generated by the entire source information item collection directory mapping to the result information item collection directory, so it is generally adjusted in the implementation. The structure of the first priority source information item collection directory and the predetermined mapping rule to ensure that the mapping information item set of the root information item set of the non-first priority source information item collection directory appears in an exact sum on the final result information item directory Its connotation of the same position. A specific example of an aggregation method for a collection of information items

Figure 7 depicts the structure of a collection of movie information items from a website named Y, and Figure 8 depicts the directory structure of a collection of news items from the X website. It is now assumed that the information item set catalogs described in Figures 1, 7, and 8 need to be aggregated into a unified result information item collection catalog. First, design the approximate final result information item collection directory structure. Considering that the directories in Figure 1, Figure 7, and Figure 8 have sub-directories divided by region, the directory structure of the regional division can be collected from each source information item. The catalogue is extracted as a unified sub-directory structure. Similarly, Figure 1 and Figure 7 have a directory structure divided by author. In the same way, a unified author sub-directory structure is constructed to obtain the final result information project. The collection directory, the structure is roughly as shown in Figure 9. This structure removes the "news category", "music style", and "movie style" collection nodes in the source information project collection directory structure to make the overall structure more streamlined, facilitate information retrieval, and enable the "regional" subdirectory structure. At the same time, it adapts to the classification of news and movies and music.

To implement the directory structure of FIG. 9, it is necessary to design a corresponding first priority source information item set directory and a predetermined mapping rule. From the entire directory aggregation process, the final result information item collection directory structure contains the structure of the entire first priority source information item collection directory, so the worst case is to design the final result information item collection directory structure as the first priority source. The structure of the information item collection directory, but in practice it is impossible to achieve. In practice, it is often the case that information directories from multiple websites are aggregated onto a personal computer terminal, and users on the personal computer have limited knowledge of directories from a particular website. For example, for the directory described in FIG. 7, as a personal computer user, it is impossible to know the existence of the information item set "Li Lianjie", and it is only possible to know the part that can be derived based on common sense. It is also unnecessary to use the final result information item collection directory structure of the design as the structure of the first priority source information item collection directory, because only the main part of the design final result information item collection directory structure is required as the first priority. The source information item collection directory can obtain the designed result information item collection directory structure through the aggregation process, so the first priority source information item collection directory is the result of the design information. The item collection directory structure can obtain the final directory structure through the aggregation process. The smallest part, the first priority source information item designed according to such principle The directory structure of the directory is shown in Figure 10.

A predetermined mapping rule is then determined, the predetermined mapping rule being a final result information item collection directory of the design in cooperation with the first priority source information item collection directory. Based on the final result information collection project catalog structure and the first priority source information project collection catalog, the following mapping rules can be designed:

1. The same name mapping rule: The information item set on the source information item collection directory is mapped to the information item item list having the same name on the item collection directory;

2. "X Net News" maps to "News";

3. "Y Net Movie" maps to "Movie";

4. "News category" maps to "news";

5, "movie style" is mapped to "movie";

6, "music style" maps to "music";

7. "China" maps to "domestic".

According to the above steps, the order of the source information item collection directory to be processed and the predetermined mapping rule are determined, and the aggregation operation can be performed by the method described in the previous embodiment to obtain the final result information item collection directory as shown in FIG. . Embodiment of an aggregation system of an information item collection directory

Based on the above-described embodiment of the aggregation method of the information item collection directory, the present invention accordingly proposes an aggregation system of the information item collection directory, and Fig. 12 shows an embodiment of the aggregation system of such information item collection directory. Referring to Fig. 12, the following is a detailed description of the aggregation system of the information item collection directory of the present embodiment.

The aggregation system of the information item collection directory of this embodiment includes the following modules connected in sequence: an input module 10, an initialization module 12, a priority order determination module 14, a directory aggregation module 16, and an output module 18 input module 10 for inputting a source information item collection directory, where „ is an integer greater than or equal to 1. The initialization module 12 sets an initial result information item collection directory, and this initial value is actually an empty directory, that is, an information item set composed of 0 information item sets. table of Contents.

The priority order determination module 14 determines the order of processing between the "source information item collection catalogs." The priority processing order of the source information item collection directory has an impact on the final result, and the final directory structure will be preferentially organized according to the structure of the source information item collection directory in the processing order. For the following application scenario, the user has established a directory structure on his local computer, and uses this method to connect multiple information sources on the network. The directory is aggregated to the local computer. If the local directory is used as the first priority processing source directory, the final aggregated directory structure will be preferentially organized according to the local directory structure. In this way, users can easily retrieve information items from multiple sources using their familiar directory structure.

The directory aggregation module 16 processes the "source information item collection directory" one by one according to the priority order determined by the priority order determining module 14, and aggregates the current source information item collection directory and the current result information item collection directory into a new result information item collection directory. The directory aggregation module 16 stores the "source information item collection directory" in order to reflect the sequential processing, which is stored in the queue data structure. It will be readily understood by those skilled in the art that the data structure using the queue is only for the purpose of sequentially processing the source information item collection directory. Therefore, other data structures and algorithms that satisfy the sequential processing source information item collection directory are equivalent to the embodiment. .

The directory aggregation module 16 is further subdivided into a first aggregation unit 160 and a second aggregation unit 162. The current source information item collection directory entering the directory aggregation module 16 can be processed by one of the two units. If the current source information item collection directory is an empty directory (that is, composed of 0 information item sets), the first aggregation unit 160 records the execution of the empty processing, that is, it is treated as the processed source information item set without any operation. table of Contents. If the current source information item collection directory is composed of information items greater than or equal to one, processing is performed by the second aggregation unit 162: starting from the root information item set, the source information item collection in the current source information item collection directory is one by one Mapping to the current result information item collection directory, wherein the root information item set is defined as an information item set without a parent set in an information item collection directory, and an information item collection directory composed of a set of information items greater than or equal to one There must be a unique set of root information items.

For the second aggregating unit 162, it further includes: an information item set mapping unit 1620, an information item set creating unit 1622, a parent set specifying unit 1624, and a direct membership specifying unit 1626. The connection relationship between these units is: Parent Set Designation Unit 1624 Connection Information Item Set Creation Unit 1622, Direct Membership Designation Unit 1626 connects the information item set mapping unit 1620 and the information item set creation unit 1622, respectively.

The internal processing of the second aggregating unit 162 is: entering the source information item set in the current source information item set directory of the second aggregating unit 162, which can be divided into two categories, one of which is a current result information item set according to a predetermined mapping rule. The information item set found in the directory corresponds to the currently processed source information item set, and the other type is that the information item set corresponding to the current result information item set directory is not found according to the predetermined mapping rule. For the source information item set of the former category, the information item set mapping unit 1620 finds the current processed source information in the current result information item set directory according to the predetermined mapping rule. The information item set corresponding to the item set is the set of mapping information items corresponding to the currently processed source information item set in the current result information item set directory. The direct membership item in the source information item set is then designated by the direct membership item designation unit 1626 as a direct membership item of the mapping information item set in the result information item collection directory. In this embodiment, the information item is divided into a direct affiliate project and an indirect affiliate project, wherein the direct affiliate project refers to an information item directly assigned to the information item set, and the indirect affiliate project refers to the information item included in the subset of the information item set. In an information item collection directory, an information item may be a direct affiliate item of more than two information item sets.

For the source information item set of the latter type, the information item set creating unit 1622 creates a new information item set in the current result information item set directory as the map information item set corresponding to the source information item set. After the information item collection creating unit 1622 is run, the parent collection specifying unit 1624 is started. The parent collection specifying unit 1624 specifies a parent collection of the new mapping information item set for the source information item set in which the parent collection exists in the source information item collection directory: the parent information collection of the source information item collection in the source information item collection directory is mapped to Result The set of mapping information items in the information item collection directory. The direct membership item in the source information item set is then designated as a direct membership item of the mapping information item set in the result information item collection directory by the direct membership item specifying unit 1626.

The predetermined mapping rule described in this embodiment is the same as the predetermined mapping rule in the foregoing method embodiment. For details, refer to the description of the predetermined mapping rule in the foregoing method embodiment.

The processing sequence of the source information item set in the second aggregating unit 162 is to start the source information item set in the current source information item set directory in the order of the topological sorting algorithm of the directed acyclic graph starting from the root information item set. Map to the current result information item collection directory one by one. Of course, in addition to the topological sorting algorithm for acyclic graphs, a directed graph depth-first traversal algorithm can also be employed. The topological sorting algorithm and the directed graph depth-first traversal algorithm of the directed acyclic graph in the present embodiment have been described in the foregoing method embodiments, and therefore will not be described herein. The above embodiments are provided to enable a person skilled in the art to implement or use the present invention, and those skilled in the art can make various modifications or changes to the above embodiments without departing from the inventive concept. The scope of protection of the invention is not limited by the embodiments described above, but should be the maximum range of the innovative features mentioned in the claims.

Claims

Rights request

1. An aggregation method of an information item collection directory, which aggregates the "source information item collection directory" into a result information item collection directory, "is an integer greater than or equal to 1, the aggregation method includes:

The information item set is a set consisting of ^ information items, wherein ^ is an integer greater than or equal to 0, and the information item collection directory is composed of " _two information item sets, wherein " ₂ is an integer greater than or equal to 0"

The aggregation method of an information item collection directory according to claim 1, wherein in the aggregation processing, the processing of the current source information item collection directory composed of 0 information item sets is execution empty processing.

The aggregation method of the information item collection directory according to claim 1, wherein in the aggregation processing of the step (3), the current source information item set composed of the information item set greater than or equal to one The processing of the directory is: starting from the root information item collection, mapping the source information item collection in the current source information item collection directory one by one to the current result information item collection directory, wherein the root information item set is defined as an information item collection directory There is no collection of information items of the parent set. For a list of information items set consisting of a set of information items greater than or equal to one, there must be a unique set of root information items.

The aggregation method of the information item collection directory according to claim 3, wherein the step of mapping the source information item set in the current source information item collection directory to the current result information item collection directory further comprises:

Find a set of information items in the current result information item collection directory according to a predetermined mapping rule The currently processed source information item set corresponds to the currently processed source information item set in the current result information item set directory corresponding mapping information item set, if there is no corresponding in the current result information item set directory according to the mapping rule The information item collection, in the current result information item collection directory, creates a new information item set as a mapping information item set corresponding to the source information item set, wherein if the source information item set has a parent set in the source information item collection directory Then, the parent set of the new mapping information item set is specified as the source information item set in the source information item collection directory, and the parent set is mapped to the corresponding information item item set in the result information item set directory.

5. The aggregation method of an information item collection directory according to claim 4, wherein the direct membership item in the source information item set is a direct membership item of the mapping item set in the result information item collection directory, wherein The direct affiliate project refers to the information item directly assigned to the information item set, and the indirect belonging item refers to the item included in the subset of the information item set. In one information item set catalogue, one information item may be two or more information items. The direct membership of the collection.

The aggregation method of the information item collection directory according to claim 5, wherein the predetermined mapping rule refers to mapping the source information item set to a similar content information item set in the result information item collection directory.

7. The aggregation method of an information item collection directory according to claim 3, wherein, from the root information item set, the source in the current source information item collection directory is in the order of the directed acyclic graph topological sorting algorithm. The information item collection is mapped one by one to the current result information item collection directory.

The aggregation method of the information item collection directory according to any one of claims 1 to 7, wherein the step (3) further comprises:

Organize the "source information project collection directory into a queued data structure;

Determine whether the queue is empty. If the queue is empty, the processing ends. If the queue is not empty, the first element of the queue is taken as the source information item collection directory to be processed, and the team operation is performed on the first element of the team;

The currently processed source information item collection directory and the current result information item collection directory are aggregated into a new result information item collection directory as the current result information item collection directory of the next loop, and the previous step is returned.

9. An aggregation system for a collection of information items, comprising:

a priority order determining module, connecting the initialization module, and determining a processing priority order between the collection lists of the source information items;

a directory aggregation module, a connection priority order determining module, processing the "source information item collection directory" one by one according to the priority order, and aggregating the current source information item collection directory and the current result information item collection directory into a new result information item collection directory;

The aggregation system of the information item collection directory according to claim 9, wherein the directory aggregation module is provided with a first aggregation unit, and the first aggregation unit has a current source information item composed of 0 information item sets. The collection directory performs null processing.

The aggregation system of the information item collection directory according to claim 9, wherein the directory aggregation module is provided with a second aggregation unit, and the second aggregation unit is composed of a set of information items greater than or equal to one. The processing of the current source information item collection directory is: starting from the root information item collection, mapping the source information item collection in the current source information item collection directory one by one to the current result information item collection directory, wherein the root information item set is defined as one information A collection of information items without a parent collection in the project collection directory. For a collection of information items consisting of a collection of information items greater than or equal to one, there must be a unique collection of root information items.

The aggregation system of the information item collection directory according to claim 11, wherein the second aggregation unit further comprises:

The information item collection mapping unit searches for the current result information item collection directory according to a predetermined mapping rule. Corresponding to the set of information items corresponding to the currently processed source information item set, the set of mapping information items corresponding to the currently processed source information item set in the current result information item set directory;

The information item collection creating unit creates a new information item set in the current result information item collection directory as corresponding to the source information item set for the information item set corresponding to the current result information item collection directory according to the predetermined mapping rule. Collection of mapping information items;

The parent collection specified unit, the connection information item collection creation unit, and the source information item collection in which the parent collection exists in the source information item collection directory, the parent collection of the new mapping information item collection is: Source information item collection in the source information item collection The parent collection in the catalog maps to the corresponding set of mapping information items in the result information project directory.

The aggregation system of the information item collection directory according to claim 12, wherein the second aggregation unit further comprises:

Directly belonging to the project specifying unit, respectively connecting the information item set mapping unit and the information item set creating unit, and designating the direct affiliated items in the source information item set as direct affiliated items of the mapping item set in the result information item set directory, wherein The direct affiliate project refers to the information item directly assigned to the information item set, and the indirect belonging item refers to the item included in the subset of the information item set. In one information item set catalogue, one information item may be two or more information items. The direct membership of the collection.

The aggregation system of the information item collection directory according to claim 13, wherein the predetermined mapping rule refers to mapping the source information item set to the information item set of the similar connotation in the result information item collection directory.

The aggregation system of the information item collection directory according to claim 11, wherein the second aggregation unit starts the current source information item in the order of the topological sorting algorithm of the directed acyclic graph, starting from the root information item set. The collection of source information items in the collection directory is mapped one by one to the current result information item collection directory.

The aggregation system of the information item collection directory according to claim 9, wherein the directory aggregation module stores the "source information item collection directory" in a queue data structure.