WO2016001991A1 - 検索方法 - Google Patents
検索方法 Download PDFInfo
- Publication number
- WO2016001991A1 WO2016001991A1 PCT/JP2014/067438 JP2014067438W WO2016001991A1 WO 2016001991 A1 WO2016001991 A1 WO 2016001991A1 JP 2014067438 W JP2014067438 W JP 2014067438W WO 2016001991 A1 WO2016001991 A1 WO 2016001991A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gram
- appearance
- search
- character
- character string
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000004044 response Effects 0.000 description 22
- 238000012545 processing Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 11
- 238000012360 testing method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
Definitions
- the present invention relates to a search method.
- Acceleration of database search is important in order to acquire necessary data in a database in a short time. Further, for example, analysis using big data existing on the Internet is required to be able to search a very large amount of data at high speed.
- Patent Document 1 creates an index (descriptor table) of appearance character positions for all three characters such as “abc” and “bcd” from the search target text.
- index descriptor table
- a technique for searching for an adjacent character position of the index “abc” and the character position of the index “bcd” is disclosed (paragraphs 0002 to 0009). 2 and 4).
- Such a technique is known as an n-gram method (a 3-gram method in the case of three characters).
- Patent Document 1 If the technology disclosed in Patent Document 1 is used, the entire text can be searched faster than searching from the beginning one character at a time. However, no technology related to speeding up search across a plurality of indexes is disclosed.
- an object of the present invention is to speed up a search in a search across a plurality of indexes, that is, a search using a plurality of decomposed search keys.
- a representative search method is a computer character string search method for searching a document using a specified character string, wherein the specified character string is a first character or character string and a second character or character.
- the present invention it is possible to speed up the search in a search across a plurality of indexes, that is, a search using a plurality of decomposed search keys.
- a computer is a general computer having a CPU (processor) and a memory, and the CPU executes processing using data stored in the memory in accordance with a program stored in the memory. For this reason, in the description below that uses the computer as the subject, the subject may be replaced with a CPU.
- the configuration of the computer is not limited to the above, but an example of a specific configuration will be described later with reference to FIG.
- the computer can also be regarded as a search device for executing the search.
- FIG. 1 is a diagram showing an example of n-gram text search using an n-gram appearance list.
- the computer creates an n-gram node 11, an n-gram appearance list 12, 13 and n-gram indexes 14 to 17 in advance before starting the search.
- the n-gram node 11 is a table that associates n characters with a pointer (pt).
- n characters are 2 characters, and all 2 characters included in the search target text are listed.
- N-gram indexes 14 and 16 are tables for enumerating character positions where “FO” appears in the search target text.
- the position of the character used as a reference in “FO” may be, for example, the position “F” or the position “O”.
- the reference character position in “OT” is “O”.
- N-gram indexes 14, 16 include ID.
- IDs may be assigned to such items and paragraphs, and the text An ID may be given to a group of multiple lines.
- the character position is a relative position from the beginning of the text specified by the ID.
- n-gram indexes 15 and 17 are tables for enumerating character positions where “OT” appears in the search target text, and the configuration of the tables is the same as that of the n-gram indexes 14 and 16.
- the n-gram appearance list 12 is a table indicating whether “FO” exists in the ID range.
- Each pointer (pt) becomes a pointer to the n-gram index 14, 16 when “FO” exists in the corresponding ID range, and when “FO” does not exist in the corresponding ID range, Become.
- the predetermined value in the case where “FO” does not exist is a value that does not become a pointer to any n-gram index, and may be null or zero, for example, and is represented by diagonal lines in FIG.
- pointers corresponding to ID ranges 1 to 100 and 201 to 300 in the n-gram appearance list 12 are n.
- Pointer to -gram index 14 and n-gram index 16. “FO” does not exist in the ID range 101-200.
- the n-gram appearance list 13 is a table indicating whether or not “OT” exists in the ID range, and the configuration of the table is the same as the n-gram appearance list 13.
- each value of the ID range of the n-gram appearance list 12 and each value of the ID range of the n-gram appearance list 13 are all the same.
- 100 is a unit, which is 1 to 100, 101 to 200, 201 to 300, and the like. Since each value of the ID range is the same in different n-gram appearance lists, either “FO” or “OT” does not exist in both the ID ranges 101 to 200 and 201 to 300, and “FOOT” Can be determined not to exist.
- the computer sets each value of the ID range of all the n-gram appearance lists including the n-gram appearance lists such as “AB” (not shown) and the n-gram appearance lists 12 and 13 to be the same.
- the n-gram index is configured so that the pointer from the n-gram appearance list to the n-gram index corresponds to the ID value of the n-gram index.
- the n-gram index so that the pointer corresponding to the ID range 1 to 100 in the n-gram appearance list 12 includes 52, 73, and 89 whose ID values are 1 to 100. Make 14 a group.
- the table is described as a table.
- the data structure is not limited to the table, and any data structure other than the table may be used as long as it can be managed in the same way.
- the structure that appears to be corresponding or grouped in FIG. 1 is not limited to a structure in which memory addresses are adjacent to each other.
- the computer searches the n-gram, here, 2-gram, and decomposes it into “FO” and “OT” as the search key 19. Then, the computer searches each of “FO” and “OT” with n characters of the n-gram node 11. When each of “FO” and “OT” is found, the computer follows the pointers corresponding to “FO” and “OT”, and acquires the n-gram appearance list 12 and the n-gram appearance list 13.
- the calculator follows these pointers in both the n-gram appearance list 12 and the n-gram appearance list 13 to indicate that the pointers corresponding to ID ranges 1 to 100 are “FO” and “OT”.
- the computer collates IDs whose values match in both the n-gram index 14 and the n-gram index 15, and collates the adjacent character positions of the IDs found in the collation.
- the calculator finds that the ID value is 52 and matches, and the character position value is 37 and 39 and finds the adjacent character position.
- the character position of “F” is 37
- the character position of “O” of “FO” is 38
- the character position of “O” of “OT” is 39. Since the ID position value 52 and the matching character position values 122 and 125 are not adjacent to the character position, the computer advances the collation target next.
- the calculator finds that the ID value matches 73. Since the character position values 1 and 26 are not adjacent to the character position, the computer advances the object of collation to the next character position value of the n-gram index 14 having the smaller character position value, and the character position value is 24. And 26 find the adjacent character position. Since the remaining value of the character position whose ID value is 73 is not adjacent to the character position, the computer advances the collation target next. Since the ID values 89 and 95 do not match and there is no “OT” in the ID value 89, the computer continues until the ID value becomes 95 or higher in the n-gram index 14 where the ID value is small. Proceed with the ID to be matched.
- next ID value of the n-gram index 14 (not shown) is found to match at 95, the process proceeds to the collation of the character position value. If the next ID value is greater than 95, the ID value The collation target is advanced to the next ID in the n-gram index 15 until the above is reached. In this way, the computer sets all the IDs of both the n-gram index 14 and the n-gram index 15 as collation targets.
- the processing using two n-gram appearance lists has been described, when the search word 18 is 5 characters or more and the search key 19 is 3 or more, the same number n as the number of the search keys 19 Only when the pointers corresponding to the same ID range are present in all of the -gram appearance lists, the n-gram index may be set as a collation target.
- the calculator can find the adjacent character positions of “FO” and “OT” and perform text search for “FOOT”, and n-gram indexes 16, 17 based on the n-gram appearance lists 12 and 13 Can be reduced and the search processing including matching can be reduced, so that text search can be speeded up.
- FIG. 2 is a diagram showing an example of hierarchizing the n-gram appearance list based on the ID range.
- the search word 18, the search key 19, the n-gram node 11, and the n-gram appearance lists 12 and 13 are the same as those described with reference to FIG.
- the pointer of the n-gram appearance list is not an n-gram index but an n-gram subgroup appearance list.
- the data structure of the n-gram subgroup appearance list 21 is the same as that described using the n-gram appearance list 12, and is a table indicating whether “FO” exists in the ID range. -The value of ID range is different from the gram appearance list 12.
- the data structure of the n-gram subgroup appearance list 22 is the same as that described using the n-gram appearance list 13, and is a table indicating whether “OT” exists in the ID range. -The value of ID range is different from the gram appearance list 13.
- Each value in the ID range of the n-gram subgroup appearance list 21 and each value in the ID range of the n-gram subgroup appearance list 22 are all the same. In the example shown in FIG. 2, the ID range is 10 units, 1 to 10, 11 to 20, 21 to 30. . . 91-100.
- the n-gram index not shown in FIG. 2 includes the same information as described with reference to FIG. Since both the n-gram index 14 and the n-gram index 15 include the ID value 52, both the n-gram subgroup appearance list 21 and the n-gram subgroup appearance list 22 have ID range values 51 to 51 Pointers corresponding to 60 indicate the presence of “FO” and “OT”. On the other hand, “OT” does not exist when the ID range is 81 to 90, and the pointer corresponding to the ID range 81 to 90 in the n-gram subgroup appearance list 22 is a predetermined value when “OT” does not exist. It is.
- the computer indicates that the pointers corresponding to ID ranges 1 to 100 in both the n-gram appearance list 12 and the n-gram appearance list 13 indicate “FO” and “OT”.
- the pointer follows the pointer.
- an n-gram subgroup appearance list 21 and an n-gram subgroup appearance list 22 are acquired.
- the pointer corresponding to the ID range 1 to 10 indicates that neither “FO” nor “OT” exists.
- the verification target is advanced.
- the pointer corresponding to the ID range 51 to 60 indicates the presence of both “FO” and “OT”, so the computer follows the pointer and the n-gram index 14 Check n-gram index 15.
- the calculator sets the ID ranges of both the n-gram subgroup appearance list 21 and the n-gram subgroup appearance list 22 Advances the collation target to 61-70.
- the pointer corresponding to the ID range 81 to 90 in the n-gram subgroup appearance list 21 indicates the presence of “FO”, but the n-gram subgroup appearance list 22 Since the pointer corresponding to the ID range 81 to 90 indicates that “OT” does not exist, the computer advances the collation target to the next without following the pointer.
- the pointers corresponding to the ID ranges 101 to 200 and 201 to 300 in the n-gram appearance list 12 and the n-gram appearance list 13 indicate that either “FO” or “OT” does not exist.
- the calculator does not follow these pointers. For this reason, the computer does not acquire both the n-gram subgroup appearance list 23 and the n-gram subgroup appearance list 24, and does not process it.
- the n-gram appearance list and the n-gram subgroup appearance list have two hierarchies.
- the ID range may be grouped into more hierarchies.
- the computer does not target 81 to 90 including the 89 ID value of the n-gram index 14 based on the n-gram subgroup appearance lists 21 and 22 and can further reduce the search processing. This makes it possible to speed up text search.
- FIG. 3 is a diagram showing an example of scanning text instead of the n-gram index.
- reading such as search target text is managed in units of pages, and the CPU reads a specific page by specifying a page number.
- the capacity of one page is, for example, a capacity that can be read by one read, and is, for example, 8 KB (kilobytes).
- a computer OS operating system
- a file system collectively manages a predetermined capacity called a cluster size as a read unit
- the page capacity may be a cluster size.
- an application program such as a database manages a predetermined capacity collectively as a read unit
- the page capacity may be a capacity managed by the application program.
- the search word 18, the search key 19, and the n-gram node 11 shown in FIG. 3 are the same as those described with reference to FIG.
- the n-gram appearance list 31 includes page numbers and information on the presence / absence of “FO” on each page.
- the n-gram appearance list 32 includes page numbers and information on the presence / absence of “OT” on each page.
- the page numbers in the n-gram appearance list 31 correspond to the page numbers in the n-gram appearance list 32, and the capacity of the same page number is the same.
- the ID range is illustrated to explain the correspondence between the page numbers, and is not necessary as the contents of the n-gram appearance lists 31 and 32.
- Page 33 is a scan target that is read by designating page number 1, and includes scan targets with IDs of 01 to 15.
- the questionnaire answer is the text to be searched, and page 33 includes the ID, gender, age and questionnaire answer. Since the answer to the questionnaire with ID 03 includes both “FO” and “OT”, whether or not the n-gram appearance list 31 appears and the n-gram appearance list 32 appears on page number 1 including ID 03 Both are available.
- the page 34 is a scan target that is read by designating the page number 2 and has the same structure as the page 33. The answer to the questionnaire on page 34 does not include “OT”, and the answer to the questionnaire whose ID is 16 includes “FO”. The presence / absence of Listing 32 is none.
- Page 35 and page 36 are also scan targets to be read by designating page number 3 and page number 40, respectively, and have the same structure as page 33. Since the answer to the questionnaire on page 35 does not include “FO” and the answer to the questionnaire with ID 33 includes “OT”, the n-gram appearance list 31 does not appear on page 3 and the n-gram appears. The presence or absence of the appearance of the list 32 becomes yes. Since the answer to the questionnaire with ID 79 includes both “OT” and “FO”, whether or not the n-gram appearance list 31 appears and whether or not the n-gram appearance list 32 appears in the page number 40 including the ID 79 Both are available.
- the computer acquires the n-gram appearance list 31 and the n-gram appearance list 32 by the same process as described with reference to FIG. Since page number 1 has both occurrences of n-gram appearance list 31 and n-gram appearance list 32, the computer designates page number 1 and leads page 33 to answer the questionnaire. to scan. In order to scan, the computer obtains the ID, gender, and age of the ID of page 33 that has been read, and scans with the search term 18 “FOOT” in the questionnaire response text. This scanning process may be a process of determining whether or not each character matches.
- the computer finishes scanning the questionnaire response with ID 01, it advances the scan target to ID 02.
- the scan target further advances to ID 03, the computer finds “FOOT” in “FOOTBALL”.
- the computer determines whether page number 2 appears in both the n-gram appearance list 31 and the n-gram appearance list 32. In this example, since there is no appearance of the n-gram appearance list 32, the computer proceeds to the determination of the appearance of the page number 3 without setting the page 34 as a scan target. Since there is no appearance of the n-gram appearance list 31 at the page number 3, the computer proceeds to the determination of the appearance of the page number 4 without setting the page 35 as a scan target.
- the calculator does not scan pages 34 and 35 based on the n-gram appearance lists 31 and 32, and can reduce the search process, so it scans text instead of n-gram index. Can also speed up text search. Further, since the presence / absence of the occurrence of a character string that matches the search key is managed on the page that is the unit of read, the computer does not perform a read that does not include the character string that matches the individual search key.
- FIG. 4 is a diagram showing an example in which questionnaire responses are separated.
- the search word 18, the search key 19, and the n-gram node 11 are the same as those described with reference to FIG.
- the n-gram appearance list 41 has the same structure as the n-gram appearance list 31 described with reference to FIG. 3 and includes a page number and information on the presence / absence of “FO” on each page, but corresponds to one page. The ID range is different.
- the n-gram appearance list 42 also has the same structure as the n-gram appearance list 32, and includes the page number and information about the presence / absence of “OT” on each page, but the ID range corresponding to one page is different. .
- Page 43 is information that is read by designating page number 1 and includes information with IDs 01-20.
- Page 43 includes an ID, gender, age, and a pointer (pt) to the questionnaire response.
- This pointer may include the page number in which the questionnaire response is included and the character position in the page of the first character of each questionnaire response text.
- the correspondence between IDs and questionnaire responses is the same as that of pages 33 and 34 shown in FIG.
- each page can include the same amount of ID range information.
- the ID range of page number 1 is 15 IDs of 1 to 15, and the ID range of page number 2 is 6 IDs of 16 to 21.
- the ID range of page number 1 and the ID range of page number 2 are 20 IDs.
- Page 44 is a scan target that is read by designating page number 5, and includes the text of the questionnaire response. Since the page 44 does not include anything other than questionnaire responses, the page 44 can include more questionnaire responses than the page 33 even if the page 44 has the same capacity. Pages 45 to 48 have the same structure as pages 43 and 44. In the example shown in FIG. 4, the questionnaire response corresponding to the ID of page 43 is present on page 44, but the answer is not limited to such a state. Depending on the page capacity and text capacity, page 43 In some cases, part of the questionnaire response corresponding to the ID is divided into page 44 and page 46. In addition, a questionnaire response corresponding to a part of the IDs of the page 45 and a questionnaire response corresponding to a part of the IDs of the page 47 may be collected on one page 46.
- the computer reads the page 43, follows the pointer in the page 43, reads the page 44, and sets the questionnaire response as the scan target.
- FIG. 5 is a diagram showing an example in which an n-gram node has a tree structure.
- the search word 18, the search key 19, and the n-gram appearance lists 12 and 13 are the same as those described with reference to FIG.
- the example of the n-gram node shown in FIG. 5 is a tree structure, and the first layer n-gram node 51 serving as the root (root), the second layer n-gram node 52, and the third layer serving as the leaf (leaf). It consists of n-gram nodes 53 in the layer.
- the n-gram node 53 has the same contents as the n-gram node 11 and includes n characters (2 characters) and a pointer to an n-gram index corresponding to the n characters.
- the n-gram node 52 includes a pointer to the n-gram node 53.
- the pointer in which the n characters of the n-gram node 52 correspond to “AE” is the pointer to which the n characters of the n-gram node 53 are “AB”, “AC”, and “AE”.
- the pointers shown in FIG. 5 are represented by arrows to “AB” on behalf of “AB”, “AC”, and “AE”.
- a pointer whose n-gram node 52 has n characters corresponding to “BR” is a pointer to n-gram node 53 whose n characters are “BE”, “BI”, and “BR”.
- the pointer whose n character corresponds to “OT” is a pointer where the n character of the n-gram node 53 is “FO”, “GO”, or “OT”.
- the pointer in which the n character of the n-gram node 51 corresponds to “BR” is a pointer to the n character including “AE” and “BR” of the n-gram node 52, and the n character of the n-gram node 51 is “
- the pointer corresponding to “SV” is a pointer to n characters including “OT” and “SV” of the n-gram node 52.
- the n character positioned between “BR” and “SV” exists ahead of the pointer corresponding to “SV”, and the n character positioned between “AE” and “BR” is “BR”.
- each of the “FO” and “OT” is searched from the n-gram node 51.
- the calculator compares the search key 19 “FO” with the n-letter “BR” and compares it with the n-letter “SV” to determine that “FO” is located between “BR” and “SV”. .
- the computer follows the pointer corresponding to “SV” and obtains n characters including “OT” and “SV” of the n-gram node 52.
- the calculator compares “FO” of the search key 19 with “OT” of n characters of the n-gram node 52, and “FO” exists in the direction of “BR” rather than “OT”, that is, “FO” is “ Judged to be located between “OT” and “BR”.
- the computer follows the pointer corresponding to “OT”, finds “FO” of the n-gram node 53, follows the pointer corresponding to “FO”, and obtains the n-gram appearance list 12.
- the subsequent processing is the same as the processing already described with reference to FIG. Similarly, for the search key 19 “OT”, the pointer corresponding to “SV” of the n-gram node 51 is traced, the pointer corresponding to “OT” of the n-gram node 52 is traced, and the n-gram node 53 Find “OT” and follow the pointer corresponding to “OT” to obtain the n-gram appearance list 13.
- the computer uses the n-gram node 11 shown in FIG. 1 to search for the search keys 19 one by one in the order of “AB”, “AC”, “AE”, “BE”, etc.
- the number of comparisons is less when the tree-structured n-gram node shown in FIG. 5 is used than when compared and searched, and n characters matching the search key 19 can be found in a short time. Since the number of comparisons increases when the number of levels in the tree structure is large, the number of comparisons decreases when the number of levels is uniform and small. For this reason, a so-called balanced tree structure is preferred.
- the computer can acquire the n-gram appearance list in a short time from the start of the search by the tree-structured n-gram node.
- the n-gram appearance list can be used even if the n-gram node has a tree structure, the speed-up effect described with reference to FIG. 1 and the like can be obtained.
- the tree structure of n-gram nodes in which the same n characters are arranged in each layer that is, the n-gram nodes 51, 52, and 53 are described as a tree structure including “BR”.
- FIG. 6 is a diagram showing an example of search using B-tree.
- the appearance list described above can be used not only for text search but also for search using B-tree.
- the search target table 61 includes values for the four items of ID, C1, C2, and C3, and is a table in which ID values that match the values of the respective items of C1, C2, and C3 are searched.
- the computer expands the C1 B-tree node, the C2 B-tree node, and the C3 B-tree node (not shown) from the search target table 61 before starting the search.
- the B-tree node has a general B-tree structure, description of the structure itself is omitted.
- Each value of the B-tree node shown in Fig. 6 corresponds to the value of C1 or C3.
- the pointer from the B-tree node to the B-tree appearance lists 62 and 63 is information for specifying the ID value, but a plurality of ID values correspond to one value of C1 in the search target table 61. .
- C1 corresponds to a value of 12.
- the B-tree appearance lists 62 and 63 include an ID range and a pointer (pt). That is, the B-tree appearance lists 62 and 63 have pointers to the existing B-tree indexes 64 to 67 when IDs within the ID range exist. Further, the B-tree appearance lists 62 and 63 have, as pointer values, predetermined values indicating that they do not exist when there is no ID within the ID range.
- each value of the ID range of the B-tree appearance list 62 and each value of the ID range of the B-tree appearance list 63 are the same.
- the B-tree indexes 64 and 66 list the ID values of the search target table 61 when the value of C1 is 12, and the B-tree index 64 has an ID value in the range of 1 to 10.
- the B-tree index 66 includes IDs in the range of 21-30. Here, there is no ID in the range of 11-20.
- the B-tree indexes 65 and 67 list the ID values of the search target table 61 when the value of C3 is 32.
- the B-tree index 65 has an ID value in the range of 1 to 10.
- the B-tree index 67 includes ID values in the range of 11-20.
- the computer expands the search target table 61 to B-tree nodes, creates B-tree indexes 64 to 67, creates B-tree appearance lists 62 and 63, and connects them with pointers.
- the computer converts the search condition 68 into a search key 69.
- the search condition 68 is an AND condition that the value of C1 is 12 and that the value of C3 is 32. Therefore, the search key 69 is converted into 12 which is the value of C1 and 32 which is the value of C3. Then, an ID whose C1 value matches the value 12 of the search key 69 is searched, and an ID whose C3 value matches the value 32 of the search key 69 is searched.
- the computer Since the value 12 of the search key 69 is 50 or less in the B-tree node of C1 and matches the value 12, the computer acquires the B-tree appearance list 62. Further, since the value 32 of the search key 69 is the value 52 or less in the B-tree node of C3 and is the value 35 or less and matches the value 32, the computer acquires the B-tree appearance list 63. As described above, each value in the ID range of the B-tree appearance list 62 and each value in the ID range of the B-tree appearance list 63 are the same. The computer determines that the ID exists in both the B-tree appearance list 62 and the B-tree appearance list 63 in the ID range 1 to 10, and follows the pointers to obtain the B-tree index 64 and the B-tree index 65, respectively. .
- the computer first compares the first ID 1 of the B-tree index 64 with the first ID 3 of the B-tree index 65, and these ID values do not match and the ID value of the B-tree index 64 is small. Then, the collation target of the B-tree index 64, that is, the comparison target is advanced. The computer compares the ID 3 of the B-tree index 64 with the ID 3 of the B-tree index 65, and since these ID values match, it finds 3 as the ID value satisfying the search condition 68.
- the computer advances the collation target and compares ID 8 of B-tree index 64 with ID 5 of B-tree index 65, and these ID values do not match, and B-tree index 64 and B-tree Since it is the end of the index 65, the collation target in the B-tree appearance list 62 and the B-tree appearance list 63 is advanced to the ID range 11-20.
- the computer determines from the pointers corresponding to the ID ranges 11 to 20 in the B-tree appearance list 62 that there is no ID in the ID ranges 11 to 20, and the pointers corresponding to the ID ranges 11 to 20 in the B-tree appearance list 63 It is determined that the search condition 68 is not satisfied without using the previous B-tree index 67 as a comparison target. Further, the computer advances the collation target in the B-tree appearance list 62 and the B-tree appearance list 63 to the ID ranges 21 to 30.
- the computer determines from the pointers corresponding to the ID ranges 21 to 30 in the B-tree appearance list 63 that there is no ID in the ID ranges 21 to 30, and the pointers corresponding to the ID ranges 21 to 30 in the B-tree appearance list 62 It is determined that the search condition 68 is not satisfied without using the previous B-tree index 66 as a comparison target.
- the computer does not target B-tree indexes 66 and 67 based on the B-tree appearance list, and the search processing can be reduced. Speed can be increased.
- FIG. 7 is a diagram showing an example of a file system.
- Some computer OSs or file systems manage directories and files using i-nodes.
- the directory is also a file and is managed as a data block in the data area.
- a plurality of i-nodes are listed to form an i-node list.
- the i-node 76 includes a pointer to the root directory 710, and each directory and each file in the root directory 710 includes a pointer to the i-node.
- the directory name home of the root directory 710 has a pointer to the i node 77, and the i node 77 has a pointer to the / home directory 711.
- the directory name src of the / home directory 711 has a pointer to the i node 78, and the i node 78 has a pointer to the / home / src directory 712.
- the file name test.c in the / home / src directory 712 has a pointer to the i-node 79, and the i-node 79 has a pointer to the test.c file 713. In this way, the test.c file 713 can be reached from the root directory 710 and the test.c file 713 can be accessed.
- the i-node list is grouped, and each grouped i-node list is managed by a list number. Similar to the explanation using FIG. 3, the unit of the group in the i-node list is preferably a unit of one read. Then, the presence / absence of a search key is managed for each list number in the appearance list. Since a plurality of appearance lists have the same list number, the same grouped i-node list is managed.
- the search key “main” exists in the data block ahead of the pointer included in the two i-node lists, i-node list 73 managed by list number 1 and i-node list 74 managed by list number 2. Therefore, the column of presence / absence of the list numbers 1 and 2 in the appearance list 71 becomes null.
- the appearance of the list number 3 in the appearance list 71 The presence / absence column is “Yes”. The same applies to the appearance list 72 of “src”.
- the computer When searching for a data block such as a directory or a file including both “main” and “src” search keys, the computer first displays the presence / absence column of the list number 1 in the appearance list 71 and the list number 1 in the appearance list 72. It is determined that both the presence / absence column is absent. That is, since the computer can determine that there are neither “main” nor “src” ahead of the pointer included in the i-node list 73 of list number 1, the i-node list 73 is not targeted for search, and list number 2 Proceed to The computer determines that the presence / absence column of list number 2 in the appearance list 72 is present, but the presence / absence column of list number 2 in the appearance list 71 is absent. For this reason, the computer does not set the i-node list 74 as a search target and proceeds to the list 3.
- a data block such as a directory or a file including both “main” and “src” search keys
- the computer determines that the appearance presence / absence column of list number 3 in both the appearance list 71 and the appearance list 72 is present. At this time, it is unclear which i-node pointer of the i-node list 75 managed by the list number 3 has “main” and “src”. Therefore, the computer scans which i-node pointer in the i-node list 75 contains “main” and “src”. The computer obtains the / home / src directory 712 that is the pointer of the i-node 78 and scans “main” and “src”. Further, the computer obtains the test.c file 713 ahead of the pointer of the i-node 79 and scans “main” and “src”. In this example, the computer finds “main” and “open” in the test.c file 713.
- the i-node lists 73 and 74, the root directory 710, and the / home directory 711 are not scanned based on the appearance lists 71 and 72, and the search processing can be reduced. Can search for files faster.
- FIG. 8 is a diagram showing an example of the configuration of a computer.
- the computer has two CPUs, CPU-A81 and CPU-B82, and the two CPUs can execute processing independently.
- a dedicated cache A83 is connected to CPU-A81
- a dedicated cache B84 is connected to CPU-B82.
- the computer has a cache C85 shared by CPU-A81 and CPU-B82, and has a memory 86 and a storage device 87 such as an HDD (hard disk drive) or an SSD (solid state drive). These have different access performance and storage capacity, and the access performance of cache A83 and cache B84 is the highest and the storage capacity is the lowest.
- Cache C85 has the second highest access performance and the second smallest storage capacity.
- the access performance of the memory 86 is the third highest and the storage capacity is the third lowest.
- the access performance of the storage device 87 is the fourth highest and the storage capacity is the fourth lowest.
- Appearance list that is, n-gram appearance lists 12, 13, 21-24, 31, 32, 41, 42, B-tree appearance lists 62, 63, and appearance lists 71, 72 have search keys within a predetermined range
- information other than the appearance list may not be processed, but all information in the appearance list is processed. Therefore, storing the appearance list in the cache has the effect of speeding up the search process. Get higher.
- the text occupies a large storage capacity, so the page is stored in the storage device 87, and the page that is read based on the appearance list is reduced.
- the number of accesses to the device 87 is reduced, and the effect of speeding up the search process is enhanced.
- a plurality of appearance lists in the hierarchy may be processed in parallel by different CPUs.
- the n-gram subgroup appearance lists 21 and 22 shown in FIG. 2 are stored in the cache A83 and processed by the CPU-A81.
- the n-gram subgroup appearance list in which the pointer exists in the same ID range of both the n-gram appearance lists 12 and 13 (not shown) is stored in the cache B84 and processed by the CPU-B82.
- the two n-gram subgroup appearance lists can be processed in parallel by the two CPU-A81 and CPU-B82.
- different ID ranges in the appearance list are independent, different ID ranges may be processed in parallel by different CPUs.
- the ID ranges 1 to 100 of both the n-gram appearance lists 12 and 13 are stored in the cache A83 and processed by the CPU-A81.
- the ID ranges 101 to 200 of both the n-gram appearance lists 12 and 13 are stored in the cache B84 and processed by the CPU-B82.
- the different ID ranges of the n-gram subgroup appearance lists may be processed by different CPUs, and the different page numbers of the n-gram appearance lists 31 and 32 may be processed by different CPUs.
- the appearance list is stored in the cache C85 or the memory 86 and is shared by the CPU-A81 and CPU-B82 so that they can be processed in parallel by different CPUs. May be.
- the questionnaire response text on different pages may be scanned in parallel by different CPUs.
- CPU-A 81 may scan page 33 with page number 1 shown in FIG. 3, and CPU-A 82 may scan page 36 with page number 40.
- FIG. 8 shows an example in which the computer has two CPUs and three caches. However, the present invention is not limited to this, and the computer may have three or more CPUs. You may have.
- the computer has an input / output IF (interface) 88 that connects a display, a keyboard, and a mouse (not shown), and a network IF 89 that connects to a network (not shown).
- the computer may receive setting information for processing the search via the input / output IF 88, and the search result may be output from the input / output IF 88 and displayed on the display. Further, the computer may receive the setting information via the network IF 89 and output the search result. Further, search target data such as text and ID and a program to be executed for the search may be received by the network IF 89.
- FIG. 9 is a diagram showing an example of a processing flow for creating an appearance list.
- the appearance list may be created when creating an n-gram node, B-tree node, or i-node before starting the search, or an n-gram node and n-gram index or a B-tree node and B-tree index are created in advance.
- the search may be started and may be created at the time of the first search.
- an appearance list is created together with a node before the search is started will be described.
- the computer determines the group creation unit in step 91.
- 100 ID ranges are one group
- in FIG. 3 one page is one group
- 10 ID ranges are one group.
- a user input value may be used, or a read unit such as a page may be obtained from parameters of the OS or the file system.
- the computer proceeds to step 93 by setting the steps from step 92 to step 911 as the loop range and the number of groups as the number of loops.
- the computer acquires data of one group in step 93.
- one group is a group of units determined in step 91, and is changed to another group each time a loop is executed.
- the data is ID of the page 33, etc. in the text search of FIGS. 3 and 4, gender, age, questionnaire response (text), the search target table 61 in the ID search of FIG. 6, and the data in the file system search of FIG. It is a block.
- step 94 the computer acquires all IDs from the data acquired in step 93. 1 are all IDs included in, for example, 1 to 100 in FIG. 1 and are 52, 73, 89, 95, etc., for example, are 1 to 15 in FIG. 3, and are included in, for example, 1 to 10 in FIG. 1 to 3, 5, 8, etc.
- the computer sets the steps from step 95 to step 910 as the loop range, and proceeds to step 96 with the number of IDs acquired in step 94 as the number of loops.
- step 96 the computer obtains all n characters in the text corresponding to one ID or a value corresponding to one ID.
- one ID is sequentially changed to another ID every time the loop is executed.
- all n characters in the text corresponding to the ID 01 are all two characters such as “OU”, “UT”, and “TD” from the answer text of the questionnaire.
- the values corresponding to ID 1 are 12 of C1, aaa of C2, and 78 of C3.
- step 97 the computer registers all the n characters or values acquired in step 96 in the n-gram node or B-tree node, and creates the n-gram node or B-tree node.
- step 98 the computer registers the ID and all character positions as necessary in the index. That is, when the ID value is 52 in step 96, the computer registers the ID value 52 in the n-gram indexes 14 and 15 in FIG. 1, and the character positions 37 and 122 and the character position 39 are associated with this ID value. And 125 are registered. If the ID value is 1 in step 96, the computer registers the ID value 1 in the B-tree index 64 in FIG. 6, and C2 is the B-tree index corresponding to aaa and C3 is 78. ID value 1 is registered in the B-tree index corresponding to.
- step 99 when the computer registers a new n character or new value in step 98, the computer creates an appearance list for the registered new n character or new value, and creates the appearance list or the already generated appearance list. Register pointer to index or information about presence or absence.
- the computer calculates the number of groups including the ID range from 1 to 100 as well as 101 or later for the new n character “FO” whose ID value is 52 in FIG. Create an n-gram appearance list 12 with numbers.
- the computer registers the n-gram appearance list 12 to the pointer corresponding to the n character “FO” of the n-gram node 11, and the n-gram appearance list 12 has an ID range of 1 to 100 to the pointer corresponding to the n-gram appearance list 12.
- Register gram index 14 The computer sets a predetermined value when “FO” does not exist in the pointer whose ID range in the n-gram appearance list is 101 or later.
- the computer registers the n-gram index 16 to the pointer corresponding to the ID range 201 to 300 of the existing n-gram appearance list 12 for the non-new n character “FO” whose ID value is 203. . Since “FO” does not exist in the ID range 101 to 200, the ID range 101 to 200 in the n-gram appearance list 12 maintains a predetermined value when “FO” set at the time of creation does not exist.
- an n-gram node or B-tree node and an n-gram index or B-tree index can be created, and an n-gram appearance list or B-tree appearance list can be created. Since the text scan shown in FIGS. 3 and 4 and the file system shown in FIG. 7 do not have an index or the like, the computer need not execute unnecessary steps.
- FIG. 10 is a diagram showing an example of a search processing flow using the appearance list.
- the computer converts the search term 18 as a search condition into the search key 19, or converts the search condition 68 into the search key 69.
- the computer searches for an n-gram node or a B-tree node with a search key, acquires a pointer corresponding to the n character string or value found with the search key, and acquires an appearance list with the acquired pointer. That is, in FIG. 1, the computer finds the n character “FO” of the n-gram node 11 with respect to “FO” of the search key 19 and uses the pointer corresponding to the n character “FO” to display the n-gram appearance list 12. To get. In FIG. 6, the computer finds the value 12 of the B-tree node of C1 with respect to the value 12 of the search key 69, and acquires the B-tree appearance list 62 using the pointer corresponding to the value 12.
- the computer proceeds from step 103 to step 107 with the range of the loop as the loop range, and the number of groups in the appearance list acquired at step 102 as the number of loops to step 104.
- the number of groups is the number of items in the ID range of 1 to 100, 101 to 200, and 201 to 300 in the n-gram appearance list 12 of FIG. 1, and the n-gram appearance list of FIG.
- the number of items of 31 page numbers is 40.
- step 104 the computer obtains all occurrence / non-occurrence information of one group or a pointer indicating the occurrence / absence from the occurrence list.
- one group is changed to another group each time a loop is executed.
- All the appearance information of one group is, for example, appearance information corresponding to page number 1 in the n-gram appearance list 31 in FIG. 3 and appearance information corresponding to page number 1 in the n-gram appearance list 32 in FIG. That is, it is the presence / absence information of all the plurality of n-gram appearance lists of one group of page number 1.
- the pointers indicating the presence / absence of all occurrences of one group correspond to, for example, the pointer corresponding to the ID range 1 to 100 in the n-gram appearance list 12 and the ID range 1 to 100 in the n-gram appearance list 13 in FIG. Pointer. That is, it is a pointer for all the n-gram appearance lists in one group whose ID range is 1 to 100.
- step 105 the computer determines whether all the presence / absence information acquired in step 104 indicates presence, or whether the acquired pointers indicate all presence. If the computer determines that all are present, the process proceeds to step 106, and a detailed search is performed for one group corresponding to presence / absence information. For example, in FIG. 3, there is one group corresponding to the presence of page number 1, and the computer acquires page 33 with page number 1 and scans “FOOT” from the questionnaire response text included in page 33. .
- the computer acquires the n-gram index 14 from the pointer corresponding to the ID range 1 to 100 in the n-gram appearance list 12, and n-gram
- the n-gram index 15 is acquired from the pointer corresponding to the ID range 1 to 100 of the gram appearance list 13, and the ID match and the character position adjacency are collated as already described with reference to FIG.
- the computer collectively outputs the detailed search results in step 108.
- step 106 is executed after checking the presence / absence information of all the groups, and when there are many groups having all occurrence / non-occurrence information, that is, when the detailed search takes time. May prompt the user to review the search conditions.
- FIG. 11 is a diagram showing an example of inputting an appearance list creation parameter.
- the computer may output the display data of the window 111 from the input / output IF 88 to the display, and accept input of parameters by the keyboard and mouse by the input / output IF 88.
- another computer connected to the network IF 89 may perform output and input, and the computer may transmit and receive the display data to be output and the input parameters via the network IF 89.
- the appearance list creation input field 112 is a field for entering settings including a setting for not creating an appearance list, a setting for creating an appearance list before starting a search, and a setting for creating an appearance list at the first search. If a setting for not creating an appearance list is input in this field, the computer does not execute step 99 shown in FIG. 9 and registers the corresponding index in the node. If the setting for creating the appearance list at the first search is input, the computer does not execute step 99, registers the corresponding index to the node, and creates and registers the appearance list at the first search. . When the setting for creating the appearance list before starting the search is input, the computer processes as described with reference to FIG.
- the search target input column 113 is a column for specifying a search target data file, that is, a target file for generating an appearance list.
- the name of the text file that is the target of the text search may be specified, or the name of the database that is the target of the search may be specified.
- the computer obtains data from the file specified in the input field 113 to be searched.
- the ID unit input column 114 is a column in which the ID described with reference to FIGS.
- an ID is explicitly included in the file specified in the search target input field 113, for example, there is an ID item in the search target table 61 described with reference to FIG.
- the unit input field 114 is designated.
- the item or paragraph input field 114 may be specified as described with reference to FIG.
- a line of text may be specified as an ID unit.
- the ID unit input field 114 may be set so that no ID is specified.
- the group unit input column 115 is a column for specifying a group based on the ID specified in the ID unit input column 114, or specifying that a parameter is acquired from the OS or file system to be a group.
- a unit 100 is input in the group unit input field 115, and the B-tree appearance lists 62 and 63 described with reference to FIG.
- the unit 10 is entered in the group unit input field 115.
- a unit of one read is input from the OS or file system.
- the designation input in the group unit input field 115 is common to all the appearance lists. This makes it possible to determine whether or not an item found with the search keys 19 and 69 appears for each group.
- the number-of-hierarchies input field 116 is a field for designating the number of levels in the appearance list.
- “1” is input in the input column 116 for the number of layers.
- the computer may determine that the appearance list is one layer if nothing is input.
- the appearance list has two layers of the n-gram appearance lists 12 and 13 and the n-gram subgroup appearance lists 21 to 24, 2 is input in the input column 116 for the number of layers.
- an input field for the group unit Increase 115 to 2.
- 100 for the n-gram appearance list 12 and 10 for the n-gram subgroup appearance lists 21 and 23 are input.
- a character string may be input, selectable candidates may be listed and displayed, and may be selected by a so-called radio button, or may be selectable by a so-called pull-down menu.
- the search target data is divided into a plurality of groups, and the same group is managed by a plurality of appearance lists, so that the computer can determine a group that does not need to be searched based on the appearance list. Reduce group processing and speed up searches.
- n-gram node 12 n-gram appearance list 14-17: n-gram index 18: search word 19, 69: search key 21-24: n-gram subgroup appearance list 62, 63: B- Tree appearance list 64 to 67: B-tree index 68: Search condition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
12、13:n-gram出現リスト
14~17:n-gramインデックス
18 :検索語
19、69:検索キー
21~24:n-gramサブグループ出現リスト
62、63:B-tree出現リスト
64~67:B-treeインデックス
68 :検索条件
Claims (12)
- 指定された文字列で文献を検索する計算機の文字列検索方法において、
前記指定された文字列を第1の文字または文字列と第2の文字または文字列を含む複数の文字または文字列に分解する第1のステップと、
前記計算機の1回のリード処理で読み込める単位で複数の文献をグルーピングして複数の集合を作成する第2のステップと、
前記複数の集合それぞれにおいて前記第1の文字または文字列が出現するか否かを判定し、出現する集合それぞれに前記第1の文字または文字列の第1の出現情報を付与する第3のステップと、
前記複数の集合それぞれにおいて前記第2の文字または文字列が出現するか否かを判定し、出現する集合それぞれに前記第2の文字または文字列の第2の出現情報を付与する第4のステップと、
前記複数の集合の中の第1の集合において前記第1の出現情報と前記第2の出現情報が付与されている場合、前記第1の集合を詳細検索対象とする第6のステップと、
を有することを特徴とする文字列検索方法。 - 前記複数の集合の中の第2の集合において前記第1の出現情報あるいは前記第2の出現情報が付与されていない場合、前記第2の集合を詳細検索対象としない第7のステップをさらに有することを特徴とする請求項1に記載の文字列検索方法。
- 前記詳細検索対象を前記指定された文字列で検索する第8のステップをさらに有することを特徴とする請求項2に記載の文字列検索方法。
- 前記複数の集合それぞれにおいて、前記第1の文字または文字列および前記第2の文字または文字列以外の前記分解された文字または文字列それぞれが出現するか否かを判定し、出現する集合それぞれに出現情報を付与する第5のステップをさらに有し、
前記第6のステップであって、前記複数の集合の中の第3の集合において前記第1の出現情報と前記第2の出現情報を含むすべての出現情報が付与されている場合、前記第3の集合を詳細検索対象とする第6のステップを有することを特徴とする請求項1に記載の文字列検索方法。 - 前記複数の集合の中の第4の集合において前記第3のステップと前記第4のステップと前記第5のステップの少なくとも1つのステップにおいて出現情報が付与されていない場合、前記第4の集合を詳細検索対象としない第7のステップをさらに有することを特徴とする請求項4に記載の文字列検索方法。
- 前記詳細検索対象を前記指定された文字列で検索する第8のステップをさらに有することを特徴とする請求項5に記載の文字列検索方法。
- 指定された文字列で文献を検索する計算機の文字列検索方法において、
前記計算機の1回のリード処理で読み込める単位で複数の文献をグルーピングして複数の集合を作成する第1のステップと、
前記複数の集合それぞれにn個(nは自然数)の任意の文字が出現するか否かを判定し、出現する集合それぞれにn個の文字の出現情報を付与する第2のステップと、
前記指定された文字列を第1のn個の文字と第2のn個の文字を含む複数のn個の文字に分解する第3のステップと、
前記複数の集合の中の第1の集合において前記第1のn個の文字の出現情報が付与されるとともに前記第2のn個の文字の出現情報が付与されている場合、前記第1の集合を詳細検索対象とする第4のステップと、
を有することを特徴とする文字列検索方法。 - 前記複数の集合の中の第2の集合において前記第1のn個の文字の出現情報あるいは前記第2のn個の文字の出現情報のいずれかがが付与されていない場合、前記第2の集合を詳細検索対象としない第5ステップをさらに有することを特徴とする請求項7に記載の文字列検索方法。
- 前記詳細検索対象を前記指定された文字列で検索する第6ステップをさらに有することを特徴とする請求項8記載の文字列検索方法。
- 前記第4のステップであって、前記複数の集合の中の第3の集合において前記第1のn個の文字の出現情報と前記第2のn個の文字の出現情報を含む前記分解されたすべてのn個の文字の出現情報が付与されている場合、前記第3の集合を詳細検索対象とする第4のステップを有することを特徴とする請求項7に記載の文字列検索方法。
- 前記複数の集合の中の第4の集合において前記第1のn個の文字の出現情報と前記第2のn個の文字の出現情報を含む前記分解されたn個の文字の出現情報の中の少なくとも1つの出現情報が付与されていない場合、前記第4の集合を詳細検索対象としない第5のステップをさらに有することを特徴とする請求項10に記載の文字列検索方法。
- 前記詳細検索対象を前記指定された文字列で検索する第6のステップをさらに有することを特徴とする請求項11に記載の文字列検索方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/122,780 US10394870B2 (en) | 2014-06-30 | 2014-06-30 | Search method |
JP2016530714A JP6212639B2 (ja) | 2014-06-30 | 2014-06-30 | 検索方法 |
PCT/JP2014/067438 WO2016001991A1 (ja) | 2014-06-30 | 2014-06-30 | 検索方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/067438 WO2016001991A1 (ja) | 2014-06-30 | 2014-06-30 | 検索方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016001991A1 true WO2016001991A1 (ja) | 2016-01-07 |
Family
ID=55018593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/067438 WO2016001991A1 (ja) | 2014-06-30 | 2014-06-30 | 検索方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10394870B2 (ja) |
JP (1) | JP6212639B2 (ja) |
WO (1) | WO2016001991A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018084907A (ja) * | 2016-11-22 | 2018-05-31 | 富士通株式会社 | ジョブ消費電力推定プログラム、並列処理装置およびジョブ消費電力推定方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11868335B2 (en) * | 2019-05-22 | 2024-01-09 | Druva Inc. | Space-efficient change journal for a storage system |
US20230161774A1 (en) * | 2021-11-24 | 2023-05-25 | International Business Machines Corporation | Semantic annotation for tabular data |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05135102A (ja) * | 1991-11-11 | 1993-06-01 | Ricoh Co Ltd | 文書検索方式 |
JPH07282073A (ja) * | 1994-04-04 | 1995-10-27 | Toyota Motor Corp | データ検索装置及びその方法 |
JPH0869476A (ja) * | 1994-08-30 | 1996-03-12 | Hokkaido Nippon Denki Software Kk | 検索システム |
JPH08161357A (ja) * | 1994-06-02 | 1996-06-21 | Ricoh Co Ltd | 文書管理装置 |
JPH09223160A (ja) * | 1996-02-19 | 1997-08-26 | Hitachi Ltd | 文書検索方法 |
JP2009037359A (ja) * | 2007-07-31 | 2009-02-19 | Hitachi Ltd | データ登録検索方法、データ登録検索プログラムおよびデータベースシステム |
JP2009134627A (ja) * | 2007-11-30 | 2009-06-18 | Mitsubishi Electric Corp | N文字索引生成装置、文書検索装置、n文字索引生成方法、文書検索方法、n文字索引生成プログラムおよび文書検索プログラム |
WO2013179348A1 (ja) * | 2012-05-31 | 2013-12-05 | 富士通株式会社 | インデックス生成プログラム及び検索プログラム |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3459053B2 (ja) | 1995-01-12 | 2003-10-20 | 株式会社日立製作所 | 文書検索方法および装置 |
JP3620968B2 (ja) | 1998-08-05 | 2005-02-16 | 株式会社日立製作所 | 文書検索方法及びその実施装置並びにその処理プログラムを記録した媒体 |
JP4115048B2 (ja) * | 1999-08-17 | 2008-07-09 | 株式会社リコー | 文書検索システム |
US7054855B2 (en) * | 2001-07-03 | 2006-05-30 | International Business Machines Corporation | Method and system for performing a pattern match search for text strings |
JP2006185408A (ja) * | 2004-11-30 | 2006-07-13 | Matsushita Electric Ind Co Ltd | データベース構築装置及びデータベース検索装置及びデータベース装置 |
US8521751B2 (en) * | 2008-08-22 | 2013-08-27 | Nec Corporation | Search device, a search method and a program |
US20160292234A1 (en) * | 2014-12-12 | 2016-10-06 | Infosys Limited | Method and system for searching in a distributed database |
-
2014
- 2014-06-30 JP JP2016530714A patent/JP6212639B2/ja active Active
- 2014-06-30 WO PCT/JP2014/067438 patent/WO2016001991A1/ja active Application Filing
- 2014-06-30 US US15/122,780 patent/US10394870B2/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05135102A (ja) * | 1991-11-11 | 1993-06-01 | Ricoh Co Ltd | 文書検索方式 |
JPH07282073A (ja) * | 1994-04-04 | 1995-10-27 | Toyota Motor Corp | データ検索装置及びその方法 |
JPH08161357A (ja) * | 1994-06-02 | 1996-06-21 | Ricoh Co Ltd | 文書管理装置 |
JPH0869476A (ja) * | 1994-08-30 | 1996-03-12 | Hokkaido Nippon Denki Software Kk | 検索システム |
JPH09223160A (ja) * | 1996-02-19 | 1997-08-26 | Hitachi Ltd | 文書検索方法 |
JP2009037359A (ja) * | 2007-07-31 | 2009-02-19 | Hitachi Ltd | データ登録検索方法、データ登録検索プログラムおよびデータベースシステム |
JP2009134627A (ja) * | 2007-11-30 | 2009-06-18 | Mitsubishi Electric Corp | N文字索引生成装置、文書検索装置、n文字索引生成方法、文書検索方法、n文字索引生成プログラムおよび文書検索プログラム |
WO2013179348A1 (ja) * | 2012-05-31 | 2013-12-05 | 富士通株式会社 | インデックス生成プログラム及び検索プログラム |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018084907A (ja) * | 2016-11-22 | 2018-05-31 | 富士通株式会社 | ジョブ消費電力推定プログラム、並列処理装置およびジョブ消費電力推定方法 |
Also Published As
Publication number | Publication date |
---|---|
US20170075989A1 (en) | 2017-03-16 |
US10394870B2 (en) | 2019-08-27 |
JPWO2016001991A1 (ja) | 2017-04-27 |
JP6212639B2 (ja) | 2017-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11720633B2 (en) | Aggregating personalized suggestions from multiple sources | |
US9104979B2 (en) | Entity recognition using probabilities for out-of-collection data | |
US10579661B2 (en) | System and method for machine learning and classifying data | |
US8745061B2 (en) | Suffix array candidate selection and index data structure | |
JP5616444B2 (ja) | 文書インデックス化およびデータクエリングのための方法およびシステム | |
WO2017151194A1 (en) | Atomic updating of graph database index structures | |
US10613785B1 (en) | Scalable binning for big data deduplication | |
US10372718B2 (en) | Systems and methods for enterprise data search and analysis | |
KR101823463B1 (ko) | 연구자 검색 서비스 제공 장치 및 그 방법 | |
Welch et al. | Fast and accurate incremental entity resolution relative to an entity knowledge base | |
Krishnan et al. | A taxonomy of query auto completion modes | |
JP4237813B2 (ja) | 構造化文書管理システム | |
JP6212639B2 (ja) | 検索方法 | |
Qin et al. | Asymmetric signature schemes for efficient exact edit similarity query processing | |
Bedathur et al. | Interesting-phrase mining for ad-hoc text analytics | |
Ilic et al. | Inverted index search in data mining | |
Mostafa | A case study on B-Tree database indexing technique | |
Lu et al. | Boosting the quality of approximate string matching by synonyms | |
Lee et al. | Integrating code search into the development session | |
JP2795317B2 (ja) | 多段表処理方式 | |
JP6081609B2 (ja) | データ分析システム及びその方法 | |
WO2013069149A1 (ja) | データ検索装置、データの検索方法及びプログラム | |
JP2000163439A (ja) | 電子ファイル検索装置および電子ファイル検索方法 | |
JP2005301855A (ja) | 文書検索方法、文書検索プログラムおよびこれを実行する文書検索装置 | |
JP7639730B2 (ja) | 検索方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14896622 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016530714 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15122780 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14896622 Country of ref document: EP Kind code of ref document: A1 |