WO2012176374A1 - 数値範囲検索装置、数値範囲検索方法、および数値範囲検索プログラム - Google Patents
数値範囲検索装置、数値範囲検索方法、および数値範囲検索プログラム Download PDFInfo
- Publication number
- WO2012176374A1 WO2012176374A1 PCT/JP2012/003300 JP2012003300W WO2012176374A1 WO 2012176374 A1 WO2012176374 A1 WO 2012176374A1 JP 2012003300 W JP2012003300 W JP 2012003300W WO 2012176374 A1 WO2012176374 A1 WO 2012176374A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- numerical
- range
- section
- value
- interval
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 26
- 238000012545 processing Methods 0.000 claims description 2
- 238000000638 solvent extraction Methods 0.000 abstract 1
- 230000014509 gene expression Effects 0.000 description 20
- 150000003839 salts Chemical class 0.000 description 14
- 238000000605 extraction Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 239000000284 extract Substances 0.000 description 8
- 230000036772 blood pressure Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 206010020772 Hypertension Diseases 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 241001655798 Taku Species 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000001631 hypertensive effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
Definitions
- the present invention relates to a numerical range search device, a numerical range search method, and a numerical range search program.
- a general keyword search or full-text search method cannot comprehensively obtain documents that meet the condition. For example, when “10 g” is used as a condition, a document written as “10 g or more and 15 g or less” can be searched. However, a document written as “5 g or more and 20 g or less” conforms to the condition of the numerical range, but does not include the character string “10 g” and cannot be searched.
- Patent Document 1 discloses an index creation method for adding information for numerical range search to an index for full text search. This method calculates the exponent part of a numerical value in a document and creates an index with an exponent part added to some of the elements that make up the index, thereby creating special information for numerical range search (another independent Index range) and full-text search are made possible by the same mechanism without preparing an index.
- “380” and “760” are expressed as 2 gram character strings (character strings every two consecutive characters), appearance position, and exponent part (the first 2 gram of each numerical value). (“38", [18 (+2)]), ("80", [19]) and ("76", [22 (+2)]), ("60", [23 ]) And stored in the word index.
- the search condition “387.6” is divided into “38” and “7.6”.
- the respective exponents are +2 and ⁇ 1, which are collated with the exponent part of the word index, and those having the same or larger exponent part (because the condition includes “above”) are extracted as candidates. Since the search condition in the above example has “more than”, a number with a value larger than the number of the search condition (when the index is the same) matches.
- connection determination means connects (“76”, [22 (+2)]) and (“60”, [23]) to “760”. It is determined that “760 (m)” satisfies “387.6 m or more”.
- numeric values and numeric ranges included in the search statement are individually matched with the numeric ranges contained in the search target document. Searching for documents containing data takes time.
- an object of the present invention is to provide a numerical range search device, a numerical range search method, and a numerical range search program that efficiently match a numerical range described in a document with a search condition.
- the numerical value range search apparatus uses a numerical value range defined by using at least one of a minimum value and a maximum value as a numerical value range, and satisfies a predetermined conformity condition for the input numerical value or numerical value range.
- a numerical range search device for searching data including a numerical range, and for the numerical range to be searched, a partial interval obtained by dividing a range including all the numerical ranges by a predetermined boundary value and an input numerical value or numerical value
- the unit is a record that includes at least a section filter, which is data representing the correspondence with the range, and reference information for referencing the numerical range to be searched, and the record is a record unit in which at least a part of the section filter is common.
- the section index storage means for storing the collected section indexes and the correspondence between the input numerical value or numerical range and the partial section Section query generation means for generating a section query that is a data, and candidate selection means for selecting a record including a section filter in which a logical product of the section filter and the section query is equal to the section query, among records.
- the numerical value range search method uses a numerical value range defined by using at least one of the minimum value and the maximum value as a numerical value range, and satisfies a predetermined conformity condition for the input numerical value or numerical value range.
- a numerical range search method for searching data including a numerical range, and for the numerical range to be searched, a partial interval obtained by dividing a range including all numerical ranges by a predetermined value and an input numerical value or numerical range The unit is a record including at least a section filter, which is data representing the correspondence relationship with the reference information for referring to the numerical range to be searched, and the records are grouped into a record unit in which at least a part of the section filter is common.
- the section index is stored, and the section query that is the data representing the correspondence between the input numerical value or numerical range and the partial section is generated, Among the code, characterized in that the logical product of a section filter and the section query selects records containing equal interval filter sections query.
- the numerical value range search program uses a numerical value range defined by using at least one of the minimum value and the maximum value as a numerical value range, and satisfies a predetermined conformity condition for the input numerical value or numerical value range.
- the numerical value or numerical range input to the computer that stores the interval index compiled in Candidates for selecting a record including a section filter that generates a section query that is data representing a correspondence with a minute section and a section filter in which a logical product of the section filter and the section query is equal to the section query. And a selection process.
- the numerical range described in the document can be efficiently collated with the search condition.
- FIG. 1 is a block diagram showing a first embodiment of a numerical range search apparatus according to the present invention.
- the numerical value range search device includes a section query generation unit 1, a section index storage unit 2, a candidate selection unit 3, a numerical range table storage unit 4, and a suitability determination unit 5.
- the interval query generation means 1 is binary data (hereinafter, referred to as a correspondence relationship between an input numerical value or numerical range and each partial interval defined by a lower limit value and an upper limit value obtained by dividing a one-dimensional real number space at a predetermined position. (Referred to as interval query).
- the section index storage means 2 is a binary data (hereinafter referred to as a section filter) representing a correspondence between a numerical range ID for specifying a document in which a numerical range is described and its description position, and a numerical range in the document and each partial section. And a section index represented by a record including the ID of the partial section having the minimum lower limit value among the partial sections corresponding to the numerical range in the document. That is, the section index storage means 2 is based on a record including at least a section filter and reference information (for example, a numerical range ID) for referring to a numerical range to be searched. Stores the section index collected in common record units.
- Candidate selection means 3 selects all records in which the logical product of the section filter and section query included in each record in the section index is equal to the section query, and sets the numerical range ID included in the selected record to the subsequent stage. Is output to the compatibility determination means 5.
- the numerical value range table storage means 4 stores a numerical value range table represented by a record including a minimum value and a maximum value of a numerical value range specified by the numerical value range ID, and a document ID that specifies a document in which the numerical value range is described. Store.
- the suitability determining means 5 determines the minimum value and the maximum value included in the record having the numerical value range ID received from the candidate selecting means 3 among the records in the numerical value range table as the input numerical value or numerical value range. Collation is performed, and it is determined whether the numerical range in the document specified by the document ID in the record matches the input numerical value or the numerical range.
- the suitability determination unit 5 determines that the document is compatible when, for example, the numerical range in the document includes the input numerical value or the numerical range. As a result, the suitability determination unit 5 outputs the document ID included in the record determined to be compatible. By separately preparing an index (not shown) in which the document ID is associated with the document name, the document file name, and the like, an input numerical value or a value based on the document ID output from the compatibility determination unit 5 is prepared. You can refer to documents that describe numerical ranges that fit the numerical ranges.
- the first embodiment of the numerical value range search apparatus can be realized by a hardware configuration similar to a general computer apparatus as shown in FIG.
- the numerical value range search device A includes at least a CPU (Central Processing Unit) A1, a main storage unit A2, an output unit A3, an input unit A4, and an auxiliary storage unit A6. Moreover, you may provide the communication part A5.
- CPU Central Processing Unit
- the main memory A2 is a main memory such as a RAM (Random Access Memory), and is used as a data work area or a temporary data save area.
- the output unit A3 is a display device such as a liquid crystal display device or a printing device such as a printer, and has a function of outputting data.
- the input unit A4 is an input device such as a keyboard or a mouse, and has a function of inputting data. When data is input by reading a file, the input unit A4 may be an external recording medium reader.
- the auxiliary storage unit A6 is a ROM (Read Only Memory), a hard disk device, or the like. Further, as shown in FIG. 12, the above-described components A1 to A6 are connected to each other via a system bus A7.
- the auxiliary storage unit A6 of the numerical value range search device A stores various programs for searching for document IDs based on the numerical values or numerical ranges input using the input unit A4. .
- the programs that realize each of the section query generation unit 1, candidate selection unit 3, and suitability determination unit 5 shown in FIG. 1 are stored in the auxiliary storage unit A6.
- the auxiliary storage unit A6 can realize the section index storage means 2 and the numerical range table storage means 4 shown in FIG. 1 by storing the section index and the numerical range table.
- the numerical range search device has a hardware component such as an LSI (Large Scale Integration) that incorporates a program for realizing the functions shown in FIG. It may be realized.
- the program that provides the function shown in FIG. 1 described above may be implemented by software by causing the CPU A1 of the computer to execute the program.
- the CPU A1 loads the program stored in the auxiliary storage unit A6 to the main storage unit A2, executes it, and controls the operation of the numerical range search device A, thereby realizing the above-described functions in software. can do.
- the communication unit A5 is connected to a peripheral device and has a function of transmitting and receiving data.
- the external storage device B shown in FIG. 12 is connected to the numerical range search device A via the network by the communication unit A5, and the section index and the numerical range table are stored in the external storage device B. Further, the document ID output by the suitability determination unit 5 may be stored in the external storage device B.
- the section index stored in the section index storage means 2 will be described in more detail with reference to FIG. 2 and FIG.
- a one-dimensional real space having a range from ⁇ (negative infinity) to + ⁇ (positive infinity) is divided into four partial sections (partial section 1, partial section 2, and partial section 3). , Divided into partial sections 4).
- the boundary values between adjacent partial sections are 0, 1, and 100, respectively.
- Each partial section includes a boundary value as a lower limit value and does not include the boundary value itself as an upper limit value.
- the partial section 2 is defined as a section in which the lower limit value includes 0 but the upper limit value does not include 1, and is 0 or more and less than 1.
- the boundary values between the partial sections are separated as many numerical ranges that do not overlap each other from all the numerical ranges described in all documents to be searched (a common range). There is a method of selecting such that it is not included in the partial section.
- the numerical range search according to the present invention can be executed at higher speed. As a specific procedure, first select a boundary value that can separate the most numerical ranges, create two partial intervals, and then select a boundary value that can separate the largest numerical ranges in each partial interval. To create a new subsection. By repeating this operation until a predetermined condition is satisfied (for example, until there are 10 partial sections or until there is no numerical range that can be separated), the partial sections can be determined.
- the numerical range described in each document is expressed as a line segment in the real number space.
- An independent single numerical value is regarded as the same numerical value range as the minimum value and the maximum value, and is represented by a zero-length line segment.
- the numerical range is ( ⁇ , ⁇ 10), which is included in the partial interval 1 having the range ( ⁇ , 0) among the above four partial intervals, It does not correspond to section 4.
- a partial interval corresponding to a part or all of a numerical value or a numerical range is set to 1
- a non-corresponding partial interval is set to 0
- binary 4-digit binary data arranged in order of partial intervals close to ⁇ is referred to as an interval filter. I will call it. Since the numerical range “ ⁇ 10 or less” of the document 1 corresponds only to the partial section 1, the section filter is “1000”.
- the numerical value range table 100 shown in FIG. 2B is an example in which the ID, the minimum value and the maximum value, and the ID of the document described are collected for the numerical value range described in each document.
- a section index 102 shown in FIG. 4 is created from such a numerical range table 100 and stored in the section index storage means 2 in advance.
- the partial section definition 101 is a table that defines the lower limit value and the upper limit value of each of the four partial sections shown in FIG.
- Each partial section is identified by a partial section ID (Z01 to Z04).
- the section index 102 in FIG. 4 is a numerical value for referring to the partial section ID and section filter of the corresponding partial section and the actual numerical range for each numerical range described in the numerical range table 100 in FIG. This is an example in which range IDs are configured and summarized for each partial section ID. Since one numerical range may correspond to two or more partial sections, the section index 102 may include a plurality of records for the same numerical range.
- FIG. 5 is a flowchart for explaining an operation example of the first embodiment.
- the section query generation means 1 collates the numerical value with the partial section definition 101 shown in FIG. Judge which partial interval the numerical value is included in.
- the numerical value “0.04” is included in the section 2 of 0 or more and less than 1, and is not included in the other sections. Therefore, the numerical value “0.04” can be represented by binary data “0100” in a 4-digit binary number in which the four partial sections defined in the partial section definition 101 are arranged in ascending order of the lower limit value.
- the section query is “0100”
- the section filters of the records whose partial section ID of the section index 102 is “Z02” are “0110”, “0100”, and “0110”, respectively.
- the logical products are all equal to “0100”, that is, the value of the interval query, and the numerical value range ID output to the suitability determination unit 4 is “0002”, “0004”, and “0006”.
- the document ID describing the range is output (step S3).
- FIG. 6 is an explanatory diagram illustrating a specific example of the operation of the first embodiment.
- the input numerical range is “0.05 to 2.25”, that is, the minimum value is 0.05 and the maximum value is 2.25.
- the partial sections corresponding to the input numerical ranges are the partial section 2 and the partial section 3. Accordingly, in this operation example, the section query generation means 1 generates a section query “0110”.
- the numeric value range IDs of the extracted records are “0002” and “0006” (in FIG. 6, “ ⁇ ” is added to the extracted numeric range ID, and “X” is added to the numeric range ID that has not been extracted).
- the suitability determination means 5 refers to the numerical value range table 100 shown in FIG. 2B with the two extracted numerical value range IDs, and the minimum value and the maximum value of the numerical value ranges corresponding to each numerical value range ID. Is compared with the minimum and maximum values of the entered numerical range.
- the minimum value and the maximum value of the input numerical range are “0.05” and “2.25”, respectively, and the combination of the minimum value and the maximum value to be compared is “0.02” “ 1.89 "and” 0 "” 90 ".
- the section filter 103 may collect numerical value range IDs having the same value into one record as in the section index 103 shown in FIG.
- the section index By configuring the section index as shown in FIG. 7, the number of records in the section index can be reduced, so that the capacity for storing the section index can be reduced, and the selection of records by the candidate selection means 3 can be speeded up.
- the inclusion relation of the ranges may be determined in advance between the numerical ranges having the same value in the interval filter, and the included numerical range ID may be linked to the included numerical range ID.
- the numerical range represented by the numerical range ID “0002” is included in the numerical range represented by the numerical range ID “0006”. Therefore, by describing such as “0002 (0006)” in the numerical range ID field of the section index 103, the suitability determination means 4 includes the input numerical range in the numerical range ID “0002”. In other words, it can be determined without being compared with the actual numerical range that it is also included in the numerical range ID “0006”. That is, by defining the comparison order of the numerical ranges for each partial section by the inclusion relationship, if a certain numerical range includes the input numerical range, all numerical ranges including the certain numerical range are also included. It is immediately apparent that it covers the entered numerical range.
- the numerical value range search device performs rough classification on the input numerical values and numerical ranges in advance according to the correspondence with the partial sections. Then, the numerical range search device is a logic between short binary data that allows a computer to process a numerical range belonging to the same partial section of both the minimum value and the maximum value among the numerical ranges described in the document to be searched. A search result can be obtained by narrowing down by the product operation and comparing only the narrowed numerical range. Therefore, the numerical value range search apparatus can search at high speed a numerical value or numerical value range that is input from a large number of documents or a very large number of numerical value ranges.
- FIG. 8 is a block diagram showing a second embodiment of the numerical value range search apparatus according to the present invention.
- the section query generation means 1, the section index storage means 2, the candidate selection means 3, the numerical value range table storage means 4, and the suitability determination means 5 are: Since it functions in the same manner as the first embodiment shown in FIG. 1, its description is omitted.
- the suitability determination unit 5 outputs a numerical range ID instead of a document ID. Accordingly, the numerical value range table stored by the numerical value range table storage unit 4 in the second embodiment may not include the document ID.
- a search sentence written in a natural language such as Japanese is input.
- the language analysis means 6 analyzes the input search sentence and identifies words and dependency relationships between words.
- the quantity expression extracting means 7 extracts a set of words representing a numerical value or a unit relating to the quantity expression from the identified word set.
- the quantity type determination means 8 determines which type (hereinafter referred to as the quantity type) of the extracted quantity expression, such as length or weight, is used.
- the target word extraction means 9 extracts a target word that specifies what the extracted quantity expression represents.
- the target word standardization means 10 converts the extracted target words into standard target words when there are a plurality of target words representing the same target.
- the document index storage means 11 stores a document index composed of records including at least a quantity type, a target word, a numerical range ID used in the first embodiment, and a document ID.
- the document search unit 12 refers to the document index stored in the document index storage unit 11 using the numerical range ID, the quantity type, and the standardized target word, and extracts at least the document ID from a record that satisfies a predetermined condition.
- the search result output means 13 outputs a search result including at least the extracted document ID.
- the existing language analysis means can be used to analyze the search sentence written in the natural language in the language analysis means 6 and identify the dependency relationship between words and words.
- MeCab for example, described in http://mecab.sourceforge.net/
- CaboCha is used to identify the dependency relationship between words. (For example, it is described in http://chasen.org/ ⁇ taku/software/cabocha/).
- Patent Document 3 Japanese Patent No. 3360617
- Patent Document 4 Japanese Patent Laid-Open No. 2006-350989
- the quantity type determining means 8 determines the quantity type by “g”, “mg”, “pound”, “weight”, etc., and the type names “weight”, “m” “cm” “feet” “distance” “ This can be realized by preparing a dictionary (not shown) for associating “width” with the type name “length”.
- the target word extraction means 9 can be realized by preparing a dictionary of target words (not shown) and searching for a word that matches the target word from the words identified by the language analysis means 6.
- the target word dictionary may be created, for example, by collecting nouns around numerical values in a document to be searched.
- the target word standardization means 10 can be realized by preparing a dictionary (not shown) indicating the correspondence between the target word and the standard target word and replacing the target word with the standard target word.
- the above dictionary can be created by collecting synonyms and synonyms of nouns using, for example, the following references and using one of them as a standard target word.
- the predetermined condition to be evaluated may include all of the numerical range ID, quantity type, and target word obtained from the input search sentence, or the quantity type obtained from the input search sentence. And the target word, but the numerical range ID may be different. Other conditions may also be applied.
- the second embodiment of the present invention can be realized by a hardware configuration similar to that of the first embodiment shown in FIG. That is, the language analysis unit 6, the quantity expression extraction unit 7, the quantity type determination unit 8, the target word extraction unit 9, the target word standardization unit 10, the document search unit 12, and the search result output unit 13 all include auxiliary storage in FIG.
- Each function can be realized in software by being stored in the unit A6, read into the main storage unit A2 when necessary, and executed by the CPU A1.
- the document index storage unit 11 stores the document index in the auxiliary storage unit A6 of FIG. 12 or stores the document index in the external storage device B, and refers to it when necessary via the communication unit A5 and the network. Can be realized.
- the language analysis means 6 analyzes the input Japanese sentence and identifies the words “salt”, “to”, “700”, “mg”, and “added”, respectively. Further, the language analysis means 6 recognizes a dependency relationship that “salt” and a set of “700” and “mg” respectively modify “added”.
- Quantum expression extraction means 7 extracts those related to the quantity expression from among the identified words.
- the quantity expression extracting means 7 extracts “700” as a numerical value and “mg” as a unit. Further, the quantity expression extracting means 7 standardizes the unit “mg” to the standard unit “g” and performs necessary numerical conversion by the standardization, so that “700” is a thousandth of “0.7”. Replace with
- the quantity type determination means 8 determines the quantity type from the extracted unit. In this example, the quantity type determination means 8 determines the quantity type as “weight (g)” from the standardized unit “g”.
- the target word extraction unit 9 extracts a target word corresponding to the quantity expression “700” “mg” (“0.7” “g” after standardization) from the analysis result of the language analysis unit 6.
- the target word extraction means 9 extracts “salt” that modifies “added” together with the quantity expressions “700” and “mg” as target words.
- the target word standardization means 10 checks whether there is a standard expression (standard target word) of the target word “salt” with reference to a dictionary (not shown), and there is a standard expression. Replace with that expression. For example, if “salt” is described in the dictionary as a standard expression for “salt”, the target word standardization means 10 sets the target word to “salt” instead of “salt”.
- the predetermined condition described above is included in the numerical value range corresponding to the input numerical value or numerical value range and the specified numerical value range ID.
- the suitability determination unit 5 outputs all the specified numerical range IDs together with a determination result as to whether or not a predetermined condition is satisfied.
- a predetermined condition is satisfied.
- the numerical range of “0002” and “0006” includes “0.7”, but the numerical range of the numerical range ID “0004” does not include “0.7”.
- the document search unit 12 refers to the document index stored in the document index storage unit 11 using the determination result obtained by the adaptability determination unit 5, the quantity type “weight (g)”, and the target word “salt”. To do.
- the document index storage means 11 stores a document index 104 as shown in FIG.
- the document index 104 in addition to the numerical range ID, document ID, target word, and quantity type, the position where the numerical range in the document is described (page number and the number of characters from the top of the page to the first character corresponding to the numerical range) It is composed of records including
- the search result output means 13 may operate to output nothing with no search result using the input Japanese sentence as a search condition, or the document ID of a record whose target word and quantity type match. And the position may be output together with a message that the numerical range does not match.
- the actual document is referred to by a table (not shown) for associating the document ID and the actual document prepared separately,
- the part where the numerical value range is described in the document is specified by the position value.
- the actual document content in which the target word and the quantity type match may be output as shown in the lower part of FIG.
- “... addition of salt should be 0.2 g or more and 0.6 g or less,...” Is output, and the input Japanese sentence “700 mg of salt added” is displayed in the corresponding document. It can be confirmed that there is a deviation from the numerical range.
- a numerical value or numerical range included in the search sentence from a search sentence written in a natural language, a numerical value or numerical range included in the search sentence, its numerical type, and a document or a specific position in the document that matches the target word. Descriptions can be retrieved at high speed. Further, it is possible to determine at high speed whether or not the numerical range in the document that matches the target word and the numerical type included in the search sentence includes the numerical value and the numerical range included in the search sentence.
- a search sentence written in a natural language is used as input data.
- a string of mutually related words such as “salt 700 mg” is used as input data. Also good.
- Another embodiment of the present invention using such a word string as input data can be realized by a similar configuration in which the language analysis means 6 is omitted from the block diagram shown in FIG.
- the section index is a section index 105, a section filter, a numerical range ID, a minimum value, a maximum value, a document ID, a target word, a quantity type, like the section index 105 shown in FIG.
- the candidate selection means 3, the suitability determination means 5, and the document search means 12 all refer to the section index storage means 2, and the numerical range table storage means 4 and the document index storage means 11
- the numerical value range search apparatus may be configured without providing.
- FIG. 13 includes a numerical range defined by using at least one of the minimum value and the maximum value as a numerical range, and includes an input numerical value or a numerical range satisfying a predetermined fitness condition for the numerical range.
- the numerical value range search device includes a section index storage unit 2, a section query generation unit 1, and a candidate selection unit 3 as minimum components.
- the section index storage means 2 is input with respect to the numerical range to be searched as a partial section obtained by dividing a value range including all numerical ranges by a predetermined boundary value.
- the unit is a record including at least a section filter that is data representing a correspondence relationship with a numerical value or a numerical range and reference information for referring to the numerical range to be searched, and at least a part of the section filter is common.
- the section query generation means 1 generates a section query that is data representing the correspondence between the input numerical value or numerical range and the partial section.
- the candidate selection means 3 selects the record containing the area filter from which the logical product of an area filter and an area query becomes equal to the said area query among records.
- the numerical range search apparatus having the minimum configuration, it is possible to search a document that matches the input numerical value or numerical range at a high speed with a small number of numerical reference times.
- the numerical value range search device sets a numerical value range defined using at least one of the minimum value and the maximum value as a numerical value range, and satisfies a predetermined conformity condition for the input numerical value or numerical value range.
- a numerical range search device for searching for data including a numerical range, and for a numerical range to be searched, a numerical value input as a partial interval obtained by dividing a value range including all the numerical ranges by a predetermined boundary value or
- Section index storage means for example, realized by the section index storage means 2) for storing the section index compiled into Section query generation means for generating a section query that is data representing the correspondence between the numerical value or numerical range and the partial section (for example, realized by the section query generation means 1), and among the records, the section filter and the section
- It further comprises candidate selection means (for
- the numerical value range search device may be configured such that the numerical value range to be searched includes the input numerical value or the numerical value range as a predetermined conforming condition.
- a partial interval is created by selecting, as a boundary value, a value that most frequently separates numerical ranges that do not overlap each other in the interval including all the numerical ranges to be searched.
- a predetermined boundary value is obtained by repeating the process until a predetermined end condition is satisfied by selecting a value that separates the most numerical ranges that do not overlap each other as a boundary value, and further creating a partial interval. May be configured to determine.
- the section index storage means includes reference information in which the section filter has the same value in one record, and refers to a numerical range to be referred to from among two or more reference information included in one record.
- a section index may be stored that includes reference information having an inclusion relationship between them in a form representing the inclusion relationship.
- the input numerical value or numerical value range is compared with the numerical value range included in the document specified by the reference information recorded in the selected record, and a predetermined conformity condition is satisfied. It may be configured to include suitability determination means (for example, realized by the suitability determination means 4) that outputs a document as a search result.
- PLM Product Lifecycle Management
- PLM Process Lifecycle Management
- the present invention can be applied to a medical information retrieval system that can quickly retrieve information on a patient showing a specific test value, medical records, and the amount of medicine applied from a large amount of electronic medical records and medical-related papers.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
第1の実施の形態.
図1は、本発明による数値範囲検索装置の第1の実施の形態を示すブロック図である。図1に示すように、数値範囲検索装置は、区間クエリ生成手段1、区間インデックス格納手段2、候補選択手段3、数値範囲テーブル格納手段4および適合性判定手段5を含む。
次に、本発明の第2の実施の形態について図面を参照して説明する。図8は、本発明による数値範囲検索装置の第2の実施の形態を示すブロック図である。図8を参照すると、第2の実施の形態において、区間クエリ生成手段1と、区間インデックス格納手段2と、候補選択手段3と、数値範囲テーブル格納手段4と、適合性判定手段5とは、図1に示した第1の実施の形態と同様に機能するため説明を省略する。ただし、第2の実施の形態では、適合性判定手段5は文書IDではなく数値範囲IDを出力することとする。従って、第2の実施の形態において数値範囲テーブル格納手段4が格納する数値範囲テーブルには文書IDが含まれていなくとも良い。
2 区間インデックス格納手段
3 候補選択手段
4 数値範囲テーブル格納手段
5 適合性判定手段
6 言語解析手段
7 数量表現抽出手段
8 数量タイプ判定手段
9 対象語抽出手段
10 対象語標準化手段
11 文書インデックス格納手段
12 文書検索手段
13 検索結果出力手段
100 数値範囲テーブル
101 部分区間定義
102 区間インデックス
103 区間インデックス
104 文書インデックス
105 区間インデックス
A 数値範囲検索装置
A1 CPU
A2 主記憶部
A3 出力部
A4 入力部
A5 通信部
A6 補助記憶部
A7 システムバス
B 外部記憶装置
Claims (7)
- 最小値と最大値とのうちの少なくとも一方を用いて定義される数値の範囲を数値範囲とし、入力された数値または数値範囲に対して所定の適合条件を満たす数値範囲が含まれるデータを検索する数値範囲検索装置であって、
検索対象となる数値範囲について、全ての当該数値範囲が含まれる値域を所定の境界値で分割した部分区間と前記入力された数値または数値範囲との対応関係を表すデータである区間フィルタと、前記検索対象となる数値範囲を参照するための参照情報とを少なくとも含むレコードを単位とし、前記レコードを前記区間フィルタの少なくとも一部が共通するレコード単位にまとめた区間インデックスを格納する区間インデックス格納手段と、
前記入力された数値または数値範囲と前記部分区間との対応関係を表すデータである区間クエリを生成する区間クエリ生成手段と、
前記レコードのうち、前記区間フィルタと前記区間クエリとの論理積が当該区間クエリに等しくなる区間フィルタを含むレコードを選択する候補選択手段とを
備えることを特徴とする数値範囲検索装置。 - 検索対象となる数値範囲が入力された数値または数値範囲を包含していることを所定の適合条件とする
請求項1に記載の数値範囲検索装置。 - 検索対象となる数値範囲の全てを包含する区間の中で、互いに範囲が重ならない数値範囲を最も多く分離する値を境界値に選んで部分区間を作り、作られた前記部分区間それぞれの中において、互いに範囲が重ならない数値範囲を最も多く分離する値を境界値に選んでさらに部分区間を作ることを、所定の終了条件を満たすまで繰り返すことにより所定の境界値を決定する
請求項1または請求項2に記載の数値範囲検索装置。 - 区間インデックス格納手段は、区間フィルタが同じ値となる参照情報を1つのレコードに含み、1つのレコードに含まれる2つ以上の前記参照情報のうち、参照する数値範囲の間に包含関係が成り立つ参照情報同士を前記包含関係を表す形式で含む区間インデックスを格納する
請求項1乃至請求項3に記載の数値範囲検索装置。 - 入力された数値または数値範囲と、選択されたレコードに記録されている参照情報によって特定される文書に含まれる数値範囲とを比較し、所定の適合条件を満たす文書を検索結果として出力する適合性判定手段を備えた
請求項1乃至請求項4に記載の数値範囲検索装置。 - 最小値と最大値とのうちの少なくとも一方を用いて定義される数値の範囲を数値範囲とし、入力された数値または数値範囲に対して所定の適合条件を満たす数値範囲が含まれるデータを検索する数値範囲検索方法であって、
検索対象となる数値範囲について、全ての当該数値範囲が含まれる値域を所定の値で分割した部分区間と前記入力された数値または数値範囲との対応関係を表すデータである区間フィルタと、前記検索対象となる数値範囲を参照するための参照情報とを少なくとも含むレコードを単位とし、前記レコードを前記区間フィルタの少なくとも一部が共通するレコード単位にまとめた区間インデックスを記憶し、
前記入力された数値または数値範囲と前記部分区間との対応関係を表すデータである区間クエリを生成し、
前記レコードのうち、前記区間フィルタと前記区間クエリとの論理積が当該区間クエリに等しくなる区間フィルタを含むレコードを選択する
ことを特徴とする数値範囲検索方法。 - 最小値と最大値とのうちの少なくとも一方を用いて定義される数値の範囲を数値範囲とし、入力された数値または数値範囲に対して所定の適合条件を満たす数値範囲が含まれるデータを検索するための数値範囲検索プログラムであって、
検索対象となる数値範囲について、全ての当該数値範囲が含まれる値域を所定の値で分割した部分区間と前記入力された数値または数値範囲との対応関係を表すデータである区間フィルタと、前記検索対象となる数値範囲を参照するための参照情報とを少なくとも含むレコードを単位とし、前記レコードを前記区間フィルタの少なくとも一部が共通するレコード単位にまとめた区間インデックスを区間インデックス格納部に格納するコンピュータに、
前記入力された数値または数値範囲と前記部分区間との対応関係を表すデータである区間クエリを生成する区間クエリ生成処理と、
前記レコードのうち、前記区間フィルタと前記区間クエリとの論理積が当該区間クエリに等しくなる区間フィルタを含むレコードを選択する候補選択処理とを
実行させるための数値範囲検索プログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013521416A JP5924339B2 (ja) | 2011-06-21 | 2012-05-21 | 数値範囲検索装置、数値範囲検索方法、および数値範囲検索プログラム |
US14/124,778 US9465838B2 (en) | 2011-06-21 | 2012-05-21 | Numeric range search device, numeric range search method, and numeric range search program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011-137663 | 2011-06-21 | ||
JP2011137663 | 2011-06-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012176374A1 true WO2012176374A1 (ja) | 2012-12-27 |
Family
ID=47422240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/003300 WO2012176374A1 (ja) | 2011-06-21 | 2012-05-21 | 数値範囲検索装置、数値範囲検索方法、および数値範囲検索プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US9465838B2 (ja) |
JP (1) | JP5924339B2 (ja) |
WO (1) | WO2012176374A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015029258A1 (ja) * | 2013-09-02 | 2015-03-05 | 富士通株式会社 | 情報検索処理プログラム、装置、および方法 |
WO2018198192A1 (ja) * | 2017-04-25 | 2018-11-01 | 三菱電機株式会社 | 検索装置、検索システム、検索方法及び検索プログラム |
JP2019028933A (ja) * | 2017-08-03 | 2019-02-21 | 株式会社日立製作所 | 多次元データ管理システム及び多次元データ管理方法 |
US10320579B2 (en) | 2016-10-06 | 2019-06-11 | Fujitsu Limited | Computer-readable recording medium, index generating apparatus, index generating method, computer-readable recording medium, retrieving apparatus, and retrieving method |
US10872060B2 (en) | 2016-10-05 | 2020-12-22 | Fujitsu Limited | Search method and search apparatus |
WO2021039175A1 (ja) * | 2019-08-23 | 2021-03-04 | パナソニック株式会社 | 支援装置、生成装置、分析装置、支援方法、生成方法、分析方法、およびプログラム |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140149132A1 (en) * | 2012-11-27 | 2014-05-29 | Jan DeHaan | Adaptive medical documentation and document management |
US10424403B2 (en) | 2013-01-28 | 2019-09-24 | Siemens Aktiengesellschaft | Adaptive medical documentation system |
JP5842902B2 (ja) * | 2013-12-16 | 2016-01-13 | コニカミノルタ株式会社 | 画像処理システム及び画像処理プログラム並びに画像処理方法 |
US10394786B2 (en) * | 2015-04-20 | 2019-08-27 | Futurewei Technologies, Inc. | Serialization scheme for storing data and lightweight indices on devices with append-only bands |
CN106874318B (zh) * | 2016-06-08 | 2020-01-14 | 阿里巴巴集团控股有限公司 | 一种信息查询的方法及装置 |
CN114741215A (zh) * | 2022-04-21 | 2022-07-12 | 中国农业银行股份有限公司 | 一种消息分发方法及装置 |
CN116860828A (zh) * | 2023-06-16 | 2023-10-10 | 深圳市世强元件网络有限公司 | 一种区间数值检索方法、存储介质及计算机 |
CN116521713B (zh) * | 2023-06-30 | 2023-09-12 | 北京奥星贝斯科技有限公司 | 一种数据查询的方法、装置、设备及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH056398A (ja) * | 1991-06-28 | 1993-01-14 | Ricoh Co Ltd | 文書登録装置及び文書検索装置 |
JP2001297095A (ja) * | 2000-04-12 | 2001-10-26 | K-Tai Net:Kk | 施設検索装置 |
JP2009048352A (ja) * | 2007-08-17 | 2009-03-05 | Nippon Telegr & Teleph Corp <Ntt> | 情報検索装置、情報検索方法および情報検索プログラム |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6038561A (en) * | 1996-10-15 | 2000-03-14 | Manning & Napier Information Services | Management and analysis of document information text |
JP3547069B2 (ja) * | 1997-05-22 | 2004-07-28 | 日本電信電話株式会社 | 情報関連づけ装置およびその方法 |
JP3360617B2 (ja) | 1998-08-18 | 2002-12-24 | 日本電気株式会社 | 数値情報抽出装置および数値情報検索装置並びに数値情報抽出プログラムを記憶した記憶媒体、数値情報検索プログラムを記憶した記憶媒体 |
US20030014414A1 (en) * | 2000-12-07 | 2003-01-16 | Newman Bruce D. | Personcast - customized end-user briefing |
JP2006163995A (ja) | 2004-12-09 | 2006-06-22 | Matsushita Electric Ind Co Ltd | 索引作成装置及び文書検索装置 |
JP4618045B2 (ja) | 2005-05-18 | 2011-01-26 | 沖電気工業株式会社 | 範囲情報抽出装置、範囲情報抽出方法及び範囲情報抽出プログラム |
US7680789B2 (en) * | 2006-01-18 | 2010-03-16 | Microsoft Corporation | Indexing and searching numeric ranges |
JP5154832B2 (ja) | 2007-04-27 | 2013-02-27 | 株式会社日立製作所 | 文書検索システム及び文書検索方法 |
US8396871B2 (en) * | 2011-01-26 | 2013-03-12 | DiscoverReady LLC | Document classification and characterization |
US9436726B2 (en) * | 2011-06-23 | 2016-09-06 | BCM International Regulatory Analytics LLC | System, method and computer program product for a behavioral database providing quantitative analysis of cross border policy process and related search capabilities |
WO2013005505A1 (ja) * | 2011-07-05 | 2013-01-10 | 日本電気株式会社 | 暗号化装置、暗号文比較システム、暗号文比較方法、および暗号文比較プログラム |
US8832427B2 (en) * | 2012-03-30 | 2014-09-09 | Microsoft Corporation | Range-based queries for searchable symmetric encryption |
JP2014142822A (ja) * | 2013-01-24 | 2014-08-07 | Azbil Corp | データ作成装置および方法 |
-
2012
- 2012-05-21 WO PCT/JP2012/003300 patent/WO2012176374A1/ja active Application Filing
- 2012-05-21 JP JP2013521416A patent/JP5924339B2/ja not_active Expired - Fee Related
- 2012-05-21 US US14/124,778 patent/US9465838B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH056398A (ja) * | 1991-06-28 | 1993-01-14 | Ricoh Co Ltd | 文書登録装置及び文書検索装置 |
JP2001297095A (ja) * | 2000-04-12 | 2001-10-26 | K-Tai Net:Kk | 施設検索装置 |
JP2009048352A (ja) * | 2007-08-17 | 2009-03-05 | Nippon Telegr & Teleph Corp <Ntt> | 情報検索装置、情報検索方法および情報検索プログラム |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015029258A1 (ja) * | 2013-09-02 | 2015-03-05 | 富士通株式会社 | 情報検索処理プログラム、装置、および方法 |
JPWO2015029258A1 (ja) * | 2013-09-02 | 2017-03-02 | 富士通株式会社 | 情報検索処理プログラム、装置、および方法 |
US10872060B2 (en) | 2016-10-05 | 2020-12-22 | Fujitsu Limited | Search method and search apparatus |
US10320579B2 (en) | 2016-10-06 | 2019-06-11 | Fujitsu Limited | Computer-readable recording medium, index generating apparatus, index generating method, computer-readable recording medium, retrieving apparatus, and retrieving method |
WO2018198192A1 (ja) * | 2017-04-25 | 2018-11-01 | 三菱電機株式会社 | 検索装置、検索システム、検索方法及び検索プログラム |
JPWO2018198192A1 (ja) * | 2017-04-25 | 2019-11-07 | 三菱電機株式会社 | 検索装置、検索システム、検索方法及び検索プログラム |
JP2019028933A (ja) * | 2017-08-03 | 2019-02-21 | 株式会社日立製作所 | 多次元データ管理システム及び多次元データ管理方法 |
WO2021039175A1 (ja) * | 2019-08-23 | 2021-03-04 | パナソニック株式会社 | 支援装置、生成装置、分析装置、支援方法、生成方法、分析方法、およびプログラム |
JP7565930B2 (ja) | 2019-08-23 | 2024-10-11 | パナソニックホールディングス株式会社 | 支援装置、分析装置、支援方法、分析方法、およびプログラム |
US12147462B2 (en) | 2019-08-23 | 2024-11-19 | Panasonic Holdings Corporation | Support apparatus, generation apparatus, analysis apparatus, support method, generation method, analysis method, and non-transitory computer-readable recording medium |
Also Published As
Publication number | Publication date |
---|---|
US9465838B2 (en) | 2016-10-11 |
US20140156670A1 (en) | 2014-06-05 |
JPWO2012176374A1 (ja) | 2015-02-23 |
JP5924339B2 (ja) | 2016-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5924339B2 (ja) | 数値範囲検索装置、数値範囲検索方法、および数値範囲検索プログラム | |
US20240028651A1 (en) | System and method for processing documents | |
US8583419B2 (en) | Latent metonymical analysis and indexing (LMAI) | |
US8244769B2 (en) | System and method for judging properties of an ontology and updating same | |
US20170300565A1 (en) | System and method for entity extraction from semi-structured text documents | |
KR101511656B1 (ko) | 퍼스널 아이덴티티를 기술하는 데이터에 대한 액셔너블 속성의 애스클라이빙 | |
US20070088743A1 (en) | Information processing device and information processing method | |
US20210216578A1 (en) | Interactive patent visualization systems and methods | |
CN102640145A (zh) | 可信查询系统和方法 | |
JP6621514B1 (ja) | 要約作成装置、要約作成方法、及びプログラム | |
JP2011059935A (ja) | 設計チェック知識構築方法及びシステム | |
US20210240334A1 (en) | Interactive patent visualization systems and methods | |
Shigarov | Table understanding: Problem overview | |
Kavuluru et al. | Unsupervised extraction of diagnosis codes from EMRs using knowledge-based and extractive text summarization techniques | |
WO2014190246A1 (en) | Systems and methods for extracting specified data from narrative text | |
WO2016067396A1 (ja) | 文の並び替え方法および計算機 | |
JP2010250439A (ja) | 検索システム、データ生成方法、プログラムおよびプログラムを記録した記録媒体 | |
Torrisi et al. | Automated bundle pagination using machine learning | |
Harber et al. | Feasibility and utility of lexical analysis for occupational health text | |
JP5679400B2 (ja) | カテゴリ主題語句抽出装置及び階層的タグ付与装置及び方法及びプログラム及びコンピュータ読み取り可能な記録媒体 | |
JP4362492B2 (ja) | 文書インデキシング装置、文書検索装置、文書分類装置、並びにその方法及びプログラム | |
JP2009271772A (ja) | テキストマイニング方法、テキストマイニング装置、及びテキストマイニングプログラム | |
KR101078966B1 (ko) | 문서 분석 시스템 | |
JP2009217406A (ja) | 文書検索装置及び方法、並びに、プログラム | |
KR101116452B1 (ko) | 한정된 범위의 문서 공간의 텍스트 어노테이션을 위한 문서 순위화 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12802816 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013521416 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14124778 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12802816 Country of ref document: EP Kind code of ref document: A1 |