[go: up one dir, main page]

CN104424350B - knowledge processing device and method - Google Patents

knowledge processing device and method Download PDF

Info

Publication number
CN104424350B
CN104424350B CN201410346227.1A CN201410346227A CN104424350B CN 104424350 B CN104424350 B CN 104424350B CN 201410346227 A CN201410346227 A CN 201410346227A CN 104424350 B CN104424350 B CN 104424350B
Authority
CN
China
Prior art keywords
character string
attribute
candidate
amendment
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410346227.1A
Other languages
Chinese (zh)
Other versions
CN104424350A (en
Inventor
吉田笃弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Toshiba Digital Solutions Corp
Original Assignee
Toshiba Corp
Toshiba Solutions Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba Solutions Corp filed Critical Toshiba Corp
Publication of CN104424350A publication Critical patent/CN104424350A/en
Application granted granted Critical
Publication of CN104424350B publication Critical patent/CN104424350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0236Character input methods using selection techniques to select from displayed items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/268Lexical context

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)
  • Machine Translation (AREA)

Abstract

Embodiment is directed to use with the knowledge processing device and method that knowledge dictionary carries out the amendment of character string.The knowledge processing device (10) of embodiment possesses selector (104), generating unit (106) and correction portion (107).Selector (104) selects amendment Object Character string from file data.Generating unit (106) obtains condition based on the condition generation character string in the attribute file data different from amendment Object Character string, generation candidate.Correction portion (107) carries out the amendment to correcting Object Character string using the candidate for the SUB substitute character string for obtaining condition according to candidate and being obtained from knowledge dictionary (N).

Description

Knowledge processing device and method
The application enjoys the preferential of the previously proposed Japan's patent application the 2013-185634th on the 6th of September in 2013 The interests of power, and the full content including earlier application.
Technical field
Embodiment is directed to use with knowledge processing device, method and the program that knowledge dictionary carries out the amendment of character string.
Background technology
For example, as passing through OCR (Optical Character Recognition/Reader:Optical character Identification) etc. and character string that character recognition goes out is modified and is close to the technology of correct option, it is known to knowledge processing.Know Knowledge processing is, using as the character string of the object of amendment (hereinafter referred to as correcting Object Character string) and pre-prepd knowledge word Allusion quotation (word lexicon) compares, and corrects Object Character string with character string (word) displacement for being stored in knowledge dictionary as needed, from And carry out the amendment to correcting Object Character string.If for example, amendment Object Character string be the surname for representing name character string, Amendment Object Character string is compareed with storing the knowledge dictionary of the multiple words used in surname, if the word met is then put The positive object character string of repair.
However, in conventional knowledge processing, do not filtered out rightly from knowledge dictionary for replacing amendment Object Character The character string of string and the situation that does not obtain enough amendment precision is more, it is desirable to the raising of precision.
The content of the invention
Embodiment discloses the knowledge processing that accurately can carry out having used the amendment of the character string of knowledge dictionary Device and method.
The knowledge processing device of embodiment, the amendment of character string, knowledge processing device tool are carried out using knowledge dictionary It is standby:Selector, generating unit, correction portion.Selector is from comprising multiple character strings and by category of each character string with the character string Property file data, selection amendment Object Character string.Generating unit is based in the file data, attribute and the amendment object The different other character strings of character string, generation obtain the condition of the candidate of SUB substitute character string, and the SUB substitute character string is used to replace The amendment Object Character string.Correction portion uses the SUB substitute character string obtained according to the condition from the knowledge dictionary Candidate, carry out to it is described amendment Object Character string amendment.
Brief description of the drawings
Fig. 1 is the block diagram that the hardware configuration example of the knowledge processing device to embodiment is indicated.
Fig. 2 is the block diagram that the configuration example functionally of the knowledge processing device to embodiment is indicated.
Fig. 3 is a figure being indicated to file data.
Fig. 4 is a figure being indicated to knowledge dictionary.
Fig. 5 is the figure that pair situation of determination SUB substitute character string is schematically shown.
Fig. 6 is the figure illustrated to can not uniquely determine the example of SUB substitute character string.
Fig. 7 is the figure illustrated to can not uniquely determine the example of SUB substitute character string.
Fig. 8 is that the condition generation character string based on " residence " attribute is generated to obtain the amendment to " surname " attribute The candidate that Object Character string enters used in the candidate of the SUB substitute character string of line replacement obtains the figure that the example of condition illustrates.
Fig. 9 is that the condition generation character string based on " birthdate " attribute is generated to obtain to " name " attribute The amendment Object Character string SUB substitute character string that enters line replacement candidate used in candidate obtain what the example of condition illustrated Figure.
Figure 10 is to screen SUB substitute character string to the candidate using the SUB substitute character string for obtaining condition according to candidate and obtaining The figure that is schematically shown of situation.
Figure 11 is to screen SUB substitute character string to the candidate using the SUB substitute character string for obtaining condition according to candidate and obtaining The figure that is schematically shown of situation.
Figure 12 is the figure being indicated to the prompting example that the candidate of SUB substitute character string is prompted to the situation of user.
Figure 13 is the figure being indicated to the prompting example that the candidate of SUB substitute character string is prompted to the situation of user.
Figure 14 is a flow chart being indicated of the processing procedure of the knowledge processing device to embodiment.
Figure 15 is that the preferential candidate using the SUB substitute character string for obtaining condition according to the high candidate of relative importance value and obtaining is come Carry out the figure that the situation of the amendment to correcting Object Character string is schematically shown.
Figure 16 is a flow chart being indicated to carrying out the process of the screening of SUB substitute character string according to relative importance value.
Figure 17 is the flow chart being indicated to another example that the process of the screening of SUB substitute character string is carried out according to relative importance value.
Figure 18 is to will be given birth to the character string of " name " attribute of the amendment Object Character string adjoining of " surname " attribute for condition The figure that the example of the candidate of SUB substitute character string is schematically shown is obtained into character string.
Figure 19 is the figure illustrated to men and women's difference of name.
Figure 20 is to carry out the figure that the example of the screening of SUB substitute character string illustrates to men and women's difference using name.
Embodiment
Hereinafter, the knowledge processing device to embodiment and method are described in detail referring to the drawings.In following institute In the embodiment shown, it is contemplated that carry out the example of the amendment by the OCR character strings gone out and character recognition.However, pass through implementation The character string of the knowledge processing device amendment of mode is not limited by OCR and character string that character recognition goes out.Embodiment is known Know the situation that processing unit can be widely applied for carrying out the amendment of character string using knowledge dictionary.
Fig. 1 is the block diagram that the hardware configuration example of the knowledge processing device to embodiment is indicated.It is as shown in figure 1, real The hardware composition of common computer can be employed as by applying the knowledge processing device 10 of mode.That is, knowledge processing device 10 has It is standby:CPU(Central Processing Unit:CPU) 11, ROM (Read Only Memory:Read-only storage Device) 12, RAM (Random Access Memory:Random access memory) 13, hard disk drive, CD (Compact Disc: CD) driver, DVD (Digital Versatile Disc:Digital versatile disc) driver, flash memories etc. auxiliary Storage device 14, the bus 15 for connecting them etc..In addition, in knowledge processing device 10, it is connected with by wired or wireless The input unit 17 of display device 16, keyboard and/or the mouse of liquid crystal display etc. etc..
Fig. 2 is the block diagram that the configuration example functionally of the knowledge processing device 10 to embodiment is indicated.At knowledge Reason device 10 by the use of RAM13 as working region and execution by CPU11 for example by being stored in ROM12, auxilary unit 14 Program in, input unit 101, prompting part 102, receiving unit 103, selector 104, determination are thus realized as shown in Figure 2 Portion 105, generating unit 106, the inscape functionally of correction portion 107 and output section 108.
The input file data D of input unit 101.File data D is with its category comprising multiple character strings and to each character string The data of property.Character string is that have the set (word etc.) of the character of the meaning as overall.Attribute is the meaning of character string Classification, such as enumerate " surname " attribute of name, " name " attribute of name, " residence " attribute, " birthdate " attribute etc..File Data D can also include other information with string association in addition to comprising character string and its attribute.In this embodiment party In formula, as file data D, the file data for including the character string gone out by OCR and character recognition is used.In the case, The other information included as file data D, enumerate as to form character string each character character recognition result and Candidate character group of acquisition etc..
Fig. 3 is a figure being indicated to file data D.File data D shown in Fig. 3 includes " Bell wood ", " too Youth ", " on June 15th, 1970 ", " city in East Kyoto Prefectures " etc. are used as character string.Has " surname " work to character string " Bell wood " For attribute, have " name " to character string " Taro " and be used as attribute, have " the date of birth for character string " on June 15th, 1970 " Day " attribute is used as, for city in character string " East Kyoto Prefectures " have " residence " and be used as attribute.In addition, as with each word Other information of symbol string association, include candidate character group of each character for forming character string etc..
In addition, in the present embodiment, using the structure that have input the file data D for having attribute in advance by each character string Into, but can also be configured to assign the attribute of each character string included in file data D in the inside of knowledge processing device 10. For example, parsed by the meaning that natural language processing is performed in the inside of knowledge processing device 10, thus, it is possible to judge number of files According to the attribute of the D each character strings included.
Prompting part 102 is using display device 16 by various information alerts to user.For example, prompting part 102 can will be by defeated The file data D for entering the input of portion 101 is shown in display device 16 and is prompted to user.In the case, user can be while join According to suggested file data D, while specifying amendment Object Character string among the multiple character strings included from file data D Operate, specify to generate character string (the hereinafter referred to as condition generation character that candidate described later obtains condition and used String.) operation etc..In addition, prompting part 102 can also will be used for displacement amendment pair using display device 16 as described later As the candidate of the SUB substitute character string of character string is prompted to user.
The input operation (user's operation) that receiving unit 103 is carried out by reason user using input unit 17.For example, user When entering to be about to the arbitrary character string that file data D includes using input unit 17 and being appointed as correcting the operation of Object Character string, Receiving unit 103 accepts user operation, and amendment the specified of Object Character string is consigned into selector 104.In addition, used in user When input unit 17 enters to be about to the arbitrary character string that file data D is included and is appointed as the operation of condition generation character string, by Reason portion 103 accepts the user and operates and the specified of condition generation character string is consigned into generating unit 106.
Selector 104 corrects Object Character string from the file data D selections inputted by input unit 101.For example, in receiving unit In the case that 103 have accepted user's operation for specifying amendment Object Character string, selector 104 selects the word specified by user Symbol string is as amendment Object Character string.In addition, selector 104 can not also be by specifying for user and according to prespecified rule Then amendment Object Character string is selected from file data D.Repaiied for example, it is contemplated that selecting the character string of prespecified attribute to be used as successively The side of the method for positive object character string, whole character strings that select file data D is included successively as amendment Object Character string Method etc..
Determining section 105 uses knowledge dictionary N, is determined for replacing the amendment Object Character selected by selector 104 The processing of the SUB substitute character string of string.
Fig. 4 is a figure being indicated to knowledge dictionary N.Knowledge dictionary N storages obtain multiple as knowledge Information.The information that knowledge dictionary N is included is classified as multiple databases (DB).Each database to file data D substantially with including Character string appended by attribute it is corresponding.For example, in knowledge dictionary N shown in Fig. 4, comprising surname DB corresponding with " surname " attribute, with Name DB, residence DB corresponding with " residence " attribute etc. corresponding to " name " attribute.In addition, knowledge dictionary N be pre-stored within it is for example auxiliary Help storage device 14 etc..Or the knowledge dictionary N of the outside of knowledge processing device 10 can also be utilized.
Determining section 105 calls the attribute of the amendment Object Character string with being selected by selector 104 corresponding from knowledge dictionary N Database, and amendment Object Character string is compareed with the database, attempt the determination of SUB substitute character string.For example, the positive object of repairing In the case that the attribute of character string is " surname " attribute, determining section 105 calls surname DB from knowledge dictionary N.Then, it is determined that portion 105 asks The candidate character group for going out each character comprising amendment Object Character string (is known as identification candidate by the character of pattern match etc. Not acquired character group) character combination, one is only existed in surname DB in the character string of the surname consistent with the combination In the case of, the character string is defined as SUB substitute character string.In addition, the candidate character group of each character of amendment Object Character string, example Such as, as by the similar degree relative to corresponding character (amendment Object Character string include character) (to as character recognition knot The value of " the character (=answer) similarity " that each candidate of fruit provides, such as Euclidean distance) high order and with suitable The information of sequence provides.
Fig. 5 is the figure schematically shown to determining the situation of SUB substitute character string by determining section 105.In Fig. 5 example In son, as amendment Object Character string, the character string " assistant is thin " of " surname " attribute have selected.In addition, be set to provide " left side ", " low ", as candidate character group corresponding with " assistant " of the 1st character, provide " rattan ", " Sa ", as with Candidate character group corresponding to " thin " of 2nd character.In the case, determining section 105 calls surname DB from knowledge dictionary N, obtains " assistant " of 1st character, " left side ", " low ", " thin ", " rattan ", " Sa with the 2nd character ", combination, and sentence The character string of fixed each combination whether there is in surname DB.In the example of fig. 5, in the character string for the combination only obtained " assistant rattan " is present in surname DB.In the case, determining section 105 can uniquely determine character string " assistant rattan " for for amendment Object Character string " assistant is thin " enters the SUB substitute character string of line replacement.
Fig. 6 and Fig. 7 is that the figure that the example of SUB substitute character string illustrates can not be uniquely determined to determining section 105. In Fig. 6 example, as amendment Object Character string, the character string " Pu " of " surname " attribute have selected.In addition, it is set to provide " Lotus ", " thin ", " Shave ", as candidate character group corresponding with " Pu " of the 1st character, provide " pond ", " ", " he ", as with the 2nd character " " corresponding candidate character group.In the case, determining section 105 is from knowledge word Allusion quotation N calls surname DB, obtains " Pu ", " Lotus ", " thin " the, " Shave of the 1st character " and the 2nd character " ", " pond ", " ", " he ", combination, and judge each combination character string whether there is in surname DB.In the example of fig. 6, " Pu Chi " in the character string for the combination obtained, " Pu ", " Lotus ponds " these three combinations are present in surname DB.In the case, Determining section 105 can not uniquely determine to be used for enter amendment Object Character string " Pu " the SUB substitute character string of line replacement.
In the example of fig. 7, as amendment Object Character string, it have selected the character string " Fu Zi " of " name " attribute.In addition, set To provide " Holy ", " place ", " snow ", as candidate character group corresponding with " richness " of the 1st character.In this situation Under, determining section 105 from knowledge dictionary N call name DB, and obtain " richness ", " Holy of the 1st character ", " place ", " snow " and The combination of " son " of the 2nd character, and judge that the character string of each combination whether there is in name DB.In the example of fig. 7, ask " Fu Zi ", " Holy ", " snow " these three combinations in the character string of the combination gone out are present in a DB.In the case, really The SUB substitute character string for being used to enter amendment Object Character string " Fu Zi " line replacement can not be uniquely determined by determining portion 105.
In the case where determining section 105 can uniquely determine SUB substitute character string, identified SUB substitute character string is delivered To correction portion 107.In the case, correction portion 107 is changed by selector by using the SUB substitute character tandem arrangement determined by determining section 105 The amendment Object Character string of 104 selections, is thus modified the amendment of Object Character string.
On the other hand, in the case where can not uniquely determine the SUB substitute character string for replacing amendment Object Character string, Determining section 105 will be unable to determine that the intention of SUB substitute character string is notified to generating unit 106.
In addition, the processing of above-mentioned determining section 105 only one, determining section 105 carries out the determination of SUB substitute character string Method be not limited to above-mentioned example.Determining section 105 can use the various methods used in conventional knowledge processing to enter Row is used for the determination for replacing the SUB substitute character string of amendment Object Character string.
Generating unit 106 for example can not uniquely determined to be used to enter line replacement to amendment Object Character string by determining section 105 SUB substitute character string in the case of, based on condition generation character string, generate the condition of the candidate for obtaining SUB substitute character string (hereinafter referred to as candidate obtains condition.).Condition generation is attribute with repairing with the character string that character string is that file data D is included The different other character strings of positive object character string.Condition generation can operate institute by user as described above with character string The character string or the word of the other attributes prespecified with the attribute relative to amendment Object Character string specified Symbol string.For example, regulation uses the character string with " residence " attribute if the attribute of process object character string is " surname " in advance As condition generation with the rule of character string, if the attribute of process object character string is " name " using having " the date of birth Day " rule of the character string of attribute as condition generation character string, condition generation character can be determined according to the rule String.
Fig. 8 is to be generated to being based on the condition generation character string of " residence " attribute by generating unit 106 in order to obtain pair The example that candidate used in the candidate for the SUB substitute character string that the amendment Object Character string of " surname " attribute enters line replacement obtains condition enters The figure of row explanation.It is the situation of " surname " that condition generation, which is " residence " with the attribute of character string and corrects the attribute of Object Character string, Under, generating unit 106 for example can analysis condition generation character string and determine the region shown in the residence, and generation obtains the ground This candidate of the list of the distinctive surname in domain obtains condition.In the example of fig. 8, show generating unit 106 from " residence " attribute Condition generation obtains condition with the candidate that character string determines " Chong Okinawa " and generates the list for obtaining the distinctive surnames of " Chong Okinawa " Example.In the case of the example, list of the condition from the knowledge dictionary N distinctive surnames of " Chong Okinawa " obtained is obtained according to candidate Comprising character string turn into SUB substitute character string candidate.
Fig. 9 is to be generated to being based on the condition generation character string of " birthdate " attribute by generating unit 106 in order to take Candidate used in the candidate for the SUB substitute character string that amendment Object Character string that must be to " name " attribute enters line replacement obtains the example of condition The figure that son illustrates.Condition generation with the attribute of character string be " birthdate " and correct Object Character string attribute be In the case of " name ", generating unit 106 for example analysis condition generation character string and can determine year of birth, and generate for obtaining This candidate of the list of the welcome name of year of birth obtains condition.In the example of figure 9, show generating unit 106 from " year of birth Month day " attribute condition generation by the use of character string determine " 1980 " as year of birth and generating be used to obtaining " 1980 " by joyous The candidate for meeting the list of name obtains the example of condition.In the case of the example, obtain condition according to candidate and taken from knowledge dictionary N The character string that the list of the welcome name of " 1980 " that obtain includes turns into the candidate of SUB substitute character string.
Similarly, condition generation with the attribute of character string be " birthdate " and correct Object Character string attribute be In the case of " name ", generating unit 106 for example also can analysis condition generation character string and determine the Heavenly Stems and Earthly Branches of year of birth and generation This candidate of the list of the name relevant with the Heavenly Stems and Earthly Branches of year of birth obtains condition.In the example of figure 9, generating unit 106 is shown " occasion " are determined as the Heavenly Stems and Earthly Branches of year of birth by the use of character string from the condition generation of " birthdate " attribute and are generated for region The candidate of the list of relevant name obtains the example of condition with " occasion ".In the case of the example, with obtaining bar according to candidate The character string that the list of " occasion " that part obtains from knowledge dictionary N relevant name includes, turn into the candidate of SUB substitute character string.
Similarly, condition generation with the attribute of character string be " birthdate " and correct Object Character string attribute be In the case of " name ", generating unit 106 for example also can analysis condition generation character string and determine season and generate for obtaining This candidate of the list of the name relevant with season obtains condition.In the example of figure 9, show generating unit 106 from " year of birth The day moon " the condition generation of attribute determines " winter " as season by the use of character string and generates the name relevant for region and " winter " The candidate of list obtains the example of condition.In the case of the example, with obtaining what condition obtained from knowledge dictionary N according to candidate The character string that the list of " winter " relevant name includes, turn into the candidate of SUB substitute character string.
In addition, as Fig. 9 example, it can be based on a condition generation character string in generating unit 106 and generate more In the case of individual candidate acquirement condition, the time of SUB substitute character string can be obtained using the whole of this multiple candidates acquirement condition Mend, specified candidate acquirement condition for example can also be operated by user to obtain displacement using in multiple candidate acquirement conditions The candidate of character string.
In addition, above-mentioned candidate obtains condition only one, this is not limited to.Generating unit 106 can be based on attribute The condition generation character string different from amendment Object Character string, generate and line replacement is entered to amendment Object Character string for obtaining The various candidates of the candidate of SUB substitute character string obtain condition.
Correction portion 107 carries out the amendment of the amendment Object Character string to being selected by selector 104.For example, correction portion 107 exists The feelings for the SUB substitute character string for entering line replacement to amendment Object Character string are uniquely determined by determining section 105 as described above Under condition, with the positive object character string of identified SUB substitute character tandem arrangement repair, thus amendment Object Character string is modified.
In addition, correction portion 107 is not being determined uniquely for replacing the SUB substitute character string of amendment Object Character string and by giving birth to In the case of generating candidate acquirement condition into portion 106, condition is obtained according to the candidate generated by generating unit 106, from knowledge word Allusion quotation N obtains the candidate (list) of SUB substitute character string.Then, correction portion 107 uses the displacement for obtaining condition according to candidate and obtaining The candidate (list) of character string, carry out the amendment to correcting Object Character string.For example, the use of correction portion 107 obtains according to candidate Condition and the candidate (list) of SUB substitute character string that obtains screen SUB substitute character string, and with the SUB substitute character string filtered out to repairing Positive object character string enters line replacement, and thus amendment Object Character string is modified.
In addition, with candidate obtain the corresponding SUB substitute character string of condition candidate (list) can be out of knowledge dictionary N with Correct database corresponding to the attribute of Object Character string to obtain, special database can also be set in knowledge dictionary N in addition And obtained from the special database.As special database, such as enumerate the association corresponding with region of the distinctive surname in region And store database, store per year year of birth welcome name database, store by each Heavenly Stems and Earthly Branches it is relevant with the Heavenly Stems and Earthly Branches Name database, store by season the database of the name relevant with season.In addition, from correcting Object Character In the case that database corresponding to the attribute of string obtains the candidate (list) of SUB substitute character string corresponding with candidate acquirement condition, only The database that make each attribute in advance is that the form that condition extracts the relational database of information can be obtained according to candidate.
Figure 10 and Figure 11 is to the time by correction portion 107 using the SUB substitute character string for obtaining condition according to candidate and obtaining (list) is mended to screen the figure that the situation of SUB substitute character string is schematically shown.In addition, shown in Figure 10 example and Fig. 6 Example is corresponding, and Figure 11 example is corresponding with the example shown in Fig. 7.
In Figure 10 example, for " Pu " of the amendment Object Character string as " surname " attribute, belonged to based on " residence " Property condition generation character string and generate candidate and obtain condition, and the region obtained shown in condition generation character string is distinctive Candidate of the list of surname as SUB substitute character string.In the example shown in Fig. 6, as described above, including the character of candidate character group Combination in " Pu Chi ", " Pu ", " Lotus ponds " these three be all present in surname DB, it is thus determined that portion 105 can not be uniquely true Fixation changes character string.However, if the distinctive surname in region for the candidate of SUB substitute character string is obtained according to candidate acquirement condition List include " Pu Chi " and not comprising " Pu ", " Lotus ponds ", then it is " Pu Chi " that can screen SUB substitute character string.In this situation Under, correction portion 107 corrects Object Character string i.e. " Pu " with the SUB substitute character string that filters out i.e. " Pu Chi " displacement, thus, it is possible to right Amendment Object Character string is modified.
In Figure 11 example, the amendment Object Character string for " name " attribute is " Fu Zi ", based on " birthdate " The condition generation character string of attribute and generate candidate obtain condition, and obtain have with the season shown in condition generation character string Candidate of the list of the name of pass as SUB substitute character string.In the example shown in Fig. 7, as described above, including candidate character group Character combination in " Fu Zi ", " Holy ", " snow " these three are all present in a DB, it is thus determined that portion 105 can not be only One ground determines SUB substitute character string.If however, obtain condition according to candidate and be obtained the candidate and season for SUB substitute character string The list for saving relevant name includes " snow " and does not include " Fu Zi " and " Holy ", then can be by the screening of SUB substitute character string " snow ".In the case, correction portion 107 corrects Object Character string i.e. with the SUB substitute character string filtered out i.e. " snow " displacement " Fu Zi ", thus, it is possible to be modified to amendment Object Character string.
Alternatively, it is also possible to be, correction portion 107 simultaneously intactly replaces amendment object without the SUB substitute character string that filters out Character string, and the candidate of SUB substitute character string is prompted to user with prompting part 102, accepted by receiving unit 103 to suggested SUB substitute character string candidate carry out selection user operation in the case of, put by the candidate of selected SUB substitute character string The positive object character string of repair.
Figure 12 and Figure 13 is the prompting example to the candidate of SUB substitute character string to be prompted to the situation of user by prompting part 102 The figure being indicated.In addition, Figure 12 example is corresponding with the example shown in Figure 10, the example pair shown in Figure 13 example and Figure 11 Should.As shown in Figure 12 and Figure 13, prompting part 102 for example can be by the candidate of SUB substitute character string together with amendment Object Character string It is shown in display device 16 and is prompted to user.Now, it is in the candidate for the SUB substitute character string that will be prompted to, according to by generating unit The candidate of 106 generations obtains condition and is shown in upper from the candidate of the knowledge dictionary N SUB substitute character strings obtained or is highlighted, So as to which user is easily selected, this is desired.
In addition, the example shown in Figure 12 and Figure 13 only one, prompting part 102 is not limited to the example, can use each The candidate of SUB substitute character string is prompted to user by kind method.
Output section 108 exports the file data D ' after being corrected by correction portion 107 to amendment Object Character string.File Data D ' output form is arbitrary.Such as can be the display of display device 16, it can also be exported as text. In addition, file data D ' only includes character string, can also be deleted with the attribute that is assigned to each character string, other information State output.
Next, the action of the knowledge processing device 10 to embodiment illustrates.Figure 14 is to knowledge processing device One flow chart being indicated of 10 processing procedure.Knowledge processing device 10 is for example according to one shown in Figure 14 flow chart Series of processes process and act.
When knowledge processing device 10 starts action, first, the input file data D (step S101) of input unit 101.Connect down Come, the file data D selection amendment Object Character strings (step S102) that selector 104 inputs from step S101.
It is next determined that the process object character string selected in step S102 is compareed (step by portion 105 with knowledge dictionary N S103).Then, with the compareing of knowledge dictionary N as a result, being confirmed whether uniquely to determine to be used for dealing with objects word Accord with the SUB substitute character string (step S104) gone here and there into line replacement, (the step S104 in the case where SUB substitute character string is uniquely determined In be yes), correction portion 107 use determined by the positive object character string (step S105) of SUB substitute character tandem arrangement repair.
On the other hand, in the case where not determining SUB substitute character string uniquely (being no in step S104), the base of generating unit 106 Attribute in the file data inputted in step S101 the condition generation character string different from process object character string, generation Candidate obtains condition (step S106).
Then, correction portion 107 obtains condition according to the candidate generated in step S106 and obtains SUB substitute character from knowledge dictionary N The candidate (list) (step S107) of string, the screening of SUB substitute character string is carried out using the candidate (list) of the SUB substitute character string of acquirement (step S108).Afterwards, correction portion 107 replaces amendment Object Character string with the SUB substitute character string filtered out in step S108 (step S109).
Next, confirm whether the amendment of the file data D for being inputted in step S101 finishes (step S110), if Amendment does not finish (being no in step S110), then returns to step S102, later processing is repeated.On the other hand, it is if right File data D amendment has finished (being yes in step S110), then output section 108 exports corrected file data D ' (steps S111), and terminate a series of processing.
More than, as enumerated specific example is described in detail, the knowledge processing device 10 of embodiment, it is based on Attribute in the file data D condition generation character string different from amendment Object Character string, generate for acquirement to correcting pair The candidate for entering the candidate of the SUB substitute character string of line replacement as character string obtains condition.Then, condition is obtained according to the candidate of generation The candidate of SUB substitute character string is obtained from knowledge dictionary N, is carried out using the candidate of the SUB substitute character string of acquirement to correcting Object Character The amendment of string.Therefore, with simply being compareed amendment Object Character string with knowledge dictionary N and being modified repairing for Object Character string Positive situation compares, and can accurately carry out the amendment of character string.
In addition, it is modified the side of the amendment of Object Character string as using the character string beyond amendment Object Character string Method, such as known have the method being modified using postcode to the character string in residence.However, this method use and amendment pair As character string corresponding information, therefore for the character string that corresponding information is not present one to one, can not apply one to one. On the other hand, the knowledge processing device 10 of present embodiment is to be generated based on the condition generation in file data D with character string Candidate obtains condition and carried out using according to candidate acquirement condition from the candidate of the knowledge dictionary N SUB substitute character strings obtained The composition of the amendment of Object Character string is corrected, therefore high-precision amendment can be carried out for various character strings.
In addition, in the knowledge processing device 10 of present embodiment, formed by using following, so as to efficiently enter The amendment of the character string of row higher precision, it is above-mentioned to be configured to, can not uniquely it be determined in determining section 105 to correcting Object Character string Candidate is generated by generating unit 106 in the case of entering the SUB substitute character string of line replacement and obtains condition, the use of correction portion 107 is according to candidate Acquirement condition is modified the composition of the amendment of Object Character string from the candidate of the knowledge dictionary N SUB substitute character strings obtained.
In addition, in the knowledge processing device 10 of present embodiment, it is prompted to by using by the candidate of SUB substitute character string User and the composition for entering line replacement to amendment Object Character string with the candidate of SUB substitute character string selected by the user, so as to Accurately carry out the amendment of character string.
In addition, in the knowledge processing device 10 of present embodiment, by using allow user specify amendment Object Character string, The composition of condition generation character string, so as to efficiently carry out the amendment for meeting the character string of the purpose of user.
(variation 1)
The knowledge processing device 10 of embodiment can also use the composition for not possessing determining section 105.That is, knowledge processing fills Putting 10 can also be, the processing of SUB substitute character string is determined without amendment Object Character string is compareed with knowledge dictionary N, and only Condition is obtained from the candidate of the knowledge dictionary N SUB substitute character strings obtained using according to candidate, is carried out to correcting Object Character string Amendment.In the case, for example, for according to candidate obtain condition and from knowledge dictionary N obtain SUB substitute character string time Mend, obtain it with correcting the similar degree of Object Character string, the screening of SUB substitute character string is carried out by the similar degree.Thereby, it is possible to The screening of SUB substitute character string is rightly carried out, can accurately carry out the amendment to correcting Object Character string.
(variation 2)
The knowledge processing device 10 of embodiment can also use following form:Multiple candidates are generated in generating unit 106 In the case of acquirement condition, the multiple candidates generated are obtained conditional prompt to user by prompting part 102, and by receiving unit 103 Accept the composition for specifying the user for the relative importance value for obtaining condition to each candidate in multiple candidate acquirement conditions to operate.In this feelings Under condition, correction portion 107 preferentially obtains time of the condition from the knowledge dictionary N SUB substitute character strings obtained respectively using according to multiple candidates Bowl spares, the candidate of SUB substitute character string that obtains condition according to the high candidate of relative importance value and obtain, are carried out to correcting Object Character The amendment of string.
It can be based on a condition generation character string and the bar that is generated by generating unit 106 that multiple candidates, which obtain conditions, Part or the condition generated based on multiple condition generation character strings by generating unit 106.In addition it is also possible to using by User obtains multiple candidates that generating unit 106 generates the composition that the number of condition and its content specify together with relative importance value.
Figure 15 is to preferentially using the SUB substitute character for obtaining condition according to the high candidate of relative importance value and obtaining by correction portion 107 The candidate of string carries out the figure that is schematically shown of situation of the amendment to correcting Object Character string.In Figure 15 example In, select " name " attribute character string " great Play " as amendment Object Character string, there is provided " Hui ", " Trees ", as with Candidate character group corresponding to 2nd character " Play ".Here, obtaining condition according to multiple candidates, obtained from knowledge dictionary N multiple Name list, the list for obtaining condition according to relative importance value highest candidate and obtaining is set to the list of relative importance value 1, will be under The list that the high candidate of one relative importance value obtains condition and obtained is set to the list of relative importance value 2.Include in the list of relative importance value 1 Character string " great Hui ", include character string " big Trees " in the list of relative importance value 2.
In the case of Figure 15 example, in the combination of the character of the candidate character group comprising amendment Object Character string " great Hui " and " big Trees " turns into the candidate of SUB substitute character string, but correction portion 107 compares the " great Hui " that are included in the list of relative importance value 1 Included in the list of relative importance value 2 " big Trees " preferentially, and is " great Play " with character string " great Hui " displacement amendment Object Character strings, Thus, it is possible to be modified to amendment Object Character string.
Figure 16 is a flow chart being indicated to carrying out the process of the screening of SUB substitute character string according to relative importance value. Condition is obtained from the case that knowledge dictionary N achieves multiple lists according to multiple candidates, and correction portion 107 can be according to for example scheming Process shown in 16 flow chart, carry out the screening of SUB substitute character string.
Correction portion 107 substitutes into relative importance value X=1 (step S201), and the row that will correct Object Character string and relative importance value X first Table compares (step S202).Relative importance value X obtains condition for the candidate used in the acquirement of list with user and specified excellent First degree is corresponding.
Next, whether the candidate that correction portion 107 judges to be suitable for correcting Object Character string is contained in relative importance value X list In, any combination for specifically for example judging to contain in the combination of the character of the candidate character group of amendment Object Character string is The no list (step S203) for being contained in relative importance value X.Then, if the candidate for being suitable for correcting Object Character string is contained in preferentially X list (being yes in step S203) is spent, then correction portion 107 uses the candidate to be repaiied as SUB substitute character string with candidate displacement Positive object character string (step S204), and terminate a series of processing.
In addition, in the case where a list achieves multiple candidates for being suitable for amendment Object Character string, as long as such as In this multiple candidate, each characters are selected (to be included relative to the similar degree of amendment Object Character string with amendment Object Character string The consistent character of character, it turns into maximum by the similar degree of character) total highest candidate as SUB substitute character string, go forward side by side Amendment of the row to amendment Object Character string.
On the other hand, if the candidate for being suitable for amendment Object Character string is not included in relative importance value X list (step S203 In be no), then whether correction portion 107 makes relative importance value X value increase (step S205), and judge relative importance value X value than from knowledge The number (list number) for the list that dictionary N is obtained is big (step S206).Then, if relative importance value X value is (step below list number It is no in rapid S206), then return to step S202 and later processing is repeated.On the other hand, if relative importance value X value ratio List number is big (being yes in step S206), then terminates a series of processing.
In addition, in the example above, it is modified the control of Object Character string successively from the high list of relative importance value, It is found that and is suitable for making amendment Object Character string relative to the control of list terminate at the time of correcting the candidate of Object Character string, makes As SUB substitute character string and the amendment to correcting Object Character string is carried out by the use of the candidate of acquisition.However, it is also possible to it is not to send out to be Show and be suitable for making amendment Object Character string relative to the control of list terminate at the time of correcting the candidate of Object Character string, but For the candidate obtained from each list, score (assigned to each candidate that list is recorded, list is calculated using the relative importance value of list In " answer similarity " value), final choice has been assigned the candidate of top score as SUB substitute character string, and carries out to repairing The amendment of positive object character string.
Figure 17 is the flow chart being indicated to another example that the process of the screening of SUB substitute character string is carried out according to relative importance value, It is the example that score is assigned by each candidate obtained from list.Correction portion 107 can also be according to shown in the flow chart of the Figure 17 Process carry out the screening of SUB substitute character string.
Correction portion 107, relative importance value X=1 (step S301) is substituted into first, and will correct Object Character string and relative importance value X's List compares (step S302).Relative importance value X corresponds to be referred to by user for candidate acquirement condition used in the acquirement of list Fixed relative importance value.
Next, whether the candidate that correction portion 107 judges to be suitable for correcting Object Character string is contained in relative importance value X row Table, specifically whether any combination in the combination of the character of candidate character group of such as judgement comprising amendment Object Character string It is contained in relative importance value X list (step S303).Then, if the candidate for being suitable for correcting Object Character string is contained in relative importance value X list (being yes in step S303), then correction portion 107 calculate corresponding with candidate score (step S304).
Score corresponding with candidate can for example use the relative importance value of the list comprising the candidate is more high, take bigger Value obtained from the weight of value is multiplied with the similar degree relative to amendment Object Character string of the candidate.It is in addition, corresponding with candidate Score can also for example use the order in the list of above-mentioned weight with the candidate is multiplied obtained from value.In this situation Under, the order in the list of candidate provides such as corresponding to the grade of fit of condition is obtained for candidate corresponding with list. In addition it is also possible to using only above-mentioned weight as score corresponding with candidate.In addition, it is found that identical in multiple lists In the case of candidate, the score of the candidate calculated by each list can be added and obtain final score.
On the other hand, if the candidate for being suitable for amendment Object Character string is not included in relative importance value X list (step S303 In be no), then correction portion 107 will not carry out the calculating of step S304 score, and enter step S305.
Next, correction portion 107 make relative importance value X value increase (step S305), and judge relative importance value X value whether than from The number (list number) for the list that knowledge dictionary N is obtained is big (step S306).Then, if relative importance value X value be list number with Under (being no in step S306), then return to step S302 and repeatedly later processing.On the other hand, if relative importance value X value Bigger (being yes in step S306) than list number, then the candidate of the maximum score in the candidate obtained using the processing more than is made For SUB substitute character string, amendment Object Character string (step S307) is replaced with the candidate, and terminate a series of processing.
In addition, in the example above, to using the whole lists obtained according to candidate acquirement condition from knowledge dictionary N Situation be illustrated, but limitation can also be provided with to the number of the list used, such as by Y from the high order of relative importance value Individual list carries out the processing shown in Figure 17 as object.In the case, above-mentioned steps S306 be treated as judging it is preferential Spend the processing whether bigger than Y of X value.
In addition it is also possible to be, the relative importance value X of the list to using sets threshold value, and relative importance value X value is smaller than threshold value List (with relative importance value X compared with the consistent list of threshold value by preferential list) carry out the processing shown in Figure 17 as object. In addition it is also possible to threshold value now is set to correspond to the score of the candidate obtained and dynamically change.For example, it is also possible to be, The similar degree of 1 candidate of the positive object character string of repairing is more than 800 point and differs more than 100 points with 2 candidates and thinks to repair Positive necessity is not high in this case, in order to realize using only more believable list and by for the relative importance value X's of list Threshold value is set to 4, and the list that relative importance value X value is more than 4 is excluded from the object of processing.In the case, above-mentioned steps The S306 value for being treated as judging relative importance value X whether the processing for being more than Y.
As previously discussed, in the knowledge processing device 10 of this variation, condition will be obtained according to the high candidate of relative importance value And the candidate of the SUB substitute character string obtained preferentially uses, to carry out the amendment to correcting Object Character string, so can be more appropriate Ground carries out the screening of SUB substitute character string, can accurately carry out the amendment for correcting Object Character string.
(variation 3)
The knowledge processing device 10 of embodiment can also be configured to, and the attribute of the positive object character string of repairing is " surname " attribute In the case of, using the character string of " name " attribute abutted with the process object character string as condition, character string next life is used in generation Obtain condition into candidate, in the case that the attribute of the positive object character string of repairing is " name " attribute, using with the process object character The character string of adjacent " surname " attribute of string obtains condition with character string as condition generation to generate candidate.
The character string of the character string of " surname " attribute and " name " attribute adjacent therewith, is the character string for representing same personage, According to the nationality of the personage, sex etc., both there is a situation where related more.For example, it is envisioned that it is, if the word to " surname " attribute Symbol string is the character string for representing the distinctive surname of intrinsic state, then the character string of " name " attribute also illustrates that the intrinsic name of the state.This Outside, it is contemplated that be that, if the distinctive surname of string representation women of " surname " attribute, the character string of " name " attribute also illustrates that women spy Some names.Therefore, in the case that the attribute of the positive object character string of repairing is " surname " attribute, abutted with the process object character string " name " attribute character string can turn into screening be used for replace amendment Object Character string SUB substitute character string candidate when have The information of benefit.In addition, in the case that the attribute of the positive object character string of repairing is " name " attribute, abutted with the process object character string " surname " attribute character string can turn into screening be used for replace amendment Object Character string SUB substitute character string candidate when have The information of benefit.
Figure 18 is to will be given birth to the character string of " name " attribute of the amendment Object Character string adjoining of " surname " attribute for condition The figure that the example of the candidate of SUB substitute character string is schematically shown is obtained into character string.In Figure 18 example, The character string " Kavfman " of " surname " attribute is selected as correcting Object Character string.In the case, the use of generating unit 106 with The character string " Jacob " of " Kavfman " adjacent " name " attribute is used as condition generation character string.
Generating unit 106 compares as " Jacob " of condition generation character string with knowledge dictionary N first, and retrieves one The character string of cause.The list of the intrinsic name in various countries gathered here, being set to include in knowledge dictionary N, gathered various countries The list of intrinsic surname.Also, " the Jew's name list " for having gathered the distinctive name of Jew is contained at " Jacob " In the case of, such as generation of generating unit 106 obtains the intrinsic list of surname this candidate acquirement condition of Jew.In this situation Under, correction portion 107 obtains condition according to the candidate generated by generating unit 106 and obtains " list of Jew's surname " from knowledge dictionary N And " list of Jew's surname " is used to carry out the amendment to " Kavfman " as amendment Object Character string.In Figure 18 example In, Object Character string is corrected in " Kaufman " displacement included with " list of Jew's surname " obtained from knowledge dictionary N " Kauvman " is modified.
Figure 19 is the figure illustrated to men and women's difference of name, represents that the men and women of the name of Russian is poor.Such as Figure 19 institutes Show, the name of Russian presses masculinity and femininity, and its end (the especially end of surname) changes respectively.
Figure 20 is to carry out the figure that the example of the screening of SUB substitute character string illustrates to men and women's difference using name.Scheming In 20 example, the character string " Yulii " of " name " attribute is selected as correcting Object Character string, as " i " with the 5th character Corresponding candidate character group, there is provided " j ", " l ", " f ", " a ".In the case, the use of generating unit 106 with The character string " Ivanova " of " Yulii " adjacent " surname " attribute is used as condition generation character string.
Generating unit 106 is according to the people for being determined as " Ivanova " expression with " Ivanova " of character string as condition generation Thing is women, and generation obtains the candidate acquirement condition of the list of Irene.In the case, correction portion 107 is according to by generating The candidate that portion 106 generates obtains condition and obtains " women list of file names " from knowledge dictionary N, and will be repaiied using " women list of file names " is somebody's turn to do The candidate character definition of 5th character of positive object character string " Yulii " is " a ".Thus, SUB substitute character string be screened for " Yulia ", amendment Object Character string " Yulii " are replaced into " Yulia ".
As previously discussed, in the knowledge processing device 10 of this variation, adjacent " surname " attribute character string and A side in the character string of " name " attribute is selected as in the case of correcting Object Character string, and the opposing party is used in into condition generation With character string and candidate acquirement condition is generated, and uses the displacement word for obtaining condition according to the candidate and being obtained from knowledge dictionary N The candidate of string is accorded with, is modified the amendment of Object Character string.Therefore, it is possible to more rightly carry out the screening of SUB substitute character string, energy Enough amendments accurately carried out to correcting Object Character string.
Each function in the knowledge processing device 10 of embodiments described above is formed, such as is being used the computer as In the case that the hardware of knowledge processing device 10 is formed, regulated procedure can be performed by using the computer and realized.By with Make knowledge processing device 10 computer perform program for example in the form of installable or executable form file record In CD-ROM (Compact Disk Read Only Memory), floppy disk (FD), CD-R (Compact Disk Recordable), DVD (Digital Versatile Disc) etc. computer-readable recording medium and be used as computer Program product provides.
In addition it is also possible to it is configured to, by the way that the program storage performed by the computer as knowledge processing device 10 is existed Downloaded on other computers that network connection with internet etc. and via network to provide.In addition it is also possible to form To provide or issuing the program performed by the computer as knowledge processing device 10 via the network of internet etc..In addition, It is configured to, the program performed by the computer as knowledge processing device 10 is previously charged into the ROM12 of computer-internal Etc. providing.
The program performed by the computer as knowledge processing device 10 turns into comprising knowledge processing device 10 functionally Inscape (input unit 101, prompting part 102, receiving unit 103, selector 104, determining section 105, generating unit 106, correction portion 107 and output section 108) module composition, as reality hardware, for example, CPU11 (processor) from aforementioned recording medium read Go out simultaneously configuration processor, thus above-mentioned each inscape is loaded onto on RAM13 etc. main storage portion, above-mentioned each inscape It is generated on main storage portion.In addition, the inscape functionally of knowledge processing device 10, can also use ASIC (Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array) Deng special hardware realize one part or whole.
By the knowledge processing device 10 of embodiment as described above, possess:Selector 104, selected from file data D Select process object character string;Generating unit 106, given birth to based on the condition different from process object character string of the attribute in file data D Condition is obtained into character string, generation candidate;And correction portion 107, taken using condition is obtained according to candidate from knowledge dictionary N The candidate of the SUB substitute character string obtained, the amendment to correcting Object Character string is carried out, thus, it is possible to accurately used knowing Know the amendment of dictionary N character string.
More than, embodiments of the present invention are illustrated, but the embodiment is to prompt as an example, it is intended that The scope for limiting invention is not lain in.The new embodiment can be implemented in a manner of others are various, not depart from the master of invention In the range of purport, various omissions, displacement, change can be carried out.These embodiments and its deformation are contained in scope, the master of invention Purport, and it is contained in the invention of claims record and its equivalent scope.

Claims (11)

1. a kind of knowledge processing device, the amendment of character string is carried out using knowledge dictionary, the knowledge processing device possesses:
Selector, repaiied from comprising multiple character strings and by the file data of attribute of each character string with the character string, selection Positive object character string;
Generating unit, it is raw based in the file data, the attribute other character strings different from the amendment Object Character string Into the condition for the candidate for obtaining SUB substitute character string, the SUB substitute character string is used to replace the amendment Object Character string;And
Correction portion, using the candidate of the SUB substitute character string obtained according to the condition from the knowledge dictionary, carry out pair The amendment of the amendment Object Character string,
The attribute includes surname attribute and name attribute, and the surname attribute represents that character string is the surname of name, and this attribute represents character String is the name of name,
In the case where the attribute of the amendment Object Character string is surname attribute, the generating unit is based on and the amendment object word Adjacent other character strings of symbol string and be the character string with name attribute to generate the condition, in the amendment object word In the case that the attribute of symbol string is an attribute, the generating unit is based on other characters with the amendment Object Character string adjoining Go here and there and be that character string with surname attribute generates the condition.
2. knowledge processing device as claimed in claim 1,
Determining section is also equipped with, the determining section independently determines the SUB substitute character string with the condition,
In the case where the determining section can not determine the SUB substitute character string, the generating unit generates the condition,
In the case where the determining section can determine the SUB substitute character string, the displacement word determined by the correction portion use Symbol tandem arrangement changes the amendment Object Character string, described to repair in the case where the determining section can not determine the SUB substitute character string Positive portion uses the candidate of the SUB substitute character string obtained according to the condition, the amendment Object Character string is repaiied Just.
3. knowledge processing device as claimed in claim 1, is also equipped with:
Prompting part, the candidate of the SUB substitute character string is prompted to user;And
Receiving unit, user's operation that selection is carried out to the candidate of the suggested SUB substitute character string is accepted,
The correction portion replaces the amendment Object Character string with the candidate of the selected SUB substitute character string.
4. knowledge processing device as claimed in claim 1, is also equipped with:
Prompting part, the file data is prompted to user;And
Receiving unit, the user's operation specified to the arbitrary character string in the suggested file data is accepted,
The selector selection operates specified character string as the amendment Object Character string by user.
5. knowledge processing device as claimed in claim 1, is also equipped with:
Prompting part, the file data is prompted to user;And
Receiving unit, the user's operation specified to the arbitrary character string in the suggested file data is accepted,
The generating unit is based on operating specified character string by user and be attribute and described to correct Object Character string different Other character strings, generate the condition.
6. knowledge processing device as claimed in claim 1,
The attribute of the amendment Object Character string is to represent that the character string is the attribute of the surname of name and other character strings Attribute be that the generating unit generation is obtained with other words in the case of representing the residence attribute that the character string is residence Candidate, described condition of the distinctive surname in region as the SUB substitute character string shown in symbol string.
7. knowledge processing device as claimed in claim 1,
The attribute of the amendment Object Character string is to represent that the character string is the attribute of the name of name and other character strings Attribute be in the case of representing the birthdate attribute that the character string is birthdate, the generating unit generation obtain with Candidate, described condition of the welcome name in the year shown in other character strings as the SUB substitute character string.
8. knowledge processing device as claimed in claim 1,
The attribute of the amendment Object Character string is to represent that the character string is the attribute of the name of name and other character strings Attribute be in the case of representing the birthdate attribute that the character string is birthdate, the generating unit generation obtain with Candidate, described bar of the name relevant using the Heavenly Stems and Earthly Branches in the year shown in other character strings as the SUB substitute character string Part.
9. knowledge processing device as claimed in claim 1,
The attribute of the amendment Object Character string is to represent that the character string is the attribute of the name of name and other character strings Attribute be in the case of representing the birthdate attribute that the character string is birthdate, the generating unit generation obtain with Candidate, described condition of the name relevant using the season shown in other character strings as the SUB substitute character string.
10. knowledge processing device as claimed in claim 1,
The generating unit generates multiple conditions,
The knowledge processing device is also equipped with:
Prompting part, prompt the multiple conditions generated;And
Receiving unit, accept pair and to be operated with suggested multiple conditions user that corresponding relative importance value is specified respectively,
The correction portion is preferentially used in the candidate of the SUB substitute character string obtained respectively according to multiple conditions, pressed The candidate of the SUB substitute character string obtained according to the high condition of the relative importance value, to carry out to the amendment Object Character The amendment of string.
11. a kind of method of knowledge processing, performed in the knowledge processing device using the amendment of knowledge dictionary progress character string Method of knowledge processing, the method for knowledge processing include:
The knowledge processing device is from comprising multiple character strings and by the number of files of attribute of each character string with the character string According to the step of selecting to correct Object Character string;
The knowledge processing device is based on the others different from the amendment Object Character string of the attribute in the file data Character string, generation obtain to it is described amendment Object Character string enter line replacement SUB substitute character string candidate condition the step of;With And
The knowledge processing device uses the time of the SUB substitute character string obtained according to the condition from the knowledge dictionary Mend, carry out to it is described amendment Object Character string amendment the step of,
The attribute includes surname attribute and name attribute, and the surname attribute represents that character string is the surname of name, and this attribute represents character String is the name of name,
In the step of generating the condition, it is described amendment Object Character string attribute be surname attribute in the case of, based on Other character strings of the amendment Object Character string adjoining and be the character string with name attribute to generate the condition, In the case that the attribute of the amendment Object Character string is an attribute, based on the others with the amendment Object Character string adjoining Character string and be that character string with surname attribute generates the condition.
CN201410346227.1A 2013-09-06 2014-07-21 knowledge processing device and method Active CN104424350B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013-185634 2013-09-06
JP2013185634A JP6304979B2 (en) 2013-09-06 2013-09-06 Knowledge processing apparatus, method and program

Publications (2)

Publication Number Publication Date
CN104424350A CN104424350A (en) 2015-03-18
CN104424350B true CN104424350B (en) 2017-12-01

Family

ID=52701916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410346227.1A Active CN104424350B (en) 2013-09-06 2014-07-21 knowledge processing device and method

Country Status (2)

Country Link
JP (1) JP6304979B2 (en)
CN (1) CN104424350B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6679350B2 (en) * 2016-03-09 2020-04-15 キヤノン株式会社 Information processing apparatus, program, and information processing method
CN113095325B (en) * 2021-05-11 2021-11-09 浙江华是科技股份有限公司 Ship identification method and device and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1215201A (en) * 1997-10-16 1999-04-28 富士通株式会社 Character Recognition/Correction Method
CN102147854A (en) * 2010-02-08 2011-08-10 冲电气工业株式会社 Bill processing system, log in terminal and bill data processing method
CN103186524A (en) * 2011-12-30 2013-07-03 高德软件有限公司 Address name identification method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS59229683A (en) * 1983-06-10 1984-12-24 Toshiba Corp Recognition processor
JPH10232906A (en) * 1997-02-19 1998-09-02 Sharp Corp Character recognition method
JP2000148912A (en) * 1998-11-09 2000-05-30 Canon Inc Name recognition device, name recognition method and storage medium
JP2000311170A (en) * 1999-04-27 2000-11-07 Hitachi Ltd Text information extraction method
JP2004086619A (en) * 2002-08-27 2004-03-18 Toshiba Corp Full name chinese character retrieval system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1215201A (en) * 1997-10-16 1999-04-28 富士通株式会社 Character Recognition/Correction Method
CN102147854A (en) * 2010-02-08 2011-08-10 冲电气工业株式会社 Bill processing system, log in terminal and bill data processing method
CN103186524A (en) * 2011-12-30 2013-07-03 高德软件有限公司 Address name identification method and device

Also Published As

Publication number Publication date
CN104424350A (en) 2015-03-18
JP2015052933A (en) 2015-03-19
JP6304979B2 (en) 2018-04-04

Similar Documents

Publication Publication Date Title
US9898464B2 (en) Information extraction supporting apparatus and method
JP4860903B2 (en) How to automatically index documents
CN110866836A (en) Computer-implemented medical insurance scheme auditing method and device
WO2016057000A1 (en) Customs tariff code classification
JP2012234106A (en) Automatic question creating device and creating method
WO2020065970A1 (en) Learning system, learning method, and program
CN104424350B (en) knowledge processing device and method
KR20230084635A (en) System based on natural language process for constructing questionnaire and verifying the reliabiliy of answer and method thereof
US10073891B2 (en) Forensic system, forensic method, and forensic program
CN117291149A (en) Rarely used word decoding method and device, electronic equipment and storage medium
US8112707B2 (en) Capturing reading styles
JP2013016036A (en) Document component generation method and computer system
JP7457392B2 (en) Document processing method, document processing program, and information processing device
Wilkins Automated title and abstract screening for scoping reviews using the GPT-4 Large Language Model
Mayatopani Multi-criteria decision making using weighted aggregated sum product assessment in corn seed selection system
Rynd Dictionaries and the interpretation of words: A summary of difficulties
Yoshioka et al. HUKB at COLIEE2018 information retrieval task
Bergeron et al. ULA, a Bibliometric Method to Identify Sustainable Development Goals using Large Language Models
JP7347559B2 (en) Dialogue knowledge creation device and dialogue knowledge creation program
KR20220005169A (en) Method for analysis of legal document using artificial intelligence, and apparatus for the same
Simon et al. The distinct link of perfectionism with positive and negative mental health outcomes
Munz et al. String coding in a generic framework
US20070280537A1 (en) Balancing out-of-dictionary and in-dictionary recognition scores
Henley et al. On the Books: Jim Crow and Algorithms of Resistance White Paper
Ganguly Leveraging Generative AI for Optimal Release Timing and Star Power Impact Analysis in Film and Television

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant