CN108197470A - Fast signature scan - Google Patents
Fast signature scan Download PDFInfo
- Publication number
- CN108197470A CN108197470A CN201711339378.4A CN201711339378A CN108197470A CN 108197470 A CN108197470 A CN 108197470A CN 201711339378 A CN201711339378 A CN 201711339378A CN 108197470 A CN108197470 A CN 108197470A
- Authority
- CN
- China
- Prior art keywords
- condition code
- fingerprint
- code
- scanning
- condition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 71
- 230000015654 memory Effects 0.000 claims description 103
- 238000007781 pre-processing Methods 0.000 claims description 24
- 101150060512 SPATA6 gene Proteins 0.000 description 80
- 239000002131 composite material Substances 0.000 description 24
- 230000015572 biosynthetic process Effects 0.000 description 20
- 238000003786 synthesis reaction Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 16
- 230000033228 biological regulation Effects 0.000 description 11
- 238000004590 computer program Methods 0.000 description 11
- 239000000047 product Substances 0.000 description 11
- 239000003550 marker Substances 0.000 description 10
- 239000000203 mixture Substances 0.000 description 9
- 238000012795 verification Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 230000006978 adaptation Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000006837 decompression Effects 0.000 description 6
- 230000002829 reductive effect Effects 0.000 description 5
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 239000004744 fabric Substances 0.000 description 4
- 238000007689 inspection Methods 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 239000012528 membrane Substances 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 241000555293 Bassariscus astutus Species 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 238000010408 sweeping Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
- G06Q30/0225—Avoiding frauds
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Collating Specific Patterns (AREA)
Abstract
The method and system of scanning feature code on String field.In one embodiment, the present invention provides a kind of signature scan methods.The method includes one or more features code is processed into one or more forms, the form includes one or more fingerprints of each fixed length condition code or feature subcode and one or more follow-up searching data structures, so that the number of the fingerprint of each fixed length condition code or feature subcode is equal to the step-length of signature scan operation, and the specific fixed length condition code or feature subcode can be identified on any position in any scanned String field, receive specific character string field, identify any condition code included by the specific character string field, it is included in each using scanning step to scan the fingerprint on the position of spacing, with there are one or the position of multiple matched fingerprints on search the follow-up searching data structure, with any identified condition code of output.
Description
Technical field
The present invention relates to the condition codes in scanning String field.
Background technology
The object (such as file, program, webpage, Email, internet data packet or digital picture) of digital content can be with
Include one or more String fields.One String field is a data value for typically representing word or executable code
String.For example, an internet data packet can include network address, host name, Hypertext Transfer Protocol (HTTP) header, hypertext biography
Defeated agreement message, e-mail attachment, Email Header and Email content.The size of one String field can be from several
A byte is to millions of with last byte.One character string condition code can be a string of specific data values indicated completely or
The expression formula (such as specific regular expression) of specific data value, the purpose is to be used for identifying a character string object (such as spy
Fixed computer virus or specific gene order).Condition code can be stored in a sign code database.One word levies code
Database can include multiple condition codes.The size of one character string condition code can be from several bytes to thousands of a bytes.
Character string condition code and String field are all the bit strings for including many basic units.One basic unit
It is the minimum semantic unit that has, therefore scanning element is used as usually in signature scan technology.One basic unit it is big
It is small by application depending on.For example, the base unit of English character string is typically 8 bits (i.e. a byte), and a computer disease
The base unit of malicious condition code is typically a byte or nybble.
The basic unit of each condition code can be designated as being equal or different to some particular value or specific at some
In the range of (such as in the digital scope 0 to 9 or in English alphabet range a to z).Each basic unit can be case-insensitive
It is or case sensitive.Each basic unit can support simple logical operation (such as " non-").In addition, each condition code can wrap
Include asterisk wildcard, for example, " * " (a random length asterisk wildcard) or "" (a fixed length asterisk wildcard), wherein " * " represents zero or appoints
Multiple arbitrary basic units of anticipating and "" represent an arbitrary basic unit.It, can be into one for each random length feature code sign
Step indicates its random length range.When a condition code includes random length character, the indefinite length of condition code.An if feature
Code does not include random length character, and length is fixed.
One typical signature scan process may include on all possible position in a String field, compare
Corresponding condition code in the String field and condition code database.Sweep speed is usually by the size of condition code and complexity
Property limitation.In addition, sweep speed also by condition code is newer one by one can power limit.
Invention content
An embodiment of the present invention provides the method and systems of the scanning feature code on String field.In general, this hair
The embodiment of bright one side provides character string signature scan method, and the method includes at one or more features code
One or more forms are managed into, the form includes each fixed length feature subcode of each fixed length condition code or random length condition code
One or more fingerprints and one or more follow-up searching data structures, it is special that one or more of fingerprints include specific fixed length
Code or the j-th fingerprint of feature subcode are levied, the first basic unit of the j-th fingerprint is in the specific fixed length condition code or spy
The remainder for levying the step-length of position in a scanning direction divided by the signature scan operation in subcode is equal to J, so that described
The number of fingerprint is equal to the step-length of signature scan, and so that the specific fixed length condition code or feature subcode are swept any
It can be identified on any position in the String field retouched, wherein each fingerprint includes specific fixed length condition code or spy
One or more segments of subcode are levied, one or more of segments have in the specific fixed length condition code or feature subcode
From anywhere in specific position, receive a specific character string field being made of data value, identify the specific character string
Any condition code included by field is included in each using scanning step on the position of spacing, to scan the specific character string
Field, with the one or more fingerprints of searching one or more features code and there are one or multiple matched fingers
On the position of line, the specific character string field is searched, to search described in one or more follow-up searching data structures and output
Any identified condition code in specific character string field.The other embodiments of the aspect of the present invention include the method
Corresponding system, device and computer software product.
These and other embodiment optionally includes one or more following characteristics.Each fixed length condition code or feature subcode
Have multiple fingerprints and it is described scanning be included in it is each using scanning step on the position of spacing, to scan the String field, with
Multiple fingerprints of one or more features code are searched, including two or more fingerprints of parallel search.The one of special characteristic code
Each fingerprint in a or multiple fingerprints is to indicate completely in former space or after one or more shadow spaces are projected to
, the shadow space is space more wider array of than the original Space format, and the shadow space is arrived by introducing some ambiguities
The original space, so that a fingerprint shadow in specific shadow space corresponds to one or more in the former space
Fingerprint.
The character string signature scan method can further comprise on each position using scanning step as spacing,
Specific character string field described in former spacescan, to search one or more fingerprints and first each using scanning step as spacing
Position on, in each shadow space in one or more shadow spaces of one or more fingerprints, scanning it is described specific
String field, to search one or more fingerprints, then again at least one of one or more shadow spaces shadow
There are one in subspace or on the position of multiple identified fingerprints, one in identified fingerprint is verified in the former space
It is a or two.Some ambiguities are introduced the upper case and lower case letter in former space is further comprised to become one to the former space
All numbers from 0 to 9 in former space are become an identical number and the sky former space by not case sensitive letter
Lattice and "-" become one or more of space or "-".
Scanning for one or more of fingerprints of one or more of condition codes can further comprise using one
One or more of a or multiple hash tables and grand (Bloom) filter of one or more cloth are scanned.It scans for
One or more of fingerprints of one or more of condition codes can further comprise using hashed value demultiplexer and
One or more of one fingerprint length demultiplexer is scanned.The number of the different fingerprint length of multiple condition codes is comparable
The number of the different characteristic code length of the multiple condition code is few, and the scanning can further comprise each with scanning step
On the position of spacing, to scan the specific character string field, to search multiple fingerprints of the multiple condition code, including parallel
Search the identical fingerprint of two or more length.One or more of fingerprints can be chosen, so that the fingerprint
Length be restricted to the length of one group of one or more length being covered in one or more length ranges, to provide more points
The finger scan of resolution.The length of each fingerprint can be the integral multiple of the step-length of signature scan operation.The character string
Signature scan method can further comprise limited certainly using one or more content adressable memories (CAM) and one or more
One or more of motivation (FA) is scanned, to search one or more of fingers of one or more of condition codes
Line.
The character string signature scan method can further comprise each fingerprint of multiple fingerprints multiple condition codes
One or more fingerprint sections are decomposed into, so that the number of the different fingerprint segment length of described document information is than described document information
The number of different fingerprint length is few, on each position using scanning step as spacing, scans the specific character string field, with
The multiple fingerprint section is searched, is synthesized including two or more fingerprint sections of parallel search and identified fingerprint section
Any fingerprint matching.All fingerprint sections can have there are one identical length, and the scanning specific character string field is to look into
Look for the step-length of the signature scan operation of integral multiple that multiple fingerprint sections can be the fingerprint segment length with one.One illustrates spy
Determining the fingerprint section figure of one or more possible positions of the fingerprint section in any fingerprint can be stored in the particular fingerprint section
Together, for identified fingerprint section is synthesized any fingerprint matching.Illustrate that the fingerprint of one or more possible length is long
It is any for identified fingerprint section is synthesized together with degree information can be stored with the first segment of each fingerprint or each section
Fingerprint matching.One or more finite automatas (FA) can be used to identified fingerprint section to synthesize any fingerprint matching.
The character string signature scan method can further comprise storing the matched probability of false positive of each fingerprint,
There are one or the position of multiple matched fingerprints on, check the corresponding matched probability of false positive and when one or more
When a matched probability of false positive in a matched probability of false positive is insufficient to low, the specific character string field is searched,
To search one or more of follow-up searching data structures.The method can further comprise more corresponding to a fingerprint
The different basic unit of the one or more of a fixed length condition code or feature subcode builds a difference searching data structure and looks into
The specific character string field is looked for, to search one or more of follow-up searching data structures, one has been searched including difference
Multiple fixed length condition codes or feature subcode corresponding to the fingerprint of identification.The method can further comprise each fixed length spy
The one or more code film bits of encoded of code or feature subcode are levied, one or more of yards of film bits include illustrating one
Or one or more code film bits of multiple following matching conditions:Without matching, if case sensitive, logic NOT predefines
Range, logical operation and any range, one or more of yards of film bits include one or more following code film bits:One
The code film bit of a or multiple basic units or sub- basic unit, the code film bit of one or more features subcode section and one
Or the code film bit of multiple fixed length condition codes or feature subcode, search the specific character string field, with search it is one or
Multiple follow-up searching data structures, including described by the fixed length condition code of code film bits of encoded or feature subcode to search.It is described
Method can further comprise the specific character string field to standardize, including encoded specific character string field decoding,
The specific character string field decompression compressed and one or more of unwanted string data deletion process.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, the method packet
It includes and each condition code in multiple condition codes is resolved into one or more features code section, receive a spy being made of data value
Determine String field, scan the specific character string field, to search the multiple condition code section of the multiple condition code, packet
Include parallel search two or more condition code sections, identified condition code section synthesize any condition code matching and
Export any identified condition code in the specific character string field.The other embodiments of the aspect of the present invention include
System corresponding to the method, device and computer software product.
Specific implementation may include one or more following characteristics.Scanning for the multiple condition code section can further wrap
Include use:One or more hash tables, one or more Bloom filters, one or more have hashed value multiplexing or length multiplexing
Or both hash table and it is one or more have hashed value be multiplexed or the Bloom filter of length multiplexing or both in one or
It is multiple to be scanned.One condition code section for illustrating one or more possible positions of the special characteristic code section in any condition code
Bitmap can be stored together with the special characteristic code section, identified condition code section to be used for synthesize any condition code
Matching.Illustrate one or more possible feature code lengths condition code length information can further with each condition code first
Section or each section of storage together, match for identified condition code section is synthesized any condition code.It is one or more
Finite automata (FA) can be used to identified condition code section to synthesize any condition code matching.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, the method packet
It includes an one or more features code and is processed into one or more forms, including each random length in one or more features code
Condition code is decomposed into multiple fixed length feature subcodes and one or more random length feature subcodes, what reception one was made of data value
Specific character string field identifies any condition code included by the specific character string field, including scanning the specific character
String field, with search multiple fixed length condition codes or feature subcode and there are one or multiple fixed length feature subcode quilts
On the position of identification, the identified fixed length feature subcode is synthesized any random length condition code and output is described specific
Any identified condition code in String field.Handling one or more features code can be further into one or more forms
Including storing the location information of each fixed length feature subcode to static nature code composition rule database, the location information includes
One order and one to next fixed length feature subcode distance range and including or do not include to each pair of fixed length feature subcode it
Between random length feature subcode description and described identified fixed length feature subcode is synthesized any random length condition code
Matching further comprise checking the location information of each identified fixed length feature subcode and verify or do not verify it is each pair of adjacent fixed
One behavioral characteristics code synthetic state table of the random length feature subcode and update between long feature subcode.It is one or more
Finite automata (FA) can be used to identified fixed length feature to synthesize any random length condition code.The institute of the present invention
The other embodiments for stating aspect include the system corresponding to the method, device and computer software product.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, the method packet
It includes and selects multiple fixed length condition codes, the specific character string object for each character string object in one or more character string objects
Multiple fixed length condition codes of part include j-th fixed length condition code, and the first basic unit of the j-th fixed length condition code is described
The remainder of the step-length of position in a scanning direction divided by signature scan operation in specific character string object is equal to J, so as to
So that the number of the fixed length condition code of the specific character string object is equal to the step-length of signature scan operation, and cause described
Specific character string object can be identified on any position in any scanned String field, receive one by data
It is worth the specific character string field of composition, identifies any character string object included by the specific character string field, is included in every
It is a using scanning step on the position of spacing, to scan the specific character string field, to search one or more of character strings
The multiple fixed length condition code of object, including described in two or more fixed length condition codes of parallel scan and output
Any identified character string object in specific character string field.The method can further comprise for each character string object
Multiple random length condition codes based on multigroup nonoverlapping orderly fixed length condition code are selected, in the multiple random length condition code
Each random length condition code include a fixed length condition code in every group of fixed length condition code and one a pair of adjacent fixed length
The random length condition code section that condition code connects, so that the number of the condition code of each character string object is equal to Sn,
Middle S is the number of the condition code of scanning step or every group of fixed length condition code and n is the group number of fixed length condition code and identification institute
It states any character string object in specific character string field to further comprise scanning the specific character string field, to search one
Or multiple random length condition codes of multiple character string objects multiple fixed length feature subcodes and there are one or it is multiple matched fixed
On the position of long feature subcode, identified fixed length feature subcode is synthesized any random length condition code.
In general, the embodiment of one aspect of the present invention provides character string signature scan method, the method packet
It includes and selects one or more fixed length condition codes for each character string object in one or more character string objects, one
Or one or more of fixed length condition codes of multiple character string objects are processed into one or more forms, the form includes every
One or more fingerprints of a fixed length condition code and one or more follow-up searching data structures, the specific character string object
Multiple fingerprints include j-th fingerprint, and the first basic unit of fingerprint described in the j-th is in the specific character string object
Position divided by the remainder of the step-length of signature scan operation in a scanning direction is equal to J, so that the specific character string
The fingerprint number of object is equal to the step-length of signature scan operation, and causes the specific character string object any scanned
It can be identified on any position in String field, wherein each fingerprint includes the institute of the specific character string object
State one or more segments of the specific fixed length condition code in one or more fixed length condition codes, one or more of segment tools
There is the specific position from anywhere in the specific fixed length condition code, receive a specific character string being made of data value
Field identifies any character string object in the specific character string field, is included in the position that each scanning step is spacing
On, the specific character string field is scanned, to search multiple fingerprints of one or more of character string objects, wherein wrapping
Include two or more described fingerprints of parallel search and there are one or the position of multiple matched fingerprints on, search
The specific character string field, to search appointing in the follow-up searching data structure and the output specific character string field
The character string object of what identification.The method can further comprise not being overlapped based on multigroup for the selection of each character string object is multiple
Orderly fixed length condition code random length condition code, each random length condition code in the multiple random length condition code includes
A fixed length condition code and a random length that a pair of adjacent fixed length condition code is connected in every group of fixed length condition code
Condition code section fingerprint, so that the number of the random length condition code of each character string object is equal to determining for each group fixed length condition code
Any character string object included by the product of the number of long condition code and the identification specific character string field includes sweeping
The specific character string field is retouched, to search multiple fingerprints of one or more character string objects, including with parallel search two
Or more than two fingerprints and there are one or the position of multiple matched fingerprints on, the specific character string field is searched, to look into
Look for the follow-up searching data structure.The other embodiments of the aspect of the present invention include the system corresponding to the method,
Device and computer software product.
In general, one aspect of the invention provides a character string signature scan system, and the system comprises one
A machine readable storage device including computer program product and one or more executable computer program products and behaviour
Make the processor for including following one or more modules:One can be processed into one or more features code including each fixed length spy
Levy one or more fingerprints of each fixed length feature subcode of code or random length condition code and one or more follow-up searching datas
The condition code preprocessing module of one or more forms of structure, one or more of fingerprints include specific fixed length condition code or
The j-th fingerprint of feature subcode, the first basic unit of the j-th fingerprint is in the fixed length condition code or feature subcode
Position divided by the remainder of the step-length of signature scan operation in a scanning direction is equal to J, so that the number of the fingerprint
Equal to the step-length of signature scan, and cause the fixed length condition code or feature subcode in any scanned String field
In any position on can be identified, wherein each fingerprint include one of specific fixed length condition code or feature subcode or
Multiple segments, one or more of segments have the spy from anywhere in the specific fixed length condition code or feature subcode
Positioning is put, and an input String field processing that one can be made of data value is the form needed for one or more scan
Scanning pre-processing engine and one can identify one or more of one or more features code on the input String field
The finger scan engine of a fingerprint, the identification are included on the position that each scanning step is spacing, scan the input word
String field is accorded with, to search one or more of fingerprints of one or more of condition codes.The system can further comprise
The fixed length of the fixed length feature subcode of one fixed length condition code that can recognize that corresponding to identified fingerprint or random length condition code is special
Levy code Lookup engine.The system can further comprise a fixed length feature subcode included identified random length condition code
Synthesize the random length condition code Lookup engine of the recognizable random length condition code of any random length condition code.The present invention's is described
The other embodiments of aspect include the method corresponding to the system, device and computer software product.
Specific implementation may include one or more following characteristics.One or more may be selected in described document information preprocessing module
An one or more fingerprints are simultaneously projected to one or more of shadow spaces and go to scan by shadow space.Condition code preprocessing module
Each fingerprint in one or more fingerprints can be decomposed into one or more fingerprint sections of one or more length, and each
The fingerprint composite signal of fingerprint section stores to a fingerprint database and the finger scan engine and can recognize that the input character
Multiple fingerprints of one or more of string field condition code, the identification are included in each position using scanning step as spacing
On, scan the input String field, with search multiple fingerprint sections and there are one or multiple identified fingerprint sections position
It puts, identified fingerprint section is synthesized any fingerprint matching.
Described document information preprocessing module can use one or more code film bits, special to the one or more of a condition code
Sign code section is encoded, and the one or more features code section of one or more of yards of film bits and described document information is stored
Together.Described document information preprocessing module can build a difference with the different basic unit of the one or more of multiple condition codes
Divide searching data structure.Described document information preprocessing module can build one and include a fingerprint database, a fixed length feature
Code database and a condition code regular data when described document information scanning system has at least one random length condition code
The condition code database in library.
The scanning pre-processing engine can further comprise a scanning conveyer, a projector, a character string word
Segment memory and a shadow field Memory.The scanning pre-processing engine can be handled one or more character string by block
Field, the processing include conveying, and decoding standardizes and converts one or more, one or more described character string
Each string chunk in the block includes one for the finger scan region that finger scan and condition code are searched, and one is used for feature
The preceding condition code before finger scan region that code is searched search region and one for condition code search in finger scan
Rear condition code after region searches region.Trizonal each region of one string chunk can be stored to one or more
In the identical memory block of a size, all trizonal all memory blocks individually or with one or more add-in memories blocks
May make up one searches the ring that first memory block in region starts by current preceding condition code, to reduce data in memory
It is mobile.
The finger scan engine can use one or more hash tables and one or more Bloom filters one of those
Or multiple detect one or more fingerprints.The finger scan engine can further comprise a finger scan controller, one
A fingerprint hash calculator, a fingerprint finder, a fingerprint synthesizer and a fingerprint database.The fingerprint hash
Calculator can in order calculate multiple with a sequence hash function in the prefix segment of multiple hashed keys not overlapped each other
Multiple hashed values of hashed key.The finger scan engine may include one fingerprint section bit map of a use and fingerprint length letter
One or more of breath concurrently or sequentially synthesizes multiple fingerprint sections the matched fingerprint synthesizer of any fingerprint.It is described
Finger scan engine may include a fingerprint synthesizer for further comprising one or more finite automatas (FA).
One or more fingerprints of one or more length can be broken down into the fingerprint section of multiple same sizes, and by one
Or multiple finger scan engines for having same scan step-length scan, each fingerprint of one or more of finger scan engines is swept
Retouch the position of the one or more nonoverlapping input String fields staggeredly of engine covering so that it is one or
Total scanning step of multiple finger scan engines is equal to the number of one or more of finger scan engines and single fingerprint is swept
It retouches the product of the former scanning step of engine or covers one or more partly overlapping input String fields staggeredly
Position, so that total scanning step of one or more of finger scan engines is swept between the original of single finger scan engine
Retouch the number of step-length and one or more of finger scan engines and the product of the former scanning step of single finger scan engine
Between.The number of the smaller finger scan engine of the product of scanning step and memory speed can be more than scanning step and memory speed
The larger finger scan engine of product number.
Covering one or more memories used in one or more finger scan engines of a shorter fingerprint section can wait
In or be faster than covering one longer fingerprint section one or more finger scan engines used in one or more memories and cover
One used in one or more fixed length condition code Lookup engines corresponding to the shorter fingerprint of the one or more average lengths of lid
Or multiple memories can be equal or faster than one or more fixed length corresponding to the longer fingerprint of the one or more average lengths of covering
One or more memories used in condition code Lookup engine.One or more covering is shorter than the finger scan of the fingerprint of specific length
Engine can individually or together with the first part of corresponding fixed length condition code Lookup engine use one or more in the scanning
Most fast memory in system.Scanning multiple finger scan engines of identical one or more fingerprints can share in a multiport
It deposits.The finger scan engine can further comprise one or more content adressable memories (CAM).
The fixed length condition code Lookup engine can further comprise a condition code search engine, and a condition code verifies device,
With a fixed length condition code database.Described document information search engine and described document information, which verify device, can use a condition code unit ratio
Compared with the segment for the condition code that device and a condition code section comparator carry out the one or more tape code films of comparison, with identification one or
Multiple fixed length condition codes.Described document information search engine can the one or more fixed length condition codes of difference lookup or feature subcode.It is described
Random length condition code Lookup engine can further comprise a condition code composition rule finder, a condition code state verification
Device, a condition code composition rule database and a condition code state table.The random length condition code Lookup engine can wrap
Include a finite automata (FA).One or more engines may include one or more content adressable memories (CAM) and one
Or multiple finite automatas (FA) one or more.
Specific embodiment described in this specification can be used to realize following one or more advantages.The present invention provides one to sweep
Retouch the character string scanning system of the condition code in a condition code library.The character string scanning system is flexible and is easily updated.
Even if a character string signature scan engine is when a large amount of condition code (such as hundreds of thousands condition code) of scanning, complicated condition code
(as up to several kilobytes or with asterisk wildcard " * " and "", range, case-insensitive, logic NOT) and a dynamic feature
During code library, the sweep speed (such as 100Gbps) being exceedingly fast is stilld provide.The sweep speed and condition code of the character string scanning system
The size in library and complexity scalability.In addition, the smaller memory of the character string scanning system demand and memory bandwidth.It is described
Character string scanning system can be implemented with software or field programmable gate array (FPGA) or application-specific integrated circuit (ASIC).This
Outside, the cost-effectiveness of the character string scanning system is high, is not only suitable for high-end product, is also applied for low cost products.
One or more embodiments of the invention sees below the description and the appended drawings.Other features and advantages of the present invention by
Specification, attached drawing and claims are apparent.
Description of the drawings
Figure 1A shows the structure chart of an exemplary quick character string signature scan system;
Figure 1B shows the exemplary process diagram of a structure character string condition code database;
Fig. 1 C show the exemplary process diagram of a character string signature scan;
Fig. 2A -2C show the example data structure of a fingerprint database;
Fig. 2 D-E show the example data structure of Hash-entry block and the embodiment of a fingerprint synthesizer;
Fig. 2 F-2G show the example data structure of Hash-entry block and the reality of a corresponding fingerprint synthesizer
Apply example;
Fig. 2 H-I show the example data structure and a corresponding parallel fingerprint synthesizer of a Hash-entry block
Embodiment;
Fig. 3 A-B show the example data of the feature code group chained list searched for fixed length condition code and condition code chained list
Structure;
Fig. 4 A-B show that one of the predefined global unit range of support one of a String field is exemplary
The structure chart of condition code unit comparator and a condition code section comparator;
Fig. 4 C show the structure chart of an example feature code unit comparator for supporting local condition code unit range;
Fig. 5 A-C show one for the selecting unit tree of fixed length condition code lookup and the example of condition code family chained list
Property data structure;
Fig. 6 shows the example data structure of a condition code regulation linked for the lookup of random length condition code;
Fig. 7 shows the example data structure of the condition code state-chain-table of a specific character string field;
Fig. 8 shows an example by a condition code state Bloom filter or the Hash-entry block of hash table meaning
Property data structure;
Fig. 9 shows an exemplary computer system.
The similar reference numeral element similar with mark expression in the drawings.
Specific embodiment
General survey
The present invention is the method being scanned for a character string condition code database to a String field and is
System.In one embodiment, the scan method of one " dividing and rule " is used to be scanned with multiple flow line stages.Each
Random length condition code is broken down into multiple fixed length feature subcodes to scan first, and each fixed length condition code or each random length are special
Each fixed length feature subcode of sign code is further broken down into multiple condition code sections to scan.In one embodiment, it is first " thick
The method of " close scanning " is used to be scanned with multiple pipeline sweep phase afterwards for scanning ", to search character string condition code.Every
In one scan position, one or more fingerprints of fixed length condition code or fixed length feature subcode are scanned first.Further inspection
Look into only need there are one or the position of multiple fingerprint matchings on carry out.In addition, finger scan can be first in each scan position
Upper scanning fingerprint shadow (shadow and the relevant space of shadow will be described in detail below).Only there are one or it is multiple matched
On the position of fingerprint shadow, just the fingerprint need to comprehensively be checked.
The fingerprint shadow further can be first segmented to be scanned on each of the scanning positions, then again by only checking fingerprint shadow
Possible position and possible fingerprint length of the subsegment in any fingerprint synthesize fingerprint shadow section.Fingerprint shadow it is complete
Face inspection only need there are one or multiple matched synthesis fingerprint shadow position on carry out.In addition, fingerprint shadow section
Scanning its hashed value need to be only scanned in each scan position.It is further to check that fingerprint shadow section only have
It is carried out on the position of one or more matched hashed values.In one embodiment, elder generation's handle is used to scan character before the scan
The character string condition code pretreatment of string field, and pretreated condition code is stored into condition code database, to be supplied to
The scanning of multiple pipeline stage uses.
Figure 1A shows a quick character string signature scan engine 100.The scanning engine includes a condition code
Preprocessing module 90, a scanning pre-processing engine 120, a finger scan engine 140, a fixed length condition code Lookup engine
160 and a random length condition code Lookup engine 180.Quick character string signature scan engine 100 is for one or more words
Symbol string condition code database, is scanned String field, and may send the number 190 of matched condition code back to and in institute
The position in String field is stated, to identify special characteristic code.In one embodiment, condition code database includes a fingerprint
Database 148, a fixed length condition code database 166 and a condition code rule database 186.
Figure 1B shows the program 91 of an each character string condition code of pretreatment.In one embodiment, first by one
A random length condition code is decomposed into multiple fixed length feature subcodes and random length feature subcode, and the pass between fixed length feature subcode
The information of system is stored to (step 92) in condition code rule database 186.
In order to quickly scan, step 92 export a fixed length condition code or feature subcode can be further broken down into it is more
A segment (step 94) that can be examined by optimum ordered.In one embodiment, first or preceding several condition code sections are especially heavy
Will, it can be as the fingerprint of fixed length condition code or feature subcode.The fingerprint of one character string condition code both can rapidly be swept
It retouches, and false negative or the matched probability of false positive can be reduced.In one embodiment, the matched probability of false negative is zero.
In one embodiment, the numbers of the different fingerprint length of multiple condition codes is than the number of the different characteristic code length of multiple condition codes
Lack, so as to accelerate the speed for the scan method for requiring individually to scan the pattern of different length.In one embodiment,
When scanning step is more than a basic unit, multiple fingerprints can be used for a character string condition code, wherein each fingerprint
First basic unit displacement of relatively previous fingerprint is one or more basic in a scanning direction for first basic unit
Unit.Scanning direction is the direction that a scan operation inputs the movement of scan position in String field at one.
The fingerprint of one fixed length condition code or feature subcode can be further broken into fingerprint section, as needed can be into one
Step projects to one or more shadow spaces, is then added to fingerprint database 148 and thereby to (step in condition code database
It is rapid 96).Since the fingerprint of different length may be by independent scan, fingerprint can further decompose into multiple fingerprint sections and carry out parallel
Or serial scan.In some embodiments, fingerprint is broken down into multiple fingerprint sections, so that the number of different fingerprint segment length
It is one or few more than the number of different fingerprint length, so as to accelerate the scanning for requiring individually to scan the pattern of different length
The speed of method.The scanning result of fingerprint section can be then synthesized to together, to detect fingerprint.
In one embodiment, in order to further improve scan efficiency and scan complex characteristic code ability, fingerprint and its
Its condition code section can first be projected to one or more shadow spaces and be scanned, and then be verified again in luv space.
Shadow space can select that scanning process can either be simplified and speeded up, while can also cover being possible to for fingerprint or fingerprint section
Form.One shadow space can cover multiple fingerprints or the form of condition code section.For example, in order to support to distinguish size simultaneously
The single character with case-insensitive is write, shadow space can be only small letter or the only space of capitalization.As a spy
Example, shadow space can be exactly luv space.
The fingerprint indicated completely or any other condition code section are still to indicate completely in the shadow of any shadow space.Cause
This, the fingerprint indicated completely can always scan in addition in luv space in any shadow space.In one embodiment, it is
The number in the space scanned needed for reducing, one all in one or more shadow spaces of all fingerprints indicated completely
Shadow space scans so as to be scanned without fingerprint in luv space.In another embodiment, condition code database
There are one shadow spaces, and all fingerprints of described document information database are all scanned in unique shadow space.
In one embodiment, fingerprint database includes one or more Bloom filters or hash table.When former hashed key
It is too long or too expensive when being compared during finger scan, in order to be further reduced the general of false positive matching and Hash collision
Rate, in another embodiment, fingerprint database has additional hashed value bit including one or more or a fingerprint is long
The modified Bloom filter of degree or both or hash table.
Finally, the segment of the fixed length feature subcode of all fixed length condition codes or random length condition code is encoded and is stored in
In fixed length condition code database 166, to search fixed length condition code or feature subcode (step 98).In one embodiment, it is described
Segment can be encoded with unit membrane or subunit film, with support the matched specified conditions of character string condition code (as " and do not have to
With ", " equal ", " unequal ", " case-insensitive ", " case sensitive ", " within the scope of one ", " in a range
Outside ").In one embodiment, the segment of tape code film can be then compiled into a chained list or any other lookup structure (such as
Tree etc.).In another embodiment, piece segment encode film or fixed length condition code or feature subcode code film can be compiled into lookup knot
Structure, to save memory space.In another embodiment, one group of character string condition code can further utilize character string condition code
Between different units carry out differential encoding, to form the differential data structure (such as difference tree) that can quickly search.
Fig. 1 C demonstrate the scanning imaging system 101 of a character string condition code.One String field to be scanned first have to by
It decodes and is converted to (as used scanning pre-processing engine 120) required form (step of one or more follow up scan stages
102).The symbol string field passes through the shadow to its shadow and the fingerprint of one or more character string condition codes in shadow space first
Son is compared and is scanned (as used finger scan engine 140), then again in original fingerprint space to any identified finger
Schlieren son (step 104) is verified.Step 106 checks whether that there are one fingerprint matchings.
After the scanning, it no matching or generates one respectively there are one matching and represents the matched output of no condition code or one
It is a to indicate the matched output of a few condition code.In one embodiment, finger scan engine 140 can provide zero false negative
Matching and a sufficiently small false positive matching probability.If without fingerprint matching, the scanning of present scanning position has been completed, can
To be moved to next scan position (step 108).If there is fingerprint matching, a few matched condition code will be made into one
The lookup (as used fixed length condition code Lookup engine 160) of step, to be more clearly identified as fixed length condition code or random length feature
The fixed length feature subcode (step 110) of code.
Step 112 checks whether that there are one fixed length condition codes or feature subcode to match.If do not matched, Current Scan position
The scanning put has been completed, and can be moved into next scan position (step 108).If there is one or more fixed length condition codes
Match, export the number of each matched fixed length condition code, and terminate the scanning (step 118) of present scanning position.If there is making
The fixed length feature subcode of a part for one or more random length condition codes is identified, and matched feature subcode will be by dynamic
Ground synthesis (as used random length condition code Lookup engine 180) is with the one or more random length condition code (steps 114) of detection.Step
116 check whether that there are one random length condition codes to match.If do not matched, the scanning of present scanning position has been completed, can be with
It is moved to next scan position (step 108).If there is one or more matches, each matched random length condition code is exported
Number, and terminate present scanning position scanning (step 118).
In the preprocessing process 91 of condition code, in one embodiment, the matched probability quilt of false positive of each fingerprint
It is stored in fingerprint database 148.If step 106 has a fingerprint matching, the false positive of the matched fingerprint is matched general
Rate will be checked.If the matched probability of false positive is sufficiently low (as less than the threshold value specified), present scanning position is swept
It retouches process to have completed, next scan position (step 108) can be moved to.In another embodiment, in all databases
Fingerprint the matched probability of false positive it is all sufficiently low, therefore be not required to store and check the matched probability of false positive of fingerprint.When
The scanning process of preceding scan position has been completed after finger scan, can be moved to next scan position (step 108).
In a step 102, scanning pre-processing engine 120 first decodes the String field, standardizes, and be transformed to
The form identical with the condition code in condition code database.In one embodiment, character string signature scan is in entire word
It is carried out on symbol string field.However, in another embodiment, limitation or low latency due to the memory space of some systems will
It asks, storing entire String field can not possibly.Therefore, in step 102, the String field can be broken down into several pre-
Determine the string chunk of size.Character string signature scan is carried out on the character string block of each predefined size.
After a string chunk is loaded, the string chunk will be decoded, normalization, and sweep phase institute after being transformed to
The different-format needed.In one embodiment, decoding and normalisation process can support different compressed format (such as LZS,
PKZip and gzip), different coding standard (such as UU is encoded, MIME codings, HTML and XML) and deletion are random " counter-scanning "
Junk data.
In one embodiment, decoded String field will further be projected to one or more features code data
The shadow space of library requirement, to support complicated condition code.For example, decoded String field is converted into full small letter
(a such as shadow space), to support the character string signature scan of case-insensitive.Character string signature scan can be first
It has decoded in full small letter and has been carried out on String field, then again with the former case sensitive String field of decoding and full small letter
String field has been decoded to be verified.
At step 104, finger scan can identify that its shadow is the fingerprint indicated completely first.It is a large amount of in order to quickly scan
And complicated condition code, in one embodiment, finger scan engine 140 can use one or more hash tables or the grand filtering of cloth
Device scans multiple basic units simultaneously.In one embodiment, finger scan engine 140 can be multiplexed and fingerprint length with hashed value
The service efficiency and the probability of the matching of reduction false positive and fingerprint collision for being multiplexed to improve reservoir.Hashed value is multiplexed and fingerprint is long
Degree multiplexing can further reduce false positive matching while zero false negative matching (missing a condition code matching) is ensured
The probability of (i.e. wrong condition code matching).
Step 110 scans fixed length condition code.The fixed length feature subcode of fixed length condition code or random length condition code can be described
The fixed length signature scan stage identifies.Fixed length signature scan only need to be when at least one fingerprint matching during finger scan
Just perform.The fixed length condition code corresponding to matched fingerprint or feature subcode can be matched or one by one with other lookups
Structure (such as setting) is matched.In one embodiment, the tape code film of basic unit or sub- basic unit can relatively be used for
The matched specified conditions of support character string condition code (such as " do not have to matching ", " equal ", " unequal ", " case-insensitive ",
" case sensitive ", " within the scope of one ", " outside a range ").
Step 114 scans random length condition code.In one embodiment, random length signature scan only need to have one in scanning
It is just performed during the random length condition code of a or multiple random length characters.The fixed length feature subcode of random length condition code is in fixed length feature
Code sweep phase is identified, and identified fixed length feature subcode can be dynamically connected to one in the random length signature scan stage
It rises, to synthesize one or more former random length condition codes.The synthesis can be dynamic with a static monitor rule list and one
State synthetic state table is realized.The composition rule table indicates the rule that fixed length feature subcode is synthesized to random length condition code,
The synthetic state table then safeguards current synthetic state according to the composition rule table.
The pretreatment of condition code database
In one embodiment, it is special before the String field is scanned in order to improve sweep speed and memory efficient
Sign code preprocessing module 90 pre-processes condition code.Before condition code database is stored in, condition code preprocessing module 90
First condition code can be decomposed, convert and be encoded into one or more forms.In one embodiment, condition code pretreatment mould
Block 90 can build and safeguard a fingerprint database 148, a fixed length condition code database 166 and a condition code rule
Database 186.
When condition code database there are one or multiple condition codes contain one or more random length subcodes and (such as represent more than zero
" * " of arbitrary basic unit or represent " (bc) { 3-6 } " that " bc " repeats 3 to 6 times), each such random length condition code
Can first multiple fixed length feature subcodes be broken down into random length feature subcode.If for example, a condition code is " subcode 1*
Code 2* subcodes 3 ", wherein subcode 1, subcode 2 and subcode 3 are all free from the fixed length subcode of random length character, and described document information can be with
It is broken down into subcode 1, subcode 2, subcode 3.Each " * " in subcode 1* subcode 2* subcodes 3 can be by a random length subcode
Substitution.In one embodiment, each fixed length feature subcode can be first scanned independently, and then be synthesized original again
Random length condition code.
In one embodiment, condition code rule database 186 can be with the location information of each fixed length feature subcode (as follows
Sequence, last subcode mark, to the distance or distance range of next fixed length subcode) it builds, for synthesis fixed length feature
Code.In one embodiment, it is described when the random length subcode between two continuous fixed length feature subcodes is not " do not have to matching "
Condition code rule database 186 can be further containing the description to the random length subcode, for synthesizing fixed length feature subcode.
In another embodiment, the fixed length feature subcode and the one or more finite automatas of random length feature subcode structure can be used
(FA), for synthesizing fixed length feature subcode, wherein each fixed length feature subcode is as a generally incoming symbol.
In one embodiment, a fixed length condition code or feature subcode will be further broken down into multiple can use most
The segment that good sequence is searched (these segments include the fingerprint of described document information or feature subcode).The multiple
Section can have different sizes or identical size.False negative matches or does not miss any one condition code in order to prevent, owns
The set of the segment is equal to original long condition code.During signature scan, with the increasing of the number of matched segment
Add, false positive matching value will reduce (i.e. matched confidence level increases).The process of scanning is terminated at first unmatched
Section or the last one matched segment.In one embodiment, the selection of the segment of condition code, which can allow, terminates or is known without matching
The other matched condition code matching of zero false positive occurs as early as possible.
In one embodiment, the fingerprint includes multiple segments.In another embodiment, there are one the fingerprints
Segment, and triple { segment, length, dislocation } is encoded as, wherein segment is first of character string condition code scanned
The fingerprint of segment, that is, described document information, length are the length of the fingerprint, misplace as the fingerprint in a fixed length condition code or
Dislocation in feature subcode.Particular fingerprint is the particular patch of the fixed length feature subcode of a fixed length condition code or random length condition code
Section.
In one embodiment, the shadow space can be chosen simplify and accelerate the mistake of described document information scanning
Journey, while the form of a variety of fingerprints or feature chip segment can be covered.In the ideal case, the phantom value can be direct
As a hashed key.For example, in order to support case sensitive and case-insensitive unit simultaneously, the shadow space can
With the space for being full small letter or capitalizing entirely.For example, in order to scan the drivers license number for being followed by 7 numbers by a letter and being formed,
Wherein each letter and number can further specify that the range for arbitrary letter and number, and shadow space can use a code
Or any one alphabetical (such as " a ") replaces owning instead of all letters, and with another code or any one number (such as " 0 ")
Number.For example, in order to scan the Social Security Number formed by three groups by three bit digitals that space or "-" separate
(SSN), wherein each number can further be designated as arbitrary digital scope, shadow space can use a code or any
One number replaces all numbers, and replace space and "-" with another code or space or "-" (such as " 0 ").As a spy
Example, the shadow space can be exactly former space.
In one embodiment, for fingerprint after shadow space has been scanned, the verification of fingerprint can detect fingerprint
It is carried out at once in former space after shadow.In another embodiment, the verification can formerly verify the part of other segments
Or it is carried out again after whole.If the fingerprint is covered completely by other segments, it is not required to verify.
The selection of fingerprint should accelerate the speed of the finger scan, while also provide minimum after finger scan
The matched probability of false positive.In one embodiment, the fingerprint can be arbitrary dimension and appointing in described document information
It anticipates on position.In another embodiment, in order to meet the requirement of system, the size of the fingerprint or in described document information
Position is restricted.For example, in order to meet the delay requirement of system, the dislocation of the fingerprint is no more than certain particular value.
In one embodiment, fingerprint can select to next or multiple conditions:
1) shadow of the fingerprint does not have asterisk wildcard or range, with by compared with short scan,
2) the probability very little that the fingerprint occurs in the String field to be scanned,
3) number for the fingerprint that multiple condition codes share it is small as much as possible and
4) number there are one the fingerprint of identical fingerprint section is small as much as possible.
The condition of other selections can increase according to the requirement of system.In common network application and non-network application
In, usually all or most numeric string condition code is all at least considerably long containing one section, is projecting to a selected shadow
Without asterisk wildcard or the segment of range after in space.In one embodiment, first alternative condition is a necessary condition.
In another embodiment, first alternative condition can be further limited to each fingerprint of all fingerprints at least in a shadow
It is to indicate completely in subspace.Condition code without the fingerprint for meeting alternative condition, can be extended to it is multiple containing meet choosing
The different scanning mode selected the condition code of the fingerprint of condition or condition code extension is not required to one scans.
The fingerprint can be selected by checking the segment of all condition codes for meeting first choice condition.It is other
The parameter of fingerprint can also be contemplated when fingerprint is selected.According to the second alternative condition, select described to be scanned
String field occur probability very little condition code section as fingerprint, it is possible to reduce the probability of false positive fingerprint matching.This
Outside, the number of the fingerprint of identical fingerprint section is small as much as possible there are one allowing, and can be further reduced false positive fingerprint matching
Probability.Although the length of the fingerprint can be less than 8 basic units or more than 32 basic units, it is that typically in 8 to 32
Between basic unit.
Due to described document information can very long (such as hundreds of or thousands of a basic units), the length of fingerprint may be also very much.
However, in one embodiment, the fingerprint of different length will be scanned individually, so as to which sweep speed is slower.In one embodiment,
In order to reduce the complexity of scanning, can be required to limit the number of fingerprint length with system architecture according to particular system, it is such as few
In 16.In one embodiment, the length of fingerprint can be selected from a scheduled length table.In addition, the length of fingerprint can
With according to system requirements and system architecture, by exponential increasing relationship (such as 2,4,8,16 and 32), linear increment relationship (such as 4 times
Number:32) or another relationship (such as 2,3,5,8,13,21,34) selects 4,8,12,16,20,24,28 and.
In one embodiment, the fingerprint of a condition code can be selected with an algorithm.For example, following algorithm can be with
For selecting the fingerprint of a fixed length condition code or feature subcode, (assuming that scanning step is equal to 1, the length of fingerprint is from being most as short as most
Length is fixed as l0, l1, l2..., lm-1, lm, finger scan, which is segmented, to carry out, and the shadow space of finger scan has given):
1. it is the condition code subsegment indicated completely in shadow space to find all.
2. couple each l for being longer thanmThe subsegment, find all length equal to lmSubsegment.
3. pair each length is equal to lmSubsegment, write down the times N that the subsegment has been chosen as fingerprint in all condition codesc
With the times N for having the first identical fingerprint section with other fingerprintssAnd it is based on N with onec, NsAnd lmCost function calculate into
This value.
4. it is equal to l with length respectivelym-1..., l2, l1And l0Subsegment repeat step 2 and 3.
5. from step 2 to 4, the subsegment for finding out minimum cost is a fingerprint.
Above-mentioned steps are related with the processing sequence of condition code.Several random process sequences can be used for finding not as needed
Same fingerprint.In one embodiment, value at cost is equal to (m-i), Nc, and NsIt is linked up from highest order to lowest order, wherein
I=0,1,2 ..., m and i are the length of fingerprint.In one embodiment, it if having found the fingerprint of specific length, does not need to again
All shorter subsegments are selected.
In another embodiment, each mobile scanning step of finger scan engine 140.In each scan position
On, finger scan engine 140 serially or parallelly scans the fingerprint of different length.Therefore, sweep speed (connects with scanning step
Continue the number of the basic unit between two scan positions) it is directly proportional.In one embodiment, in order to improve sweep speed, refer to
Line scanning can scan multiple basic units rather than a basic unit simultaneously.In order to ensure zero false negative matches, each word
Symbol string condition code is equal to the scanning step with the number of multiple fingerprints and the fingerprint.In other words, a fixed length condition code
Or condition code database can be put into multiple fingerprints for feature subcode and the number of the fingerprint is equal to scanning step.Specific spy
First basic unit for levying the j-th fingerprint of code is (J+k*S) of the special characteristic code in scanning direction a substantially single
Member, wherein S be scanning step, k be a nonnegative integer and J=0,1,2 ..., S-1.Described document information can be described
It finds any position in string segments to be scanned.For example, if special characteristic code is " [Rr] [Ee] [Aa] [Dd] [Mm]
[Ee] 123.exe ", the scanning step are 4, and the fingerprint length includes 4,8 and 12, can select following four fingerprint:
" [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.ex ", " [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe ", " [Aa] [Dd] [Mm]
[Ee] 123. " and " [Dd] [Mm] [Ee] 123.e ", wherein [Rr], [Ee], [Aa], [Dd] and [Mm] are respectively not differentiate between greatly
English alphabet r, e, a, d and the m of small letter." [Rr] [Ee] [Aa] [Dd] [Mm] [Ee] 123.exe " is put into four fingerprints
To fingerprint database four times.When scanning step is 1, only needs a fingerprint and need to only be put into primary.
The scanning of the multiple fingerprint can be from the preceding S unit of input String field (i.e. in the first scanning step
It is interior) any position start.For example, in one embodiment, the scanning will scan (k*S) position since the 0th position
It puts, wherein k is a nonnegative integer.In any input String field, (k*S) position is covered by the 0th fingerprint, the
(k*S+1) position is covered by (S-1) a fingerprint, (k*S+2) position by (S-2) a fingerprint covering ..., with (k*S+
S-1) position is covered by the 1st fingerprint.In another embodiment, it scans since (S-1) position, (k*S+ will be scanned
S-1) position, wherein k are a nonnegative integer.In any input String field, (k*S) position is by (S-1) a finger
Line covers, and (k*S+1) position is covered by (S-2) a fingerprint, (k*S+2) position by (S-3) a fingerprint covering ...,
(k*S+S-2) position is covered by the 1st fingerprint and (k*S+S-1) position is covered by the 0th fingerprint.
In order to which the scanning step is increased to S basic unit, in one embodiment, using scanning step of being taken in as 1
When be that each fixed length condition code or feature subcode select the algorithm of a fingerprint that can make following modification, it is each to be used for selecting
The S fingerprint of fixed length condition code or feature subcode:Step 1 to 4 and in the past it is just the same, step 5 be then revised as from own
The dislocation in scanning direction of fixed length condition code find in step 2, described or feature subcode for (J+k*S) subsegment in,
The subsegment for finding out minimum cost value is the j-th fingerprint in the S fingerprint, and wherein J=0,1,2 ..., S-1 and k are one
Nonnegative integer.
Only there are one condition codes for usually each character string object, and to support the scanning step of S basic units, each fixed length is special
The fixed length feature subcode of sign code or random length condition code needs S fingerprint.In another embodiment, to support S basic units
Scanning step, S fixed length condition code can be used for identifying a character string object, so that each fixed length condition code is being swept
It is (J+k*S) that first basic unit in direction, which is retouched, in the dislocation of the character string object, wherein J=0,1,2 ..., (S-1)
It is a nonnegative integer with k.The specific character string object can be on any position of any String field to be scanned
It is identified.In one embodiment, the scanning of S fixed length condition code can not have to fingerprint.In another embodiment, determine for S
Each condition code in long condition code can carry out signature scan with a fingerprint.
In one embodiment, multiple random length condition codes based on multigroup nonoverlapping orderly fixed length condition code can be with
Further it is selected to one character string object of identification.Every group of fixed length condition code has S fixed length condition code, and wherein j-th is determined
Long condition code scanning direction first basic unit the character string object dislocation for (J+k*S), wherein J=0,1,
2 ..., (S-1) and k be a nonnegative integer, so as to form SnA random length condition code, wherein n are that nonoverlapping orderly have
The group number of the code character of S fixed length condition code.A fixed length condition code is selected from every group of fixed length condition code, then again by these
Fixed length condition code and one or more random length string segments are combined into SnA random length spy in a random length condition code
Code is levied, so that the specific character string object can be identified on any position of any String field to be scanned.
Each original long condition code in the n groups S fixed length condition codes becomes the random length condition code after multiple synthesis in post synthesis
One fixed length feature subcode.In one embodiment, the scanning of the random length condition code can not have to fingerprint.In another reality
It applies in example, each fixed length condition code in the random length condition code can select a fingerprint to scan.
In one embodiment, the scanning step for support S basic units, specific character string object can select P to determine
Long condition code, each fixed length condition code can further select one or more fingerprints, so that each character string object is total
Fingerprint number is equal to S.J-th fingerprint in S fingerprint of the specific character string object is substantially single at first of scanning direction
Dislocation of the member in the specific character string object is (J+k*S), wherein J=0,1,2 ..., (S-1) and k be one non-negative whole
Number, so that the character string object can be identified on any one position of any String field to be scanned.
In another embodiment, multiple random length condition codes based on multigroup nonoverlapping orderly fixed length condition code can
To be further selected to one character string object of identification.I-th group is not overlapped and orderly fixed length condition code has PiA fixed length
Condition code, wherein each fixed length condition code further there are one or multiple fingerprints so that total finger of i-th group of fixed length condition code
Line number is not to be overlapped and the group number of orderly fixed length feature code character equal to S, wherein i=0,1,2 ..., n-1 and n.Every group of fixed length
J-th fingerprint in S fingerprint of condition code scanning direction first basic unit in the specific character string object
Dislocation for (J+k*S), wherein J=0,1,2 ..., (S-1) and k be a nonnegative integer.
A fixed length condition code is selected from every group of fixed length condition code, then again by these fixed length condition codes and one or
Multiple random length character string combinations are into P0*P1*P2…*Pn-1A random length condition code in a random length condition code, so as to make
Obtaining the specific character string object can be identified on any position of any string segments to be scanned.The n groups fixed length is special
Each original long condition code in sign code becomes a fixed length feature subcode of the random length condition code after multiple synthesis.
In the condition code of each character string object of a scanning system, after fingerprint and shadow space determine, a finger
The shadow of line can be scanned as an entirety.The fingerprint shadow of different length can be scanned concurrently or sequentially.At one
In embodiment, the shadow of the fingerprint of different length can be used as an entirety to carry out serial scan.In one embodiment, one
A character string condition code is incorporated into described document information database and can be used to lower pseudocode:
Wherein S is scanning step, hiIt is the length of the fingerprint of i-th of character string condition code, IV is original Hash value and hash
() is a sequence hash function.Fingerprint selects the best fingerprint that () selects for each shift position, and condition code is incorporated into () spy
Sign code is programmed into the database of the fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180 and refers to
Line is incorporated into () and fingerprint is programmed into the fingerprint database 148.When scanning step is more than 1, the fingerprint of each condition code
Fingerprint will be called to be incorporated into () primary.But because all fingerprints of a condition code all point to same condition code and search
Data structure, as long as all fingerprints of a condition code call altogether condition code to be incorporated into () once.
In one embodiment, following pseudo- generation can be used by deleting a character string condition code from described document information database
Code:
Wherein S is scanning step, hiIt is the length of the fingerprint of i-th of character string condition code, IV is original Hash value, hash
() is a sequence hash function, and fingerprint deletion () deletes a fingerprint from the fingerprint database 148 and condition code is deleted
Except () by the condition code searching data structure of described document information from the fixed length condition code Lookup engine 160 and random length feature
It is deleted in the database of code Lookup engine 180.It, be from spy because a condition code has multiple fingerprints when scanning step is more than 1
It levies and a condition code is deleted in code database, fingerprint deletes () and to be called repeatedly.But all fingerprints of special characteristic code
It is primary that as long as condition code is called to delete () altogether.
Because number of the number of different fingerprint segment length usually more than different fingerprint length is few, in one embodiment,
In order to improve scan efficiency, a fingerprint can be further broken down into multiple fingerprint sections.All fingerprint sections of one fingerprint can
First concurrently or sequentially to scan, the scanning result of the fingerprint section concurrently or sequentially is synthesized again, to detect fingerprint.It is described
The length of fingerprint section can be identical or different, the length depending on the fingerprint that specific scanning engine is supported.
In one embodiment, the number of fingerprint section and length can be according to its of the length of fingerprint and specific scanning engine
Depending on its sweep parameter.In another embodiment, between fingerprint length for linear relationship and all fingerprint sections length all
It is identical.In general, the length of a fingerprint section is equal to one or more scanning steps.In another embodiment, scanning step is
The integral multiple of the length of one fingerprint section and multiple fingerprint sections will be synthesized concurrently.
In one embodiment, fingerprint database can include one or more Bloom filters and one or more hash
One or more of table.The size and bandwidth of memory should be weighed usually using Bloom filter or hash table.Work as condition code
Number can all be stored in the on-chip memory of sufficiently large bandwidth, Bloom filter or may be more even more than hash table
It is good.On the contrary, the yardage when condition code is more, thus when must use chip external memory, the bandwidth of memory just becomes main limitation,
The hash table rather than the Bloom filter may be more preferable.
In one embodiment, the added bit of hashed value is stored in a Bloom filter or a hash table,
To realize that hashed value is multiplexed.In another embodiment, fingerprint length is stored in a Bloom filter or a hash table
In, to realize that fingerprint length is multiplexed.When former hashed key is too long or too expensive when being compared in scanning fingerprint, hashed value is multiplexed
And the multiplexing of fingerprint length can further reduce the probability of false positive matching and fingerprint collision.
In one embodiment, work as shadow space, fingerprint, after fingerprint section and finger print data structure are determined to, one fixed
The fixed length feature subcode of long condition code or random length condition code can be used as an entirety or segmentation to be programmed into shadow space
In the fingerprint database 148 and condition code database.
In one embodiment, the fingerprint of a fixed length feature subcode of a fixed length condition code or random length condition code with
Outer other segments can be encoded and stored fixed length condition code database 166, entire to be used for scanning after fingerprint matching
Fixed length condition code or feature subcode.In one embodiment, feature of a fixed length condition code or random length condition code
One or more segments of code can be encoded into multiple tape codes with the code film of one or more basic units or sub- basic unit
The segment of film, to support the matched specified conditions of character string condition code (such as " not having to matching ", " equal ", " unequal ", " not area
Divide capital and small letter ", " case sensitive ", " within the scope of one ", " outside a range ").In one embodiment, in order to improve
Storage efficiency, one or more features chip segment encode film or one or more features code or feature subcode code film can individually make
It is used together with or with one or more basic units or subunit this element number film.
In one embodiment, the condition code section of tape code film can in order or out of order be linked together with chained list,
Or pretreatment (is such as set) into other lookup structures.In one embodiment, the length of the condition code section of all tape code films can be complete
It is equal or not congruent.The length of the condition code section of the tape code film can most preferably be selected according to the framework of particular memory.
In another embodiment, it is allowed in order to which the false positive matching of character string condition code is allowed to converge to zero-sum as quickly as possible
Possible matched condition code converges to zero or one as quickly as possible, and one group of character string condition code can further carry out difference
Coding, to build the differential data structure (such as difference tree) that can quickly search.Difference tree can utilize the difference between condition code
Unit is built.
In one embodiment, when the one or more new condition codes of addition or as the one or more existing spies of deletion
When levying code, condition code database preprocessing can only be carried out in the initial establishment of condition code database.In another embodiment,
The database of condition code can dynamically update in the scanning process of condition code.
The scanning pretreatment of String field
The scanning pre-processing engine 120 is different String field pretreatment according to character string condition code database
Form, to simplify and speed up scanning flow line stage thereafter.In one embodiment, the condition code in condition code database
It is stored with uncoded form.Therefore, scanning pre-processing engine 120 decodes encoded String field, with the spy
The codec format for levying the condition code in code database is consistent.As shown in Figure 1A, it is defeated to include a scanning for scanning pre-processing engine 120
Send device 122, a String field memory 124, a format decoder 126, a decoding field Memory 128, one
Projector 130 and a shadow field Memory 132.The scanning conveyer 122 is first by data to be scanned from the character string
Field Memory 124 is loaded into the format decoder 126, and the format decoder 126 again will be decoded original field,
Parsing and decompression, wherein may include that MIME is decoded, the decoding of UU, foreign language decoding removes and includes skimble-skamble character string (such as
Additional white space) and counter-scanning garbage character string data (such as the counter-scanning junk data of injection), HTML parsings, XML solutions
Analysis, deflate decompressions, LZS decompressions, PKZip decompressions and gzip decompression.The format decoder 126 can be with root
Normalized character string field is needed according to particular system.Hereafter, the format decoder 126 is by the character after decoding and normalization
In string field deposit decoding field Memory 128.
Decoded field is projected to one or more shadow spaces, and shadow data are stored by the projector 130
Into shadow field Memory 132.For example, a condition code database can include the condition code of case-insensitive.For
Support the condition code of case-insensitive, the projector 130 helps the data conversion decoded in field Memory 128 small
It writes, and in the storage to shadow field Memory 132 of full lowercase character string field.Full lowercase character string field can be swept by fingerprint
Engine 140 is retouched to be used for carrying out finger scan.Case sensitive condition code and the condition code of case-insensitive can be simultaneously
It is scanned.The matching of case sensitive condition code can be after the matching of its shadow, then with case sensitive decoded
Field is verified.
In one embodiment, quick character string signature scan engine 100 includes to support to entire String field
Carry out the computing resource or the network equipment of character string signature scan.However, in another embodiment, quick character string feature
Code scanning engine 100 includes that the computing resource or the network equipment of entire String field can not be stored, due to for example by system
Memory space limits and low time delay requirement.Therefore, the String field can be broken down into the character string of multiple predefined sizes
Block.The scanning of character string condition code carries out on the string chunk of each predefined size.
In one embodiment, the size of a string chunk depends on longest character string condition code.The string chunk
It can be further separated into three regions:One finger scan region is to cover all signature scan positions, in fingerprint before one
The region of scanning area is to provide the reference data before or during fingerprint, and one after the region in finger scan region to provide
After fingerprint or among reference data.The set in all finger scan regions of specific character string field is covered described
All possible fingerprint initial position in String field.The trizonal size can be identical or different.At one
In embodiment, the preceding minimum dimension in the region in finger scan region is characterized the maximum fingerprint of all condition codes in yard database
Dislocation, the minimum dimension after the region in finger scan region are characterized the length of all condition codes and dislocation in yard database
Maximum difference, the minimum dimension in finger scan region is scanning step.
In one embodiment, it is described scanning string chunk size can according to the parameter of the scanning system, for example,
The length of longest condition code, internal storage structure and sweep speed select.When longest condition code is longer, character string is scanned
Block can be several times of the length of longest condition code, for example, 2 to 4 times.When longest condition code is shorter, string chunk is scanned
It can be the more times of the length of longest condition code.
In one embodiment, the trizonal size for scanning string chunk is identical, is equal to longest condition code
Length.Three regions can be stored in the ring formed with three pieces of memories, to reduce the movement of data in memory.Another
Described trizonal of different sizes in one embodiment, finger scan region is smaller than other two regions, other two regions
Size can be identical or different.
In one embodiment, trizonal each region of a scanning block can be stored in one or more
In a an equal amount of memory block, all memory blocks in all three regions form a past in the region in finger scan region
First memory BOB(beginning of block), to the ring of the last one memory block end after the region in finger scan region, to reduce number
According to movement in memory.After first memory block in finger scan region is scanned through, first memory block of the ring will
The ring is exited, can be used for loading new data.After new data are loaded, the memory block will be used as the last one memory block
Add in the ring.In one embodiment, one or more add-in memories blocks can add in the ring tail of the ring, new for loading
Data.
In one embodiment, the String field memory 124 only includes the last one memory block.Therefore, it is described
The size of String field memory 124 is equal to the size of a memory block.The decoding field Memory 128 and the shadow
Field Memory 132 includes all memory blocks in all three regions, and size is the sum of all memory block sizes.
Due to boundary condition, described first piece there are specific conditions with last block.In one embodiment, length is most
" impossible " basic unit of the length of long condition code can be filled into the preceding reference zone in the finger scan region
Before first piece and after last block of the rear reference zone in the finger scan region." impossible " basic unit is included not
There is the basic unit that condition code is started or ended up with it, therefore, the region of the filling can not possibly be as the feature of a reality
A part for code.If finger scan engine 140, fixed length condition code Lookup engine 160 and random length condition code Lookup engine
Have scanning range checking mechanism in 180, the filling there is no need to.Scanning range checking mechanism can prevent scanning beyond character
The boundary of string field.
Finger scan
In one embodiment, the finger scan engine 140 can include one or more content adressable memories
(CAM) and one or more of one or more finite automata (FA).In another embodiment, when original fingerprint or refer to
For line when the shadow of a shadow space is to indicate completely, the finger scan engine 140 may be based on the engine of hash.Such as figure
Shown in 1A, in one embodiment, the finger scan engine 140 includes a finger scan controller 142, and a fingerprint dissipates
Column counter 144, a fingerprint finder 146, a fingerprint database 148 and a fingerprint synthesizer 150.In a reality
It applies in example, a fingerprint is integral to scan, and therefore, the fingerprint synthesizer 150 is optional component.
In another embodiment, each fingerprint is broken down into multistage by independent scan.All fingerprint sections can first quilt
Then scanning synthesizes (as using fingerprint synthesizer 150) to generate the scanning result of fingerprint by serial or parallel again.In a reality
It applies in example, the finger scan controller 142 controls entire scanning process.
140 exportable one, the finger scan engine without occurrence or with a matched knot in fingerprint database
Fruit.The occurrence, which corresponds to one or more, then to be looked by fixed length condition code Lookup engine 160 and random length condition code
The character string condition code that engine 180 is looked for search.If without occurrence, scanning process is just completed.
In one embodiment, the fingerprint hash calculator 144 includes multiple independent universal hash functions, h0,
h1..., hI, to support Bloom filter.For example, when memory size rather than memory Bandwidth-Constrained when, the grand mistake of cloth can be used
Filter.For example, the condition code database when scanning system is sufficiently small, when can all be stored in on-chip memory, memory it is big
It is small to be restricted.In another embodiment, the fingerprint hash calculator 144 includes multiple independent universal hash functions, h0,
h1..., hI, to support multiple hash tables.For example, when false positive matching needs very low (such as 10-3Or smaller), the bandwidth of memory
When sufficiently wide and size is sufficiently large, multiple hash tables can be used.For example, when the slower chip external memories of DRAM or other are used for
The data structure in follow up scan stage is stored, and when on-chip memory stores multiple hash tables enough, it is desirable to which false positive matching is non-
It is often low.
In another embodiment, the fingerprint hash calculator 144 only includes a hash function h0.For example, when interior
When the bandwidth deposited rather than the limited size of memory, a hash table can be used.For example, when the feature yardage of scanning system
It is sufficiently large according to library, thus when must use chip external memory, the Bandwidth-Constrained system of memory.
The fingerprint hash calculator 144 extracts the data of n byte from the shadow field Memory 132, and calculates
Its hashed value.The data can be used for calculating working as all hash functions individually or together with random starting values or preceding hashed value
Preceding hashed value.In one embodiment, one of hash function is (for example, first hash function h0) hashed value bit
Digit is more than the number of bits of the hashed value of other hash functions, and the number of bits phase of the hashed value of other hash functions
Together.
The hashed value of hash function output can be used for searching fingerprint database 148 by fingerprint finder 146.In a reality
It applies in example, the fingerprint finder 146 includes a Bloom filter and a hashed value demultiplexer.The Bloom filter
Check the effective marker pointed by the hashed value.If all effective markers are all effective, the further root of hashed value demultiplexer
A corresponding Hash block is found out according to the additional bit of hashed value.The hashed value demultiplexing is exactly to check particular Hash value
Additional bit, hashed value demultiplexing can individually carry out or with check fingerprint length and it is other with the relevant information of fingerprint simultaneously into
Row.Hash demultiplexing can further reduce the matched result of false positive condition code.In one embodiment, the grand filtering of the cloth
Device can be condensed to a hash table.In another embodiment, the Bloom filter can expand as multiple hash tables.
The fingerprint of different length can be by parallel scan or serial scan.In one embodiment, multiple fingerprint finders
146 can be used for carrying out parallel scan simultaneously.For example, a fingerprint finder 146 can be used for scanning each different length
Fingerprint.
In another embodiment, a fingerprint finder 146 can carry out serial scan to the fingerprint of different length.Example
Such as, the length of fingerprint can be the integral multiple of scanning step.Each quick character string signature scan engine 100 is supported one group and is referred to
Line length, { S, 2*S, 3*S ..., m*S }, wherein S is scanning step and m*S is the length of longest fingerprint.In each scan position
On, the fingerprint of different length is by serial scan.The sequence hash function of one input S basic unit can be used for described
Scanning.In one embodiment, the serial finger scan of the finger scan engine 140 can be described with following pseudocode:
Wherein S is scanning step, and m*S is longest fingerprint length, and t is the total length of String field to be scanned, IV
It is original Hash value and hash function () is a sequence hash function.Fingerprint searches () with fingerprint finder 146 and fingerprint
Synthesizer 150 realizes that condition code searches () with fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180
To realize.
Fig. 2A-C demonstrate a fingerprint database 148 when the fingerprint of different length is scanned as an entirety
Data structure, including a Bloom filter table 200, a fingerprint hash block chained list 250 and a Hash-entry block
256 pieces.The one group of hashed value given with fingerprint hash calculator 144, { hashed value0A, hashed value1..., hashed valueI, fingerprint
Finder 146 goes to read the Bloom filter table 200.Each of the Bloom filter table 200 includes an effective marker 202
With a Hash block chain table pointer 204.Effective marker 202 is that there are the marks set during at least one condition code when input.
In one embodiment, the only hashed value of Hash block chain table pointer 2040AMeaning, and it is directed toward fingerprint hash block chained list
250 first term.If the effective marker 202 of all hashed value meanings is all set, further finger scan can be in Hash block chain
It is carried out on the signified fingerprint hash block chained list 250 of list index 204.In another embodiment, Hash block chain table pointer 204 is equal to
Hashed value0A, can be omitted, to reduce Bloom filter table or hash table, cost is to increase the data of sweep phase thereafter
Structure.
In one embodiment, hashed value0AFingerprint is used to be added in Bloom filter table 200, and it is other scattered
Train value can be used to reduce false positive matching.In addition, in order not to accidentally delete character string condition code, each single item of Bloom filter table 200
The number of its condition code is tracked with a counter.In another embodiment, the Bloom filter table 200 can be condensed to
One hash table only needs a hashed value (such as hashed value0A).In another embodiment, the Bloom filter table 200 can be with
Expand as multiple hash tables.
In one embodiment, the fingerprint hash block chained list 250 is a chained list.Fingerprint hash block chained list 250 it is every
One includes a last block 252, a next block pointer 254 and a Hash-entry block 256.Next block pointer 254 is directed toward institute
State the next item down of fingerprint hash block chained list 250.When one of the fingerprint hash block chained list 250 being tail item, last block 252 will be by
Setting.Because tail item can also be known by checking whether next block pointer 254 is null pointer, last block 252 be one be used for it is fast
Speed detects the optional domain of tail item.Each Hash-entry block 256 includes most n fingerprints, wherein n be it is any be more than zero it is whole
Number.In one embodiment, a best n can be selected according to the memory architecture of scanning system.For example, if interior save as
SRAM, n can be equal to 1;If inside saving as DRAM, n>1.
In one embodiment, as shown in Figure 2 C, each fingerprint item of the Hash-entry block 256 includes a hashed value
(hashed value0B) 260, a fingerprint length 262, a type 264, a feature code group pointer 266 or condition code pointer 268,
With a dislocation 270.Hashed value0B260 and fingerprint length 262 be respectively intended to realize hashed value multiplexing and fingerprint length multiplexing.When
To be that a feature code group exports a feature code group pointer 266 when type 264 is zero;Otherwise, will be that a condition code exports
One condition code pointer 268.Dislocation 270 for fingerprint first basic unit to next subsegment to be compared first base
The dislocation of this unit.If without next data structure, it is just not required to dislocation 270.When being not required to dislocation 270, dislocation 270 can be with
It is set as zero.In another embodiment, work as n>When 1, each fingerprint item can increase an effective marker.There is criterion when described
When will is set, hashed value may compare0B260 and fingerprint length 262.
In one embodiment, each fingerprint item of Hash-entry block 256 stores a fingerprint.However, in another reality
It applies in example, because fingerprint can be very long, size is again different, and because has the lookup of other feature code after finger print data library lookup
Stage, original fingerprint can not be stored in each fingerprint item.Therefore, each matched fingerprint item may be due to false positive
Matching is corresponding to zero fingerprint or since fingerprint collision corresponds to multiple fingerprints.When with a simple hash table rather than one
A Bloom filter table scans fingerprint and (k/2m)<<When 1, the order of magnitude of the matched probability of false positive is (k/2m) and fingerprint touch
The order of magnitude of the probability hit is (k2/22m), it in total fingerprint number and m is to include hashed value that wherein k, which is,0AAnd hashed value0BHashed value0
Total bit.
When with a Bloom filter table, two above probability will decline to a great extent.When with multiple hash tables, described two
A probability will be further compromised.It is searched to reduce rear phase characteristic code to the greatest extent, the probability of false positive matching and fingerprint collision can
To reduce to close to zero.In one embodiment, the number of sufficiently large m and hash function can be used for reducing false positive matching
With the probability of fingerprint collision.
In order to reduce memory space, in one embodiment, multiple hashed values can be multiplexed into a fingerprint item, to reduce
The probability of null term.Hashed value m each can be divided into two parts:The hashed value of m11The hashed value of (m-m1) position2.It dissipates
Train value1It is used as the address of fingerprint database, and hashed value2(such as hashed value0B260) it can be used to solve Hash collision and vacation sun
Property matching.M1 is smaller, and required memory space is fewer, but fingerprint hash block chained list 250 is longer.In one embodiment, when
2m1When being more than twice or more times of k, the average length of fingerprint hash block list is less than 1.
In order to save the complexity of the management of memory space and the reduction table, in another embodiment, all differences
The fingerprint of length can be multiplexed into a fingerprint database.Fingerprint length 262 can be further reduced false positive matching and fingerprint
The probability of collision.
Searching data structure as shown in figures 2 a-c can be implemented with several different modes.Particular implementation can be according to spy
Levy the size of code table, the size and type of free memory, depending on piece SRAM, on piece DRAM, the outer SRAM of piece and the outer DRAM of piece.
For example, in one embodiment, if feature number of codes is 128K, the effective marker 202 can be stored in an on piece
SRAM is to realize quick access, and Hash block chain table pointer 204 is then stored in the outer SRAM of a piece.Effective marker 202 can be used
All hashed values access, and Hash block chain table pointer 204 can only use hashed value0AIt accesses.Only when the effective marker of all hashed values
202 all for it is effective when, need to just access Hash block chain table pointer 204.Last block 252 and next block pointer 254 can be stored in one
In the outer SRAM of piece, hashed value0B260 and fingerprint length 262 can be stored in the outer SRAM or DRAM of a piece, type 264, feature
Code group pointer 266 or condition code pointer 268 and dislocation 270 can be stored in the outer DRAM of a piece.Only work as hashed value0B260 Hes
When fingerprint length 262 matches, access type 264, feature code group pointer 266 or condition code pointer 268 and dislocation 270 are just needed.
In one embodiment, in order to improve scan efficiency, the fingerprint of a character string condition code can resolve into multiple
Fingerprint section.First these fingerprint sections can concurrently or sequentially be scanned, then use fingerprint synthesizer 150 again by matched fingerprint
Section synthesizes the scanning result of fingerprint.Because the number of different fingerprint segment length is generally much smaller than the number of different fingerprint length,
Fingerprint is divided into fingerprint section can accelerate finger scan to scan.Fingerprint is divided into fingerprint section to scan, makes support is longer to contain
The fingerprint of more fingerprint sections is possibly realized, so as to reduce false positive matching.
In one embodiment, the synthesis of fingerprint section is accurate or complete, non-false positive matching.In another implementation
In example, in order to accelerate finger scan, the synthesis of fingerprint section is " coarse " or partial, there is false positive matching.In order to reduce vacation
The information of the probability of positive match, the possible position of one or more of each fingerprint section and the possibility length of fingerprint can be stored up
It deposits and be used to fingerprint section concurrently or sequentially synthesizing any fingerprint matching.
When multiple fingerprint section serial scans, Hash-entry block fingerprint corresponding with its of " at least one fingerprint matching " closes
The different implementations grown up to be a useful person are for example shown in Fig. 2 D-E.In one embodiment, it is reported as long as at least one fingerprint matching
One matching, but the information of the length in relation to how many fingerprint matching and matched fingerprint cannot be provided.
As shown in Figure 2 D, in one embodiment, each of Hash-entry block 257 includes a hash of particular fingerprint section
Value0B260, a fingerprint section Figure 27 2, an at least fingerprint composite signal 274, a type 264, a feature code group refer to
Needle 266 or condition code pointer 268 and a dislocation 270.Fingerprint section Figure 27 2 is an effective marker bitmap array, bit
Digit is identical with the number of fingerprint section that the fingerprint synthesizer 151 is supported.When i-th of fingerprint of Duan Weiyi fingerprint of fingerprint
Section, the i-th bit of fingerprint section Figure 27 2 will be set.It is all in all fingerprints that fingerprint section Figure 27 2 provides the fingerprint section
Possible position.At least a fingerprint composite signal 274 provides the number of minimum steps of the fingerprint containing the fingerprint section, at least one
The matched fingerprint synthesis of fingerprint section.In one embodiment, an at least fingerprint composite signal 274 is stored in the of the fingerprint
In one fingerprint section.In another embodiment, an at least fingerprint composite signal 274 or any other fingerprint length information are deposited
Storage is in each fingerprint section of the fingerprint.In another embodiment, at least a fingerprint composite signal 274 is omitted.
In one embodiment, hashed value0BHashed value in 260 and Fig. 2 C0B260 is identical.Type 264, feature code group refer to
Needle 266 or condition code pointer 268 and dislocation 270 are also identical with the respective value in Fig. 2 C, are stored in first fingerprint of fingerprint
Duan Zhong, and used after " fingerprint hop count subtracts 1 " a clock cycle is postponed.In one embodiment, due to type 264, condition code
Group's pointer 266 or condition code pointer 268 and dislocation 270 are only stored in first fingerprint section of fingerprint, all first fingerprints
The fingerprint of Duan Xiangtong is stored in together.In one embodiment, in the selection course of fingerprint, to allow has identical first
The probability of multiple fingerprints of fingerprint section is minimum.
Fig. 2 E demonstrate one when at least a fingerprint composite signal 274 is only stored in first fingerprint section of each fingerprint
One embodiment of fingerprint synthesizer 151.In one embodiment, the size of fingerprint section is selected as identical with scanning step.For example,
The size and scanning step of fingerprint section are all 4, so that the length of fingerprint is 4,8,12 and 16.The fingerprint synthesizer 151
Including 12 d type flip flops, 280,32 inputs and door 282 and 14 multiple selector 284 inputted.Fingerprint synthesizer 151
Input be 2 and at least one fingerprint composite signals 274 of fingerprint section Figure 27.If finding a fingerprint, fingerprint synthesizer 151 will
One matching 290 of output.The matching 290 is only when first fingerprint section of a synthesis fingerprint is first finger of a fingerprint
It is effective during line section.
It in one embodiment, can be 5 defeated by one in order to verify the multiple selector 284 of 290,4 input of matching
The multiple selector substitution entered, wherein the input terminal newly added is connected to zero and is chosen when fingerprint section is not first fingerprint section.
In one embodiment, by increasing logic gate in the different time delay stages, fingerprint synthesizer 151 can be extended, to support one
The fingerprint length information of the fingerprint section in addition to the first fingerprint section of a fingerprint.In another embodiment, when an at least fingerprint
When composite signal 274 is not stored in any fingerprint section, fingerprint synthesizer 151 can be by deleting multiple selector 284
Simplify with the logic gates of all shortest fingerprints being not used in condition code database.In another embodiment, fingerprint
Synthesizer 151 can be easily revised as with other scanning steps, the number of fingerprint section and fingerprint length.In an implementation
In example, because multiple fingerprint matchings are merged into the matching of one " at least one fingerprint matching ", fingerprint synthesizer 151 can be
Each clock cycle exports the result of a finger scan.
When multiple fingerprint section serial scans, the Hash-entry block for providing all matched " all fingerprint matchings " is right with its
Another implementation for the fingerprint synthesizer answered is for example shown in Fig. 2 F-G.When the fingerprint of multiple and different length is as a condition code
During with being matched, it may be necessary to which the stage is scanned after one or more.
As shown in Figure 2 F, each single item of the Hash-entry block 258 includes a hashed value of particular fingerprint section0B260, one
A fingerprint section Figure 27 2, all fingerprint composite signal 276, a type 264, a feature code group pointer 266 or feature
Code pointer 268 and a dislocation 270.All fingerprint composite signals 276 are all matched fingerprint composite signals, digit and
The number of fingerprint section that fingerprint synthesizer 152 is supported is identical.When a fingerprint section is have i fingerprint section one section of fingerprint
When, the i-th bit of all fingerprint composite signals 276 will be set.Corresponding domain phase in the other domains and Fig. 2 D of Hash-entry block 258
Together.In one embodiment, all fingerprint composite signals 276 are merely stored in first fingerprint section of fingerprint.Another
In a embodiment, all fingerprint composite signals 276 or any other fingerprint length information are stored in each of fingerprint
In a fingerprint section.In another embodiment, all fingerprint composite signals 276 are omitted.
Fig. 2 G demonstrate one when all fingerprint composite signals 276 are merely stored in first fingerprint section of each fingerprint
There is one embodiment of the fingerprint synthesizer 152 of multiple fingerprint length.The fingerprint synthesizer 152 scanning identical with Fig. 2 E
Step-length, fingerprint section size and fingerprint length.The fingerprint synthesizer 152 includes 16 d type flip flops, 280,52 inputs and door
282 and 13 input with door 286.Described one matching 0292 of the output of fingerprint synthesizer 152, a matching 1294, one
With 2296 and one matching 3298, respectively with one, two, the matching of the fingerprint of three and four fingerprint segment lengths.
In one embodiment, by increase in different time delay stage logic gate (as with door) or increase existing and door
Input, the fingerprint synthesizer 152 can be extended, to support the attached of the fingerprint section in addition to the first fingerprint section of a fingerprint
Adding fingerprint length information.In another embodiment, when all fingerprint composite signals 276 are not stored in any fingerprint section
When, it can be deleted for input of all fingerprint composite signals 276 with door and all logic gates related with fingerprint length,
With the simplification fingerprint synthesizer 152.In one embodiment, the Hash-entry block 258 in Fig. 2 F can be extended, including more
Set type 264, feature code group pointer 266 or condition code pointer 268 and dislocation 270;Every group of domain is used for each fingerprint length.This
Outside, in one embodiment, the information of the exact length in relation to fingerprint can also be used for the follow up scan stage.
Fingerprint synthesizer 152 can scan the fingerprint of all different lengths simultaneously.But due to type 264, feature code group
Pointer 266 or condition code pointer 268 and dislocation 270 are merely stored in a fingerprint section, multiple fingerprints for sharing the fingerprint section
It is stored together.The appropriate selection of the fingerprint of condition code can be preferably minimized the influence.However, in order to eliminate the shadow
It rings, the type 264 of the fingerprint of all matched fingerprint sections, feature code group pointer 266 or condition code pointer 268 and dislocation 270 can
To be stored in another by the table of matched fingerprint section meaning.All addresses to matched fingerprint section can be used for
Search an entry in the table.
In another embodiment, in addition to an at least fingerprint composite signal 274 or all fingerprint composite signals 276, refer to
Line section Figure 27 2 can be further omitted, and be stored in any fingerprint section from without any fingerprint composite signal.Fingerprint
Section can be synthesized according to the possible form of all fingerprints in described document information database.If multiple fingerprint sections meet all
Any form of the form of fingerprint is considered as a fingerprint matching.For a kind of special circumstances, if multiple fingerprint sections expire
The minimum requirements of all fingerprint formats of foot, is considered as a fingerprint matching.
When multiple fingerprint section parallel scans, the Hash-entry of " an at least fingerprint fingerprint matching " and " all fingerprint matchings "
Block implementation different with one of its corresponding fingerprint synthesizer is for example shown in Fig. 2 H-I.As illustrated in figure 2h, the Hash-entry block
Each of 259 includes a hashed value0B260, a fingerprint section matching 278, a type 264, a feature code group pointer 266
Or condition code pointer 268 and one dislocation 270.The fingerprint section matching 278 is the match bit of fingerprint section, other domains and figure
Corresponding domain is identical in 2D.The fingerprint section matches some spy of 278 pairs of parallel fingerprint synthesizers as shown in figure 2i
Fixed section adaptation has specific meaning (such as section adaptation 0, section adaptation 1, section adaptation 2 or section adaptation 3).Work as fingerprint
Section is i-th of fingerprint section of any fingerprint, and the fingerprint section matching 278 of i-th section of adaptation will be set.In one embodiment
In, i-th section of adaptation only stores i-th of fingerprint section of fingerprint, so as to which fingerprint section matching 278 is always set, it is convenient to omit.
In one embodiment, in order to further reduce false positive synthesis, a similar at least fingerprint composite signal 274 or all fingerprints synthesize
The fingerprint length information of information 276 can be stored in first fingerprint section of a fingerprint or all fingerprint sections.
Fig. 2 I demonstrate the one embodiment without the parallel fingerprint synthesizer 153 of fingerprint length information.The fingerprint
Synthesizer 153 include one 2 input with door 282, one 3 input with door 286, one 4 input with door 288 and one 4
Input or door 285.The fingerprint synthesizer 153 exports a matching 0292 for all fingerprint matchings, and one matches 1294, one
It is a matching 2296 and one matching 3298 and be an at least fingerprint matching export one match 290.In one embodiment, one
A totality fingerprint length filters can use matching 0292, matching 1294, matching 2296, matching 3298 and matching 290, with
Filter out impossible fingerprint length.In one embodiment, by increasing logic gate, the fingerprint synthesizer 153 can be by
Extension, to support the fingerprint length information stored in the first fingerprint section or all fingerprint sections of each fingerprint.In a reality
It applies in example, the long scan step-length that parallel fingerprint section scanning can be equal to fingerprint segment length several times with one, to accelerate sweep speed.
In general, in one embodiment, each fixed length condition code can be broken down into multiple segments to carry out independence
It scans on ground.As scan with synthesis fingerprint section as, all segments of a condition code can be first scanned, then again by serially or
It is parallel to close.In general, the number of different characteristic code segment length, also more much smaller than the number of different characteristic code length even if not being one,
So as to accelerate the speed for the scan method for needing individually to be scanned for each modal length.In one embodiment,
The identical condition code section of two or more length is by parallel scan.
In another embodiment, one or more of one or more hash tables and one or more Bloom filters
It is used to scan through all referring to bright condition code section.In one embodiment, when at least one condition code matches, the fingerprint
Synthesizer 150 can be used to that identified condition code section is serially or parallelly synthesized any condition code matching.Shown in Fig. 2 D-I
Data structure and embodiment can be used for composite character code section.In another embodiment, one or more finite automatas
(FA) and one or more of one or more content adressable memory (CAM) is used to matched condition code section to synthesize
It is matched for any condition code.
Fixed length signature scan
As shown in Figure 1, the fixed length condition code Lookup engine 160 includes a condition code search engine 162, a condition code
Verify device 164 and a fixed length condition code database 166.Condition code search engine 162 can identify a possible fixed length feature
Code, the fixed length feature subcode of possible random length condition code or one include multiple possible fixed length condition codes or fixed length
The condition code family of feature subcode.The possible fixed length condition code or feature subcode identified by condition code search engine 162 will be by
Condition code is verified device 164 and is verified.Fixed length condition code database 166 is then characterized yard search engine 162 and condition code verifies device 164
Database.
Fixed length condition code database 166 can be implemented with plurality of data structures.In one embodiment, such as Fig. 3 A-B institutes
Show, fixed length condition code database 166 is one and multiple condition code chained lists 350 are chained up with constitutive characteristic code group chained list 300
Two-dimensional chain table.The each single item of feature code group chained list 300 includes a next yard of pointer 302, a dislocation 304 and a spy
Levy code pointer 306.Next yard of pointer 302 is the pointer of the next item down for being directed toward feature code group chained list 300, condition code pointer
306 be a pointer for being directed toward special characteristic code chained list 350.Dislocation 304 is to refer to from the first unit of particular fingerprint to condition code
The dislocation of the first unit of the signified specific character string condition code of needle 306.
Each fixed length condition code or feature subcode can be broken down into multiple condition code sections 352, to form condition code chained list
350.Described document information section 352 can be suitable by scanning since the first basic unit of a fixed length condition code or feature subcode
Sequence connects.In one embodiment, the size of described document information section is different.In another embodiment, all spies
The size for levying code section is identical, and size can most preferably be selected according to system architecture.The each single item of condition code chained list 350 includes one
A condition code section 352, a condition code section film 354, a latter end 356, a next segment pointer 358,360 and of type
One number 362.Next segment pointer 358 is a pointer for being directed toward next section, and latter end 356 is latter end mark.When type 360 is
When 0, number 362 is fixed length feature subcode number 364;Otherwise, number 362 is characterized code number 366.Condition code section film 354 is used
In the specific matching condition for specifying each basic unit or even sub- basic unit, including " not having to matching ", " equal ", " not phase
Deng ", " within the scope of one ", " outside a range ", " case-insensitive " and " case sensitive ".The matching condition
It can be realized by the input source and output source of selecting unit comparator.If a fixed length condition code or feature subcode are not
The integral multiple of the size of condition code section 352 can be filled out in the tail portion of the fixed length condition code or feature subcode with for up to " feature
The size of code section 352 subtracts one " a " 0 " or other values, and the film of corresponding fills unit is set as " not having to matching ".
In one embodiment, the condition code section film 354 of each basic unit is 3 bits." case-insensitive "
When, the first bit is set as 0;When " case sensitive ", the first bit is set as 1.Latter two bit is set when " equal "
0 is set to, latter two bit is set as 1 when " unequal ", latter two bit is set as 2 when " not having to matching ", latter two ratio
Special position is retained for 3.More film bits can be used for realizing other matching conditions as needed, for example, pre-defined
Range (numerical character or alphabetic character), symbol class or any range.In another embodiment, in order to improve storage efficiency,
The code film of one or more features code section or fixed length condition code or feature subcode can be used alone or with basic unit or subunit
Used in the code film cooperation of this unit.
In one embodiment, described document information search engine 162 can search feature code group chained list 300 until the chained list
Tail item, i.e. next yard of pointer 302 is null pointer.The each single item of feature code group chained list 300 all returns to one feature of a direction
The condition code pointer 306 of code chained list 350.
Described document information, which verifies device 164, can be verified as each progress of condition code chained list 350 condition code verification.Condition code core
Real device 164 is since the first segment of condition code chained list 350, by each condition code section of scanning sequency paragraph by paragraph examination.If mismatched,
Condition code, which verifies device 164, will stop hypomere inspection;Otherwise, condition code verify device 164 will check entire condition code chained list 350 until
Its tail, i.e. latter end 356 are 1.If finding a matching, when type is 0, the fixed length subcode for random length condition code is matched,
Condition code, which verifies device 164, will export a fixed length feature subcode number 364;When type is 1, match as fixed length condition code, spy
Sign code, which verifies device 164, will export a fixed length condition code number 366.
In another embodiment, in order to which false positive matching is made to drop to zero with the rapid development of hop count, condition code section 352
It can be linked together by an optimal sequence.The dislocation for representing current signature code section to next condition code section will be added to condition code chain
In each single item of table 350.Although the length of condition code section can differ, the length of a fixed condition code section can be chosen
It selects.
Fig. 4 A-B demonstrate an implementation of a condition code unit comparator 400 and a condition code section comparator 450
The block diagram of example.Described document information unit comparator 400 carries out unit comparison in the lookup of fixed length condition code.Condition code unit ratio
Include a code film decoder 402 compared with device 400, one 2 input multiple selector 404, an equality comparator 406, two non-
Door 408, one 4 input multiple selector 410, one 2 input or door 412 and a range comparator 414.Code film decoder
402 can be decoded as code film bit equality comparator 406 and the control signal output and input of range comparator 414.One
In a embodiment, range comparator 414 is optionally used to support the global scope or each of predefined each String field
The global scope of condition code.In one embodiment, m range comparator 414 can be used for supporting m predefined global models
It encloses.In another embodiment, units match 416 can be with " not having to match " bit logic or being exported later.
Multiple condition code unit comparators 400 and a multi input can be used to construction feature code section comparator with door 452
450.The data cell of condition code section comparator 450 is typically a byte, but is alternatively four bits or any other size.
In another embodiment, condition code unit comparator 400 can be by a condition code unit comparator 480 as shown in Figure 4 C
Substitution, with supported feature code unit local scope.The unit of each tape code film in described document information section 352 can be extended to
The unit range of one tape code film or the upper bound of given condition code unit of tape code film and the unit pair of lower bound.
In one embodiment, fixed length condition code Lookup engine 160 shown in figure 1A may require to look up multiple condition codes
Chained list.However, it is necessary to the probability for searching multiple condition code chained lists is usually very low.When requiring to look up multiple condition code chained lists,
The differential encoding that multiple condition codes in a group condition code are mutually encoded can be used.
For example, in one embodiment, as shown in Figure 5A select string elements tree 500 can be as shown in figure 1A
The searching data structure of condition code search engine 162.String elements tree 500 is selected to include burl 520a-520e.Select character string
There are two branches for each burl 520 of cell tree 500, and a matching branch is by pointer1530 is signified, another mismatches branch
By pointer2532 is signified.As shown in Figure 5 B, there are two types of different burls 520 for selection string elements tree 500:Type 528 is 1
Leaf and the non-leaf that type 528 is 0.Matched non-leaf always points towards another burl 520, and unmatched non-leaf is directed toward tree
On another burl 520 or empty section.Matched leaf always points towards a condition code family chained list 550 as shown in Figure 5 C, no
Matched leaf is directed toward another burl 520 or empty section on tree.
In one embodiment, as shown in Figure 5 B, each burl 520 of selection string elements tree 500 includes a mistake
Position 522, a unit 524, a unit membrane 526, a type 528, a pointer1530 and a pointer2532.Type
528 include leaf as described above and non-leaf.Selected unit 524 can be in any position of condition code, when burl 520 is not
During tree root, position provided by the dislocation 522 of previous burl 520 (for example, the front nodal point of node 520b is node 520a, but
It is because node 520a is the tree root for selecting string elements tree 500,520a does not have front nodal point), when burl 520 is tree root
When (for example, node 520a), position is provided by the dislocation 270 of the occurrence in fingerprint hash block chained list 250 (see Fig. 2 B).Often
A burl include one with the condition code section film 354 in condition code chained list 350 (see Fig. 3 B) corresponding unit membrane 526.
At least one basic unit of any two character string condition code is different, as long as one of character string condition code is not
It is the subsegment of another character string condition code.Therefore, a unit 524 can be chosen to distinguish at least two character string features
Code, so that after the unit 524, at least one character string condition code can be eliminated.By to selection
The lookup of string elements tree 500, the different condition code of at least one basic unit can be distinguished.In one embodiment
In, shown selects string elements tree 500 as binary tree.In another embodiment, corresponding selection character can be built
K points of trees of string location.K basic unit of the k condition code on same basic unit position in one feature code group can be made
The burl set for k points, although each condition code can also be contributed in multiple basic units to each burl of k points of trees.
In one embodiment, a character string condition code can be a part for another character string condition code.These
The condition code for having mother-child relationship (MCR) cannot be chosen the differentiation of string elements tree 500.In one embodiment, as long as finding therein
Any one condition code is just not required to further search for, it is not necessary that distinguishing has the condition code of mother-child relationship (MCR).Therefore, it only needs to scan
Wherein shortest condition code.But in another embodiment, it needs to distinguish all condition codes or needs to identify longest spy
Levy code.
In order to support the condition code family of mother-child relationship (MCR), in one embodiment, a condition code as shown in Figure 5 C
Family's chained list 550 can be characterized the searching data structure that code verifies device 164 (see Figure 1A).Condition code family chained list 550 it is every
One includes a type 552, a dislocation 554, a condition code section 556, a condition code section film 558, a next item down
Pointer 560, a numbering type 562 and a number 564.In one embodiment, for supported feature code family, feature
There are two kind items for code family chained list 550:The result items that type 552 is 0 lookup item and type 552 is 1.In inspection, each is looked into
When looking for, condition code section 556 will be compared according to condition code section film 558.Condition code section 556 and condition code section film 558 and feature
Corresponding domain in code chained list 350 is identical (see Fig. 3 B).But result items are not required to any condition code section and compare.
In one embodiment, all matched condition codes of the system searching, and export each matched condition code
Number 564.However, the lookup of condition code will proceed to the tail portion of condition code family chained list 550 or the next item down pointer 560 always
For null pointer, to find all female condition codes.There are two types of numbers 564:Feature subcode number 566 and condition code number 568.It compiles
Depending on numbers 564 type is by numbering type 562.When numbering type 562 is 0, number 564 is feature subcode number 566, is represented
The fixed length feature subcode of one random length condition code;Otherwise, number 564 is condition code number 568, represents a fixed length feature
Code.
In one embodiment, condition code family chained list 550 since a minimus generation or shortest subcode, always
It is connected to an oldest generation or longest subcode.Dislocation 554 is specified from the first unit of existing condition code section 556 to lower condition code section
The dislocation of 556 first unit.If existing condition code section mismatches, the lookup of condition code family chained list 550 can stop.It is fixed to provide
Arbitrary dislocation 554 between feature subcode makes to terminate into possibility in advance based on unmatched.
In one embodiment, every generation of condition code family chained list 550 can only support a condition code.If one is specific
, there are multiple condition codes in generation, can use multiple condition code families chained list 550, each condition code in the specific generation needs a spy
Zheng Ma families chained list 550.Multiple condition code families chained list 550 can be distinguished with selection string elements tree 500.
In another embodiment, the condition code of tape code film can use one or more memories content addressed including one
Memory (CAM) is stored and is scanned.
Random length signature scan
As shown in Figure 1, the fixed length condition code Lookup engine 160 exports all matched random length features by scanning sequency
The number of the fixed length feature subcode of code, size, the position in the String field.With identified fixed length feature subcode
Identified fixed length feature subcode is synthesized any random length condition code by information, random length condition code Lookup engine 180.One
In a embodiment, one or more finite automatas (FA) are used to synthesis fixed length feature subcode.In another embodiment, no
Fixed length condition code Lookup engine 180 include a condition code rule searching device 182, a condition code state verification device 184, one
Condition code composition rule database 186 and a condition code state table 188.Condition code rule database 186, which provides, how will
The fixed length feature subcode of random length condition code synthesizes the static rule of random length condition code.188 dynamic of condition code state table
State of a process of the ground for an input String field storage synthesis.
Condition code rule searching device 182 is found out and matched fixed length feature from condition code rule database 186
Code numbers relevant rule, and the relevant condition code rule is supplied to condition code state verification device 184.Condition code state
Matched fixed length feature subcode is synthesized random length condition code, and update spy by validator 184 according to condition code composition rule
Levy code state table 188.
Condition code rule database 186 and condition code state table 188 can have a variety of different data structures.In a reality
It applies in example, condition code rule database 186 can be condition code regulation linked 600 as shown in Figure 6.Condition code regulation linked
Pointed by 600 numbers of feature subcode found by fixed length condition code Lookup engine 160.Multiple random length condition codes may include phase
Same fixed length feature subcode.Condition code regulation linked 600 can be special by all fixed length representative comprising special characteristic subcode number
The random length condition code of sign subcode links together.
In one embodiment, each single item of condition code regulation linked 600 corresponds to a random length condition code.Each single item packet
Containing a condition code number 602, a sequence 604, a last code 606, a next yard of pointer 608 and a distance range
Information 610.Condition code number 602 represents specific random length condition code.Sequentially 604 is determine representated by feature subcode number
Sequence of the long feature subcode in all fixed length feature subcodes of the random length condition code.Last code 606 indicates that the fixed length is special
Sign subcode whether be random length condition code the last one fixed length feature subcode.Last code 606 indicates random length signature scan
Terminate.Next yard of pointer 608 is directed toward 600 the next item down.Distance range information 610 is indicates the fixed length feature subcode under
The optional domain of the distance range (i.e. minimum and most unit number between two fixed length feature subcodes) of fixed length feature subcode.For example,
When the range or longest distance or the shortest distance are previously given or when being infinitely great, distance range information 610 can be saved
Omit or be reduced to the shortest distance or longest distance.
In another embodiment, each single item of condition code regulation linked 600 can include one or more additional fields, with
The one or more random length feature subcodes between two or more fixed length feature subcodes of description.For example, " a mould can be used
One " pattern " of repetition that one is used for filling up distance range information 610 is added to feature by formula " or one " pointer of pattern "
In each single item of code regulation linked 600, to describe random length feature subcode.
As shown in fig. 7, in one embodiment, condition code state table 188 can use one or more features code state
Chained list 700 is implemented.Each condition code state-chain-table 700 can be a String field of particular link, and dynamic memory is
The condition code state of all fixed length feature subcodes of all random length condition codes of identification.Condition code state-chain-table 700 it is each
Item includes a condition code number 702, previous subcode sequence (lorder) 704, a next subcode position 706
(nloc) and a next yard of pointer 708 (nptr).Condition code number 702 is the number of specific random length condition code.Previous son
Code sequence 704 be the specific random length condition code previous fixed length feature subcode number to specific fixed length feature son
The sequence of code.Next subcode position 706 be next fixed length feature subcode of the specific random length condition code number to
A fixed length feature subcode active position range.
In one embodiment, there are one condition code state-chain-tables 700 for each String field of each line.In general,
At each moment, each line is only scanned there are one String field, and only there are one condition code state-chain-tables 700.Condition code
State-chain-table 700 may include all matched subcodes of all random length condition codes of a String field of particular link
Entire effective history.
Condition code state-chain-table 700 can be dynamic.In one embodiment, if the volume of the fixed length feature subcode
The first fixed length feature subcode of number signified fixed length feature subcode for a random length condition code, i.e. its sequence 604 are 1, and
There are no the item of the specific random length condition code, a new item can be added into the spy of a String field of particular link
It levies in code state-chain-table 700.If the position of first unit of existing matched fixed length feature subcode is not in next subcode bits
Put 706 to effective range in or time-out occur, one of described document information state-chain-table 700 can be deleted.It is described
One of condition code state-chain-table 700 can also be deleted after based on the item matched random length condition code is found
It removes.In one embodiment, all items of the condition code state-chain-table 700 of a String field of particular link can be one
The end of a String field is deleted.
As shown in figures 1 to 6, in one embodiment, described document information rule searching device 182 is from the fixed length condition code
Lookup engine 160 receives the number of a fixed length feature subcode.Described document information rule searching device 182 searches the entire fixed length
The condition code regulation linked 600 of the number meaning of feature subcode, and each single item of described document information regulation linked 600 is given
Information (such as { condition code number 602, sequence 604, last code 606, distance range information 610 }) and fixed length condition code lookup are drawn
Hold up 160 to information (as the position of the first unit of fixed length feature subcode, the length of fixed length feature subcode, line number,
String field is numbered }) it is sent to successively together in condition code state verification device 184, reach up to described document information rule chain
The tail portion of table 600 (i.e. next yard of pointer 608 is null pointer).
Described document information state verification device 184 is looked into the information of each single item that described document information rule searching device 182 is given
The condition code state-chain-table 700 for looking for the line number signified.To each single item of condition code state-chain-table 700, if condition code
Number 602 and condition code number 702 are identical, and sequence 604 is equal to previous subcode sequence 704 plus one, and the of fixed length feature subcode
As soon as in the range of the effective marker that the position of basic unit is given in next subcode position 706 is put, for a matching.To feature
Each occurrence of code state-chain-table 700, if last code 606 is 1, output described document information number 602, and by the Xiang Congte
It is deleted in sign code state-chain-table 700;Otherwise, the item is updated, and the value 704 of previous subcode sequence is allowed to be equal to the value of sequence 604,
The value of next subcode position 706 be equal to fixed length feature subcode first unit position, the length of fixed length feature subcode and away from
The summation of value from range information 610.It is not required to do anything to unmatched item.
In one embodiment, when condition code state-chain-table 700 is short and particular link only needs scanning one in particular moment
String field can be scanned with described document information state-chain-table 700.But if condition code state-chain-table 700 is grown or spy
Multiple String fields need to be scanned in particular moment by determining line, and other searching data structures can be used for described document information state table
188.In one embodiment, described document information state table 188 can be analogous to one of the data structure in Fig. 2A-C
Condition code state Bloom filter or a condition code state hash table.Condition code state Bloom filter or condition code state dissipate
The hashed key of list can be 3 tuples { line is numbered, String field number, condition code number }.In one embodiment, when
Each line only has a String field in particular moment and is scanned, and is not required to be numbered with String field.
As shown in figure 8, in one embodiment, the Hash-entry block 256 in Fig. 2A-C can be hashed item 856 and take
Generation.The each single item of Hash-entry block 856 includes special sign code number 860, a line number 862, a String field
Number 864, a previous subcode sequence 866 and a next subcode position 868.Previous subcode sequence 866 and next subcode bits
The definition for putting 868 and previous subcode 704, the hashed keys 3 yuan identical with the definition of next subcode position 706 of sequence in described Fig. 7
Group { condition code number 860, line number 862, String field number 864 } can be stored and be made to solve Hash collision.
To each single item of Hash-entry block 856, if former hashed key is identical, sequence 604 is equal to previous subcode sequence 866 plus one, and fixed
It it is just one within the scope of the active position that the position of the first unit of long feature subcode is given in next subcode position 868
Match.To each occurrence of Hash-entry block 856, if last code 606 is 1, output described document information number 602, and by described in
Item is deleted from Hash-entry block 856;If last code 606 is 0, the item of the Hash-entry block 856 is updated, and allows previous subcode
Sequentially 866 are equal to sequence 604, and next subcode position 868 is equal to the position of the first unit of fixed length feature subcode, fixed length feature
The length of code and the summation of distance range information 610.Xiang Ze is mismatched to remain unchanged.
In one embodiment, when the sequence of two continuous fixed length feature subcodes and distance range meet condition code rule
It is one or more in chained list 600, condition code state verification device 184 will further verify whether described two fixed length feature subcodes
Between string segments and condition code regulation linked 600 one or more meanings one or more random length feature subcodes
Match.When the character string in the specific character string field between described two continuous fixed length feature subcodes and the feature
During the random length feature subcode matching of one or more meanings of code regulation linked 600, described document information state-chain-table 700
It will be updated with the new fixed length feature subcode.
The design and performance of scanning system
In one embodiment, the speed of quick character string signature scan engine 100 is by the speed of finger scan engine 140
Degree limitation, for example, when the false positive matches the sufficiently small and follow up scan stage and properly designed.When fingerprint is whole as one
When body is scanned by length one by one, the speed of finger scan engine 140 depends on scanning step, the number and clock of fingerprint length
Velocity composition.In one embodiment, the speed of the scanning engine 100 is (s/m) * R, and wherein s is scanning step, and m is refers to
The number and R of line length are clock speed.If for example, scanning step be 8 bytes, 4,8,16 and 32 byte of fingerprint length, when
Clock frequency is 500MHz, and the speed of the scanning engine 100 is 8* (8/4) * 500MB/s=8Gbits/s.
In another embodiment, if fingerprint is first segmented carry out parallel scan, then " an at least fingerprint matching " is used again
Serial synthesis, and fingerprint segment length is identical with scanning step, the speed of an individual scanning engine 100 is s*R.In a reality
It applies in example, when s and R takes the value identical with precedent, the scanning engine 100 can be with one character string of 32Gbps velocity scannings
Field.In addition, in another embodiment, fingerprint can first be segmented and be scanned, then parallel synthesis again, scanning step is so as to sweep
Can be further improved by retouching speed by n times, and wherein n is the number of the fingerprint section of parallel scan and synthesis.When fingerprint segment length and R take
During identical with precedent value, if n is 4, i.e. for 4 fingerprint sections by parallel scan and synthesis, scanning step is 32 bytes, described to sweep
Retouching engine 100 can be with one String field of 128Gbps velocity scannings.
Sweep speed discussed above is the speed of a single signature scan engine.In one embodiment, when
When carrying out parallel scan with several signature scan machines, signature scan speed can further improve several times.
In one embodiment, the selection of the overall structure and parameter of a signature scan system can according to one or
Multiple following factors:The fixed length feature subcode of the sweep speed of character string condition code, fixed length condition code or random length condition code
Size, the size of similitude and condition code database between multiple condition codes or feature subcode select, to ensure that fingerprint is swept
Engine 140 is retouched, fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180 can meet specific scanning system
Requirement.For example, scanning step can be selected according to system requirements.As shown in table 1, scanning step is longer, quick character string
The speed of signature scan engine 100 is faster.But the minimum dimension of fixed length condition code or feature subcode just need it is bigger, with
And the number for being inserted into and deleting is more.The step-length that exposes thoroughly also limits the selection of the fingerprint of each condition code, and increases fingerprint and touch
It hits and the matched probability of fingerprint false positive.
Table 1:The selection of scanning step
In addition, scanning step can especially be limited so as to sweep speed by the size of minimum fixed length condition code or feature subcode.
In one embodiment, in order to avoid short condition code is separately scanned, scanning step can be by table 1 according to fixed length condition code or spy
The minimum dimension of subcode is levied to select.
Table 1 assumes that each basic unit of all fixed length condition codes and feature subcode can be with for fingerprint.In a reality
It applies in example, all fixed length condition codes and feature subcode at least indicate completely in a shadow space.In addition, at another
In embodiment, scanning step is so as to which sweep speed is further by the shadow indicated completely of all fixed length condition codes and feature subcode
Minimum dimension limitation.Therefore, can make into the third column heading of table 1 " all fixed length condition code or feature subcode it is complete
All referring to the minimum dimension of bright shadow ".
In another embodiment, in order to improve sweep speed, larger scanning step can be selected.Shorter than described scanning
The scannable condition code of step-length can be scanned separately, for example, with above-mentioned scan method or any other scan method.When only few
Number fixed length condition code or feature subcode are shorter, and it is very effective to increase scanning step.
In another embodiment, the engine number of different scanning flow line stage can be different.Engine can be according to institute
The requirement of particular system is stated to select.For example, for particular system, a scanning pre-processing engine 120, four fingerprints can be used
Scanning engine 140, a fixed length condition code Lookup engine 160 and two random length condition code Lookup engines 180.
In one embodiment, multiple finger scan engines 140 can be used, so that each finger scan engine 140
One group of fingerprint length is covered, to provide multiresolution finger scan.In one embodiment, all fingerprints are all broken down into phase
It is scanned with the fingerprint section of length and the identical scanning steps of all fingerprint Duan Douyong.For scanning all fingerprint length groups
The number of the finger scan engine of every group of fingerprint length can be all identical.
In another embodiment, fingerprint is broken down into the fingerprint section of different length, and the fingerprint section of different length is according to one
The average length of group fingerprint length is scanned with the scanning step of different length, so that the finger that one group of average length is longer
Fingerprint longer fingerprint section and the larger scanning step of line length, the fingerprint of the shorter fingerprint length of one group of average length are used
Shorter fingerprint section and smaller scanning step.For example, the fingerprint section and scanning step of 8 basic units can be used for length is
The fingerprint of 8,16,24,32 and 40 basic units, and the fingerprint section of 2 basic units and scanning step can be used for length is
The fingerprint of 2,4 and 6 basic units.
In one embodiment, drawn to balance with the finger scan of the fingerprint section of different scanning step scan different length
140 speed is held up, the number of the finger scan engine 140 of the shorter fingerprint section of the shorter scanning step scanning of use can be more than is swept with longer
Retouch the number of the finger scan engine 140 of the longer fingerprint section of step scan.In another embodiment, it is not synchronized in order to balance use
The speed of the finger scan engine 140 of the memory of degree, the number of the finger scan engine 140 of the slower memory of use can be more than with very fast
The number of the finger scan engine 140 of memory.In general, in another embodiment, the number of finger scan engine 140 can
Depending on by the product of scanning step and memory speed.The finger scan engine of the product of smaller scanning step and memory speed
140 number can be more than the number of the finger scan engine 140 of the product of larger scanning step and memory speed.
In one embodiment, the finger scan engine 140 of the identical same group of fingerprint of scanning of multiple scanning steps covers
Multiple nonoverlapping positions staggeredly in one input String field, so that the multiple finger scan engine is always swept
Retouch the number and the product of the scanning step of former single finger scan engine that step-length is finger scan engine.For example, in order to provide
Identical sweep speed, it can be scanning step be 8 bases that scanning step, which is the number of the finger scan engine of 2 basic units,
4 times of the number of the finger scan engine of this unit.In another embodiment, same group of the identical scanning of multiple scanning steps
The position staggeredly of multiple portions overlapping in the one input String field of the covering of finger scan engine 140 of fingerprint, so as to make
The total scanning step for obtaining the multiple finger scan engine is more than the scanning step of former single finger scan engine, but is swept than fingerprint
The product for retouching the number of engine and the scanning step of former single finger scan engine is small.
In one embodiment, the fingerprint database 148 of different fingerprint segment length is storable in the memory of friction speed,
So that the memory for shorter fingerprint section is faster than the memory for being used for longer fingerprint section.In one embodiment, it is different
Fixed length condition code database 166 corresponding to fingerprint length group is storable in the memory of friction speed, so that for one
The memory of the shorter fingerprint of group average length is faster than the memory for being used for one group of longer fingerprint of average length.
In one embodiment, the fingerprint database 148 of the shorter than fingerprint of specific length (for example, 9 basic units) can
It is stored in one of most fast memory (for example, caching of on-chip memory or CPU) of scanning system.In one embodiment,
The all or part of fixed length condition code database 166 corresponding to the fingerprint of shorter than described specific length also can and fingerprint database
148 are collectively stored in one of most fast memory of scanning system.In another embodiment, multiple fingerprints of same fingerprint group
Scanning engine can share a fingerprint database being stored in the memory of a multiport 148.
In one embodiment, one or more engines discussed above in different flow line stages can be any
Other scan methods are replaced.For example, in one embodiment, a content adressable memory (CAM) can be used as finger scan
Engine 140 is used to the shadow of scanning fingerprint, and fixed length condition code Lookup engine 160 and random length condition code Lookup engine 180
It still can be used to further scanning feature code.In another embodiment, CAM can be used as finger scan engine 140 by with
Come in former spacescan one or more fingerprint.In one embodiment, one determine or non-deterministic finite automaton (DFA or
NFA) synthesis fingerprint section can be used to as fingerprint synthesizer 150.In another embodiment, a DFA or NFA can make
A fixed length feature subcode, which is used to, for random length condition code Lookup engine 180 synthesizes random length condition code.
Other embodiments of the invention can scan other string datas.For example, in biosystem, a hereditary generation
Code sequence can be used as a String field.Describing the condition code of specific gene can be used for forming from one by genetic data
String field in identify the specific gene sequence.For example, specific gene can be with the scanning machine by specific special
Code is levied to identify.
The present invention and the described all feature operations of this specification, including the described structural approach of this specification or phase
When structural approach, or both combination, can use and Fundamental Digital Circuit or be implemented with computer software, firmware or hardware.
The present invention can be implemented on one or more computer program products, i.e., one or more be stored in one such as one it is machine readable
Storage device or a transmitting signal information carrier in, a such as programmable processor, one or multiple stage computers
Digital processing device is performed or computer program that it is controlled to operate.One computer program (an also referred to as program, it is soft
Part, application software or code) any programming language form for including compiling or interpretative code is can be written as, and can be deployed as
Including as a stand-alone program or being used as a module, component, subprogram or other units suitable for computing environment
Any form.One computer program might not correspond to a file.One program can be with other programs or data together
Deposit in one file, can be a Single-issue program file, can be multiple coordinations file (for example, storage one
Or multiple modules, multiple files of subprogram or partial code).One computer program can be deployed to a computer,
The multiple stage computers in same place or be distributed in the multiple stage computers interconnected with communication network in multiple places perform.
Processing routine described by this specification and logic flow include the method and step of the present invention, can use and perform one
Or one or more programmable processors of multiple computer programs perform, with by operation input data and generate output come
Implement the function of the present invention.The logic circuit of specific use, such as field programmable gate array (FPGA) or application-specific integrated circuit
(ASIC), it can also be used to perform the processing procedure and logic flow and implement the device of the invention.
The processor for being adapted for carrying out computer program includes, such as the microprocessor of general and special purposes and any
Any one or more processors of the digital computer of type.In general, processor will from read-only memory or with
Machine access memory or both receives instruction and data.The basic module of one computer is a processing for being used for execute instruction
Device and the memory of one or more store instructions and data.In general, a computer is further included or is efficiently couple to
One or more is used for storing the mass-memory unit (e.g., disk, magneto-optic disk or CD) of data, to receive, sends or receives
Send out data.Include the non-volatility memorizer of all forms suitable for the information carrier of storage computer program instructions and data,
Including such as EPROM, EEPROM, the semiconductor memory of quick flashing (flash) memory;Such as internal hard drive or the magnetic of moveable magnetic disc
Disk;Magneto-optic disk;With CD-ROM and DVD-ROM CDs etc..Processor and memory can be mended by the logic circuit of specific use
Fill or be included into the logic circuit of specific use.
In order to provide the interaction with user, the present invention can have display equipment at one, and a keyboard and a pointer are set
Implement on standby computer.Show equipment, such as a cathode-ray tube (CRT) or liquid crystal display (LCD) for user for showing
Show information;Keyboard and pointing device such as mouse or trace ball, then help user to provide input directly in computer.It is other types of
Equipment can also be used for providing the interaction of user and computer;For example, it can be any form that computer, which is supplied to the feedback of user,
Sense feedback, for example, visual feedback, audio feedback or touch feedback;The form that user is input to computer may be to appoint
What form, including acoustics, voice or sense of touch.
The present invention can be implemented in a computer system, after one such as one data server
End component such as the intermediate module (Middleware) of application server or has comprising one such as one comprising one
The client computer of one graphic user interface or by its user can be interactive with the implementation of the present invention network (Web) it is clear
Look at device front end assemblies or any this kind of rear end, intermediate or front end assemblies combinations.The component of the system can be by such as
Any form of communication network or the digital data communications of medium are connected with each other.The example of communication network includes local network (LAN)
With wide area network (WAN), such as internet.
This computing system can include client and server.Client and server is typically remote from other side, usually passes through
One communication network interaction.The relationship of client and server result from run on respective computer respectively have client and
The calculation procedure of service relation.
Fig. 9 demonstrates an example of such computer, and one can be used for being practiced or carried out the device of the invention
Or the block diagram of the programmable processing system (abbreviation system) 910 of method.The system 910 include a processor 920, one
A random access memory (RAM) 921, (the writeable of such as one such as flash ROM read-only is deposited for read-only memory 922
Reservoir (ROM)), a hdd controller 923, the control of an image controller 931 and an input/output (I/O)
Device 924 processed, and coupled by processor (CPU) bus 925.The system 910 can be programmed such as in ROM, can also be used another
One program source (such as floppy disk, CD-ROM or another computer) loads a program so as to be programmed (and being reprogrammed).
Hdd controller 923 and hard disk 930 are coupled available for storing executable computer program.
The controller 924 of input/output is to be connected to an input/output interface by an input/output bus 926
927.Input/output interface 927 is in serial link, LAN, on wireless connection and parallel link etc. communication link, receive and
Transmit analog or digital data (such as one group of stage photo, picture, film and animation).
One display 928 and a keyboard 929 are also connected on input/output bus 926.In addition, different line
(different buses) can be used for connecting input/output interface 927, display 928 and keyboard 929.
The present invention is described in a manner of specific embodiment.Other embodiments the scope of the appended claims it
It is interior.For example, the step of invention, can perform, and remain to reach desired result in differing order.
Claims (10)
1. a kind of character string signature scan method, the method scans one or more features code with larger scanning step,
The shorter than described scannable condition code of scanning step is separately scanned with different scanning method.
2. a kind of character string signature scan method, the method scans one or more features code with larger scanning step,
The shorter than described scannable condition code of scanning step is separately scanned with same scan method but different scanning step-length.
3. a kind of character string signature scan system, the system comprises scan one or more features with larger scanning step
Code, the device that the shorter than described scannable condition code of scanning step is separately scanned with different scanning method.
4. a kind of character string signature scan system, the system comprises scan one or more features with larger scanning step
Code, the dress that shorter than described scanning step scannable condition code same scan method but different scanning step-length are separately scanned
It puts.
5. a kind of character string signature scan method, the method includes:
Multiple condition codes are divided into multiple feature code characters, wherein every group of condition code includes one or more features code;
Any condition code in a String field is identified, including being looked into multigroup scanning engine scanning String field
Multigroup condition code is looked for, wherein every group of scanning engine includes one or more identical scannings for being used to search one set of feature code
Engine to cover the position of one or more groups of String fields staggeredly, use by the wherein scanning engine of different scanning engine group
The number of the scanning engine of the shorter scanning engine group of different scanning steps and scanning step is longer compared with scanning step to sweep
The number for retouching the scanning engine of engine group is more;With
Export any matched condition code in the String field.
6. a kind of character string signature scan system, the system comprises:
One condition code preprocessing module that multiple condition codes can be divided into multiple feature code characters, wherein every group of condition code includes one
A or multiple condition codes;With
One signature scan module that can recognize that any condition code in a String field, described document information scan module
The String field is scanned to search the scanning engine of multigroup condition code including multigroup, wherein every group of scanning engine packet
It includes one or more identical for searching the scanning engine of one set of feature code to cover one or more groups of characters staggeredly
The position of string field, the wherein scanning engine of different scanning engine group are swept with different scanning steps and scanning step is shorter
The number for retouching the scanning engine of engine group is more compared with the number of the scanning engine of long scanning engine group compared with scanning step.
7. a kind of character string signature scan method, the method includes:
Multiple condition codes are divided into multiple feature code characters, wherein every group of condition code includes one or more features code;
Any condition code in a String field is identified, including being looked into multiple scanning engines scanning String field
Look for multigroup condition code, wherein each scanning engine is for searching one set of feature code, wherein different scanning engine at one or
The memory of friction speed is used in multiple scanning flow line stages;With
Export any matched condition code in the String field.
8. a kind of character string signature scan system, the system comprises:
One condition code preprocessing module that multiple condition codes can be divided into multiple feature code characters, wherein every group of condition code includes one
A or multiple condition codes;With
One signature scan module that can recognize that any condition code in a String field, described document information scan module
The String field is scanned to search the scanning engine of multigroup condition code including multiple, wherein each scanning engine is used
In searching one set of feature code, wherein different scanning engine is used in one or more scans flow line stage in friction speed
It deposits.
9. a kind of character string signature scan method, the method includes:
Multiple condition codes are divided into multiple feature code characters, wherein every group of condition code includes one or more features code;
Any condition code in a String field is identified, including being looked into multigroup scanning engine scanning String field
Multigroup condition code is looked for, wherein every group of scanning engine includes one or more identical scannings for being used to search one set of feature code
Engine to cover the position of one or more groups of String fields staggeredly, use by the wherein scanning engine of different scanning engine group
Different scanning step uses the memory of friction speed in flow line stages in one or more scan;With
Export any matched condition code in the String field.
10. a kind of character string signature scan system, the system comprises:
One condition code preprocessing module that multiple condition codes can be divided into multiple feature code characters, wherein every group of condition code includes one
A or multiple condition codes;With
One signature scan module that can recognize that any condition code in a String field, described document information scan module
The String field is scanned to search the scanning engine of multigroup condition code including multigroup, wherein every group of scanning engine packet
It includes one or more identical for searching the scanning engine of one set of feature code to cover one or more groups of characters staggeredly
The position of string field, the wherein scanning engine of different scanning engine group are with different scanning steps or in one or more scanning stream
The memory of friction speed is used in last pipeline stages.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711339378.4A CN108197470A (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410055830.4A CN103793522B (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
CN201711339378.4A CN108197470A (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
CN200880127748.0A CN101960469B (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
PCT/US2008/080457 WO2010047683A1 (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200880127748.0A Division CN101960469B (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108197470A true CN108197470A (en) | 2018-06-22 |
Family
ID=42119542
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711339378.4A Pending CN108197470A (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
CN200880127748.0A Expired - Fee Related CN101960469B (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200880127748.0A Expired - Fee Related CN101960469B (en) | 2008-10-20 | 2008-10-20 | Fast signature scan |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN108197470A (en) |
WO (1) | WO2010047683A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5970546B2 (en) * | 2012-05-31 | 2016-08-17 | 株式会社オプトエレクトロニクス | Reading apparatus, reading result output method, and program |
EP3091450B1 (en) * | 2015-05-06 | 2017-04-05 | Örjan Vestgöte | Method and system for performing binary searches |
CN105095367B (en) * | 2015-06-26 | 2018-12-28 | 北京奇虎科技有限公司 | A method and device for collecting client data |
WO2020051895A1 (en) * | 2018-09-14 | 2020-03-19 | 西门子股份公司 | Data compression method, data restoration method and device |
US12159139B2 (en) * | 2021-09-28 | 2024-12-03 | Rakuten Mobile, Inc. | Logic-gate based non-deterministic finite automata tree structure for managing communication networks |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04306785A (en) * | 1991-04-03 | 1992-10-29 | Mitsubishi Electric Corp | Pattern recognition system |
CN1459761A (en) * | 2002-05-24 | 2003-12-03 | 清华大学 | Character identification technique based on Gabor filter set |
US20050086520A1 (en) * | 2003-08-14 | 2005-04-21 | Sarang Dharmapurikar | Method and apparatus for detecting predefined signatures in packet payload using bloom filters |
CN1625121A (en) * | 2003-12-05 | 2005-06-08 | 中国科学技术大学 | A Layered Cooperative Network Virus and Malicious Code Identification Method |
CN1648901A (en) * | 2005-02-03 | 2005-08-03 | 中国科学院计算技术研究所 | Method and system for large-scale keyword matching |
CN1997011A (en) * | 2006-07-26 | 2007-07-11 | 白杰 | Data partition method and data partition device |
US20080010278A1 (en) * | 2006-07-06 | 2008-01-10 | Lukas Kencl | Substring detection system and method |
CN101165681A (en) * | 2006-10-17 | 2008-04-23 | 中兴通讯股份有限公司 | Character string matching information processing method in communication system |
CN101194256A (en) * | 2004-11-12 | 2008-06-04 | 谷歌公司 | Method and system for autocompletion for languages having ideographs and phonetic characters |
CN101243441A (en) * | 2005-06-21 | 2008-08-13 | 国际字符股份有限公司 | Method and apparatus for processing character streams |
US20080201140A1 (en) * | 2001-07-20 | 2008-08-21 | Gracenote, Inc. | Automatic identification of sound recordings |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE338988T1 (en) * | 1994-07-26 | 2006-09-15 | Siemens Energy & Automation | METHODS AND SYSTEMS FOR GENERATING AND AUTHENTICATION OF UNCHANGABLE SELF-VERIFICATION ITEMS |
WO2002091145A1 (en) * | 2001-05-08 | 2002-11-14 | Ip.Com, Inc. | Method and apparatus for collecting electronic signatures |
CN1567174A (en) * | 2003-06-09 | 2005-01-19 | 吴胜远 | Method for expressing and processing object and apparatus thereof |
CN1972292B (en) * | 2005-10-17 | 2012-09-26 | 飞塔公司 | Systems and methods for processing electronic data |
US20080005578A1 (en) * | 2006-06-29 | 2008-01-03 | Innovya Research & Development Ltd. | System and method for traceless biometric identification |
-
2008
- 2008-10-20 CN CN201711339378.4A patent/CN108197470A/en active Pending
- 2008-10-20 CN CN200880127748.0A patent/CN101960469B/en not_active Expired - Fee Related
- 2008-10-20 WO PCT/US2008/080457 patent/WO2010047683A1/en active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04306785A (en) * | 1991-04-03 | 1992-10-29 | Mitsubishi Electric Corp | Pattern recognition system |
US20080201140A1 (en) * | 2001-07-20 | 2008-08-21 | Gracenote, Inc. | Automatic identification of sound recordings |
CN1459761A (en) * | 2002-05-24 | 2003-12-03 | 清华大学 | Character identification technique based on Gabor filter set |
US20050086520A1 (en) * | 2003-08-14 | 2005-04-21 | Sarang Dharmapurikar | Method and apparatus for detecting predefined signatures in packet payload using bloom filters |
CN1625121A (en) * | 2003-12-05 | 2005-06-08 | 中国科学技术大学 | A Layered Cooperative Network Virus and Malicious Code Identification Method |
CN101194256A (en) * | 2004-11-12 | 2008-06-04 | 谷歌公司 | Method and system for autocompletion for languages having ideographs and phonetic characters |
CN1648901A (en) * | 2005-02-03 | 2005-08-03 | 中国科学院计算技术研究所 | Method and system for large-scale keyword matching |
CN101243441A (en) * | 2005-06-21 | 2008-08-13 | 国际字符股份有限公司 | Method and apparatus for processing character streams |
US20080010278A1 (en) * | 2006-07-06 | 2008-01-10 | Lukas Kencl | Substring detection system and method |
CN1997011A (en) * | 2006-07-26 | 2007-07-11 | 白杰 | Data partition method and data partition device |
CN101165681A (en) * | 2006-10-17 | 2008-04-23 | 中兴通讯股份有限公司 | Character string matching information processing method in communication system |
Non-Patent Citations (2)
Title |
---|
N. MEKUZ 等: "Adaptive Step Size Window Matching for Detection", 《 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR"06)》 * |
宋明秋 等: "入侵检测多模式匹配算法", 《计算机工程》 * |
Also Published As
Publication number | Publication date |
---|---|
CN101960469B (en) | 2014-03-26 |
WO2010047683A1 (en) | 2010-04-29 |
CN101960469A (en) | 2011-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10949641B2 (en) | Fast signature scan | |
US6892207B2 (en) | Method of updating data in a compressed data structure | |
US10169425B2 (en) | Fast identification of complex strings in a data stream | |
US9619585B2 (en) | Fast, scalable dictionary construction and maintenance | |
CN111460311A (en) | Search processing method, device and equipment based on dictionary tree and storage medium | |
US20070198566A1 (en) | Method and apparatus for efficient storage of hierarchical signal names | |
Yang et al. | Pase: Postgresql ultra-high-dimensional approximate nearest neighbor search extension | |
US20070255748A1 (en) | Method of structuring and compressing labeled trees of arbitrary degree and shape | |
US20080319987A1 (en) | System, method and program for creating index for database | |
WO2003065175A2 (en) | A system and method for real time interface translation | |
US20020093522A1 (en) | Methods of encoding and combining integer lists in a computer system, and computer software product for implementing such methods | |
CN107111623A (en) | Parallel historical search and coding for the compression based on dictionary | |
EP1352344A2 (en) | Efficient searching techniques | |
US10810258B1 (en) | Efficient graph tree based address autocomplete and autocorrection | |
US10275486B2 (en) | Multi-system segmented search processing | |
CN108197470A (en) | Fast signature scan | |
CN103793522B (en) | Fast signature scan | |
CN111813744A (en) | File searching method, device, equipment and storage medium | |
US10949465B1 (en) | Efficient graph tree based address autocomplete and autocorrection | |
US20080306948A1 (en) | String and binary data sorting | |
JP3728264B2 (en) | Index creation apparatus, search system, and control method | |
JP3062119B2 (en) | Character string search table, method for creating the same, and character string search method | |
CN104484381B (en) | For searching for the method and system of multiple strings | |
CN118245640A (en) | Multimodal data verifiable query method and system based on blockchain | |
Dorfman et al. | Black Belt Hashigana |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180622 |