US8773909B2 - CAM NAND with or function and full chip search capability - Google Patents
CAM NAND with or function and full chip search capability Download PDFInfo
- Publication number
- US8773909B2 US8773909B2 US13/957,219 US201313957219A US8773909B2 US 8773909 B2 US8773909 B2 US 8773909B2 US 201313957219 A US201313957219 A US 201313957219A US 8773909 B2 US8773909 B2 US 8773909B2
- Authority
- US
- United States
- Prior art keywords
- data
- word lines
- nand
- search
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C15/00—Digital stores in which information comprising one or more characteristic parts is written into the store and in which information is read-out by searching for one or more of these characteristic parts, i.e. associative or content-addressed stores
- G11C15/04—Digital stores in which information comprising one or more characteristic parts is written into the store and in which information is read-out by searching for one or more of these characteristic parts, i.e. associative or content-addressed stores using semiconductor elements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/56—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency
- G11C11/5621—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using storage elements with more than two stable states represented by steps, e.g. of voltage, current, phase, frequency using charge storage in a floating gate
- G11C11/5642—Sensing or reading circuits; Data output circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C15/00—Digital stores in which information comprising one or more characteristic parts is written into the store and in which information is read-out by searching for one or more of these characteristic parts, i.e. associative or content-addressed stores
- G11C15/04—Digital stores in which information comprising one or more characteristic parts is written into the store and in which information is read-out by searching for one or more of these characteristic parts, i.e. associative or content-addressed stores using semiconductor elements
- G11C15/046—Digital stores in which information comprising one or more characteristic parts is written into the store and in which information is read-out by searching for one or more of these characteristic parts, i.e. associative or content-addressed stores using semiconductor elements using non-volatile storage elements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C16/00—Erasable programmable read-only memories
- G11C16/02—Erasable programmable read-only memories electrically programmable
- G11C16/04—Erasable programmable read-only memories electrically programmable using variable threshold transistors, e.g. FAMOS
- G11C16/0483—Erasable programmable read-only memories electrically programmable using variable threshold transistors, e.g. FAMOS comprising cells having several storage transistors connected in series
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C16/00—Erasable programmable read-only memories
- G11C16/02—Erasable programmable read-only memories electrically programmable
- G11C16/06—Auxiliary circuits, e.g. for writing into memory
- G11C16/22—Safety or protection circuits preventing unauthorised or accidental access to memory cells
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C16/00—Erasable programmable read-only memories
- G11C16/02—Erasable programmable read-only memories electrically programmable
- G11C16/06—Auxiliary circuits, e.g. for writing into memory
- G11C16/34—Determination of programming status, e.g. threshold voltage, overprogramming or underprogramming, retention
- G11C16/3436—Arrangements for verifying correct programming or erasure
- G11C16/3454—Arrangements for verifying correct programming or for detecting overprogrammed cells
- G11C16/3459—Circuits or methods to verify correct programming of nonvolatile memory cells
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C2213/00—Indexing scheme relating to G11C13/00 for features not covered by this group
- G11C2213/70—Resistive array aspects
- G11C2213/75—Array having a NAND structure comprising, for example, memory cells in series or memory elements in series, a memory element being a memory cell in parallel with an access transistor
Definitions
- This invention relates generally to non-volatile memories and, more specifically, to using non-volatile memory of a NAND-type architecture in operations related to content addressable memory (CAM).
- CAM content addressable memory
- Content addressable memories also known as associative memories, are different from standard memories in the way that data is addressed and retrieved.
- a conventional memory an address is supplied and the data located at this specified address is retrieved.
- a content addressable memory CAM
- data is written as a key-data pair.
- search key is supplied and all the keys in the memory are searched for a match. If a match is found, the corresponding data is retrieved.
- CAMs can be implemented in several ways.
- a CAM is implemented using a conventional memory and an associated CPU which searches through the memory to find a matching key.
- the keys in the memory may be sorted, in which case a binary search can be used; or they can be unsorted, in which case they are usually hashed into buckets and each bucket is searched linearly.
- a CAM can also be implemented as a semiconductor memory, where every memory location contains an n-bit comparator. When an n-bit key is provided, each entry in the CAM will compare the search key with the entry's key, and signal a match if the two are equal.
- a memory circuit has an array of non-volatile memory cells arranged along a plurality of word lines and a plurality of M bit lines into a NAND type of architecture, the array formed of multiple blocks, each of the blocks including a plurality of M NAND strings connected along a corresponding one of the M bit lines and each having a plurality of N memory cells connected in series with a plurality of N word lines spanning the M NAND strings, each of the N word lines connected to a corresponding one of the N memory cells of the NAND strings of the block.
- the circuit also includes word line driving circuitry connectable to the word lines, whereby one or more word lines in a plurality of blocks can be concurrently and individually be set to one of a plurality of data dependent read values corresponding to a respective data pattern for each of the plurality of blocks.
- Sensing circuitry is connectable to the M bit lines individually determine those of the M bit lines where at least one of the NAND strings connected there along are conducting in response the word line driving circuitry applying said respective data patterns to the corresponding plurality of blocks.
- Additional aspects relate to a method of operating a memory system, where the memory system includes a memory circuit an array of non-volatile memory cells arranged into a plurality of blocks, each of the blocks including a first plurality NAND strings and a plurality word lines spanning the NAND strings, each of the word lines connected to a corresponding one of the memory cells thereof, where a first plurality of bit lines span the blocks with each of the blocks having one of the NAND strings thereof connected along a corresponding one of the bit lines.
- the method includes receiving for each of a first plurality of the blocks a corresponding first search data pattern from a host device to which the memory system is connected.
- One or more word lines of the first plurality of blocks are concurrently biased according to the corresponding first search data patterns.
- the method concurrently determines those of the bit lines that conduct in response to the one or more word lines of the first plurality of blocks being concurrently biased according to the first corresponding first data search patterns.
- the performing of a first determination includes: biasing the corresponding first set of the word lines according to the first sub-pattern; and concurrently determining those of the NAND strings that conduct in response to the corresponding first set of the word lines biased according to the first sub-pattern being applied thereto.
- the performing of a performing a second determination includes: biasing the corresponding second set of the word lines according to the second sub-pattern; and concurrently determining those of the NAND strings that conduct in response to the corresponding second set of the word lines biased according to the second sub-pattern being applied thereto.
- the method subsequently combines the results of the first and second determinations to determine those of the NAND strings that conduct in response to the word lines biased according to the first and second sub-patterns being applied thereto.
- aspects of a particular set of embodiments concern a method of operating a memory system, the memory system including a memory circuit having an array of non-volatile memory cells arranged into a plurality of blocks, each of the blocks including a first plurality NAND strings and a plurality word lines spanning the NAND strings, each of the word lines connected to a corresponding one of the memory cells thereof, where a first plurality of bit lines span the blocks with each of the blocks having one of the NAND strings thereof connected along a corresponding one of the bit lines.
- the method include storing a plurality of data patterns each along a bit line of the memory array, where each of the data patterns are stored in inverted and non-inverted forms along the same bit line, the bits of the non-inverted form of the data patterns being stored along a first set of word lines and the bits of the inverted form of the data patterns being stored along a second set of word lines.
- the memory system receives a search pattern and performs a determination of whether any of the stored data patterns match the search pattern.
- the determination includes applying the search pattern to the first set of word lines and applying the search pattern in inverted form to the second set of word lines, wherein the search pattern and inverted form of the search pattern are applied in a plurality of sensing operations in which a plurality of non-adjacent word lines for one or more blocks are concurrently biased to a data dependent voltage level based upon the search pattern.
- aspects of another set of embodiments concern a method of operating a memory system, the memory system including a memory circuit having an array of non-volatile memory cells arranged into a plurality of blocks, each of the blocks including a first plurality NAND strings and a plurality word lines spanning the NAND strings, each of the word lines connected to a corresponding one of the memory cells thereof, where a first plurality of bit lines span the blocks with each of the blocks having one of the NAND strings thereof connected along a corresponding one of the bit lines.
- the method includes storing a plurality of data patterns each along a bit line of the memory array, where each of the data patterns a first copy and a second copy are stored along the same bit line, the bits of the first copy being stored along a first set of word lines and the bits of the second copy being stored along a second set of word lines.
- the memory system receives a data search query and performs a search operation on the stored data patterns based upon the data search query.
- the search operation includes concurrently biasing first and second word lines respectively from the first and second sets of word lines to a voltage level based upon the data search query, where the first and second word lines correspond to the same bit of the data patterns respectively from the first and second copies thereof.
- a method is presented of operating a memory system including a memory circuit having an array of non-volatile formed according to a NAND type of architecture where each of a first plurality of data patterns is stored along one of a corresponding first plurality of NAND strings of the memory array.
- a first bloom filter corresponding to first plurality of data patterns can also be stored on the memory circuit.
- An ECC protected copy of each of the first plurality of data patterns is also stored on the memory circuit.
- a search operation is then performed, where the search operation includes: receiving a search pattern; performing a comparison of the search pattern against the first bloom filter; in response to a positive result from the comparison of the search pattern against the first bloom filter, performing a determination of which of the first plurality of NAND strings conduct in response to the memory cells thereof being biased according to the search pattern; and in response to a negative result from the determination, searching the copies of the first plurality of data patterns for a match with the search pattern.
- Other aspects concern a method of operating a memory circuit including an array of non-volatile memory cells having a NAND type of architecture and include receiving one or more indices and writing the one or more indices along a corresponding one or more NAND strings of a first block of the memory array. Subsequently, a plurality of word lines of the array are individually read back to determine the value of the corresponding bit stored there along for each of the indices. For each of the word lines read back, a comparison is performed of the value of the corresponding bit stored there along with the value of the corresponding bit as received for each of the indices and, based upon the comparisons, the method determines whether any of the bits as read back contain error. The error determined based upon the comparisons for the plurality of word lines is individually accumulated for each NAND string to determine whether the bits of the index stored along the corresponding NAND string contain one or more errors.
- FIG. 1 is a schematic representation of a NAND array used as a CAM memory.
- FIG. 2 is a schematic illustration of the network of some of the elements to supply the word line in a NAND array for conventional operation.
- FIG. 3 is a schematic illustration of the network of some of the elements to supply the word line in a NAND array for CAM operation.
- FIG. 4 shows one embodiment for how keys can be written along bit lines of an NAND array and searched.
- FIG. 5 given some detail on how a key/inverse pair from FIG. 4 is programmed into a pair of NAND strings.
- FIGS. 6A-C shows another embodiment for how keys can be written along bit lines of an NAND array and searched.
- FIG. 7 shows an exemplary encoding of 2-bits per cells for four state memory cell operation.
- FIG. 8 shows how the data states and the complementary data used for the inverted keys correspond in the 2-bit per cell example.
- FIG. 9 shows an example of how a key would be encoded onto a 4 cell NAND string on bit line BL and its inverse on bit line BLB.
- FIG. 10 illustrates the process of matching of content in word line direction.
- FIG. 11 illustrates how the position of a conducting bit line can be used as an index in to another table that can be used to retrieve data associated with the target key.
- FIG. 12 schematically illustrates how a key-value pair is stored in a NAND based CAM and how the value is accessed using the key.
- FIG. 13 illustrates a memory arrangement for transposing the data keys.
- FIG. 14 represents a first hardware embodiment for transposing data using a FIFO-type structure.
- FIG. 15 represents another hardware embodiment for transposing data.
- FIG. 16 shows one embodiment of a memory system incorporating a CAM type NAND into a solid state drive (SSD) for performing data analytic within the memory system.
- SSD solid state drive
- FIG. 17 illustrates how data analytics with numerical range detection can be performed by exploiting an array's NAND structure.
- FIG. 18 is an example of data latch assignments for the process illustrated by FIG. 17 .
- FIGS. 19 and 20 illustrate some steps of two search processes.
- FIGS. 21 and 22 illustrate a maximum and a minimum search operation.
- FIGS. 23 and 24 respectively give a schematic representation of an on-chip arithmetical operation and a corresponding latch utilization.
- FIGS. 25A-C illustrate some detail of how arithmetic operations can be performed.
- FIGS. 26A and 26B show how more latches can be used to perform arithmetic operations involving more n
- FIGS. 27 and 28 illustrate an application to financial data analysis.
- FIGS. 29-31 show some examples of how a data set can placed on more than on NAND string and corresponding latch structures.
- FIGS. 32 and 33 respectively illustrate digital and analog counting techniques for analytics results.
- FIG. 34 gives an example of file mapping for performing analytics on large file systems.
- FIG. 35 illustrates a conventional architecture for server CPU based data analytics.
- FIG. 36 illustrates an architecture using computational NAND in data analytics.
- FIG. 37 shows a more detailed hardware architecture for a part of the data analytics system.
- FIGS. 38A-D illustrate various aspects related performing to a “price summary report” query.
- FIGS. 39A and 39B is a schematic representation of the division of tasks between the controller and NAND memory for the query of FIGS. 38A-D .
- FIGS. 40A-D illustrate various aspects related performing to a “minimum cost supplier” query.
- FIGS. 41A and 41B is a schematic representation of the division of tasks between the controller and NAND memory for the query of FIGS. 40A-D .
- FIG. 42 illustrates the data structure using column and row storage.
- FIG. 43 presents the relationship between the building blocks in the analytic system.
- FIG. 44 is a schematic representation of a server based map reduce data analysis and how this can be moved to an CAM NAND based analytic system.
- FIG. 45A illustrates a hardware architecture for a Hadoop arrangement and how CAM NAND can be incorporated.
- FIG. 45B compares performance versus storage capacity for NAND based computing with CPU based computing.
- FIG. 46 illustrates the use of CAM NAND memory based analytics for the sort and merge of un-structures data.
- FIG. 47 schematically illustrates an example of when the sort is partially done in the SSD ASIC.
- FIGS. 48-54 provides examples of analytic operations performed the CAM NAND based data analytics system.
- FIG. 55 illustrates the OR function and how this can be used to add functionality to the CAM NAND structure and enable whole chip search functions.
- FIG. 56 illustrates an example of doing a full chip search simultaneously applying the same search key to all blocks.
- FIG. 57 schematically illustrates a binary search to determine the block to which conducting NAND string.
- FIG. 58 illustrates an example of using the inherent OR functionality between different searches.
- FIGS. 59A and 59B illustrate an example of doing a JOIN using AND function between data registers.
- FIG. 60 shows an example of a JOIN operation.
- FIG. 61 illustrates two blocks of NAND memory, with a set of data written in the first block and the inverse of the data written in the second block.
- FIG. 62 looks at the possible set of results from sensing operation on the data arrangement of FIG. 61 .
- FIG. 63 illustrates a shift in the read levels that can be used for the data/data bar arrangement.
- FIG. 64 shows the writing of duplicated data into two different blocks.
- FIG. 65 illustrates a shift in the read levels that can be used for the duplicated data arrangement.
- FIG. 66 illustrate the conditions under which a memory is verified during a write operation.
- FIG. 67 looks at the word line to word line effect of simultaneously multiple word lines.
- FIG. 68 looks at avoiding of the distribution shift up due to adjacent word line voltage bias.
- FIG. 69 illustrates some of the steps of data manipulations for 64 bits index that is to be stored as data/data bar.
- FIG. 70 illustrates the effects of word line to word line coupling with duplicated data arranged in different block.
- FIG. 71 illustrates multi-block sensing example.
- FIG. 72 is a table to illustrate the OR-ing of read results with duplicate data.
- FIG. 73 illustrates some of the steps of data manipulations for 64 bits index that is to be stored as duplicated data.
- FIG. 74 summarizes some of the different cases of multiple word line/multiple block sensing.
- FIG. 75 is a schematic representation of including a bloom filter to improve error protection.
- FIGS. 76A and B illustrate index and meta-data programming flows.
- FIGS. 77A-D looks at several example of index-meta-data link cases.
- a Flash based NAND memory array as a content addressable memory (CAM) that can be realized in both binary and ternary embodiments.
- keys can be programmed along the bit lines of a block.
- the search key is then input along the word lines of the blocks, so that a bit line on which a corresponding key has been programmed will be conducting. This allows for all the keys of a block to be checked at the same time.
- the typical way by which a NAND memory array is read is that data is read out a single word line (or portion of a word line) at a time, with the non-selected word lines along the NAND strings being biased so that they are fully turned on regardless of the data state, removing the non-selected memory from affecting the read operation. In this way, the data content of the memory is read out a page (the unit of read) at a time.
- all of the word lines are set to a specific data dependent value, where the data is the key, and the memory determines which bit lines then conduct, thereby determining particular bit lines correspond to the input key, rather that the data of individual cells.
- EEPROM based flash memory As the exemplary embodiment, but other memory devices having a NAND type of architecture, including 3D NAND (such as described in T. Maeda et al., “Multi-stacked 1G cell/layer Pipe-shaped BiCS flash memory”, 2009 Symposium on VLSI Circuits, pages 22-23) for example, can also be used.
- 3D NAND such as described in T. Maeda et al., “Multi-stacked 1G cell/layer Pipe-shaped BiCS flash memory”, 2009 Symposium on VLSI Circuits, pages 22-23) for example, can also be used.
- each cell in a write operation each cell is either left in an erased state or charge is placed on the cell's floating gate to put the cell in a programmed state, which here are respectively taken as the 1 and 0 states.
- a low value for the read voltage is applied to its control gate, only a cell in the erased, or 1, state will conduct.
- a high value of the read voltage needs to be applied to the control gate for a cell to conduct.
- the keys will be arranged along bit lines of a block of the memory array. Since a cell in the 1 state will conduct for either read voltage, each key needs to be written twice, in inverted and non-inverted form.
- this can be done by writing the target key along one bit line and its inverse along another, or writing half the bit line with the (non-inverted) target key and the other half of the bit line with the inverted target key.
- More key info can be compressed into the NAND chain using multiple bits programming. For example, in a 2-3 bits per cell case, the key can be sorted in the controller RAM and the bits will be programmed as lower, (middle) or upper pages. The following discussion will mostly be given in terms of a binary embodiment, with some specifics of the multi-state case are discussed later.
- Target keys Key 0, Key 1, . . . are programmed down bit lines BL0, BL1, . . . of a NAND block. Data is programmed in a separate location that can be indexed by the target key's column address number.
- the search key is broadcasted on the block's word lines by setting all of the word lines according to either the high or low read voltage according to the search key. (In addition to setting the word line voltages according to the key, the select gates at the end of the NAND string will also need to be turned on.)
- Each BL effectively compares itself to the WL key pattern for all of the bit lines in the block at the same time.
- bit line key matches the search key, the whole of the bit line will be conducting and a “1” will be read out.
- the key can be the hash code of the data page that will lead to the right data page by the column address of the matched NAND chain.
- each 16 KB, say, of content can generate a corresponding hash code that can be stored along the NAND chain.
- the data page will be compared with the comparing data along the word line to avoid hash collision cases.
- the content along the word line may not be a hash value, but characteristics of the data elements that can be searched as a keys to data; or the bits lines themselves main be the elements of the data themselves, rather than a pointer to a data base.
- NAND flash memory has relatively large latencies in its operation, but in many applications this would more than be offset by the number of keys (bit lines) that can be checked in parallel (128K, for example).
- the process can all be done on-chip and, as only the bit lines that meet the matching case conducting current, with relatively low power consumption, so that compared to toggling out all of the data from the memory and doing the compare in the controller, it is a process of relatively low power and higher speed.
- an exemplary embodiment can be based on a flash memory where the indices are saved on the 128 Gb NAND chains.
- An all bit line (ABL) architecture is used where one sensing operations will perform a match operation on all of the indices on a block at the same time.
- Extra column redundancy is included to avoid any bad columns (more detail on such redundancy and the accessing of columns, as well as flash memory in general, can be found in the following US patent publication/application numbers: US-2005-0141387-A1; US-2008-0266957-A1; US-2011-0002169-A1; US-2010-0329007-A1; 13/463,422; and 13/420,961.)
- partial page programming can be used to write part of the keys, with more added later.
- Such partial page programming is typically more limited for multi-states implementations than in binary blocks.
- the data can be shifted on to the memory and the inverted data can be generated on the memory to save effort on the controller for these data manipulations, where the data and data bar can be written without shifting in the data twice, with the data being written first, and the generated inverse next.
- Both the keys and the data can be input into the memory system, or in some cases the keys could be generated on the memory system by the controller from the data, such as by generating hash values from the data to use as keys.
- the keys are to be sorted before being written along the bit lines, this will typically be done on the controller due to the amount of data involved, such as multiple blocks' worth of data.
- the data could initially be written in a particular area, say die 0, plane 0, blocks 0-15, and then sorted and written into the blocks having been sorted to the block level.
- the keys could be assembled in RAM (either on the controller or on a separate chip) or cache NAND memory (such as described in U.S. provisional application No. 61/713,038) before sorting them to the desired level of granularity and writing them into a set of blocks.
- the data/data bar pairs can be written on two bits lines or on a single bit line.
- the pairs can be written next to each other or in other patterns, such as writing the data bit lines in one area and the inverted data bit lines in another zone.
- both parts of the pair on written on the same bit line as discussed below with respect to FIG. 6A , they can be written in a top/bottom format or interleaved.
- the matched index can then be linked to other data corresponding to the determined column address; for instance, the keys could be a hash value, such as from a Secure Hash Algorithm (SHA), used to point to the actual data that can also be stored elsewhere on the memory itself.
- All the matching can be done inside of the NAND chip and, when the match is found, the column address can also be transferred out if needed or just the data, if also stored on the NAND chip, can be transferred out.
- SHA Secure Hash Algorithm
- each word line of the block needs to be set to either the high or low read voltage according to the search key. This is in contrast to typical NAND operation, where only a single word line at a time is selected for a read voltage, with all of the other word lines receiving a pass voltage sufficient to remove them from influencing the sensing regardless of their data state.
- FIG. 2 is a schematic illustration of the network of some of the elements to supply the word line in a NAND array for conventional operation.
- At 201 is the cell array for a plane of a NAND chip, with two blocks explicitly marked out at 203 and 205 .
- Each block's word lines are feed by a word line select gate WLSW 213 or 215 as controlled from select circuitry at 217 .
- the bit lines are not indicated, but would run down to the sense amp block S/A 207 .
- the various control gate voltage CGI are then supplied to the select gates 213 and 215 from the drivers CG drivers 231 and UCG drivers 233 and 235 by way of switches 223 and 225 , respectively.
- a block is taken to have 132 word lines, where a pair of dummy word lines are included on both the drain and source sides of the NAND strings.
- the UCG Drivers 233 and 235 are for supplying the pass voltages used on unselected word lines during program, (standard, non-CAM) read or verify operations. As this level is used on the large majority of word lines, these can be lumped together for a single driver.
- the selected control gates are biased to VPGM at program, CGR voltage at read or verify.
- CGI ⁇ 126:1> is the decoded global CG lines.
- CGI ⁇ 0> and CGI ⁇ 127> that are here biased differently from other 126 word lines due to edge word line effects.
- the dummy word line bias CGD0/1 is for the drain side dummy word lines and CGDS0/1 is for the source side ones.
- the array 301 , blocks 303 and 305 , select circuitry 317 , CG Drivers 331 , and switches 313 and 315 can be the same as in FIG. 2 .
- the additional word line drivers are shown at 343 and 345 and can supply the word lines through respective switches at 353 and 355 .
- the level shifter HVLSHIFT receives the voltage VREAD and a digital value DFF(0/1) for each word line. The level shifter then converts the digital values of 0, 1 for the broadcast key to the analog high and low word line levels.
- the other circuit sketched out in FIG. 2 will still be present, though not shown in FIG. 3 to simplify the discussion. It may also be preferable to make some changes to the sensing circuitry S/A 307 to more efficiently perform the XOR operation described below between the pairs of bit lines holding a key and its inverse.
- FIG. 4 shows the encoding of the keys along bit lines, where the key is entered twice, in non-inverted and inverted form.
- bit lines are labeled BL for the non-inverted key and BLB for the inverted version.
- the pairs are shown as being adjacent, although this need not be the case, but will typically make XOR-ing and keeping track of data easier. Also, this arrangement readily lends itself to NAND arrays using an odd/even BL arrangement.
- BL1 bit lines
- bit line For the defective bit lines, the bit line either stuck “0” or stuck “1” regardless of the word line voltage bias.
- the XOR results between the two read results will always yield “1”.
- the BL and BLB data pattern will eliminate the defected bit lines from yielding match results mistakenly. In this example, only seven word lines are used.
- a more interesting key of (1001101) is entered on BLn+1, with its inverted version at BLBn+1, as also illustrated in FIG. 5 .
- FIG. 5 shows the two corresponding NAND strings, where 0 is a programmed cell, 1 a cell left in its erased state, the cells being connected in series down the NAND strings to the common source line CELSRC.
- This key is encoded as low read voltage for the 0 entries and high read voltage for the 1s.
- the search key is shown at the left of the top of FIG. 5 .
- BLn+1 is conducting (and BLBn+1 is non-conducting), as shown by the “c” (and “nc”) in the sense 1 row.
- BL1 and BLBn are also both conducting, as a cell in the 1 state will conduct for either read value.
- the second sensing (these can be performed in either order) is then made with the search reversed.
- BL1 and BLBn are still conducting, the result from the key actually sought has changed: BLn+1 is now non-conducting and BLBn+1 conducts.
- the sought key will give a 0 on the corresponding bit line and also on its inverted version. Consequently, by searching for the 00 pattern in the XOR data, the output column address can be found and the corresponding data block accessed.
- two reads are needed for the pattern match and internal pattern detection on the NAND device can judge if there is a match.
- the redundancy of the BL/BLB pairs provides redundancy to help protect from bad bit lines, but a second pair can also be kept for further protection.
- a copy of the key can also be kept with any associated data and used to check the match, where this copy can be ECC protected. Additional protection can also be provided by each bit line including several (8, for example) parity bits, for error detection and correction purposes, where the redundancy bit are preferable along the same bit lines for all of the keys so that these parity bits can either be read or taken out to the comparisons by use of a “don't care” value applied to these word lines, as described below.
- the data can be read when checking when checking the data, as either part of a post-write read or other data integrity check, but ignored during CAM-type operations.
- a post-write read can be used to insure that the keys have been successfully written into the NAND memory, as any error bits could prevent a NAND string from conducting and would give rise to “false negatives” when matching. If an error is found, the bad data can be rewritten. In the exemplary NAND flash example, the incorrectly written data can rewritten to another data block and any key-data correspondences updated accordingly. More detail on post-write read operations can be found in U.S. patent application Ser. No. 13/332,780 and references cited therein.
- FIG. 6A illustrates a second embodiment for how the key can be stored along a bit line.
- both the key and its inverse are written onto the same bit line.
- FIG. 6A this shows 14 different word lines with the keys entered in the top half and the inverted versions of these same keys entered in inverted form in the bottom half of the same bit line.
- rows 1-7 hold a 7 bit key
- rows 8-14 the inverted version of the same key.
- the top and bottom halves represent 14 different word lines where the top-bottom division is the key/inverted key boundary, whereas in FIG. 4 , the top and bottom are the same seven word lines repeated twice for two different sensing operations.
- the keys shown in FIG. 6A are the same as in FIG. 4 , with the bit line of D7 holding the sought for key in the top half and its inverse in the bottom half, and D8 holding the inverted key so that these two halves are switched.
- the search pattern is then broadcast on the top half word lines and its inverse on the bottom half word lines. Any bit lines with a matching keys, in this case D7, will then conduct, as shown at bottom where “nc” is non-conducting and “c” conducting. If redundancy is desired, the non-inverted version can also be programmed in as at D8 and then detected by broadcasting the non-inverted search key, and the bit lines reads searched for a 11 pattern, which can then be output as a data pointer. If further redundancy is wanted, the key or key/inverse pair can be written into the array a second time and parity bits can also be included, much the same way as discussed for the embodiments based on FIG. 4 .
- the defective bit line should be isolated with isolation latch and not used. If some defect shows up as a stuck “0”, it can potentially generate the “false” match. In this case, the data content should be compared in order to confirm whether this is a real match or a false match. The other most common reliability issue is that some cells may have lost some charges after some time, which will also produce a “false” match. Then a content match check will eliminate the “false” match error.
- the word line voltage bias can be budgeted a little higher to avoid “missing” a match, which is very harmful error. A “false” match can be double checked with the content check.
- FIG. 6B schematically illustrates the key/inverse pairs along NAND strings.
- Two strings are shown (for bit lines BLn and BLm) each having a drain and source select gate (SOD, SGS) on either end, where the source ends are then connected along the source line CELSRC.
- SOD, SGS drain and source select gate
- the stings has cell capacity to hold a 48 bit key, its 48 bit inverse, and some parity bits.
- each of the key bits can be followed it inverse in the next word line as, when programming, this allows for a page to loading in and written, after which the programming data can be inverted in the latches and written into the next word line.
- the parity bits can also be variously located along the NAND string, although having them grouped can lead to easier decoding when searching the keys.
- bit lines BLn and BLm show a portion of a key along four adjacent word lines and the corresponding four adjacent word lines holding the inverse.
- the word lines are then biased according to the search key, where the high sensing voltage used to checking for “0” values and the low sensing voltage to check for ‘1” values.
- the high value is here taken as VREAD, and can be the same used in a typical NAND memory for non-selected word lines, and the low sensing values is labeled as V0.
- the select gates will also need to be on and VREAD should also be applied to the word lines holding parity bits as these as used for data integrity checks and are not meant factor into key search operations.
- the memory can shift the sensing margins to favor “false” matches rather than misses.
- the programming parameters can be shifter relative to those typically used.
- the “false” matches can be examined by the data check later to help remove any false positives.
- a duplicated key can be used to check for preventing error, where these duplicates can be stored on other NAND strings, with the associated data, or other locations on the system. Relative to a standard NAND memory, this arrangement will need to add extra circuitry, as described with respect to FIGS. 2 and 3 .
- a partial key can be searched, allowing the full key/inverse matching to be done incrementally. This can allows for the less independently settable word line levels, resulting in less circuitry changes relative to a standard NAND memory, but it can require some logic changes.
- the full key/inverse can be searched sequentially, where each subsequent sensing will be judged based on previous sensing results. For the example of FIG.
- a partially key check of, say 24 bits at a time can be done: if no matches are found, the process can move on to any other blocks holding keys; if a match is found, a second partial key can be checked, and so on.
- the subsequent checks can either do all of the NAND string again and compare the results of the partial searches, or only check those which have conducted in the previous partial key matches.
- FIG. 6C illustrated such a partial key comparison, where only 24 bits of the 48 bits in the key are being checked. The other bits of the key and its inverse are then set to the “don't care” value, as shown at the corresponding bits of the inverse that are set at VREAD.
- a block with 128 word lines can hold 64 bit keys, while 128 bit keys would need blocks of 256 word lines.
- key/inverted keys are here shown as being written respectively into the top half/bottom half of the word lines. More generally, the keys and inverse pairs could be interleaved in any desired fashion, as long as it was consistent for all of the keys in the block; however, this would require keeping track of the arrangement.
- the interleaved pattern along the NAND chain may be preferred since the data can be inversely program in another WL without loading the data again.
- the method uses the inherent “AND” functionality available in a NAND Flash memory to compare thousands of keys in a single sensing operation.
- keys can be stored in a CAM as either sorted, in which case a binary search can be used; or as unsorted, in which case a linear search is used.
- NAND based CAM the keys need only be sorted to the granularity of the block or the number of blocks that are sensed in parallel.
- the CAM allows for a binary search, but at the block level due to this parallelism. Even for linear searches, this degree of parallelism can make linear searching comparable or even faster than binary searches for fairly large data sets. Again, for any of these arrangements, performance here can also be improved by running multiple die in parallel.
- the keys can be sorted based on a given number of most (or least) significant bits.
- a sorting based on significant bits is generally most useful when the key or content being searched is not a hash value, but a set of characteristics or data itself. In this case, the sorted data in each block would all share a certain number of most significant bits for their keys.
- Content addressable memory exist in both binary form, where the search key consists of 0s and 1s as described above, and ternary form, where the search key can also include “don't care” value.
- the search key can also include “don't care” value.
- “don't care” value when a high read value is broadcast along a word line, all of the cells along that word line will conduct regardless of its state. This property allows for a “don't care” value to be implemented by setting the corresponding word line to the high read voltage for both the key and its inverse; that is, when sensing with the key and its inverse (in either the second read of FIG. 4 , or the lower half of the word lines), the don't care values are set to the high read value for both the key and its inverse, while the other values of the key are inverted as before.
- NAND based CAM also make it particularly suited to a number of other uses. For instance, as large numbers of keys can be searched in parallel, this allows for all copies of the same key in the searched blocks to be determined in the process, improving efficiency of de-duplication operations of the sort that are valuable in cleaning up data bases.
- the NAND structure also makes for a CAM useful as a Bloom filter as an intersection of multiple search keys can be formed by setting any values that differ between the keys to the high read voltage in the combined search key, which can then be used to search the horizontally stored keys of one or more blocks in parallel. (More detail on using bloom filter with the sort of memory can, be found in a US patent application entitled “Data Search Using Bloom Filters and NAND Based Content Addressable Memory” by Steven Sprouse and Yan Li filed on Mar. 14, 2013.)
- Don't care values can also be used to perform a type of “iterative” search. This can be used the keys may have, or possibly have, some number of bit errors.
- a series of reduced search keys can be employed is where the content is itself a data set, as opposed to say a hash value.
- the content of the block could be searched to a desired number of significant bits, by setting bits of lower significance to “don't care”. Similar arrangement could also be used for patterning matching of the content or for cases where the keys are properties of main data content.
- NAND based CAM can be used in many applications, such as data base searching, voice recognition, DNA matching/genome searches, cryptography and so on. It can lend itself to CAM based indexing and can be incorporated, for example into CAM indexed SSD systems.
- multi-state (MLC) memory can also be used; for example, in a mixed binary-MLC memory, the keys could be stored in binary memory for CAM use, while data to which the keys pointed could be stored in MLC areas. It is also possible to use MLC NAND memory for CAM, using 2 to 3 bits per cell, for example, in key matching. Using 2 to 3 bits per cell, the NAND chain can store longer keys. In the sort of embodiment described with respect to FIG.
- a 128 cell NAND chain in binary operation can store 64 bit keys, while a 128 NAND chain with 2-bits per cell can store 128 bits keys. Similarly, 3-bits per cell operation can store 192 bit keys.
- FIG. 7 shows an exemplary encoding of 2-bits per cells for four state memory cell operation.
- the erased state is encoded as 11, the first state up (or “a” state) is 10, followed by 00 (for the “b” state) and 01 (or “c” state).
- the various sensing levels are also shown.
- FIG. 8 shows how the data states and the complementary data used for the inverted keys correspond.
- FIG. 9 shows an example of how a key ( ⁇ 00111001 ⁇ ) would be encoded onto a 4 cell NAND string on bit line BL and its complement on bit line BLB.
- the system can use one or two word lines along the NAND chains to store the parity bits of each NAND chain in order to check on the integrity of the NAND chain.
- manufacture defective columns can be isolated out and more redundancy along the word lines (duplicated data) can further protect the keys' integrity.
- the complementary data shifted as illustrated in the figures to provide more sensing margins.
- keys were written down the bit lines of the array, with the search key broadcast along the word lines, allowing the keys along a block's bit lines to be searched in parallel.
- the arrangement can also be reversed, where NAND array can also be operated so that the content or key matching is in the word line direction.
- one or more keys would be written along each word line (that can be very long keys), an arrangement that can be useful in several different circumstances. Multiple short keys can be stored along the word line direction as well. If the keys are encoded so as to have significance as a 2D array of values, this would allow for content searching in both of the bit line and word line directions, although the more typical situation would just be for content matching in the word line direction.
- a word line based CAM allows for the use of longer keys. Also, as data is written in page along word lines, it may be more convenient, at least initially, to write incoming key data along word lines. This would then allow for key to be searched as written along the word lines. If desired, the keys could then be rewritten along bit lines, where they could then be searched as described above.
- the process of matching of content in word line direction is illustrated with respect to FIG. 10 .
- keys these can be formed into pages of one or more keys and written into the memory array 901 along word lines.
- the system inputs the matching content of one or more search keys into a matching buffer or register 905 , which can then be used to look for duplication content along the word line.
- the data along a word line is read from memory array 901 into a buffer or register 903 .
- the memory can then perform internal match operations between the read data in buffer 903 and search data in buffer 905 , where some number of bits ignored, if desired.
- the ignored bits can either be to “don't care” values, because some read error can occur on the read.
- the smallest length of key/content along the word line that can be compared is 1 KB, while the longest length of key/content that can be compared in one plane is 16 KB. If the key length is smaller than 1 KB, the key can be duplicated in chunks patterns to do the pattern matching with more parallelism. Then the matched case will produce a group of “1” and the un-matched case will produce 50% “1”s. Circuitry can detect if a word is all “1”s to judge the match or miss. If there are some “0”s in a word, this word can be discarded as a miss.
- a majority voting circuitry can be employed to choose the word with majority “1”s for matching. Some words can be masked out by marking the isolation latch to be “ignored”. To simplify operations, it is typically preferable to write the beginning of a file to aligned with certain columns. After finishing a compare on one word line, the next word line content can be compared in a similar sequence.
- NAND Flash content addressable memory CAM
- CAS content addressable storage
- Conventional storage drives such as solid state dives or hard-disk drives (SSD or HDD)
- LBA logical block address
- These employ logical to physical address translation tables to locate the data, where the address translation table is stored on flash, in DRAM, or on magnetic media and is updated on the basis of sectors, bytes, or pages.
- Typical sizes for such addresses are 32, 48, or 64-bits.
- keys of hundreds or thousands of bits
- a content addressable memory utilizing key-value pairs is used to index the elements stored in the device.
- a content addressable memory In a content addressable memory, data is written as a key-data pair. To retrieve the data, a search key is supplied; all the keys in the memory are searched for a match. If a match is found, the corresponding data is retrieved.
- This section presents a storage drive using a Flash based NAND array as described in the preceding section as a content addressable memory that is addressed using key-value pairs instead of a logical block address.
- This drive can provide both Binary and Ternary search capability, meaning that bit patterns in the key can have the values 1 or 0 as well as “don't care” entries.
- This type of NAND based CAS drive can then be used to replace other implementations of CAM or CAS functionality, such as those employing a database, that would usually include a host CPU, DRAM, and storage media.
- this section applies the of operation of a NAND flash memory as a pattern matching engine from the last section to a storage device that is indexed using key-value pairs instead of conventional logical block addresses.
- the device can use a standard transport protocol such as PCI-E, SAS, SATA, eMMC, SCSI, and so on.
- the NAND cells When used in a pattern matching mode, the NAND cells not only store values, but can also be used to compare their stored values with an input value.
- target patterns are stored along bit lines, although the word line based storage discussed above can also be used. In the bit line example, the pattern to be matched is broadcast down word lines. If all the elements in the NAND chain match their target pattern, the NAND chain (bit line) will conduct. The position of the conducting bit line can be used as an index in to another table that can be used to retrieve data that is associated with the target key. This is shown in FIG. 11 , which expands upon FIG. 1 .
- bit lines BL0, BL1, . . . run down the columns of the array and are written with corresponding keys Key 0, Key 1, as previously described.
- the word lines are then biased according to the search key (here Key 2) so that it is broad to all of the bit lines spanned by the word lines.
- the column address of the bit line is then input as an index to find the data set, also stored on the drive.
- the keys could be stored in binary or MLC arrays optimized for CAM use, while the data is stored in more standard MLC arrays.
- a drive using such a mechanism can then be used to search for key-value pairs in a large search space, perform general pattern matching (using bloom filters), or be used for determining set membership.
- Some of the advantages of a drive using such a scheme include low power usage and high bandwidth. As data does not need to be moved from the NAND array to a separate computational module for comparison, power consumed on IO operations is reduced. Furthermore, since only bit lines that match a given search pattern will conduct, the NAND comparison operation is also low power. With respect to bandwidth, a single NAND die is capable of doing, say, 256K 64-bit comparisons in under 50 us, working out to under 200 ps per comparison. Additionally, multiple die can be operated in parallel to increase bandwidth or to increase the effective key-length. Potentially 8 Gb ( ⁇ 8G keys) of 64-bit keys can be searched in ⁇ 100 ms in a single die based on current design.
- the host will write Key-Value pair (K, V) to the drive.
- the drive will store the Value V in a data store at some address in the Data table of FIG. 12 , as illustrate at ( 1 ).
- the drive will store the key value K on a bit line “i” in a block of an array of the drive, as shown at ( 2 ) of FIG. 12 .
- the drive will make an entry in the block table at address i, with a pointer to the value V, as shown at ( 3 ).
- the column address which has the matching key can be output from NAND memory from status bits.
- NAND flash memory data is written in word line based pages. Because of this, as previously discussed, the keys may be initially written along word lines, then rearranged to be written along bit lines, or first stored in RAM and then sorted into bit lined oriented keys. (It could also be possible for the host to have already taken care of this bit line based orientation for the keys, although it will generally be preferable for this operation to be transparent as seen from outside of the drive, with a host just providing basic key-value pairs and not having to engage in such data manipulations.) The controller will take care of assigning the keys and values to physical addresses and of determining the needed addressing structures to translate the key into the corresponding data location.
- the key to value mapping tables can be maintained in much the same way as the usual logical to physical mapping tables as far storing them and updating them, such as mappings using look up tables or based a correspondence formula.
- the column address can be mapped to metadata in the primary storage flash management layers.
- the drive itself has a key generating ability, such as a hashing algorithm using by the controller, just the data set itself could be sent to the drive and the corresponding keys generated on the drive.
- a key generating ability such as a hashing algorithm using by the controller
- just the data set itself could be sent to the drive and the corresponding keys generated on the drive.
- the host would need to use the same key generating algorithm (such as from a Secure Hash Algorithm (SHA), for example) as being used by the drive.
- SHA Secure Hash Algorithm
- the host will send the drive a key (K) that is then used to search key blocks.
- the key blocks may be sorted, in which case a binary search can be used; or they can be unsorted, in which case a linear search is used.
- the drive will apply the key K to the word lines. If a matching key exists along a bit line in the block, NAND flash will register a “1” at the bit position “j” associated with the matching key. The value “j” can then be used as an index to the associated block table, as represented at ( 4 ) in FIG. 12 , to retrieve a pointer, ( 3 ), to the associated value V in the Data Table. If all key blocks are searched without finding a match, the drive can return an “element not found status” or error.
- the CAM NAND can be incorporate into the same memory system as that in which the associated data is stored, such as an SSD, in which case the data corresponding to the search key can be provided directly to the host.
- the CAM NAND could be a separate device used to provide the sort of CAM-based operations described here, while the associated data could be stored separately, in which case as address or other pointer to the corresponding data on the separated device would be provided.
- a storage drive of this type has several major advantages over traditional CPU- or semiconductor-based CAM memories. First, because the key comparison is done “on die”, there is no need to transfer the data out of the memory. This saves both time and IO Power. Furthermore the actual comparison operations use less power than conventional memories.
- this scheme has the advantage that write times can be shorter if data is searched in a linear mode.
- Most databases spend time and energy sorting and maintaining tables to enable fast, binary type, search capability for when data is read.
- the writes of data and keys can be done in a random fashion making writes times of O(1) complexity. Searches will use a linear search mechanism which is highly parallelized but is still O(N) complexity. This is less efficient than the O(LogN) of most binary searches and is a tradeoff between insertion time vs. lookup time.
- the high degree of parallelism in searching mean that the sorting only needs to be done to the level at which the search is done, namely to the granularity of block or number of blocks searchable in parallel.
- the sort of NAND flash base CAS drives can be applied to a number of applications, including those described in the previous section.
- One set of examples of these exemplary applications is for de-duplication using pattern matching (CAM) NAND to store the hash keys.
- Incoming data can be sent through the hash function to generate the content related fingerprints.
- the fingerprints can then be searched with the existing hash keys to see whether the data already exists in the data storage. If it does already exist, no write action is taken; but if the data does not yet exit, then the new data will be written into the storage.
- the de-duplication can be done when the data is backing up, during garbage collection operations of the primary storage, or in-line as the data comes in from host.
- Another application is for virtual memory management, which can be done similarly to de-duplication.
- the drive can also be applied to the Human Genome, where the drives stores signatures in the CAM NAND so that any segment of the DNA sequence can be searched.
- the drive also lends itself to parallel computing where, a mathematical NAND function can be done inside of the NAND memory.
- the CAM NAND operation has the keys oriented along bit line, whereas NAND memory written along word lines. Consequently, as the keys come in from a host, they need to be accumulated in a buffer memory of some sort, transposed to a bit line orientation, formed into pages (including adding any inverse keys as needed), and transferred to the NAND device for writing. This is illustrated schematically in FIG. 13 .
- a host 1301 can take the data files and generate the corresponding keys, such as using a Secure Hash Algorithm (SHA) to generate a 64 bit hash key, which can then be transferred over to a buffer memory 1303 on the memory system, where the keys can be accumulated.
- the transposing buffer memory 1303 is used to align the keys for writing in the NAND CAM memory 1305 . Once a sufficient number of keys, say 4 MB keys for a NAND memory 1305 with 4 MB blocks, the data can be transferred over for programming as pages along the word lines.
- transposing buffer memory 14 and 15 give some examples of hardware implementations for the transposing buffer memory, but this can be implemented in various other ways, such as by use of a field programmable gate array (FPGA).
- FPGA field programmable gate array
- a blocks worth of keys could be accumulated in an FPGA and then read out a word line at a time and transferred over to the CAM NAND for writing.
- FIG. 14 is a schematic illustration of a hardware implementation of the transposing memory in FIFO style.
- the data can come in as, say, 64 bits keys or indices and is saved in column-oriented 64 bits registers.
- the registers are chained into a FIFO arrangement so that when a new key comes in, the previous keys shift over by one column to the right.
- the pages are shifted over to the NAND for programming into the array there.
- the keys may be searched while still in the FIFO before being programmed, as the keys can be shifted out one at a time for comparison.
- FIG. 15 is a schematic illustration of another hardware implementation for transposing the data keys using more of a RAM style arrangement.
- the data can come in as, for example, 64 bit keys or indices and be saved in 64 bits registers, being accumulated in a relatively small, 16 ⁇ 64 array 1509 in latches.
- the 16 ⁇ 64 bits of the small array 1509 can then be shifted over a bus 1507 a word (16 bits) at a time into the RAM 1503 .
- the small array 1509 can accumulate next 16 ⁇ 64 bits. This process can continue until the RAM 1503 is full or it is otherwise desired to write in the keys, at which point is programmed in the CAM NAND memory.
- FIG. 15 Under the arrangement of FIG. 15 , if it is desired to search the keys before they are written into the CAM NAND, another RAM buffer storing the keys without transpose can be kept for this search purpose.
- the sort of highly parallel operations using a memory device of a NAND structure as a content addressable memory described in the preceding sections can also be applied to performing data analytics.
- This allows for massively parallel computing to be applied to various analytic applications, where the computing be performed inside of the storage and remotely from the server.
- This arrangement can also allow processing to be done in real time, using inline processing, and also allow for the analytics to be executed without input/output transmission limitations. Consequently, these techniques and structures can be applied to many applications, from crunching large amounts of data in data warehousing applications, quantitative analysis of financial data, and other data analysis intensive uses.
- a memory system 1601 is a computing solid state drive (SSD) that includes a main storage SSD section 1603 , the NAND device used can be normal NAND devices as well as CAM type NAND.
- the NAND portion 1605 as again taken as an EEPROM based flash memory when a specific concrete example is needed.
- a host 1611 such as a PC or even a network connection, provides data and any instructions for analytics to perform on the data to the memory system 1601 .
- the data can be supplied to the NAND section 1605 to be stored for analyzing and then to the main storage section 1603 , allowing for in-line analysis if desired, or stored directly in the main storage section 1603 and retrieved to NAND module 1605 when analysis is requested.
- the keys could be stored on the NAND 1605 and the associated data going to the main storage section 1603 , where the system can maintain a key-data correspondence as described in preceding sections.
- the CPU or GPU or SSD controller could be used to perform some initial manipulations (choosing data subsets, generating hash values, and so on) as needed before writing the data into the NAND structure of 1605 .
- the main data storage section 1603 need not be a SSD, but could be hard drives or other data storage.
- the NAND portion 1605 need not be incorporated into the same system as the main storage 1603 , but a separate system for this portion used in conjunction with a bulk data storage system.
- the NAND system can be used directly with the host for performing data analytics. For instance, a portable device incorporating the CAM NAND and some additional flash storage may be sufficient.
- FIGS. 17-20 illustrate how the NAND array can be used to perform analytics in parallel for all of the columns of the array when the data includes both categorical (i.e., data that can fit into multiple categories, such as (red, blue, green) or (yes, no)) data as well as numerical range detection. Due to the CAM nature of the memory described here, multiple categories can be handled. In this example, categorical and numerical data can be stored along the same NAND strings, but the categorical data is saved in a binary format, while the numerical data can be save as binary (D1), 2-bit per cell (D2), 3-bit per cell (D3) or other multi-state format.
- categorical and numerical data can be stored along the same NAND strings, but the categorical data is saved in a binary format, while the numerical data can be save as binary (D1), 2-bit per cell (D2), 3-bit per cell (D3) or other multi-state format.
- the categorical/numerical distinction is not necessarily hard and fast, as the techniques described here allow for the processing of numerical data to make it into categorical data for purposes of analysis as, in some case this can be faster than performing numerical comparisons.
- the more bits per cell the fewer the number of word lines that will be used to store the data, but with the increased complexity involved in such multi-state operations.
- the analytics will generate a match for the specific query and the match results can be counted inside the NAND or outside of NAND. As discussed further below with respect to FIGS. 30 and 31 , the counting can be done inside NAND digitally, which is precise, or in an analog, which is faster but less accurate. When counting outside NAND, the match results will be transferred to controller and the number of “1” or “0” will be counted there.
- FIG. 17 shows how two of the data sets of a block are written onto NAND strings along bit lines BLn and BLm.
- categorical data in binary form, with some numerical data further down, where a 3-bit per cell format is shown, the bits arranged top to bottom as least to most significant bits.
- categorical data word lines can be searched first, with the analytics then performed on numerical data for the matched “category”.
- the categorical data can be “don't care” or not written with the numerical data at the same memory block.
- the numerical data can then be sequentially analyzed, here starting with the most significant bit, by reading one word line at a time, placing the appropriate read level (CGRV) on the MSB word line.
- CGRV read level
- Each of the bit lines has an associated set of latches that can be used to keep track of the results of the sequence of analytic operation, where an example of how the results are assigned to the latches is shown in FIG. 18 .
- the data latches here are labeled XDL, UDL, and, further down, LDL for transfer, upper, and lower data latch respectively, to correspond to the arrangement such as that described in U.S. Pat. Nos. 7,206,230 and 8,102,705, where more detail on such latch structures can be found, and also see FIGS. 28 and 29 below.
- the table of FIG. 19 shows an example of a compare to see whether a number is greater than 010011001 for four data values.
- the search is here done from the most significant bit down towards the least.
- MSB9 most significant bit
- the top value is found to be greater than the search number, the latches are set accordingly, subsequent reads are ignored and no updates are made.
- the results are indeterminate.
- MSB8 is checked, the second data is still indeterminate, but the lower two values are found to be less than the search values so that the latches are set accordingly and no more updates are needed afterwards.
- the MSB7 result would again be indeterminate and is not shown, but the MSB value establishes that it is greater than the search values and the latches are set accordingly.
- the final search values for this data set are established, as shown in the right hand most column. If there were still indeterminate data, the process would continue on towards the least significant bit until the final search results were all established.
- the match to fit the query can be counted later or saved to another word line for further analysis in combination with other query criteria.
- FIG. 20 is an example of another search to see which data values are between 123 and 231.
- the first digit of the data values are checked against the upper bound, which is found to have exceeded for the first number, putting it out of the range so that any subsequent reads can be ignored.
- the second number is found to equal the MSB upper bound, with the bottom data to be under the MSB upper bound.
- the second digit of the second number is found to exceed the upper bound, so the latches are set and no further updates are needed.
- the second read finds this below the lower MSB values and, consequently, outside the range so that the latches are again set and no further updates needed.
- the second read for the third row data finds it to equal the MSB of the lower bound, so that the next digit is checked against the second digit of the search's upper value in the third read and the second digit of the search's lower value in the fourth read.
- the final search result is then shown in the far right column.
- FIGS. 21 and 22 illustrate how to perform to maximum and minimum value searches.
- the MSB row is searched and loaded into the UDL latch.
- the left two most NAND strings have a “1” for the most significant bit.
- the other columns can be ignored for the rest of the search.
- the process works its way up the rows, where the next two most MSBs are indeterminate.
- the fourth row up the two left columns are different, where the results as loaded into LDL show that the second column has the max value.
- FIG. 22 similarly illustrates a min search, again working its way up from the MSB. At left is shown the situation after working up to the fifth most significant bit, where the outermost column have both had zeros up until that point, as reflected in UDL. At right of FIG. 22 shows the result of two reads later as loaded into LDL, showing the left most column to be the minimum.
- Max and min search can be performed on file size. For a max, the memory can find the file size with most number of digits along the NAND chain, then find the next largest files by eliminating the small numbers. For a min, the memory can find the file size with least number of digits along the NAND chain, and then search for the next smallest files by eliminating the larger numbers. Parts of a file system can be stored in this manner.
- the array structure allows for the data analytics to be done one row at a time, as they can be done by reading one word line at a time.
- the array structure can also be used to perform arithmetical operation, such as addition, subtraction and multiplication, on the numeral data along the NAND strings.
- the process is schematically illustrated in FIG. 23 for a summing operation.
- the data sets of block N 2301 can added to the corresponding data sets of block M 2301 on a row by row basis for each bit line.
- the result can then be written into a SUM block 2305 .
- block N has NAND strings N1 to N128K
- block M has NAND strings M1 to M128K
- the SUM block similarly has NAND strings SUM1 to SUM128K.
- the NAND array is organized as to have a block structure, such as found in flash memory.
- the word lines can be any of the word lines of the array; and when the memory has a block structure, these need not be from different blocks, as, for example, when adding two different word lines of numerical data from the same data set.
- more than two data can be processed and saved into the data latches before writing the result to a new word line. For example, with 3 data latches the memory can add 4 pages before writing to a word line, saving the carry in the data latches for the next bit addition. For 5 data latches, it can add 16 pages and then write to different word line once, and so on.
- FIG. 24 illustrates an example of how the latches can be used in this process for addition.
- the latches associated with each bit line are here labeled as UDL, LDL, and XDL.
- FIG. 24 illustrates a single one of each of these latches with values read from different word lines holding different values as these are read sequentially for a 13 bit number from LSB to MSB. (That is, the arrow represents the sequence of reads in time or, equivalently, down word lines for a single UDL, LDL, XDL set of latches associated with a single word line.
- UDL contains the data set collected at time A from a first block (such as block N of FIG. 23 ) and
- LDL contains the data set collected at time B from a second block (such block M).
- the XDL latch can hold any carry bits.
- the two data sets can be added and stored back in LDL, with the summation then programmed back into another block.
- Other operations multiplication, subtraction, division, etc.
- subtraction can be done as one data added to the complement of the other data.
- floating point operations can similarly be performed by properly aligning the digital information so that the points align for the operands.
- FIGS. 25A-C give some more detail on the mechanics of some arithmetic operations as these are execute in the latches.
- FIG. 25A looks at addition, specifically 10+3, as noted at top.
- 10 or 1010 in binary
- 3 0011 binary
- blocks A and B are shown listed for blocks A and B in the left column, MSB to LSB written bottom to top.
- T0, T1, T2, and T3 these are read into a the latches UDL and LDL, with the carry being held in the XDL latch, as shown underneath.
- the results are written back into Block C from the values shown latched there.
- FIG. 25B illustrates how to perform subtraction of two numbers N1 and N2 to form the difference N1 ⁇ N2. This is done by adding N1 to the 2's complement of N2 plus 1.
- a specific example, here again using 10 and 3 to determine 10-3 in the latch structure is shown: in the top row is the binary form of 10, in the second row the 2's complement of 3 plus 1 (3c+1), and the result is shown at bottom. Any overflow bits need to be discarded, with the result being the binary form of 7.
- FIG. 25C shows how multiplication can be done using bit shift and addition, where 10 and 3 are again used as the inputs.
- FIGS. 26A and 26B look at examples of where, in addition to the XDL latches there are additional latches available on each bit line beyond UDL and LDL, such as is found in a multi-state memory device.
- FIG. 26A looks at the case of 3 data latches. As shown, data from four blocks (or, more generally, four pages) are written in. This allows for four numbers to be added or subtracted in a single write.
- FIG. 26B shows a 5 latch case, allowing for up to 16 numbers to be added or subtracted in one write.
- FIG. 27 illustrates an example of loading the stock data for a single stock historical data analysis, where for the 128k (in this example) bit lines can each be assigned to given stock or other financial instrument, with day for each day written to a different block.
- the data for each stock can be lined up along the NAND strings. With each stock taking a bit line, for an array of 128K bit lines a total 128,000 stocks can be evaluated simultaneously.
- the price per day can then take different blocks or different locations of the NAND chain. Using the blocks for different days, operation such as averages, linear regression, and so on can be performed using the data from the corresponding blocks, where the analyzed data can be saved in a new block.
- the data can be arranged differently on the array.
- FIG. 28 An example is shown in FIG. 28 where the data sets are arranged to perform a correlation study.
- the data for up to 128K stocks on a given day are entered.
- the data from different stock B pre-processed from same chip or different chip will align up with the pre-processed data for stock A on the same bit line.
- the correlation between stock A and B can be calculated accordingly.
- these operations can be performed on chip or with the help of the controller.
- the NAND can include fairly complex, but specific, operations.
- FIGS. 29-31 consider data arrangement for analytics in more detail. So far, the discussion of this and the preceding sections have largely considered the data sets or keys being analyzed on the NAND array as being on a single NAND string. However, more generally, as each bit line can have many NAND strings formed along it that are connectable to the same data latches, this allows for data (a schema) to be arranged in few separate blocks along the same NAND bit line. This is illustrated schematically in FIG. 29 , where some numeric data of a data set, arranged from most to least significant bit, is stored on the same bit line, but in NAND strings from separate blocks.
- the bit line then is connectable to the transfer data latch XDL and, through XS W (transistor switch), the corresponding sense amp (SA) and data latches UDL, LDL, where these can correspond to upper and lower page data latches in a more typical multi-state operation.
- SA sense amp
- Data can also be arranged inside a group of (typically adjacent) bits lines that share the same data bus to communicate with a set of data latches.
- a set of 8 or 16 adjacent hit lines with such shared structure could store each data set on multiple ones of these bit line groups.
- FIG. 31 schematically illustrates a latch structure where 8 NAND bit lines can process the data with shared SBUS and DBUS through logic operations from YBOX circuitry, so that a schema can be stored in up to 8 associated bit lines sharing same data latch set.
- the results of the analytic operations can be computed according to various data counting methods. As illustrated schematically in FIG. 32 , the counting can be done digitally inside the CPU, toggling the data out to RAM for counting. Digital counting can also be performed inside the NAND device, such as by binary search or shooting chain. Analog counting can also be done inside the NAND, which, while less accurate can be done more quickly.
- FIG. 33 shows some elements of such circuitry for counting quickly with analog wired. OR circuitry: here, the data is applied to the gates of a set of transistor connected in parallel, each connected in series with a transistor controlled by an analog bias. The transistors are fed by one leg of a current mirror, the other leg of which is connected to ground through a transistor acting as acting to set a digitized current level.
- FIG. 34 illustrates how analytics can be applied to large file systems.
- complex queries such as “How many files have been updated since 10 days?” and “Which are the top five largest files that belong to John?”
- the first is an example of aggregate queries which provide a high-level summary of all or part of the file system, while the second is top-k queries which locate the k files and/or directories that have the highest score according to a scoring function.
- the incorporation of the NAND section as described with respect to FIG. 16 provides a simple solution for performing such queries.
- FIG. 34 shows a file system.
- the basic file data owner, time stamp, file size, etc.
- the basic file data is saved into NAND in vertical NAND chains as shown at the right of FIG. 34 .
- Performing the analytics by the NAND SSD in this way saves the trouble needed to build the data tree structure as shown in the file structure on the left.
- aggregate queries can, for example, be searched on the time stamp that can be located in a few word lines against a certain date.
- Top-k queries can be done, for example, by identifying “John” and the file size. (As noted above, min and max searches can also be done for file sizes, allowing for portions of file systems to be stored on this basis.)
- FIG. 35 a conventional arrangement of a server 3501 used along with a data system 3503 of one or more SSD drives such as 3507 .
- the CPU of the server 3501 does all of the computational operations for any sort of data analytics, where the storage system 3503 is just used for storing data, so that has to flow from storage device to the server CPU to perform analytics on any data stored there.
- an individual element 3507 such as an SSD blade, of the storage system 3503 will have a controller, this is traditionally used for managing of the data storage on the device, rather than any sort of analytics.
- the current section describes techniques whereby some or all of the data analytics are instead done on the memory system that is now used as a data analytics system.
- FIG. 36 schematically illustrates an architecture using the sort of computational NAND described in the preceding section. This has several layers: the server, the solid state drives; and, within the drives, the memory controllers and their associated memory devices.
- the server 3601 issues software commands that can be in an existing data base languages. On the Data Analytic System 3603 , these commands are translated into firmware to be executed there. Various tasks can then distributed between the flash memory controllers, such as 3611 , and their associated flash arrays, such as 3613 .
- the server itself can also be used in performing some to the task, such as any final or complicated computations, where the results could then be provided on a graphical display 3607 , for example.
- functions that can be performed on the flash controller can include sort, merge, addition, subtraction, multiplication, division, squaring, exponentiation, etc. while for the NAND memory the functions could include operations such as searching, matching, minimum, maximum, greater than, less than, addition, subtraction and so on as discussed in more detail above.
- This arrangement can save on power, as data does not need to go out of the SSDs. It can also save on servers, as it can handle more data without a need to expand servers with large DRAM memory. Since this is a distributed system, with multiple controllers and associated flash array, the controllers and the associated sets of flash array do not need to be executing parts of the same job. Instead, multiple jobs can be executed in parallel, as indicated by the different Job numbers above the lash control circuitry of FIG. 36 .
- This ability to run multiple jobs in parallel can be particularly useful for computational NAND as in some cases a user's data may not be large enough to compete with the performance of a server-based implementation; however, the system described here has the advantage that it can process multiple jobs in parallel to exploit the benefits from utilizing the degree of parallelism in the NAND based flash SSD of the data analytics system.
- the software running on the server will issue the commands using a database language.
- This software which can interface with a common, existing database language, is translated into firmware language and then into multiple small tasks.
- the software level can also optimize the analytics to choose the best way to arrange data or distribute the task among the microprocessors or flash controllers or NAND memory units.
- the small tasks will be executed on the SSD flash controllers or on NAND flash according to the task specifications.
- the mid-product from the NAND flash or the SSD controllers can be merged within each SSD blade and then, if needed, further merged on the top server level.
- the task to distribute the jobs and merge the jobs can be executed with proprietary software or it can also be interfaced with open source computing, such as Hadoop Map Reduce platform.
- FIG. 37 shows an example of the hardware architecture in more detail for one SDD unit.
- the SSD controller includes interface circuitry as well as components to parse and distribute commands and data across the 16 (in this example) flash controllers.
- the flash controllers perform the usual functions, such as logical to physical address conversion and so on, but also the analytic functions.
- the SSD controller can focus analytic operations. Memory maintenance operations, such as garbage collection, can be delayed and not be triggered until the analytics operations finished, releasing the SSD controllers to execute analytics jobs.
- the SSD controllers, and the flash controllers within these often a fair amount of capability that can directed towards analytic operations, particularly if maintenance and other non-analytic functions, are suspended. Consequently, the larger changes to the system is with respect to the firmware for the SSD controllers have to be controlled with firmware, although the hardware can also be optimized to do various analytics jobs.
- FIGS. 38A-D illustrate an example of a particular benchmark operation, a pricing summary report query (TPC-H Q1 benchmark), where FIG. 38A is some corresponding pseudo-code for the analytic operation. These operations can then be distributed between NAND operations and controller operation based upon which is most efficient: for example, addition can be done in NAND, where it is faster, while multiplication will be done more quickly in the controller.
- FIG. 38B illustrates how the data sets are arranged in the NAND memory for analytic operation to be done there, where the NAND bit lines run up and down. As discussed in previous sections, this may require the data to be transposed form a word line orientation prior being written into the arrays.
- FIG. 38A is some corresponding pseudo-code for the analytic operation.
- FIG. 38C illustrates some of the parameters for determining “120 days Sum of Quantity”, where the operations on NAND are indicated by the bracket from the left, those in the controller by the bracket from the right.
- FIGS. 39A and 39B schematically illustrate examples of this division of tasks between the NAND memory and memory controller for the respective queries of FIGS. 38C and 38D .
- the CAM type NAND 3901 can provide the controller 3903 with the Quantity(i,j) results, which it then buffers in the RAM 3911 . These can then be summed up (represented at 3919 ) by the circuitry on the controller 3903 . The single number of the result, SUM(Quantity(i)), can then be sent back to the memory chips.
- the NAND 3901 now provides two sets of quantities, Price(i,j) and discount(i,j), to the controller 3903 that are respectively buffered in RAM1 3911 and RAM2 3913 . (Although shown as separate for illustrative purposes, in practice these can be stored in a single RAM memory used by the controller.) These quantities are then multiplied and summed by the controller's logic, schematically represented by 3915 and 3919 , to generate the result SUM(discount_price(i)), where this result can then be written back into the non-volatile memory.
- FIGS. 40A-D illustrate another example of a particular benchmark operation, a minimum cost supplier query (TPC-H Q2 benchmark), where FIG. 40A is some corresponding pseudo-code for the analytic operation.
- TPC-H Q2 benchmark a minimum cost supplier query
- FIG. 40A is some corresponding pseudo-code for the analytic operation.
- These operations can again be distributed between NAND operations and controller operation based upon which is most efficient: for example, search operations and minimum operations are done in NAND, with results being in the controller.
- FIG. 40B illustrates how the data sets are arranged in the NAND memory for analytic operation to be done there, where the NAND bit lines run up and down and the left-most column shows an example of how may digits can be assigned to the various line items.
- FIG. 40C lists steps of the operation and the amount of time typically involved for these, where steps 6 and 8 are controller operations and rest NAND operations.
- FIG. 40D illustrates some of the parameters for determining “minimum cost supplier” in this way
- FIGS. 41A and 41B are respectively more detailed versions of portions of FIGS. 40B and 40C .
- FIG. 41A illustrates the first two steps in FIG. 41B of the steps 1 - 5 from FIG. 40C that are executed on the NAND array.
- the supplier and region are lined up along the NAND strings, where region is searched.
- the results of the search are then shown, followed by an AND operation for the search result with supplier.
- a similar search for “part No1”.
- Similar searches are then done for type and size, followed by a Min(Price). In doing this, only two functions (Search and Min) are used on the NAND array, where these are described in more detail in the last section.
- the controller then performs the “Merge Min from all die”, shown as step 6 in FIG. 40C .
- the controller when data is transferred to the NAND devices, some data will be stored vertically along the NAND chains for Search operations. Consequently, the data, whether being retrieved from elsewhere in the memory system or being received from a server, will likely need a rotation to be done inside the SSD controller.
- this transposing of selected data into a bit line arrangement can be done when the data is initially shifted in from other storage to the analytic storage. The transpose can also be done when certain specific search commands are issued and the needed data has to be re-arranged to accelerate the search.
- FIG. 42 shows portions of the data sets as stored along columns for the search operations shown in FIG. 41A .
- associated data is then stored along word lines in NAND pages.
- One such correspondence is shown for the line item with (region1, part No1) is associated with the shown (quantity).
- Column addresses can be linked with a row address by an offset in a fixed mapping, a formula, or by something like a logical-to-physical mapping, but now the metadata would associate column addresses with associated row oriented data. This is similar to the sort of key-data associations described above for more general CAM NAND arrangements.
- a search can then be done in the Column (along bit line) Store (such as for (region1, part No1)) and the corresponding data (such as the associated (quantity)) can then be fetched from Row (along word line) Store.
- FIG. 43 considers the building blocks to convert a flash SSD to a computational SSD drive.
- the Software Layer running on the serve, where this takes a database language query and turns this into a set of commands to send to the data analytic system.
- the SSD controller compiles the commands into executable SSD firmware layers.
- the firmware can then direct the flash controllers to fetch needed data and perform computations on them, where this can done using hardware merge and sort operations as these typically deal with small amounts of data.
- the flash controllers then issue the needed commands, such as for normal operations or searches, on to the NAND memories.
- the NAND memories can have the sort of enhanced search capabilities described in the last section above.
- a single analytic job can be divided up into many small jobs and distributed to each flash controller and associated sets NAND flash memory for execution.
- the flash controllers can then issue parallel command sequences to execute the analytics jobs in parallel in multiple memory chips and controllers.
- the execution of some analytics in the SSD controllers can be done in parallel with the NAND analytics operations, where the two operations do not interfere with each other; for example, counting in the controller while doing a sensing read inside an associated NAND device for a search operation.
- the multiple flash controllers on SSD drive are controlled by the interface controllers and can coordinate the merge and transfer of partial results out of the SSD drive to the server, which can merge the final results and display them to the customer.
- FIG. 44 can be used to contrast the techniques of this section with this sort of server based approach, here in the context of a Hadoop Distributed. File System (HDFS) used for the first system. This is shown to the left.
- HDFS Hadoop Distributed. File System
- MapReduce engine is used for the computing, forming ⁇ key, value> pair, which then undergo the network intensive operation of a “shuffle”, with the results then be assembled (a count).
- the data is split into HDFS blocks of 64 MBs or 128 MBs chunks dependent on the server's DRAM size. Map tasks writes the intermediate output to the local disk.
- FIG. 44 illustrates how the analytics can be moved from the server to the SSD drive, where the CAM NAND can be used for as part of a data analytics system.
- the key value pairs in this example ⁇ year, temperature>
- the key can be mapped according to the column address. Only the value need be stored, where it can be rotated and stored during a Map phase.
- the Reduce (or computing) phase is done along the NAND strings.
- the split can be 2 MB/block times 4*16*8 blocks, or 1 GB, for one SSD blade, compared to the 64 MB split of the HDFS block.
- data can be duplicated twice for improved accuracy. If the data is spread across multiple die, such as the temperatures spread across two die in this example, a join operation will program the (here) two maxima to the same die and the same bit lines for comparison. The final maxima results can also be sorted out by a microprocessor or top level servers. To prevent errors, a few top max value from each NAND memory can be found first. These few Max values will be fetched out from a ROW (or word line) stored page with ECC correction. These few Max values will be sorted out again on the top flash controller or microprocessor. By being able to use larger splits, the data analytic system will result in fewer merges and less data toggling relative to a server-based implementation.
- FIG. 45A illustrate the architecture for a server CPU-based implementation and FIG. 45B compares its performance with that of data analytic system using NAND flash based computing.
- the SSD can be located inside of the data node, where the SSD preferably has the capability to needed data rotation quickly. Server then provides a series of queries to trigger firmware execution to do the needed data re-arrangement and low level NAND commands.
- the computing SSD can be with compatible top level language, with possible modifications of NOT cutting the jobs to small chunks of 64 MB and skipping the replications by using RAID inside SSD to protect data, so as to be open to systems using HDFS.
- NAND based computing can process 8 Giga data sets at a time (1 block/plane/die), this leads to the sort of relative performance shown in FIG. 45B .
- the sort and merge process of the data analytic system for un-structured data is looked at in more detail with respect to FIG. 46 .
- the year and temperature are saved along the NAND chains for two memory blocks of CAM NAND. This corresponds to the right side of FIG. 44 .
- the CAM NAND array is then used to search for the maximum(s) for each year within each block of data.
- the columns are then associated with the corresponding original data, at top, center stored with a word line orientation, based on the column location.
- the maximums from the blocks are collected into another block, where the original data can be fetched from normal storage with ECC protection.
- This new block (shown center, bottom) with merged maximums from the different blocks of the earlier max operation is linked to original data through the address map.
- the new block is then searched for the overall maximum for each year and referred back to the address table to provide the final results for outputting the ⁇ year, max> result.
- Another option is to get the intermediate results of Maximums out to the controller or microprocessor to sort out the maxima with high to low orders. The values in this group far from the top group of maxima can be thrown out.
- the sort can also partially be done on the SSD controller ASIC. This can be illustrated with respect to FIG. 47 .
- the search for word frequency where the sort is done partially on the CAM NAND, using a greater than operation to find the top 100 frequencies.
- the words and their frequencies are stored in word line oriented pages and the frequencies are mapped vertically in CAM NAND.
- the first example is a Distributed Grep operation, where Grep is a command-line utility for searching plain-text data sets for lines matching a regular expression.
- Grep is a command-line utility for searching plain-text data sets for lines matching a regular expression.
- the map function emits a line if it matches a supplied pattern.
- the reduce function is an identity function that copies the supplied intermediate data to the output. This is illustrated in FIG. 48 .
- the URL matches the search criteria, the corresponding source is fetched.
- the patterns are stored the in flash memory along bit lines. These are then searched for a matched pattern. When there is a match (the Reduce phase), the controller can go to the corresponding location of the row oriented data to read out the source content, where the searched key is linked to source content through column address.
- FIG. 49 illustrates this process for a CAM NAND implementation.
- the memory stores the patterns (URLs) along the NAND strings.
- the search is then made for the matched pattern (URL1 in this example).
- the matched “1”s can then be counted up on the NAND or on the controller for query result.
- the more precise counting can be done in the way that the matched results will be read out from the corresponding ROW (word line) storage locations and the controller can check the match of these roughly chosen data to be counted, while throwing away the false results.
- the performance speed-up comes from the reduced amount of data to be counted by the top level controller or servers.
- FIG. 50 illustrates this process for a CAM NAND implementation.
- the memory stores the patterns (URLs) along bit lines in the flash memory. This is then searched for the matching pattern (here URL1).
- URL1 the matching pattern
- the memory then foes to the corresponding location of the row oriented memory and reads out the source content.
- the system then saves all of the source data related to a match to the same location (in this example, source 1, 3, 5, 6, 9), where the searched key is linked through to the source content though the column address as described above.
- FIG. 51 illustrates the example of a term-vector per host query.
- a term vector summarizes the most important words that occur in a document or set of documents as a list of ⁇ word, frequency> pairs.
- the map function emits a ⁇ hostname, term vector> pair for each input document, where the hostname is extracted from the URL of the document.
- the reduce function is then passed all per-document term vectors for a given host. It adds these term vectors together, where it can throw away infrequent terms, and then emits a final ⁇ hostname, term vector> pair.
- the memory stores word frequencies along the NAND strings in flash. The words are saved horizontally along the word lines with frequency by column address versus word line address. The frequencies of occurrence in different documents are saved in different blocks or different sets of word lines. The frequency of occurrence in the different documents are then added and stored in another block of word line address. The final result can streamed out and the data can be rotated back for later access.
- the next example is an inverted index.
- the map function parses each document and emits a sequence of ⁇ word, document ID> pairs.
- the reduce function accepts all pairs for a given word, sorts the corresponding document IDs, and emits a ⁇ word, list(document ID)> pair.
- the set of all output pairs forms a simple inverted index. It is easy to augment this computation to keep track of word positions.
- FIG. 52 illustrates this example for a CAM NAND implementation.
- words are stored along the NAND strings in flash.
- the document ID is saved along the word line direction, linked with the words by column address versus word line address.
- the words can be matched one at a time.
- the document ID associated with the matched word can be read out according to the matched column address.
- the associated document ID read out can be saved in another location.
- the next example is a distributed sort, as illustrated by FIG. 53 .
- the map function extracts the key from each record and emits a ⁇ key, record> pair.
- the reduce function then emits all pairs unchanged.
- the memory arranges both key and record along the NAND strings in flash. Key-record pair can also be stored along the word line direction, linked to the column address.
- a search is then done for un-changed ⁇ key, record> pairs, which are the read out from the word line stored information.
- the record can be sorted, with all the keys related to record1 saved together, all the keys related to record2 saved together, and so on.
- the sort process can be break into the following sub-operations: a) Search the 1 st record; b) Get the keys for 1 st record; c) Repeat a)&b) for 2 nd record, and so on.
- a final example considers the joining of two tables.
- the first table matches Station IDs to Station Names.
- the second table associates the Station IDs with a Timestamp and a Temperature. As shown at the left of FIG. 54 , these are to be merged into a single table where the station IDs are matched with Station Names, Timestamp, and Temperature.
- the right of FIG. 54 illustrates how this can be implemented in CAM NAND.
- the data corresponding to the second table is entered in a page based manner. (Note that when entered in along the word lines, these different table entries need not line up on a 1 to 1 basis.)
- the station IDs which is the common element between the original two tables, are arranged along the bit lines.
- the station ID data is stored in a column format. A search for each station ID is done first, all the matched station ID will be read out from the ROW store location for the corresponding station names. The station name will be loaded into the ROW store locations corresponding to the selected station ID in the second table. After each station ID is searched and their corresponding station name is inserted into the second table, the Join operation is finished.
- NAND memory has an inherent AND structure for the memory cells along a NAND string as a consequence of their being connected serially along the string.
- the word lines span the columns of the array, allowing for parallelism exploited in the preceding sections. So far the discussion has only looked at operations on a single such block at a time. (This sort of block (a single NAND string on each column, for all of the columns of the array) is here taken to be the same as the usual erase block of the flash erase structure, but more generally these need not be the same; for example, an erase block could correspond to, say, two adjacent ones or the NAND strings along each column.) An array typically has a large of such backs, with their corresponding NAND strings connected along common columns or bit lines.
- FIG. 55 can be used to illustrate the OR function and how this can be used to add functionality to the CAM NAND structure and enable whole chip search functions.
- FIG. 55 shows a portion of a NAND array, specifically two NAND string along each of two bit lines BL1 and BL2.
- a top NAND string is connected to the bit line at its top end and connected to source line SRC 5511 . In between, four cells are shown connected between the select gates.
- the lower NAND string is similarly arranged, but upside down.
- the result can then be stored in one or more of the latches 5503 , 5505 , or 5507 .
- the multiple (here 3) latches can be used to store and perform logical/arithmetical operations between the results of different queries.
- FIG. 55 just showed a small portion of a single array.
- An exemplary system could store upwards of 128,000 index/key items per block, where each block could have 128 word line per block for a maximum index length of 16 Bytes. If data is stored twice or in complementary pairs on a string, this corresponds to a 64 bit index.
- an array can have on the order of 2000 blocks per plane, with 2-4 planes per chip, 8-16 chips/Each flash controller, and 128 chips for one SSD.
- a specific key can be searched across multiple dice in an SSD drive. In regular operations, only one block per plane will be searched at a time.
- An alternate mode can be implemented to search the same key throughout multiple or all blocks of a plane by enabling multiple or even all the blocks in one chip. All or part of the blocks in one chip can be searched with same pattern simultaneously.
- the matching NAND strings will turn on while the non-matching NAND string will be off.
- One bit line may be connected to 2000 blocks.
- One match along any bit line will make the corresponding global bit line conduct, since all the search results in all the blocks are OR-ed together.
- a single match in the plane as a whole will make a bit line conducting.
- the sensed result can be stored in the massive register structures adjacent to the memory array.
- each, NAND string within a block has a (N)AND function, and the NAND strings between blocks have OR functions without performing the logic operations from the data registers.
- FIG. 55 there are three memory registers 0 5503 , 1 5505 , and 2 5507 .
- Each register can contain one search or query. Multiple search or queries can be stored into one register or multiple registers. Multiple queries can be combined in the 3 registers with functions such as (N)OR, (N)AND, or (N)XOR operations between the queries.
- FIG. 56 illustrates an example of doing a full chip search simultaneously applying the same search key to all blocks. This can determine whether any of the NAND strings within any of the blocks have the searched for pattern. In this example, only one search result is found in all the blocks simultaneously searched.
- the block to which a conducting NAND string belongs can be determined by a binary search, for example. This is illustrated with respect to FIG. 57 . Initially, as shown in the left-most column, blocks 0 to N can be sensed all in one sensing. If there is a match, a binary search can be performed to select 1 ⁇ 2 the number of blocks to see if the match is in this half. This is shown by the shading of the top half of the block in the second column. This can then be narrowed into the 1 ⁇ 4, 1 ⁇ 8 1/16, and so on until the match block is found.
- FIG. 58 illustrates an example of using the inherent OR functionality between different searches.
- a first attribute (X) is loaded in a first block and a second (Y) attribute is loaded into a second block.
- the two blocks are then biased according to the respective search patterns.
- the “0”s and “1”s represent which of the individual NAND string conduct in response to the applied pattern.
- the result as seen at the sense amp's registers is then the OR of these two queries is shown at the bottom row and can be latched for use in additional operations, if desired.
- FIGS. 59A and 59B illustrate an example of doing a JOIN using AND function between data registers.
- FIG. 59A illustrates the data space, where the X and Y are the location coordinates of certain restaurant, for example.
- large amount of data will go to CPU to be filtered for selected X range first.
- the resulted result will still be a large amount of data.
- the data that fall within the selected X range will be transferred to CPU again to search the data fall into Y range.
- the final resulted data could be small, but a large amount of data has to be processed.
- the (X,Y) values serve as attributes in the same table: X and Y are the two columns of the same table.
- the CAM NAND will store the data in the corresponding bit lines, storing both X and Y attributes according to the table row and column address.
- a range search on Table X can yield large number of elements, and the range search on Table Y can also yield large number of elements.
- X, Y share same primary key and can be AND-ed together in CAM NAND in order to give the AND results to find, in this example, the restaurants.
- FIG. 60 looks at another JOIN example, here from a TPC-H Benchmark, where one table ( 6001 , top on left) is used to find a part key (PARTKEY) from the part's name (P_NAME) and another table ( 6003 , bottom on left is used to fine the supplier and cost associated with the part key.
- PARTKEY part key
- P_NAME part's name
- 6003 bottom on left is used to fine the supplier and cost associated with the part key.
- the map and reduce phases for the data structure are illustrated on the right for a prior art implementation.
- Perform the JOIN in the CAM NAND can begin by arranging the part names in the CAM NAND while the PARTKEY is arranged in the associated block in a fixed mapping to the column address of the P-NAME, as illustrated at top left.
- a search is performed on the part table 6001 to find the part name being sought. From the part name, the corresponding PARTKEY is allocated in corresponding data store block.
- the other table, 6003 from Supply Key will arrange the PARTKEY in the CAM NAND oriented along the bit lines. Then a search is performed on this table on all of the PARTKEY results (found in the first table, 6001 ) to allocate all the SUPPKEY and SUPPLY_COST in the corresponding data store blocks. In this way, the CAM NAND JOIN operation is performed by the first table ( 6001 ) search results becoming the search objects for the second table ( 6003 ) search.
- FIG. 61 illustrates two blocks of NAND memory, with a set of data written in the first block (block N) and the inverse of the data written along the bit lines of a second block (block M).
- the sought for data pattern is illustrated on the bit line with D7, with other example shown on some to the other bit lines.
- the SA line illustrates the result of a sensing operation for the block in response to an applied data pattern.
- the search for the NAND strings with possible matches can be done by first sensing data in block N and storing the result into register 1. This is shown in block N where the search pattern of high and low word line levels shown at left corresponds to the search pattern.
- An operation is also done to sense the inverse data (data bar) block M, with the results put into register 2.
- the possible sets of these results is shown in FIG. 62 , where possible register 1 (Data) and register 2 (Data Bar) values are respectively shown on the first and second lines.
- the memory can then perform a logic operations between these two registers and put the result SA(Data) AND SA(Data Bar) into another register for showing any absolute matches.
- the read levels can be shifted slightly. This is illustrated in FIG. 63 , where the distributions of the “1” and “0” states are shown. The read point is margined high. Although this may pick up some false positive results, any matches can be followed up by checking with copies stored with the metadata.
- duplicated data can be written into two different blocks, so that will still be good copy should the other one have an error.
- FIG. 64 where the same data pattern is stored along each bit line in both of blocks M and N.
- the pure pattern all “1” should be avoided, as such a bit line will conduct regardless of the search pattern that is applied.
- any unused bit lines should have some sort of dummy data written in, rather than be left in an erased state, or a specified bit could be programmed to indicate an unused and otherwise eased NAND string.
- the data in two separate blocks with same data can be simultaneously sensed at the same time.
- All the comparing and calculating is based on the fact one of the data is conducting to judge the data.
- the duplicated data in separate block is suitable for the range search case where the search is done based on single word line sensing results.
- the sense margin for searching can be shifted low, to favor the “0”, or non-conducting, situation. This is illustrated in FIG. 65 .
- the erased cell “1” blocked by the sensing margin will have the chance to be conducting in the duplicate block, where the use of two blocks can help the likelihood that at least one of the blocks with a “1” will have a “1” as a result.
- This principle can also be applied to multi-level memories with 2 or more bits per cell.
- multiple word lines are often set to data dependent levels, so that one or more word lines have their adjacent word lines biased differently than was used when the word lines were programmed. Consequently, the word line to word line coupling can lead to errors in the multiple word line sensing operation used in the various operations described above.
- FIGS. 66 and 67 can help to illustrate this situation.
- FIG. 66 illustrate the conditions under which a memory is verified during a write operation and under which it will read accurately without margin loss.
- the selected word line is in the middle with being sensed with a “0” or low voltage word line.
- the adjacent, non-selected word lines are both biased to HI or VREAD with a data indicated by “X” and this is normal sensing condition.
- FIG. 67 illustrate the situation when multiple word lines simultaneously have data dependent values applied.
- the left-most column corresponds to a search key of “1” and “0” values, which then shown having corresponding biased voltage into Low and Hi voltage levels for multiple word line sensing.
- a normal read margin will occur if both adjacent word lines are high, but if either adjacent word line is biased with low, the result will be shifted; and if both adjacent word lines are low, the shift will be compounded.
- This is shown in FIG. 67 where almost all of the level have SHIFT UP due to an adjacent Low word line and for the Low a little over half-way down there is a SHIFT UP UP as both adjacent word lines are biased low.
- This problem can be dealt with in various, such as programming the memory with greater separation between states or using neighbor dependent bias conditions, but one way to is to scatter the index data and avoid programming the index in the adjacent neighbors in which data dependent levels are avoided on adjacent word lines.
- the sensing in a block can be split in an even word line and an odd word line sensing, where the word lines of the non-selected half are biased to Vread, the passing voltage, uniformly.
- One set of indices can be programmed on even word lines and another (or another segment of) index can be programmed on the odd word lines.
- FIG. 68 looks at this avoiding of the distribution shift up due to adjacent word line voltage bias and re-arrange the inverted data.
- the 1st and 2nd columns are for two sets of words line along the same set of NAND strings and correspond to the first and second sensing operations: word lines 0, 2, 4, . . . can be biased with a data dependent the corresponding portion of the search key, while the word lines 1, 3, 5, . . . are set to non-selected word line value; and then the roles are swapped for the second sensing. Under such an arrangement, it may be preferable to rearrange how the data is stored along the bit line.
- the first sense would use the search key on the even word lines and the second sensing would use the another key on the odd word lines.
- the data could be arranged as shown in FIG. 68 were the data bits are paired up in both the first and second subsets.
- FIG. 69 looks at some of the steps of data manipulations for 64 bits index that is to be stored as data/data bar.
- the data comes in to the memory system and rotated from a horizontal format to vertical format in DRAM or other memory, as discussed above.
- the order can be arranged. For example, here the data is split into two halves, so that the even word lines hold the first 32 bits and the odd word lines hold the second half of the patterns, allowing programming of the memory in the physical order of the index.
- the memory device can generate the inverted data (data bar) and insert the inverted data internally during program.
- the memory device can hold a pair alternating pages of data, such as index 0 and index 32, to be programmed into array, then invert the data and program into flash again, so that the controller only need to map half of the block space for data.
- the index data will be divided into a first half and a second half, or, more generally, into many segments. These segments can be searched in sequence.
- the data On the far right of FIG. 69 , the data is un-shaded and the data bar is shaded in gray.
- the segments shown are interleaved.
- the second segment will be all biased to the data independent passing voltage Vread to avoid the word line to word line effects altering the results. If the first segment showed matching results, then the second segment can be sensed. When the second segment is searched with multiple word line sensing, the first segment will be biased with Vread.
- word line to word line coupling with data arranged in different block can be illustrated with respect to FIG. 70 , showing the word line bias when the memory has data in the different blocks.
- an adjacent low word line results in a shift of the word line's level.
- the word lines can be split into non-adjacent subsets and the corresponding partial keys or indices applied sequentially.
- the first segment data and second segment data are interleaved in block N.
- a multiple block sensing can OR all of the first segment index search results together. The first segment search result can be confirmed first before proceeding to the second segment multiple block sensing.
- FIG. 71 looks at a multiple block example, In a first sensing operation, the even word lines of multiple blocks are selected, followed by a second sensing where the odd word lines are selected. In both cases the non-selected word lines are biased with the by-pass voltage Vread.
- this looks at an example of a range search operation with duplicated data error protection, for identifying a number which is >010011001.
- the memory can use complementary data in the same block or the duplicated data in different block scheme.
- the range search case only one wordline is sensed per block at the same time.
- the sensing margin need to be margined as FIG. 65 with a low margining so that “0” can be accurately detected.
- the table of FIG. 72 illustrates the case where the “1” cell has error in one block, since the other block duplicated data is very likely is “1”, the two block sensing result will be “OR” together to generate the correct results. This is shown in the first column where one of the sensing may either yield a “1” or, due to error, a “0”, but the sense sensing correctly gives a “1” and, where OR-ed together, a correct (C) result. When both sensings give a “0”, a not correct (NC) will result.
- the system can arrange the data of the paired word lines in an adjacent blocks so that there will not be any word line to word line effects as described with respect to FIG. 70 .
- the most significant bit MSB
- a 4 digit number has the MSB of 1000.
- the number 1000 can be arranged along the NAND string in block N and block M. This data arrangement will avoid the data shift due to a neighboring word line's voltage bias effect.
- the rest of word lines in the two blocks will be biased to the passing voltage Vread.
- the steps of data manipulations for, in this example, 64 bits of data in the duplicated data case can corresponding to the data/data bar version of FIG. 69 , but with duplicate data instead of an inverted copy.
- the memory chip can hold two pages of data, such as index 0 and index 32, and program them into the array a first time and then program the same pair into array again. Alternately, in the case the interleaving can be skipped and the duplicate can be written on another block, with the duplication again carried on the memory. The result is then as is shown in FIG. 74 , where the data is rotated, transferred to the memory device, and then duplicated there.
- FIG. 75 summarizes some of the different cases.
- the same data can preferably be arranged in different block, each with one word line (1 WL) sensed.
- data and data bar should be arranged in the same block, with voltage biased in the even multiple word line sensing or odd multiple word line sensing.
- the data can also be arranged in the separate blocks, but the sensing has to be done in two separate steps.
- a bloom filter is formed from a group of data elements.
- a checking of a data element against the bloom filter can then be used to determine whether that element is a member, or likely to be a member, of the group.
- the structure of bloom filters allows for the possibility of false positives, but not false negatives, where the likelihood of false positives tends to decrease with the size of the filter and to increase with, the number of elements from with the filter is formed. Consequently, in addition to checking for an exact match of a search key among stored keys, an initial check with the search key can be done on a bloom filter to determine if the object key is likely to be present. If a positive result is found, the search can continue for a match by applying the search key along the word lines and seeing whether any NAND strings conduct in response.
- each memory chip can have its own bloom filter section where a set of bloom filters formed from the indices stored along NAND strings on the chip.
- the bloom filers can either have a vertical orientation, as described further in U.S. patent application Ser. No. 13/827,609, or a horizontal orientation.
- each block of indices can have a corresponding bloom filter with ECC protection written along a word line of a block of bloom filters.
- FIG. 75 is a schematic representation of the arrangement.
- a host 7501 such as a PC or server, provides the data, which can include the indices or keys.
- the indices can be stored in a buffer memory and rotated to have vertical orientation before being sent out as pages for programming into a memory array. This part of the process is basically equivalent to that discussed above with respect to FIG. 13 .
- Bloom filters are also generated from the indices; for example, all the indices to be stored in a single block could be used to form a single bloom filter configured to be written onto a single word line, where the controller could also generate ECC for the bloom filter and form a corresponding write page. Copies of the keys, protected with ECC, can also be stored along with the metadata as sent over to the memory.
- the memory space on the memory chip can be conceptually be split up to include the blocks in which the indices are stored along the NAND strings; a portion in which the bloom filters are stored, such as one or more blocks (depending on the number of such filters) set aside for this purpose; and an area for the metadata.
- a flow for searching the indices is then shown to the right of FIG. 76 .
- the search key can initially be checked against the bloom filter. A positive result indicates that the sought for index is in the memory, although there is the possibility of a false positive.
- the bloom filter positive for the search key will not give the location of sought for index, but the positive result will correspond (for non-false positive) to the set of indices from which it is generated.
- each bloom filter corresponds to block of indices
- only a block whose filter yields a positive result will need to be checked. Consequently, the bloom filter results can also be used to reduce the amount of searching, if desired.
- the system can go through the meta data (which in this example includes the index and pointer) one by one to find the match index. If the bloom filter result was a false positive, then this search will also come up empty, while if the regular search was a false negative due to error, this provides a way to still extract the sought for index. As noted above, even if the regular search comes up with a result, it may be useful to double check the result against the metadata.
- the system can re-program the same word line for the update.
- a horizontal bloom filter is here preferred as it more easily allows for ECC protections, but the parity area is not readily updated in this way, in which case it may be preferable to re-compute the ECC parity from the controller and reprogram the result to another word line.
- bloom filter is not implemented. Only index block and meta-data blocks are arranged. To reduce index block error, the new program scheme shown below can be used. The index block can also be replicated in a few different blocks to reduce error. All the replicated index blocks can be linked to same meta-block by address calculation.
- this section looks at an additional technique for increasing error protection on the programming side. As before, this can be combined with one or more of the other data protection techniques (bloom filter, storing a copy in meta-data, margining in sensing operations, storing multiple (2 or 3) copies of data, and so on) already presented.
- a post-write read of the data is performed.
- the process here is directed at find those NAND strings that have one or more incorrectly programmed bits along them. More accurately, it is to locate those keys or indices that are incorrectly written along bit lines: If the keys/indices runs over several NAND strings, these can be checked a block at a time; while if the indices are less than a NAND string in length (such as 32 bits long so that two keys fit on each of 64 bit NAND strings), the indices can be checked once written in (as the half-block stage in the 32/64 bit example). In the following discussion, the length of the indices will be taken to that of the NAND strings in a block.
- the memory reads back the word lines one by one. These are compared with the original data inside the NAND to see whether the memory has and mis-programmed bits. Mis-programmed bits will be accumulated (OR-ed along each column) among all or some of the word lines to determine which, if any, NAND strings will result in an error when sensed. These mis-programmed indices can be identified with their column address and IO address. The mis-programmed indices can then be appended to the next index block to be programmed in another block. Their associated meta data can also be appended to the next index block meta-data area.
- the meta data block can be programmed in conventional way, where a read can also be performed after the program to check if the FCC protected page data can correctly be read out.
- FIG. 76A is an exemplary flow for the programming of indices and also their corresponding meta-data to help reduce failure rates.
- the process begins at 7601 with the start of programming of a block. As described above in earlier sections, the indices have been rotated to a vertical orientation and are transferred in to the memory in pages that are then programmed along word lines at 7603 . The process continues until the whole of the block is complete. (As noted above, for shorter indices, this can just be until a corresponding portion of the block on a smaller number of word lines is complete.) A post-write read operation is then performed at 7610 to find any indices that may have been incorrectly programmed.
- the indices have a vertical orientation
- read and write pages are horizontally aligned, so that process proceeds a word line at a time.
- the original data for a word line can be shifted to memory.
- the data as written to the word line is then read back at 7613 and compared at 7615 to the original data to see whether any errors are present.
- the individual errors for the different word lines need not be maintained, but only accumulated individually for each bit line, such as can be done at 7615 by accumulating the errors in an OR process.
- the correct data can be shifted back in to register 1 5505 for each column, the read-back data from the area can be stored into register 0 5503 , and these results can be XOR-ed on a column by column based and the results stored in register 2 5507 , with a “1” representing an error.
- the result can be OR-ed with the result from earlier word lines of the block in register 2 5507 to accumulate an error for the index on that column in the block.
- the data write can instead, or additionally, be checked reading back the data written to each word line twice, where the read margins are shifted and the results of the two reads XOR-ed to determine any errors.
- FIG. 76B where, relative to FIG. 76A , 7611 , 7613 and 7615 are replaced by 7611 ′, 7613 ′, and 7615 ′.
- 7611 ′ a page is read at a first margining level (here margined high). The page is then re-read with a different margining (here low) 7613 ′.
- the result of the post-write read for column verification can then shifted out at 7617 .
- the column address and IO address can be specified at 7619 and dealt with at 7621 by appending any such indices to the next block to be programmed.
- Indices found defect can then also be marked as obsolete, where, for example, one row could be set aside for this purpose and programmed to block defective indices, similar to as described above.
- Meta-data can then be programmed 7630 , where a more typical write process can be used.
- pages of meta data can be written a page at time on to word lines at 7633 .
- Various post-write verifications can be used either along the one or subsequent to completing a block, such as subsequently reading the block out a page at time and checking it according to its ECC, as shown at 7635 .
- index blocks can be checked periodically with margin low and margin high reads to see whether indices have gone, or are going, bad. If any indices are corrupted, that can then be reprogrammed as described.
- FIGS. 77A-D consider several index to meta-data linking cases.
- FIG. 77A is the case of one index per block, where the index block and the meta-data blocks can have a 1 to 1 mapping. In the case of index program error, the meta-data is also be copied.
- FIG. 77B looks at the case of shorted indices, where multiple indices (two, in this example) can be stored along each NAND string.
- each index can be associated with a dedicated meta-block.
- the error of program can be accumulated according to index length.
- one index runs across multiple (here, two) blocks, as would be the case when the index length exceeds the NAND string length.
- the error on index program will be accumulated across the block boundary.
- index runs across multiple blocks, this can also be managed by truncating the index into several segments each saved in one block. This is illustrated in FIG. 77D .
- multiple columns can correspond to one meta-data.
- the error of program can be accumulated across a few column IO addresses.
- the index and metadata will be re-programmed in other locations.
Landscapes
- Engineering & Computer Science (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Read Only Memory (AREA)
Abstract
Description
Claims (26)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/957,219 US8773909B2 (en) | 2012-11-09 | 2013-08-01 | CAM NAND with or function and full chip search capability |
PCT/US2013/068448 WO2014074496A2 (en) | 2012-11-09 | 2013-11-05 | Cam nand with or function and full chip search capability |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261724401P | 2012-11-09 | 2012-11-09 | |
US201261730884P | 2012-11-28 | 2012-11-28 | |
US13/749,361 US8634247B1 (en) | 2012-11-09 | 2013-01-24 | NAND flash based content addressable memory |
US13/756,076 US8811085B2 (en) | 2012-11-09 | 2013-01-31 | On-device data analytics using NAND flash based intelligent memory |
US13/827,407 US8792279B2 (en) | 2012-11-09 | 2013-03-14 | Architectures for data analytics using computational NAND memory |
US13/957,219 US8773909B2 (en) | 2012-11-09 | 2013-08-01 | CAM NAND with or function and full chip search capability |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/827,407 Continuation-In-Part US8792279B2 (en) | 2012-11-09 | 2013-03-14 | Architectures for data analytics using computational NAND memory |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140136763A1 US20140136763A1 (en) | 2014-05-15 |
US8773909B2 true US8773909B2 (en) | 2014-07-08 |
Family
ID=50682857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/957,219 Active US8773909B2 (en) | 2012-11-09 | 2013-08-01 | CAM NAND with or function and full chip search capability |
Country Status (1)
Country | Link |
---|---|
US (1) | US8773909B2 (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190163394A1 (en) * | 2017-11-28 | 2019-05-30 | Advanced Micro Devices, Inc. | Expandable buffer for memory transactions |
US10387303B2 (en) | 2016-08-16 | 2019-08-20 | Western Digital Technologies, Inc. | Non-volatile storage system with compute engine to accelerate big data applications |
US10459644B2 (en) | 2016-10-28 | 2019-10-29 | Western Digital Techologies, Inc. | Non-volatile storage system with integrated compute engine and optimized use of local fast memory |
US10565123B2 (en) | 2017-04-10 | 2020-02-18 | Western Digital Technologies, Inc. | Hybrid logical to physical address translation for non-volatile storage devices with integrated compute module |
US10643119B2 (en) | 2018-07-24 | 2020-05-05 | Sandisk Technologies Llc | Differential non-volatile memory cell for artificial neural network |
US10643705B2 (en) | 2018-07-24 | 2020-05-05 | Sandisk Technologies Llc | Configurable precision neural network with differential binary non-volatile memory cell structure |
US10671370B2 (en) | 2018-05-30 | 2020-06-02 | Red Hat, Inc. | Distributing file system states |
US11170290B2 (en) | 2019-03-28 | 2021-11-09 | Sandisk Technologies Llc | Realization of neural networks with ternary inputs and binary weights in NAND memory arrays |
US11328204B2 (en) | 2018-07-24 | 2022-05-10 | Sandisk Technologies Llc | Realization of binary neural networks in NAND memory arrays |
US11397885B2 (en) | 2020-04-29 | 2022-07-26 | Sandisk Technologies Llc | Vertical mapping and computing for deep neural networks in non-volatile memory |
US11410727B1 (en) | 2021-03-15 | 2022-08-09 | Sandisk Technologies Llc | Scalable search system design with single level cell NAND-based binary and ternary valued content addressable memory cells |
US11544547B2 (en) | 2020-06-22 | 2023-01-03 | Western Digital Technologies, Inc. | Accelerating binary neural networks within latch structure of non-volatile memory devices |
US11568228B2 (en) | 2020-06-23 | 2023-01-31 | Sandisk Technologies Llc | Recurrent neural network inference engine with gated recurrent unit cell and non-volatile memory arrays |
US11568200B2 (en) | 2019-10-15 | 2023-01-31 | Sandisk Technologies Llc | Accelerating sparse matrix multiplication in storage class memory-based convolutional neural network inference |
US11625586B2 (en) | 2019-10-15 | 2023-04-11 | Sandisk Technologies Llc | Realization of neural networks with ternary inputs and ternary weights in NAND memory arrays |
US11657259B2 (en) | 2019-12-20 | 2023-05-23 | Sandisk Technologies Llc | Kernel transformation techniques to reduce power consumption of binary input, binary weight in-memory convolutional neural network inference engine |
US11663471B2 (en) | 2020-06-26 | 2023-05-30 | Sandisk Technologies Llc | Compute-in-memory deep neural network inference engine using low-rank approximation technique |
US11977915B2 (en) | 2020-12-15 | 2024-05-07 | Western Digital Technologies, Inc. | Non-volatile memory with intelligent compute task distribution |
US12079733B2 (en) | 2020-06-23 | 2024-09-03 | Sandisk Technologies Llc | Multi-precision digital compute-in-memory deep neural network engine for flexible and energy efficient inferencing |
US12205008B2 (en) | 2021-05-13 | 2025-01-21 | Sandisk Technologies Llc | Dropout in neutral networks using threshold switching selectors in non-volatile memories |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9298386B2 (en) * | 2013-08-23 | 2016-03-29 | Globalfoundries Inc. | System and method for improved placement of blocks in a deduplication-erasure code environment |
US9047978B2 (en) | 2013-08-26 | 2015-06-02 | Micron Technology, Inc. | Apparatuses and methods for selective row refreshes |
US9859005B2 (en) | 2014-01-12 | 2018-01-02 | Gsi Technology Inc. | Memory device |
KR20170090177A (en) * | 2016-01-28 | 2017-08-07 | 에스케이하이닉스 주식회사 | Memory system, semiconductor memory device and operating method thereof |
US10580475B2 (en) | 2018-01-22 | 2020-03-03 | Micron Technology, Inc. | Apparatuses and methods for calculating row hammer refresh addresses in a semiconductor device |
US11152050B2 (en) | 2018-06-19 | 2021-10-19 | Micron Technology, Inc. | Apparatuses and methods for multiple row hammer refresh address sequences |
US10770127B2 (en) | 2019-02-06 | 2020-09-08 | Micron Technology, Inc. | Apparatuses and methods for managing row access counts |
US11043254B2 (en) | 2019-03-19 | 2021-06-22 | Micron Technology, Inc. | Semiconductor device having cam that stores address signals |
US11264096B2 (en) | 2019-05-14 | 2022-03-01 | Micron Technology, Inc. | Apparatuses, systems, and methods for a content addressable memory cell with latch and comparator circuits |
US11158364B2 (en) | 2019-05-31 | 2021-10-26 | Micron Technology, Inc. | Apparatuses and methods for tracking victim rows |
US11158373B2 (en) * | 2019-06-11 | 2021-10-26 | Micron Technology, Inc. | Apparatuses, systems, and methods for determining extremum numerical values |
US10832789B1 (en) * | 2019-06-13 | 2020-11-10 | Western Digital Technologies, Inc. | System countermeasure for read operation during TLC program suspend causing ADL data reset with XDL data |
US10825526B1 (en) | 2019-06-24 | 2020-11-03 | Sandisk Technologies Llc | Non-volatile memory with reduced data cache buffer |
US10811082B1 (en) * | 2019-06-24 | 2020-10-20 | Sandisk Technologies Llc | Non-volatile memory with fast data cache transfer scheme |
US11139015B2 (en) | 2019-07-01 | 2021-10-05 | Micron Technology, Inc. | Apparatuses and methods for monitoring word line accesses |
US10832792B1 (en) | 2019-07-01 | 2020-11-10 | Micron Technology, Inc. | Apparatuses and methods for adjusting victim data |
US11386946B2 (en) | 2019-07-16 | 2022-07-12 | Micron Technology, Inc. | Apparatuses and methods for tracking row accesses |
US10943636B1 (en) | 2019-08-20 | 2021-03-09 | Micron Technology, Inc. | Apparatuses and methods for analog row access tracking |
US10964378B2 (en) | 2019-08-22 | 2021-03-30 | Micron Technology, Inc. | Apparatus and method including analog accumulator for determining row access rate and target row address used for refresh operation |
US11200942B2 (en) | 2019-08-23 | 2021-12-14 | Micron Technology, Inc. | Apparatuses and methods for lossy row access counting |
US11442658B1 (en) * | 2020-07-12 | 2022-09-13 | Lightbits Labs Ltd. | System and method for selecting a write unit size for a block storage device |
US11222682B1 (en) | 2020-08-31 | 2022-01-11 | Micron Technology, Inc. | Apparatuses and methods for providing refresh addresses |
US11462291B2 (en) | 2020-11-23 | 2022-10-04 | Micron Technology, Inc. | Apparatuses and methods for tracking word line accesses |
US11482275B2 (en) | 2021-01-20 | 2022-10-25 | Micron Technology, Inc. | Apparatuses and methods for dynamically allocated aggressor detection |
US11600314B2 (en) | 2021-03-15 | 2023-03-07 | Micron Technology, Inc. | Apparatuses and methods for sketch circuits for refresh binning |
US11664063B2 (en) | 2021-08-12 | 2023-05-30 | Micron Technology, Inc. | Apparatuses and methods for countering memory attacks |
US11688451B2 (en) | 2021-11-29 | 2023-06-27 | Micron Technology, Inc. | Apparatuses, systems, and methods for main sketch and slim sketch circuit for row address tracking |
US12165687B2 (en) | 2021-12-29 | 2024-12-10 | Micron Technology, Inc. | Apparatuses and methods for row hammer counter mat |
Citations (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5602789A (en) | 1991-03-12 | 1997-02-11 | Kabushiki Kaisha Toshiba | Electrically erasable and programmable non-volatile and multi-level memory systemn with write-verify controller |
US5642322A (en) | 1995-05-24 | 1997-06-24 | Kawasaki Steel Corporation | Layout of semiconductor memory and content-addressable memory |
US6157558A (en) | 1999-05-21 | 2000-12-05 | Sandisk Corporation | Content addressable memory cell and array architectures having low transistor counts |
US6166938A (en) | 1999-05-21 | 2000-12-26 | Sandisk Corporation | Data encoding for content addressable memories |
US20010010057A1 (en) | 1997-06-24 | 2001-07-26 | Matsushita Electronics Corporation | Semiconductor integrated circuit, computer system, data processor and data processing method |
US6317349B1 (en) | 1999-04-16 | 2001-11-13 | Sandisk Corporation | Non-volatile content addressable memory |
US20020171652A1 (en) | 2001-05-15 | 2002-11-21 | Perego Richard E. | Scalable unified memory architecture |
US20030007408A1 (en) | 2001-02-08 | 2003-01-09 | Integrated Device Technology, Inc. | Cam circuit with error correction |
US20030012063A1 (en) | 2001-06-21 | 2003-01-16 | Pien Chien | Content addressable memory cell |
US20030018868A1 (en) | 2001-07-19 | 2003-01-23 | Chung Shine C. | Method and apparatus for using smart memories in computing |
US20030117851A1 (en) | 2001-12-24 | 2003-06-26 | Samsung Electronics Co., Ltd. | NAND-type flash memory device with multi-page program, multi-page read, multi-block erase operations |
US20030163509A1 (en) | 2002-02-25 | 2003-08-28 | International Business Machines Corporation | Method and apparatus for cooperative distributed task management in a storage subsystem with multiple controllers using cache locking |
US20040124466A1 (en) * | 2002-12-31 | 2004-07-01 | Walker Andrew J. | Method for fabricating programmable memory array structures incorporating series-connected transistor strings |
US20040125629A1 (en) * | 2002-12-31 | 2004-07-01 | Scheuerlein Roy E. | Programmable memory array structure incorporating series-connected transistor strings and methods for fabrication and operation of same |
US20040240484A1 (en) | 2002-01-14 | 2004-12-02 | Argyres Dimitri C. | Transposing of bits in input data to form a comparand within a content addressable memory |
US20050078514A1 (en) | 2003-09-30 | 2005-04-14 | Scheuerlein Roy E. | Multiple twin cell non-volatile memory array and logic block structure and method therefor |
US20050141387A1 (en) | 2003-12-31 | 2005-06-30 | Raul-Adrian Cernea | Flexible and area efficient column redundancy for non-volatile memories |
US6970988B1 (en) | 2001-07-19 | 2005-11-29 | Chung Shine C | Algorithm mapping, specialized instructions and architecture features for smart memory computing |
EP1720168A1 (en) | 2005-04-27 | 2006-11-08 | Samsung Electronics Co., Ltd. | Integrated circuit device, flash memory array, nonvolatile memory device and operating method |
US20070047314A1 (en) * | 2005-08-31 | 2007-03-01 | Micron Technology, Inc. | Programming method for NAND EEPROM |
US20070058407A1 (en) | 2005-09-12 | 2007-03-15 | Renesas Technology Corp. | Semiconductor memory device |
US7206230B2 (en) | 2005-04-01 | 2007-04-17 | Sandisk Corporation | Use of data latches in cache operations of non-volatile memories |
US20070140012A1 (en) * | 2005-12-20 | 2007-06-21 | Micron Technology, Inc. | NAND architecture memory devices and operation |
US20070189073A1 (en) * | 2006-02-16 | 2007-08-16 | Micron Technology, Inc. | Programming method to reduce gate coupling interference for non-volatile memory |
US20070236990A1 (en) * | 2006-03-28 | 2007-10-11 | Micron Technology, Inc. | Programming method to reduce word line to word line breakdown for NAND flash |
US20070263462A1 (en) * | 2006-05-11 | 2007-11-15 | Micron Technology, Inc. | NAND architecture memory devices and operation |
US20070291542A1 (en) * | 2006-06-14 | 2007-12-20 | Micron Technology, Inc. | Programming method for NAND flash |
US20080031044A1 (en) * | 2006-08-04 | 2008-02-07 | Micron Technology, Inc. | Memory device architectures and operation |
US20080062763A1 (en) | 2006-09-13 | 2008-03-13 | Park Ki-Tae | Multi-bit flash memory device and memory cell array |
US20080158989A1 (en) | 2006-12-28 | 2008-07-03 | Jun Wan | Retention margin program verification |
US7403421B2 (en) * | 2002-01-18 | 2008-07-22 | Sandisk Corporation | Noise reduction technique for transistors and small devices utilizing an episodic agitation |
US20080266957A1 (en) | 2006-03-24 | 2008-10-30 | Farookh Moogat | Method for Column Redundancy Using Data Latches in Solid-State Memories |
EP1988474A1 (en) | 2007-05-04 | 2008-11-05 | Axalto SA | System and method of managing indexation of flash memory |
US20090141566A1 (en) | 2007-12-03 | 2009-06-04 | International Business Machines Corporation | Structure for implementing memory array device with built in computation capability |
US20090190404A1 (en) | 2008-01-25 | 2009-07-30 | Roohparvar Frankie F | Nand flash content addressable memory |
US20090254694A1 (en) | 2008-04-02 | 2009-10-08 | Zikbit Ltd. | Memory device with integrated parallel processing |
US20090303767A1 (en) | 2008-04-02 | 2009-12-10 | Avidan Akerib | System, method and apparatus for memory with embedded associative section for computations |
US20100329007A1 (en) | 2009-06-24 | 2010-12-30 | Hardwell Chibvongodze | Pointer Based Column Selection Techniques in Non-Volatile Memories |
US20110002169A1 (en) | 2009-07-06 | 2011-01-06 | Yan Li | Bad Column Management with Bit Information in Non-Volatile Memory Systems |
WO2011007304A1 (en) | 2009-07-16 | 2011-01-20 | Zikbit Ltd. | Using storage cells to perform computation |
US20110051485A1 (en) | 2009-08-28 | 2011-03-03 | International Business Machines Corporation | Content addressable memory array writing |
US20110096601A1 (en) | 2009-10-28 | 2011-04-28 | Gavens Lee M | Non-Volatile Memory And Method With Accelerated Post-Write Read To Manage Errors |
US20110096607A1 (en) | 2008-09-22 | 2011-04-28 | Micron Technology, Inc. | Programming a memory device to increase data reliability |
US20110103153A1 (en) | 2009-11-02 | 2011-05-05 | Kabushiki Kaisha Toshiba | Nonvolatile semiconductor memory device and method for driving same |
US20110134676A1 (en) | 2009-12-04 | 2011-06-09 | International Business Machines Corporation | Resistive memory devices having a not-and (nand) structure |
US20120005419A1 (en) | 2010-07-02 | 2012-01-05 | Futurewei Technologies, Inc. | System Architecture For Integrated Hierarchical Query Processing For Key/Value Stores |
US8102705B2 (en) | 2009-06-05 | 2012-01-24 | Sandisk Technologies Inc. | Structure and method for shuffling data within non-volatile memory devices |
US20120102298A1 (en) | 2010-10-20 | 2012-04-26 | Microsoft Corporation | Low RAM Space, High-Throughput Persistent Key-Value Store using Secondary Memory |
US20120250424A1 (en) | 2011-03-31 | 2012-10-04 | Kabushiki Kaisha Toshiba | Semiconductor memory device |
US20130028021A1 (en) | 2011-07-28 | 2013-01-31 | Eran Sharon | Simultaneous Sensing of Multiple Wordlines and Detection of NAND Failures |
US20130042055A1 (en) | 2011-08-08 | 2013-02-14 | Atsuhiro Kinoshita | Memory system including key-value store |
-
2013
- 2013-08-01 US US13/957,219 patent/US8773909B2/en active Active
Patent Citations (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5602789A (en) | 1991-03-12 | 1997-02-11 | Kabushiki Kaisha Toshiba | Electrically erasable and programmable non-volatile and multi-level memory systemn with write-verify controller |
US5642322A (en) | 1995-05-24 | 1997-06-24 | Kawasaki Steel Corporation | Layout of semiconductor memory and content-addressable memory |
US20010010057A1 (en) | 1997-06-24 | 2001-07-26 | Matsushita Electronics Corporation | Semiconductor integrated circuit, computer system, data processor and data processing method |
US6317349B1 (en) | 1999-04-16 | 2001-11-13 | Sandisk Corporation | Non-volatile content addressable memory |
US6157558A (en) | 1999-05-21 | 2000-12-05 | Sandisk Corporation | Content addressable memory cell and array architectures having low transistor counts |
US6166938A (en) | 1999-05-21 | 2000-12-26 | Sandisk Corporation | Data encoding for content addressable memories |
US20030007408A1 (en) | 2001-02-08 | 2003-01-09 | Integrated Device Technology, Inc. | Cam circuit with error correction |
US20020171652A1 (en) | 2001-05-15 | 2002-11-21 | Perego Richard E. | Scalable unified memory architecture |
US20030012063A1 (en) | 2001-06-21 | 2003-01-16 | Pien Chien | Content addressable memory cell |
US20030018868A1 (en) | 2001-07-19 | 2003-01-23 | Chung Shine C. | Method and apparatus for using smart memories in computing |
US6970988B1 (en) | 2001-07-19 | 2005-11-29 | Chung Shine C | Algorithm mapping, specialized instructions and architecture features for smart memory computing |
US20030117851A1 (en) | 2001-12-24 | 2003-06-26 | Samsung Electronics Co., Ltd. | NAND-type flash memory device with multi-page program, multi-page read, multi-block erase operations |
US7412561B2 (en) | 2002-01-14 | 2008-08-12 | Netlogic Microsystems, Inc. | Transposing of bits in input data to form a comparand within a content addressable memory |
US20040240484A1 (en) | 2002-01-14 | 2004-12-02 | Argyres Dimitri C. | Transposing of bits in input data to form a comparand within a content addressable memory |
US7237058B2 (en) | 2002-01-14 | 2007-06-26 | Netlogic Microsystems, Inc. | Input data selection for content addressable memory |
US7403421B2 (en) * | 2002-01-18 | 2008-07-22 | Sandisk Corporation | Noise reduction technique for transistors and small devices utilizing an episodic agitation |
US20030163509A1 (en) | 2002-02-25 | 2003-08-28 | International Business Machines Corporation | Method and apparatus for cooperative distributed task management in a storage subsystem with multiple controllers using cache locking |
US7005350B2 (en) * | 2002-12-31 | 2006-02-28 | Matrix Semiconductor, Inc. | Method for fabricating programmable memory array structures incorporating series-connected transistor strings |
US20040124466A1 (en) * | 2002-12-31 | 2004-07-01 | Walker Andrew J. | Method for fabricating programmable memory array structures incorporating series-connected transistor strings |
US7505321B2 (en) * | 2002-12-31 | 2009-03-17 | Sandisk 3D Llc | Programmable memory array structure incorporating series-connected transistor strings and methods for fabrication and operation of same |
US20040125629A1 (en) * | 2002-12-31 | 2004-07-01 | Scheuerlein Roy E. | Programmable memory array structure incorporating series-connected transistor strings and methods for fabrication and operation of same |
US20050078514A1 (en) | 2003-09-30 | 2005-04-14 | Scheuerlein Roy E. | Multiple twin cell non-volatile memory array and logic block structure and method therefor |
US20050141387A1 (en) | 2003-12-31 | 2005-06-30 | Raul-Adrian Cernea | Flexible and area efficient column redundancy for non-volatile memories |
US7206230B2 (en) | 2005-04-01 | 2007-04-17 | Sandisk Corporation | Use of data latches in cache operations of non-volatile memories |
EP1720168A1 (en) | 2005-04-27 | 2006-11-08 | Samsung Electronics Co., Ltd. | Integrated circuit device, flash memory array, nonvolatile memory device and operating method |
US7292476B2 (en) * | 2005-08-31 | 2007-11-06 | Micron Technology, Inc. | Programming method for NAND EEPROM |
US20070047314A1 (en) * | 2005-08-31 | 2007-03-01 | Micron Technology, Inc. | Programming method for NAND EEPROM |
US20070058407A1 (en) | 2005-09-12 | 2007-03-15 | Renesas Technology Corp. | Semiconductor memory device |
US7489546B2 (en) * | 2005-12-20 | 2009-02-10 | Micron Technology, Inc. | NAND architecture memory devices and operation |
US7746700B2 (en) * | 2005-12-20 | 2010-06-29 | Micron Technology, Inc. | NAND architecture memory devices and operation |
US20070140012A1 (en) * | 2005-12-20 | 2007-06-21 | Micron Technology, Inc. | NAND architecture memory devices and operation |
US7400532B2 (en) * | 2006-02-16 | 2008-07-15 | Micron Technology, Inc. | Programming method to reduce gate coupling interference for non-volatile memory |
US20070189073A1 (en) * | 2006-02-16 | 2007-08-16 | Micron Technology, Inc. | Programming method to reduce gate coupling interference for non-volatile memory |
US20080266957A1 (en) | 2006-03-24 | 2008-10-30 | Farookh Moogat | Method for Column Redundancy Using Data Latches in Solid-State Memories |
US20070236990A1 (en) * | 2006-03-28 | 2007-10-11 | Micron Technology, Inc. | Programming method to reduce word line to word line breakdown for NAND flash |
US7450422B2 (en) * | 2006-05-11 | 2008-11-11 | Micron Technology, Inc. | NAND architecture memory devices and operation |
US20070263462A1 (en) * | 2006-05-11 | 2007-11-15 | Micron Technology, Inc. | NAND architecture memory devices and operation |
US20070291542A1 (en) * | 2006-06-14 | 2007-12-20 | Micron Technology, Inc. | Programming method for NAND flash |
US20080031044A1 (en) * | 2006-08-04 | 2008-02-07 | Micron Technology, Inc. | Memory device architectures and operation |
US20080062763A1 (en) | 2006-09-13 | 2008-03-13 | Park Ki-Tae | Multi-bit flash memory device and memory cell array |
US20080158989A1 (en) | 2006-12-28 | 2008-07-03 | Jun Wan | Retention margin program verification |
EP1988474A1 (en) | 2007-05-04 | 2008-11-05 | Axalto SA | System and method of managing indexation of flash memory |
US20090141566A1 (en) | 2007-12-03 | 2009-06-04 | International Business Machines Corporation | Structure for implementing memory array device with built in computation capability |
US20090190404A1 (en) | 2008-01-25 | 2009-07-30 | Roohparvar Frankie F | Nand flash content addressable memory |
US20090254694A1 (en) | 2008-04-02 | 2009-10-08 | Zikbit Ltd. | Memory device with integrated parallel processing |
US20090303767A1 (en) | 2008-04-02 | 2009-12-10 | Avidan Akerib | System, method and apparatus for memory with embedded associative section for computations |
US20110096607A1 (en) | 2008-09-22 | 2011-04-28 | Micron Technology, Inc. | Programming a memory device to increase data reliability |
US8102705B2 (en) | 2009-06-05 | 2012-01-24 | Sandisk Technologies Inc. | Structure and method for shuffling data within non-volatile memory devices |
US20100329007A1 (en) | 2009-06-24 | 2010-12-30 | Hardwell Chibvongodze | Pointer Based Column Selection Techniques in Non-Volatile Memories |
US20110002169A1 (en) | 2009-07-06 | 2011-01-06 | Yan Li | Bad Column Management with Bit Information in Non-Volatile Memory Systems |
WO2011007304A1 (en) | 2009-07-16 | 2011-01-20 | Zikbit Ltd. | Using storage cells to perform computation |
US20110051485A1 (en) | 2009-08-28 | 2011-03-03 | International Business Machines Corporation | Content addressable memory array writing |
US20110096601A1 (en) | 2009-10-28 | 2011-04-28 | Gavens Lee M | Non-Volatile Memory And Method With Accelerated Post-Write Read To Manage Errors |
US20110103153A1 (en) | 2009-11-02 | 2011-05-05 | Kabushiki Kaisha Toshiba | Nonvolatile semiconductor memory device and method for driving same |
US20110134676A1 (en) | 2009-12-04 | 2011-06-09 | International Business Machines Corporation | Resistive memory devices having a not-and (nand) structure |
US20120005419A1 (en) | 2010-07-02 | 2012-01-05 | Futurewei Technologies, Inc. | System Architecture For Integrated Hierarchical Query Processing For Key/Value Stores |
US20120102298A1 (en) | 2010-10-20 | 2012-04-26 | Microsoft Corporation | Low RAM Space, High-Throughput Persistent Key-Value Store using Secondary Memory |
US20120250424A1 (en) | 2011-03-31 | 2012-10-04 | Kabushiki Kaisha Toshiba | Semiconductor memory device |
US20130028021A1 (en) | 2011-07-28 | 2013-01-31 | Eran Sharon | Simultaneous Sensing of Multiple Wordlines and Detection of NAND Failures |
US20130042055A1 (en) | 2011-08-08 | 2013-02-14 | Atsuhiro Kinoshita | Memory system including key-value store |
Non-Patent Citations (12)
Title |
---|
Black, Jr., et al., "A High Performance Low Power CMOS Channel Filter," IEEE Journal of Solid-State Circuits, vol. SC-15, No. 6, Dec. 1980, pp. 929-938. |
Communication Relating to the Results of the Partial International Search, Int'l Appl. No. PCT/US2013/068448 mailed Feb. 26, 2014, 1 page. |
Lu et al., Bloomstore: Bloom Filter Based Memory-Efficient Key-Value Store for Indexing of Data Deduplication on Flash, Mass Storage Systems and Technologies, Apr. 16, 2012, IEEE 28th Symposium, pp. 1-11. |
Maeda et al., "Multi-Stacked 1G Cell-Layer Pipe-Shaped BiCS Flash Memory," 2009 Symposium on VLSI Circuits, pp. 22-23. |
Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration for Int'l Appl. No. PCT/US2013/068448 mailed May 16, 2014, 19 pages. |
U.S. Appl. No. 13/420,961 entitled Techniques for Accessing Column Selecting Shift Register with Skipped Entries in Non-Volatile Memories, filed Mar. 15, 2012, 52 pages. |
U.S. Appl. No. 13/463,422, entitled Column Redundancy Circuitry for Non-Volatile Memory, filed May 3, 2012, 50 pages. |
U.S. Appl. No. 13/756,076 entitled "On-Device Data Analytics Using NAND Flash Based Intelligent Memory," filed Jan. 31, 2013, 67 pages. |
U.S. Appl. No. 13/794,398, entitled De-Duplication Techniques Using NAND Flash Based Content Addressable Memory, filed Mar. 11, 2013, 80 pages. |
U.S. Appl. No. 13/794,428 entitled "De-Duplication System Using NAND Flash Based Content Addressable Memory," filed Mar. 11, 2013, 80 pages. |
U.S. Appl. No. 61/713,038, entitled "Use of High Endurance Non-Volatile Memory for Read Accleration," filed Oct. 12, 2012, 93 pages. |
Wei et al., "DBA: A Dynamic Bloom Filter Array for Scalable Membership Representation of Variable Large Data Sets," Jul. 25-27, 2011, IEEE 19th Annual International Symposium of Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Mascots 2011), pp. 466-468. |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10387303B2 (en) | 2016-08-16 | 2019-08-20 | Western Digital Technologies, Inc. | Non-volatile storage system with compute engine to accelerate big data applications |
US10459644B2 (en) | 2016-10-28 | 2019-10-29 | Western Digital Techologies, Inc. | Non-volatile storage system with integrated compute engine and optimized use of local fast memory |
US10565123B2 (en) | 2017-04-10 | 2020-02-18 | Western Digital Technologies, Inc. | Hybrid logical to physical address translation for non-volatile storage devices with integrated compute module |
US20190163394A1 (en) * | 2017-11-28 | 2019-05-30 | Advanced Micro Devices, Inc. | Expandable buffer for memory transactions |
US10740029B2 (en) * | 2017-11-28 | 2020-08-11 | Advanced Micro Devices, Inc. | Expandable buffer for memory transactions |
US10671370B2 (en) | 2018-05-30 | 2020-06-02 | Red Hat, Inc. | Distributing file system states |
US10643119B2 (en) | 2018-07-24 | 2020-05-05 | Sandisk Technologies Llc | Differential non-volatile memory cell for artificial neural network |
US10643705B2 (en) | 2018-07-24 | 2020-05-05 | Sandisk Technologies Llc | Configurable precision neural network with differential binary non-volatile memory cell structure |
US11328204B2 (en) | 2018-07-24 | 2022-05-10 | Sandisk Technologies Llc | Realization of binary neural networks in NAND memory arrays |
US11170290B2 (en) | 2019-03-28 | 2021-11-09 | Sandisk Technologies Llc | Realization of neural networks with ternary inputs and binary weights in NAND memory arrays |
US11568200B2 (en) | 2019-10-15 | 2023-01-31 | Sandisk Technologies Llc | Accelerating sparse matrix multiplication in storage class memory-based convolutional neural network inference |
US11625586B2 (en) | 2019-10-15 | 2023-04-11 | Sandisk Technologies Llc | Realization of neural networks with ternary inputs and ternary weights in NAND memory arrays |
US11657259B2 (en) | 2019-12-20 | 2023-05-23 | Sandisk Technologies Llc | Kernel transformation techniques to reduce power consumption of binary input, binary weight in-memory convolutional neural network inference engine |
US11397886B2 (en) | 2020-04-29 | 2022-07-26 | Sandisk Technologies Llc | Vertical mapping and computing for deep neural networks in non-volatile memory |
US11397885B2 (en) | 2020-04-29 | 2022-07-26 | Sandisk Technologies Llc | Vertical mapping and computing for deep neural networks in non-volatile memory |
US11544547B2 (en) | 2020-06-22 | 2023-01-03 | Western Digital Technologies, Inc. | Accelerating binary neural networks within latch structure of non-volatile memory devices |
US11568228B2 (en) | 2020-06-23 | 2023-01-31 | Sandisk Technologies Llc | Recurrent neural network inference engine with gated recurrent unit cell and non-volatile memory arrays |
US12079733B2 (en) | 2020-06-23 | 2024-09-03 | Sandisk Technologies Llc | Multi-precision digital compute-in-memory deep neural network engine for flexible and energy efficient inferencing |
US11663471B2 (en) | 2020-06-26 | 2023-05-30 | Sandisk Technologies Llc | Compute-in-memory deep neural network inference engine using low-rank approximation technique |
US11977915B2 (en) | 2020-12-15 | 2024-05-07 | Western Digital Technologies, Inc. | Non-volatile memory with intelligent compute task distribution |
US11410727B1 (en) | 2021-03-15 | 2022-08-09 | Sandisk Technologies Llc | Scalable search system design with single level cell NAND-based binary and ternary valued content addressable memory cells |
US12205008B2 (en) | 2021-05-13 | 2025-01-21 | Sandisk Technologies Llc | Dropout in neutral networks using threshold switching selectors in non-volatile memories |
Also Published As
Publication number | Publication date |
---|---|
US20140136763A1 (en) | 2014-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8773909B2 (en) | CAM NAND with or function and full chip search capability | |
US8780635B2 (en) | Use of bloom filter and improved program algorithm for increased data protection in CAM NAND memory | |
US8780634B2 (en) | CAM NAND with OR function and full chip search capability | |
US8792279B2 (en) | Architectures for data analytics using computational NAND memory | |
US8634248B1 (en) | On-device data analytics using NAND flash based intelligent memory | |
US8811085B2 (en) | On-device data analytics using NAND flash based intelligent memory | |
US8780632B2 (en) | De-duplication techniques using NAND flash based content addressable memory | |
US8780633B2 (en) | De-duplication system using NAND flash based content addressable memory | |
US8817541B2 (en) | Data search using bloom filters and NAND based content addressable memory | |
US9098403B2 (en) | NAND flash based content addressable memory | |
US9548120B2 (en) | Content addressable memory | |
TW201308074A (en) | Non-volatile memory and method having block management with hot/cold data sorting | |
US12142318B2 (en) | Redundancy and majority voting in a key-value data storage system using content addressable memory | |
US11756619B2 (en) | Key storage for sorted string tables using content addressable memory | |
WO2014074496A2 (en) | Cam nand with or function and full chip search capability | |
US11955175B2 (en) | Copy redundancy in a key-value data storage system using content addressable memory | |
WO2014074483A2 (en) | On-device data analytics using nand flash based intelligent memory | |
TW201432692A (en) | De-duplication system and techniques using NAND flash based content addressable memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SANDISK TECHNOLOGIES INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, YAN;SPROUSE, STEVEN T;REEL/FRAME:030978/0372 Effective date: 20130731 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: SANDISK TECHNOLOGIES LLC, TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:SANDISK TECHNOLOGIES INC;REEL/FRAME:038807/0948 Effective date: 20160516 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
AS | Assignment |
Owner name: SANDISK TECHNOLOGIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SANDISK TECHNOLOGIES LLC;REEL/FRAME:069796/0423 Effective date: 20241227 |
|
AS | Assignment |
Owner name: SANDISK TECHNOLOGIES, INC., CALIFORNIA Free format text: PARTIAL RELEASE OF SECURITY INTERESTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A., AS AGENT;REEL/FRAME:071382/0001 Effective date: 20250424 Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, ILLINOIS Free format text: SECURITY AGREEMENT;ASSIGNOR:SANDISK TECHNOLOGIES, INC.;REEL/FRAME:071050/0001 Effective date: 20250424 |