CN118351926B

CN118351926B - Fault testing equipment and method for memory chip

Info

Publication number: CN118351926B
Application number: CN202410785619.1A
Authority: CN
Inventors: 夏俊杰; 林华胜; 顾红伟
Original assignee: Shenzhen Chaoying Intelligent Technology Co ltd
Current assignee: Shenzhen Chaoying Intelligent Technology Co ltd
Priority date: 2024-06-18
Filing date: 2024-06-18
Publication date: 2024-08-16
Anticipated expiration: 2044-06-18
Also published as: CN118351926A

Abstract

The invention provides a fault test device and method of a memory chip, relating to the technical field of chip test, wherein the device comprises: the system comprises a top layer controller, a multi-dimensional test data bucket, a fault self-diagnosis module, an address remapping repair module and a content addressable memory; the top controller is respectively connected with the fault self-diagnosis module, the address remapping repair module and the content addressable memory; the address remapping repair module is connected with the content addressable memory; the fault self-diagnosis module is respectively connected with the multi-dimensional test data bucket and the content addressable memory; the controller of the memory chip is connected with the top controller, the fault self-diagnosis module extracts test data from the multi-dimensional test data barrel according to the test instruction to test the memory chip, and the fault position is output; the content addressable memory is used for storing the fault location; and after the test is finished, the address remapping repair module extracts a fault position in the content addressable memory to repair the fault of the memory chip.

Description

Fault testing equipment and method for memory chip

Technical Field

The invention relates to the technical field of chip testing, in particular to a fault testing device and method for a memory chip.

Background

A memory chip is an electronic device for storing data, and is widely used in various electronic products such as computers, mobile phones, digital cameras, etc. These chips may be volatile, such as Dynamic Random Access Memory (DRAM) and Static Random Access Memory (SRAM), which lose stored information when powered down, or non-volatile, such as flash memory, which keeps data from losing even in the event of a power failure.

The fault test of the memory chip is an indispensable ring in the manufacturing process of electronic equipment, and ensures that the chip can reliably store and process data, thereby improving the overall quality of products and user experience. These tests help to discover and repair potential defects, reduce post-maintenance costs, reduce warranty claims, and also facilitate optimization of the design and manufacturing process.

However, the existing memory chip test equipment and test flow often do not consider the format of test data in the test process, fault test is carried out by repeatedly injecting a large amount of repeated data into the memory chip, and the data are repeatedly injected to all memory cells indiscriminately in the test process, and in the repeated test process, the erasing times of the normal memory cells are consumed in a large amount, so that the test efficiency is low, and a large amount of memory chip life is consumed.

Disclosure of Invention

In order to solve the technical problems that the prior memory chip test equipment and test flow in the prior art often do not consider the format of test data in the test process, the fault test is carried out by repeatedly injecting a large amount of repeated data into the memory chip, and the repeated data are injected into all memory units indiscriminately for many times in the test process, and the erasing times of the normal memory units are consumed in a large amount in the repeated test process, so that the test efficiency is low and the service life of a large amount of memory chips is consumed.

The technical scheme provided by the embodiment of the invention is as follows:

first aspect

The fault test equipment for the memory chip provided by the embodiment of the invention comprises the following components:

the system comprises a top layer controller, a multi-dimensional test data bucket, a fault self-diagnosis module, an address remapping repair module and a content addressable memory;

the top-level controller is respectively connected with the fault self-diagnosis module, the address remapping repair module and the content addressable memory;

the address remapping repair module is connected with the content addressable memory;

the fault self-diagnosis module is respectively connected with the multi-dimensional test data bucket and the content addressable memory;

The controller of the memory chip is connected with the top layer controller, the top layer controller sends a test instruction to the fault self-diagnosis module, the fault self-diagnosis module extracts test data from the multi-dimensional test data barrel according to the test instruction to test the memory chip, and a fault position is output;

the content addressable memory is used for storing the fault location;

and after the test is finished, the address remapping repair module extracts the fault position in the content addressable memory to repair the fault of the memory chip.

In the failure test apparatus for a memory chip, preferably, the failure test apparatus further includes: a motherboard and a power module; the top-level controller, the multi-dimensional test data bucket, the fault self-diagnosis module, the address remapping repair module and the content addressable memory are all distributed on the motherboard; the power module is connected with the motherboard to supply power to the fault test equipment.

Second aspect

The fault testing method of the memory chip provided by the embodiment of the invention is applied to the fault testing equipment according to the first aspect, and comprises the following steps:

S1: obtaining redundant resources of the memory chip, wherein the redundant resources are non-fixed memory resources dynamically started by the memory chip;

s2: extracting target test data from the multi-dimensional test data bucket, wherein the target test data comprises a fault primary screening layer, an interference fault detection layer and a coupling fault detection layer;

S3: performing cyclic test on the memory chip by using the target test data, and outputting a fault position;

S4: establishing a fault repairing linear programming model by taking the redundant resources as constraints in combination with the fault positions, wherein the fault repairing linear programming model comprises the repairing quantity of fault units;

S5: taking the maximum number of fault unit repair as a target, and optimizing the fault repair linear programming model by combining a greedy algorithm and a local search strategy to output an optimal repair strategy;

s6: repairing the memory chip according to the optimal repairing strategy;

S7: and outputting unrepaired fault positions in the optimal repair strategy to finish the test of the memory chip.

In the fault testing method of a memory chip, preferably, the fault primary screening layer includes zero-data with the same number, the interference fault detection layer includes checkerboard data and inverted checkerboard data, and the coupling fault detection layer includes line flip data and pseudo-random data.

In the fault test method of a memory chip, preferably, the step S3 specifically includes:

S301: acquiring an addressable storage unit of the storage chip;

S302: sequentially inputting all zero data and all one data in the fault primary screening layer to the storage chip through ATE equipment by taking covering the addressable storage unit as a target, and recording a primary test result, wherein the primary test result comprises a storage unit with failed reading and writing;

S303: performing cluster analysis based on K-means cluster calculation on the primary test result to obtain a fault area;

S304: sequentially inputting the data of the interference fault detection layer and the coupling fault detection layer into the fault area through the ATE equipment to carry out secondary screening, and recording a secondary test result, wherein the secondary test result comprises fault positions of each storage unit in the fault area, which have read-write faults under the same test data and different test data;

S305: and outputting the fault position.

In the fault test method of a memory chip, preferably, the step S303 specifically includes:

S303A: randomly selecting a plurality of storage units with failed reading and writing from the fault area as an initial clustering center;

S303B: calculating the Euclidean distance from each storage unit with failed reading and writing to each initial clustering center:

；

Wherein d represents the Euclidean distance, The physical address of the memory cell u indicating a read-write failure,Representing the coordinates of the i-th initial cluster center,K represents the total number of initial cluster centers;

S303C: distributing the storage units with failed reading and writing to an initial cluster center with minimum Euclidean distance;

S303D: taking the average value of all coordinate points in the cluster, and taking the average value as a new initial cluster center;

S303E: the iteration number is increased by 1, and the step S303B is returned until the initial cluster center is not changed or the preset iteration number is reached.

In the fault test method of a memory chip, preferably, the fault repair linear programming model specifically includes:

；

wherein M represents the repair number of the fault unit, Representing a binary indicating variable when a fault unit of an ith row and a jth column of the memory chip has a fault=1, Otherwise,=0，，Wherein m and n respectively represent the maximum number of rows and the maximum number of columns of the memory chip,Representing a binary auxiliary variable when the faulty cell of the ith row and jth column is repaired=1, Otherwise,=0，A binary decision variable indicating whether or not to select the ith spare row in the redundant resource,And (3) a binary decision variable which indicates whether the j-th standby column in the redundant resource is selected, wherein R indicates the maximum number of rows of the redundant resource, and C indicates the maximum number of columns of the redundant resource.

In the fault test method of a memory chip, preferably, the step S5 specifically includes:

s501: counting the total number of faults of each row and each column in the memory chip;

s502: assigning priorities in direct proportion to the total number of faults to each fault unit;

S503: setting a greedy selection target of the greedy algorithm to repair the fault units one by one from the highest priority to the lowest priority, formulating a preliminary repair strategy, and obtaining the maximum value of the repair quantity of the fault units of the preliminary repair strategy:

；

Wherein max represents taking the maximum value;

s504: performing local change operation of the neighborhood structure on the preliminary repair strategy, wherein the change operation comprises an adding operation, a removing operation and a replacing operation;

S505: calculating the repair quantity of the fault units after the change operation:

；

wherein, Indicating the number of failed cell repairs after the change,Indicating the number of failed cells of the changed kth row,A set of change rows is represented and,Indicating the number of failed cells in the changed first column,Represents a change column set, O represents the number of overlapping failed cells,Intersection binary indicating variable representing kth row and kth column, if intersection is a failure cell, then=1, Otherwise,=0；

S506: repeating S504-S505 until the rows and columns in the memory chip are covered, namely, the changed row set and the changed column set are executed completely;

s507: taking the maximum value of the repair quantity of the changed fault units, and recording the change repair strategy at the moment;

s508: taking a strategy corresponding to the maximum value of the repair quantity of the fault units as an optimal recovery strategy:

；

wherein, Indicating the maximum value of the number of failed cell repairs after modification.

In the fault test method of a memory chip, preferably, the step S6 specifically includes: and according to the optimal repair strategy, mapping the fault position corresponding to the fault unit to a spare row or a spare column of the redundant resource by modifying the address decoding logic of the memory chip, so as to finish the repair of the memory chip.

In the fault test method of a memory chip, preferably, after S7, the fault test method further includes:

and sending out early warning under the condition that the number of faults corresponding to the unrepaired fault positions is larger than the preset number of faults.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

In the invention, a fault test device comprising a top layer controller, a multi-dimensional test data bucket, a fault self-diagnosis module, an address remapping repair module and a content addressable memory is constructed, and the fault primary screening layer, the interference fault detection layer and the coupling fault detection layer stored in the multi-dimensional test data bucket are utilized to perform ordered layered test. In the test process, redundant resources of the memory chip are fully utilized, layered test data are utilized to gradually reduce the range of the tested memory chip, the maximum number of fault unit repairs is taken as a target, a greedy algorithm and a local search strategy are combined to optimize the fault repair linear programming model, unrepaired fault positions are marked to evaluate the availability condition of the memory chip, the content addressable memory and the address remapping repair module can be utilized to rapidly complete the test, the optimal fault repair is executed after the test is finished, the abrasion of the normal memory unit in the test process is reduced to the greatest extent, the test efficiency is high, and the fault positions can be automatically and accurately positioned.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a failure test apparatus for a memory chip according to an embodiment of the present invention;

Fig. 2 is a flow chart of a fault testing method for a memory chip according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is described below with reference to the accompanying drawings.

In embodiments of the invention, words such as "exemplary," "such as" and the like are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the term use of an example is intended to present concepts in a concrete fashion. Furthermore, in embodiments of the present invention, the meaning of "and/or" may be that of both, or may be that of either, optionally one of both.

In the embodiments of the present invention, "image" and "picture" may be sometimes used in combination, and it should be noted that the meaning of the expression is consistent when the distinction is not emphasized. "of", "corresponding (corresponding, relevant)" and "corresponding (corresponding)" are sometimes used in combination, and it should be noted that the meaning of the expression is consistent when the distinction is not emphasized.

In embodiments of the present invention, sometimes a subscript such as W ₁ may be wrongly written in a non-subscript form such as W1, and the meaning of the expression is consistent when the distinction is not emphasized.

In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1 of the specification, a schematic structural diagram of a fault test device for a memory chip according to an embodiment of the present invention is shown.

The embodiment of the invention provides a fault test device of a memory chip, which comprises:

The system comprises a top-level controller, a multi-dimensional test data bucket, a fault self-diagnosis module, an address remapping repair module and a content addressable memory;

the content addressable memory is used for storing the fault location;

Specifically, the top-level controller is used for managing and coordinating the operation of the fault test equipment and sending a test instruction to the fault self-diagnosis module. The fault self-diagnosis module is used for extracting corresponding test data from the multi-dimensional test data barrel according to the instruction of the top-level controller, carrying out fault detection and sending fault position information to the content addressable memory. The address remapping repairing module is used for receiving the fault position information stored in the content addressable memory, performing address remapping of the fault unit, and mapping the fault unit to the spare row or column for repairing. The content addressable memory is used for temporarily storing fault detection results (fault location information), providing necessary data support for the address remapping repair module and helping to optimize fault repair strategies. The multi-dimensional test data bucket is used to store a variety of test data (e.g., fault prescreening layer, interference fault detection layer, and coupling fault detection layer data). The memory chip controller provides model information of the memory chip to help the top level controller determine an appropriate test strategy. The controller of the memory chip provides the model information of the memory chip, the top-level controller can judge the type of the memory chip according to the model information, then send corresponding test instructions to the fault self-diagnosis module, and then call corresponding test data which are multidimensional so as to test each fault position of the memory chip.

It should be noted that, the existence of the top controller enables the whole testing process to be highly automated and centrally managed, and the testing strategy can be adjusted according to the specific model of the memory chip, so that the accurate matching of the testing data is ensured, and the accuracy of fault detection is improved. Secondly, the multidimensional test data bucket allows the system to detect various fault types from preliminary screening to deep detection, has wide coverage, and avoids missed detection and false detection. The fault self-diagnosis module can effectively and rapidly locate faults from test data, and the use of the content addressable memory ensures orderly recording and rapid access of fault information, thereby facilitating execution of subsequent repair steps. The address remapping repair module can accurately map the fault unit to the standby row or column by utilizing redundant resources in the memory, so that the repair efficiency is improved, and the performance degradation caused by the fault unit is greatly reduced.

The content addressable memory can be used for temporarily storing data, the stored data has traceable characteristics so as to test and optimize the memory chip in stages, and each fault position can be compared in parallel according to the stored data after the test is finished, so that limited redundant resources of the memory chip are fully utilized, and a better repairing scheme is formulated.

In one possible embodiment, the fault testing device further comprises: motherboard and power module

The top-level controller, the multi-dimensional test data bucket, the fault self-diagnosis module, the address remapping repair module and the content addressable memory are all distributed on the motherboard;

the power module is connected with the motherboard to supply power to the fault test equipment.

Wherein the motherboard provides physical support and circuit connections, ensuring that all modules can work cooperatively. The power module ensures that the test equipment is stably supplied with power and supports continuous test operation. The integrated design is light and efficient, expands application scenes, and improves the usability of fault test equipment.

Referring to fig. 2 of the specification, a flow chart of a fault testing method for a memory chip according to an embodiment of the invention is shown.

The invention also provides a fault test method of the memory chip, which is applied to the fault test equipment of the memory chip, and comprises the following steps:

S1: and obtaining redundant resources of the memory chip.

Wherein, the redundant resource is a non-fixed memory resource dynamically started by the memory chip;

specifically, the memory design reserves additional rows and columns, referred to as spare rows and columns. These spare resources do not participate in normal storage operations, but are dynamically enabled to replace failed units only when the original storage units fail.

The target test data are provided with layer-by-layer test data, the fault range is firstly screened, then the storage units in the fault range are subjected to fine-granularity fault detection, the problem that the service life of the hard disk is reduced due to the fact that a large amount of repeated data consume the erasing times of the normal storage units is avoided, the target test data are used for testing, the fault set can be rapidly tested within the limited test times, the defect that the service life of the storage chip is obviously reduced due to the fact that a large amount of repeated data are read and written in the traditional test process is overcome, and the test speed, the test integrity and the test accuracy can be effectively increased.

In one possible implementation, the fault prescreening layer includes the same number of zero-ones of data, the interference fault detection layer includes checkerboard data and inverted checkerboard data, and the coupling fault detection layer includes row-flip data and pseudo-random data.

TABLE 1

As shown in table 1, the data types of the fault primary screening layer, the interference fault detection layer and the coupling fault detection layer are listed in the table, wherein the fault primary screening layer is all-zero data and all-one data and is used for rapidly detecting the basic read-write function and electrical characteristic faults of the chip. The interference fault detection layer checkerboard data and the inverted checkerboard data are specially used for detecting interference faults between adjacent units. The coupling fault detection layer is used for detecting address decoding errors, complex multi-unit faults and finer coupling faults.

In particular, in the failure prescreening layer, by performing the test using the same number of zero-one data, all the cells of the memory can be uniformly covered. During the test, each memory cell writes a 0 and a1 in turn, ensuring that each bit is repeatedly checked to see if it can correctly store and hold a given value. Such a test can quickly identify whether a memory cell has basic read-write errors. The main advantage of using zero-one data testing is the simplicity and overall coverage, enabling quick localization of basic persistent faults in memory. This approach provides an efficient way to ensure reliability of the memory cells at the basic functional level by uniformly inspecting each memory cell.

At the disturb fault detection layer, checkerboard and inverted checkerboard data are used to identify errors due to electrical disturb between memory cells. The test simulates the largest electrical disturbance scenario by alternately writing opposite data (0 and 1) in adjacent cells. It is observed whether there is data error due to a state change of the adjacent cells, thereby detecting a disturbance failure. Checkerboard and inverted checkerboard testing is effective in revealing failures due to adjacent cells interacting with each other, especially in high density integrated circuits. The test mode can simulate the interference scene to the maximum extent, and helps a designer to understand the interference problem possibly encountered by the memory in actual use, so that the circuit design is optimized, and the stability and reliability of the product are improved.

The coupling failure detection layer detects coupling failures that may occur in the memory by using the line flip data and the pseudo random data. The row-line flip data is tested by using the inverse data pattern between successive rows to see if the memory cells are subject to error due to a change in state of adjacent rows. Pseudo-random data adds complexity and randomness to the test to detect more subtle coupling effects. This test approach can be particularly effective in discovering complex faults that only manifest themselves under certain data patterns and operating conditions by deep mining of coupling disturbance problems that may arise due to data state changes from row to row. In addition, the pseudo-random data test increases randomness, simulates various complex situations in practical application, and improves the effectiveness and comprehensiveness of the test. This helps to ensure robustness of the memory design, reducing problems that may be encountered in actual use.

it should be noted that, in each test process, the fault positions are orderly stored in the content addressable memory, and the fault positions are uniformly processed after the test is completed, so as to reasonably utilize redundant resources.

In one possible implementation manner, the step S3 specifically includes:

S301: acquiring an addressable storage unit of the storage chip;

among them, ATE (automatic test equipment) is a kind of equipment for automated test electronics, and in the failure test of memory chips, ATE equipment is particularly important because it can efficiently perform a large number of read and write operations and other electrical tests.

in one possible implementation manner, the step S303 specifically includes:

；

The coordinates are obtained based on a coordinate system established by the memory chip, and each physical address corresponds to one coordinate.

It should be noted that K-means clustering is a widely used unsupervised learning algorithm, and aims to divide a set of data points into K clusters, each cluster being represented by the mean (i.e., the cluster center) of the data points inside. The algorithm optimizes the clustering result by minimizing the sum of squares of the distances of each point to its cluster center. K clustering centers are randomly selected at the beginning of the algorithm, and then the following two steps are iteratively executed: 1) Assigning each data point to a cluster represented by the nearest cluster center; 2) Updating the cluster center of each cluster as the average value of all points in the cluster. The K-means clustering algorithm is used in fault detection of the memory chip, so that the accuracy and efficiency of fault positioning can be remarkably improved. Through the algorithm, the storage units which are physically similar and show similar fault modes can be automatically classified into the same fault cluster. The method enables the identification of the fault area to be more centralized and clear, thereby simplifying the repairing process and ensuring the pertinence and the effectiveness of repairing measures. In addition, iterative optimization in the clustering process helps to finely adjust fault partitions, optimizes diagnosis flow, reduces unnecessary testing and repairing cost, and improves overall operation and maintenance efficiency.

S305: and outputting the fault position.

Specifically, firstly, through the steps S301 to S305, the system detects the deep analysis from the primary screening, gradually narrows down the range of the fault area, and such a gradual refinement process not only improves the accuracy of the test, but also greatly improves the test efficiency. The primary screening by using all-zero and all-one data can quickly identify obvious read-write failure, and the subsequent K-means cluster analysis further accurately locates fault clusters by a statistical method, so that the fault location accuracy is optimized. By the method, the fault area can be effectively limited in the minimum possible range, so that more complex test data can be applied in a targeted manner to carry out secondary screening, the limited test resources can be utilized to the greatest extent in each test, and unnecessary test expenditure is reduced. In addition, the method also supports the execution of repair immediately after the test is finished, so that the functionality of the memory chip is ensured to be quickly restored.

in one possible implementation, the fault resilient linear programming model is specifically:

；

It should be noted that the model embodies the location of the faulty cell by defining binary indicator variables and optimizes the fault repair process in combination with the decision variables of the spare rows and columns. This approach allows for accurate calculation and maximization of the number of possible faulty cell repairs, ensuring that each repair most efficiently utilizes the available redundancy resources. Through explicit mathematical modeling, the scheme can ensure the maximum availability of the memory chip and simultaneously minimize the influence on the normal operation of the memory system. In addition, the application of the linear programming enables the restoration strategy not only to rely on visual judgment, but also to carry out optimization decision based on actual data and algorithm, thereby improving the accuracy and efficiency of restoration operation and reducing the risk of resource waste or secondary failure possibly caused by inaccurate restoration. In the whole, the method ensures the quality of fault repair, optimizes the allocation and the use of resources in the repair process, and is an efficient and economical fault treatment strategy.

Among these, greedy algorithms are an optimization technique that seeks optimal solutions in each selection step, hopefully ultimately achieving global optimization by selecting the current best local solution. However, greedy algorithms may not always result in a globally optimal solution because it does not have a backtracking function to adjust previous decisions. The local search strategy is also an optimization method that starts with a candidate solution and then explores in the neighborhood of the solution (i.e., the set of similar solutions) to find a better solution. The strategy iterates repeatedly, exploring new solutions by slightly varying the current solution until a satisfactory solution is found or other stopping conditions are reached. The greedy algorithm and the local search strategy are applied to the fault repairing process of the memory chip, and the advantages of the greedy algorithm and the local search strategy can be effectively combined, so that the efficiency and the effect of fault repairing are improved. The greedy algorithm provides a seemingly optimal solution for each repair step, such as preferentially repairing those rows or columns that fail most severely. The local search strategy can then further optimize and adjust the preliminary strategy by looking at the neighbor solutions of this solution, which helps to compensate for long-term or more complex failure modes that greedy algorithms may ignore. The combined method can more comprehensively utilize limited standby resources, realize the maximized repair of the fault unit, simultaneously reduce the problems of resource waste and excessive repair possibly occurring in the repair process, and ensure the efficient operation and the lasting stability of the storage system.

In the actual use process, the conventional method is to map the fault positions one by one according to the redundant resources, but the spare rows and columns are limited, the resources are rapidly exhausted by the direct one by one mapping, particularly in the memory area with dense faults, and the fault repair linear programming model is optimized by combining a greedy algorithm and a local search strategy, so that the important and most influenced fault areas can be repaired preferentially, and the use effect of the limited resources is maximized. Second, mapping one by one may not adequately take into account the overall distribution of faults. For example, if an entire row or column is faulty, it is more efficient to replace the entire row or column with a spare row or column than to map each individual faulty cell individually. In addition, mapping one by one may result in increased complexity of the memory map, affecting access efficiency. The optimization strategy may maintain or improve the performance of the storage system by handling failures centrally. In short, although mapping one by one is an intuitive repair method, under the conditions of limited resources and wide or complex fault distribution, the adoption of a systematic optimization strategy can more effectively utilize standby resources, improve the coverage and efficiency of repair, and simultaneously maintain or improve the overall performance and reliability of the system. Such an optimization strategy can ensure that there are enough resources to handle the most critical problems in an emergency situation while leaving room for less urgent problems.

It should be noted that, by reasonably selecting the positions of the fault units, more fault units can be repaired to the greatest extent under the limited redundancy resources, for example, if a plurality of fault units are located in the same row, the faults of the whole row can be repaired only by one spare row, so that the orderly repair can also improve the read-write performance of the memory chip.

In one possible implementation manner, the step S5 specifically includes:

the step S5 specifically comprises the following steps:

；

Wherein max represents taking the maximum value;

Where the neighborhood structure selects a configuration of the preliminary repair strategy (e.g., which rows and columns have been selected for repair) and considers the likelihood of minor changes to it. The add operation is to add the unselected rows or columns to the current repair list. The remove operation is to remove the selected row or column from the repair list. The replacement operation is to replace a selected row or column with another unselected row or column. May be performed simultaneously or separately.

；

It should be noted that the total number of failures in the row and column selection intersection portion is calculated, and this portion repetition count needs to be subtracted.

；

Specifically, first, by counting the total number of faults per row and column (S501), and assigning priorities to the individual faulty units in proportion to the total number of faults (S502), this step ensures that the focus of the repair strategy is on those areas where faults are most severe. Subsequently, the greedy selection target is to repair the faulty units one by one in order of highest to lowest priority, thereby developing a preliminary repair strategy and calculating the maximum number of faulty unit repairs that can be achieved (S503). Next, by performing local modification operations (addition, removal, and replacement policies) of the neighborhood structure, the repair policies are constantly optimized, ensuring that each adjustment is made in a direction to increase the number of repair failure units (S504-S505). This method allows for multiple iterations until all rows and columns are covered or until the number of repairs can no longer be increased (S506). Finally, the strategy with the largest repairing amount is selected as the optimal recovering strategy (S507-S508), which ensures that the most fault units are recovered by using the optimal resource allocation scheme, thereby greatly improving the repairing efficiency and the usability of the chip. This process not only systematically reduces the failure area, but also improves the stability and performance of the overall storage system.

S6: repairing the memory chip according to the optimal repairing strategy;

in one possible implementation manner, the step S6 is specifically:

and according to the optimal repair strategy, mapping the fault position corresponding to the fault unit to a spare row or a spare column of the redundant resource by modifying the address decoding logic of the memory chip, so as to finish the repair of the memory chip.

The address decoding logic is a key part of the memory chip, and is responsible for converting a logic address (an address used by a CPU or other devices to specify a data location) into a physical address (an actual storage location of data in the memory chip). This process ensures that data can be correctly read and written to the specified locations of the memory. In the context of fail-over, modifying address decode logic involves adjusting the decode logic to redirect an address that would otherwise be directed to a failed cell to a healthy spare row or column. Such remapping allows the system to continue to use the rest of the damaged memory chip while data is stored to the undamaged portion, thereby maintaining memory integrity and functionality, reducing data loss and system downtime due to failures. In this way, the lifetime of the memory chip can be extended and its reliability improved.

In the testing process, various faults of the chip can be tested by utilizing multi-dimensional testing data, testing and repairing are integrated, the storage chip can be tested and repaired with the aim of normal operation, the storage chip which cannot be used normally is accurately tested, in addition, the fault position is stored in a lasting mode by the content addressable memory, fault records can be reserved for the storage chip which can be operated normally but needs to be repaired, the performance of the storage chip is optimized, and the testing of the storage chip is completed in an omnibearing mode.

In one possible implementation manner, after the step S7, the method further includes:

It should be noted that, the automatic completion test automatically sends out the early warning to the memory chip that the trouble is more, reduces the manual intervention in the test process, increases the degree of automation of test flow.

It should be appreciated that the processor in embodiments of the invention may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (field programmable GATE ARRAY, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be appreciated that the memory in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of random access memory (random access memory, RAM) are available, such as static random access memory (STATIC RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware (e.g., circuitry), firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

In the present invention, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another device, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

The following points need to be described:

(1) The drawings of the embodiments of the present invention relate only to the structures related to the embodiments of the present invention, and other structures may refer to the general designs.

(2) In the drawings for describing embodiments of the present invention, the thickness of layers or regions is exaggerated or reduced for clarity, i.e., the drawings are not drawn to actual scale. It will be understood that when an element such as a layer, film, region or substrate is referred to as being "on" or "under" another element, it can be "directly on" or "under" the other element or intervening elements may be present.

(3) The embodiments of the invention and the features of the embodiments can be combined with each other to give new embodiments without conflict.

The present invention is not limited to the above embodiments, but the scope of the invention is defined by the claims.

Claims

1. A method for testing a memory chip for faults, the method comprising:

S1: obtaining redundant resources of a memory chip, wherein the redundant resources are non-fixed memory resources dynamically started by the memory chip;

s2: extracting target test data from a multi-dimensional test data bucket, wherein the target test data comprises a fault primary screening layer, an interference fault detection layer and a coupling fault detection layer;

s6: repairing the memory chip according to the optimal repairing strategy;

s7: outputting unrepaired fault positions in the optimal repair strategy to finish the test of the memory chip;

in S4, the fault resilient linear programming model specifically includes:

；

wherein M represents the repair number of the fault unit, Representing a binary indicating variable when a fault unit of an ith row and a jth column of the memory chip has a fault=1, Otherwise,=0，，Wherein m and n respectively represent the maximum number of rows and the maximum number of columns of the memory chip,Representing a binary auxiliary variable when the faulty cell of the ith row and jth column is repaired=1, Otherwise,=0，A binary decision variable indicating whether or not to select the ith spare row in the redundant resource,A binary decision variable representing whether the j-th standby column in the redundant resource is selected, wherein R represents the maximum number of rows of the redundant resource, and C represents the maximum number of columns of the redundant resource;

the step S5 specifically comprises the following steps: s501: counting the total number of faults of each row and each column in the memory chip;

；

Wherein max represents taking the maximum value;

；

2. The method of claim 1, wherein the failure prescreening layer includes zero-one data in the same number, the disturbance failure detection layer includes checkerboard data and inverted checkerboard data, and the coupling failure detection layer includes row inversion data and pseudo-random data.

3. The method for testing a memory chip according to claim 1, wherein S3 specifically comprises:

S301: acquiring an addressable storage unit of the storage chip;

S305: and outputting the fault position.

4. The method for testing a memory chip according to claim 3, wherein S303 specifically comprises:

；

5. The method for testing a failure of a memory chip according to claim 1, wherein S6 is specifically:

6. The method for testing a failure of a memory chip according to claim 1, further comprising, after S7:

7. A fault testing device for a memory chip, wherein the fault testing method according to any one of claims 1 to 6 is applied, and comprises a top-level controller, a multi-dimensional test data bucket, a fault self-diagnosis module, an address remapping repair module and a content addressable memory;

the content addressable memory is used for storing the fault location;

8. The fault testing device of a memory chip of claim 7, wherein the fault testing device further comprises: a motherboard and a power module;