CN112036554B - Neural network model processing method and device, computer equipment and storage medium - Google Patents
Neural network model processing method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN112036554B CN112036554B CN202011213215.3A CN202011213215A CN112036554B CN 112036554 B CN112036554 B CN 112036554B CN 202011213215 A CN202011213215 A CN 202011213215A CN 112036554 B CN112036554 B CN 112036554B
- Authority
- CN
- China
- Prior art keywords
- neural network
- data
- network model
- layer data
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003062 neural network model Methods 0.000 title claims abstract description 204
- 238000003672 processing method Methods 0.000 title abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000011176 pooling Methods 0.000 claims description 104
- 238000004590 computer program Methods 0.000 claims description 25
- 238000012545 processing Methods 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 17
- 230000006835 compression Effects 0.000 claims description 10
- 238000007906 compression Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000003925 brain function Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000004377 microelectronic Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a processing method and device of a neural network model, computer equipment and a storage medium. The method comprises the following steps: loading the neural network model in a memory; reading each data from the neural network model in the memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type; and/or determining at least two structural layer data included by the neural network model in the memory, and compressing the at least two structural layer data. By adopting the method, the size of the memory space occupied by the neural network model can be reduced.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a neural network model, a computer device, and a storage medium.
Background
With the development of computer technology, artificial intelligence technology has emerged, which can help people to perform various tasks such as language identification, image identification, natural language processing, etc. In artificial intelligence techniques, neural network models can be employed to handle tasks. Neural Networks (NN) are complex network systems formed by a large number of simple processing units (called neurons) widely interconnected, reflect many basic features of human brain functions, and are highly complex nonlinear dynamical learning systems.
However, the traditional neural network model has the problem of large occupied memory space.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a processing method and apparatus for a neural network model, a computer device, and a storage medium, which can reduce the size of a memory space occupied by the neural network model.
A method of processing a neural network model, the method comprising:
loading the neural network model in a memory;
reading each data from the neural network model in the memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type; and/or
And determining at least two structural layer data included by the neural network model in the memory, and compressing the at least two structural layer data.
In one embodiment, the structural layer data comprises convolutional layer data;
the determining at least two layers of structural layer data included in the neural network model in the memory, and compressing the at least two layers of structural layer data includes:
and determining at least two convolution layer data included in the neural network model in the memory, and multiplying the at least two convolution layer data to obtain a first series of operators.
In one embodiment, after the multiplying the at least two convolution layer data to obtain the first series of operators, the method further includes:
and inputting an input object into the compressed neural network model, multiplying the input object by the first series of operators through the compressed neural network model to obtain a result, and outputting the result.
In one embodiment, the structural layer data comprises pooling layer data;
the determining at least two layers of structural layer data included in the neural network model in the memory, and compressing the at least two layers of structural layer data includes:
determining at least two pooling layer data included in the neural network model in the memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer;
and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In one embodiment, said converting, for each of said pooling layers, said pooled layer data into corresponding intermediate convolutional layer data comprises:
and for each piece of pooling layer data, multiplying the pooling layer data by a preset matrix to obtain intermediate convolution layer data corresponding to the pooling layer data.
In one embodiment, the structural layer data comprises convolutional layer data and pooling layer data;
the determining at least two layers of structural layer data included in the neural network model in the memory, and compressing the at least two layers of structural layer data includes:
determining at least two convolution layer data included in the neural network model in the memory, and multiplying the at least two convolution layer data to obtain a first series of operators;
determining at least two pooling layer data included in the neural network model in the memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer;
and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In one embodiment, after converting the data of the numeric value type into the data of the character string type, the method further includes:
and if the required data is the data of the character string type, acquiring the data of the character string type, and performing deserialization on the data of the character string type to obtain the data of the numerical value type corresponding to the data of the character string type.
An apparatus for processing a neural network model, the apparatus comprising:
the loading module is used for loading the neural network model in the memory;
the conversion module is used for reading each data from the neural network model in the memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type; and/or
And the compression module is used for determining at least two structural layer data included by the neural network model in the memory and compressing the at least two structural layer data.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The processing method, the processing device, the computer equipment and the storage medium of the neural network model load the neural network model in the memory; reading each data from a neural network model in a memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type occupying a small memory space if the data of the numerical value type occupies a large memory space, so that the memory space occupied by the neural network model can be reduced; and/or determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data, so that the size of the memory space occupied by the neural network model can be further reduced.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a method for processing a neural network model in one embodiment;
fig. 2 is a schematic flow chart illustrating a step of determining at least two layers of structure layer data included in a neural network model in a memory and compressing the at least two layers of structure layer data according to an embodiment;
fig. 3 is a schematic flow chart illustrating a step of determining at least two layers of structure layer data included in a neural network model in a memory and compressing the at least two layers of structure layer data according to another embodiment;
FIG. 4 is a block diagram of a processing device of a neural network model in one embodiment;
FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The processing method of the neural network model can be applied to computer equipment. The computer device may be a terminal or a server. The terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server can be implemented by an independent server or a server cluster formed by a plurality of servers. The processing method of the neural network model can be further applied to a system comprising the terminal and the server and is achieved through interaction of the terminal and the server.
In one embodiment, as shown in fig. 1, there is provided a processing method of a neural network model, the method comprising the steps of:
Neural Networks (NN) are complex network systems formed by a large number of simple processing units (called neurons) widely interconnected, reflect many basic features of human brain functions, and are highly complex nonlinear dynamical learning systems. The neural network has the capabilities of large-scale parallel, distributed storage and processing, self-organization, self-adaptation and self-learning, and is particularly suitable for processing inaccurate and fuzzy information processing problems which need to consider many factors and conditions simultaneously. The development of neural networks is related to neuroscience, mathematical science, cognitive science, computer science, artificial intelligence, information science, cybernetics, robotics, microelectronics, psychology, optical computing, molecular biology and the like, and is an emerging edge crossing discipline.
The neural network model is described based on a mathematical model of the neuron, i.e. the neural network model is a model built by the neural network. The Neural network model may be one of a CNN (Convolutional Neural network) model, a GAN (generic adaptive network) model, an RNN (Recurrent Neural network) model, and the like.
A Memory (Memory), also called an internal Memory and a main Memory, temporarily stores operation data in a CPU (central processing unit) and data exchanged with an external Memory such as a hard disk. In the running process of the computer equipment, an operating system of the computer equipment transfers data needing to be operated to a CPU from a memory for operation.
And if the computer equipment needs to adopt the neural network model to process tasks, acquiring the neural network model from the disk, and loading the neural network model in the memory.
Specifically, if the computer device needs to adopt a neural network model processing task, an identifier of the neural network model to be adopted is obtained, the identifier is matched with identifiers of the neural network models stored in the disk, a matched neural network model is obtained, and the matched neural network model is loaded in the memory.
The identification of the neural network model may be a unique numerical label, may be a string, may be a user-defined name, and the like, without limitation.
One or more neural network models and identifications corresponding to the neural network models can be stored in the disk, the identifications of the neural network models to be adopted are matched with the identifications of the neural network models stored in the disk, the neural network models matched with the identifications of the neural network models to be adopted in the disk, namely the neural network models to be adopted, are obtained, and the matched neural network models are loaded in the memory.
And 104, reading each data from the neural network model in the memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type.
The data of numerical type indicates the number of data of the type in which numerical operation can be performed. The value type may be float, int, double32, etc. The string type may be string.
It can be understood that the data of the value type is data of high occupied bytes, that is, the data of the value type occupies a large memory space. For example, data of the float type occupies 8 bytes of memory space, data of the int type occupies 4 bytes of memory space, data of the double type occupies 16 bytes of memory space, and data of the double32 type occupies 32 bytes of memory space. The data of the character string type is data of low byte occupation, namely the data of the character string type occupies small memory space. For example, string type data occupies 1 byte of memory space.
The computer equipment reads the type information of each data from the neural network model in the memory, determines that the type information is data of a numerical value type, determines the data of the numerical value type from each data, and converts the data of the numerical value type into data of a character string type.
In one embodiment, each data in the neural network model is marked with its own type information, and the computer device can read the marked type information of each data and then determine that the type information is data of a numerical type.
In another embodiment, the computer device may identify individual data in the neural network model, identifying data of a numerical type.
And 106, determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data.
In the neural network model, at least two structural layer data may be included. The structural layer data may be convolutional layer data, pooling layer data, convolutional layer data and pooling layer data.
The computer equipment determines at least two structural layer data included by the neural network model in the memory, and compresses the at least two structural layer data, so that the size of the memory space occupied by the at least two structural layer data can be reduced, and the size of the memory space occupied by the neural network model is reduced.
In one embodiment, a computer device may compress at least two layers of fabric data and store the compressed data in memory. In another embodiment, the computer device may compress at least two layers of structural layer data, store the compressed data in the memory, and store at least two layers of structural layer data before compression in another storage device, such as a disk, a hard disk, or the like.
In this embodiment, the neural network model is loaded in the memory; reading each data from a neural network model in a memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type occupying a small memory space if the data of the numerical value type occupies a large memory space, so that the memory space occupied by the neural network model can be reduced; at least two structural layer data included in the neural network model in the memory are determined, and the at least two structural layer data are compressed, so that the size of the memory space occupied by the neural network model can be further reduced.
In another embodiment, a method for processing a neural network model is provided, the method comprising the steps of: loading the neural network model in a memory; reading each data from the neural network model in the memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type.
In this embodiment, the neural network model is loaded in the memory; reading each data from the neural network model in the memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type occupying a small memory space if the data of the numerical value type occupies a large memory space, so that the memory space occupied by the neural network model can be reduced.
In another embodiment, a method for processing a neural network model is provided, the method comprising the steps of: loading the neural network model in a memory; determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data.
In this embodiment, the neural network model is loaded in the memory; reading each data from the neural network model in the memory, determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data, so that the memory space occupied by the neural network model can be reduced.
In one embodiment, after converting the data of the numeric type into the data of the character string type, the method further includes: and if the required data is the data of the character string type, acquiring the data of the character string type, and performing deserialization on the data of the character string type to obtain the data of the numerical value type corresponding to the data of the character string type.
Deserialization refers to a process of restoring a byte sequence to an object, that is, a process of converting data of a character string type into data of a numerical value type.
The computer device converts the data of the numeric type into data of the character string type, and then stores the data of the character string type in the memory. If the data required by the computer equipment is the data of the character string type, the data of the character string type is obtained from the memory, and the data of the character string type is deserialized to obtain the data of the numerical value type corresponding to the data of the character string type.
In this embodiment, if the required data is data of a character string type, the data of the character string type is obtained, and the data of the character string type is deserialized to obtain data of a numerical value type corresponding to the data of the character string type, so that the data of the numerical value type can be obtained to perform the operation.
In one embodiment, the structural layer data comprises convolutional layer data; determining at least two structural layer data included in a neural network model in a memory, and compressing the at least two structural layer data, wherein the method comprises the following steps: determining at least two convolution layer data included in a neural network model in a memory, and multiplying the at least two convolution layer data to obtain a first series of operators.
Each Convolutional layer (Convolutional layer) in the Convolutional neural network is composed of a plurality of Convolutional units, and the parameters of each Convolutional unit are optimized through a back propagation algorithm. The convolution operation aims to extract different input features, the convolution layer at the first layer can only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features.
The first series of operators refers to the result of multiplying at least two convolution layer data.
The computer equipment determines at least two convolution layer data included in the neural network model in the memory, and multiplies the at least two convolution layer data to obtain a first series of operators. For example, the neural network model in memory includes 4 convolutional layer data, A, B, C and D respectively, and the computer device may multiply A, B, C and D to obtain a result H, where H is the first series of operators.
In this embodiment, at least two convolution layer data included in the neural network model in the memory are determined, and the at least two convolution layer data are multiplied to obtain a first series of operators, so that the size of the memory space occupied by the neural network model can be reduced.
In one embodiment, after multiplying at least two convolution layer data to obtain a first series of operators, the method further includes: and inputting the input object into the compressed neural network model, multiplying the input object by the first series of operators through the compressed neural network model to obtain a result, and outputting the result.
The input object refers to an object in the input neural network model. The input object may be text, image, video, audio, etc. data. Text such as an article, a question, a sentence, etc.
And the computer equipment inputs the input object into the compressed neural network model, multiplies the input object by the first series of operators through the compressed neural network model to obtain a result, and outputs the result.
For example, the neural network model in the memory includesThe 4 convolution layer data are A, B, C and D respectively, the computer device can multiply A, B, C and D to obtain a first series of operators H, the input object is Q, the computer device inputs the input object Q into the compressed neural network model, the input object Q and the first series of operators H are multiplied through the compressed neural network model to obtain a result E, namely the result E is obtained。
In this embodiment, the input object is input into the compressed neural network model, and the compressed neural network model multiplies the input object by the first series of operators, so that the neural network model only needs to calculate the input object and the first series of operators obtained by compression, thereby avoiding the input object from being respectively calculated with each convolution layer data, improving the calculation efficiency, and obtaining and outputting the result more quickly.
In one embodiment, the structural layer data includes pooling layer data. As shown in fig. 2, determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data includes:
The pooling layer is also called a down-sampling layer, and the specific operation of the pooling layer is basically the same as that of the convolution layer, except that the convolution kernel of the pooling layer only takes the maximum value, the average value and the like (maximum pooling and average pooling) of the corresponding positions, namely the operation rules among the matrixes are different, and the matrixes are not modified by back propagation.
For each pooling layer, converting pooling layer data into corresponding intermediate convolution layer data, comprising: and for each piece of pooling layer data, multiplying the pooling layer data by a preset matrix to obtain middle convolution layer data corresponding to the pooling layer data.
The preset matrix may be preset by a user. The intermediate convolution layer data refers to data translated from pooled layer data.
For each piece of pooling layer data, the computer device multiplies the pooling layer data by a preset matrix, and can convert the pooling layer data into convolution layer data to obtain middle convolution layer data.
For example, pooling layer data isThe predetermined matrix isThen, the pooling layer data is multiplied by a preset matrix:pooling layer data may be converted to convolutional layer data。
And step 204, multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
The second series of operators refers to the result of multiplying at least two intermediate convolution layer data.
In this embodiment, at least two pooling layer data included in the neural network model in the memory are determined, and for each pooling layer, the pooling layer data is converted into corresponding intermediate convolution layer data; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators, so that the size of the memory space occupied by the neural network model can be reduced.
In one embodiment, after the multiplying the obtained at least two intermediate convolution layer data to obtain the second series of operators, the method further includes: and inputting the input object into the compressed neural network model, multiplying the input object by the second series of operators through the compressed neural network model to obtain a result, and outputting the result.
The input object refers to an object in the input neural network model. The input object may be text, image, video, audio, etc. data. Text such as an article, a question, a sentence, etc.
And the computer equipment inputs the input object into the compressed neural network model, multiplies the input object by the second series of operators through the compressed neural network model to obtain a result, and outputs the result.
For example, the neural network model in the memory includes 4 pooling layer data, which are a1, B1, C1 and D1, the computer device multiplies a1, B1, C1 and D1 by a preset matrix to obtain intermediate convolution layer data a2, B2, C2 and D2 corresponding to one another, and multiplies the intermediate convolution layer data a2, B2, C2 and D2 to obtain a second series of operators H, the input object is Q, the computer device inputs the input object Q into the compressed neural network model, and the input object Q is multiplied by the second series of operators H through the compressed neural network model to obtain a result E, that is, the result E is obtained。
In this embodiment, the input object is input into the compressed neural network model, and the compressed neural network model multiplies the input object by the second series of operators, so that the neural network model only needs to calculate the input object and the compressed second series of operators, thereby avoiding the input object from being respectively calculated with each pooling layer data, improving the calculation efficiency, and obtaining and outputting the result more quickly.
In one embodiment, the structural layer data includes convolution layer data and pooling layer data. As shown in fig. 3, determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data includes:
Each Convolutional layer (Convolutional layer) in the Convolutional neural network is composed of a plurality of Convolutional units, and the parameters of each Convolutional unit are optimized through a back propagation algorithm. The convolution operation aims to extract different input features, the convolution layer at the first layer can only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features.
The first series of operators refers to the result of multiplying at least two convolution layer data.
The computer equipment determines at least two convolution layer data included in the neural network model in the memory, and multiplies the at least two convolution layer data to obtain a first series of operators. For example, the neural network model in memory includes 4 convolutional layer data, A, B, C and D respectively, and the computer device may multiply A, B, C and D to obtain a result H, where H is the first series of operators.
Step 304, determining at least two pooling layer data included in the neural network model in the memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer.
The pooling layer is also called a down-sampling layer, and the specific operation of the pooling layer is basically the same as that of the convolution layer, except that the convolution kernel of the pooling layer only takes the maximum value, the average value and the like (maximum pooling and average pooling) of the corresponding positions, namely the operation rules among the matrixes are different, and the matrixes are not modified by back propagation.
For each pooling layer, converting pooling layer data into corresponding intermediate convolution layer data, comprising: and for each piece of pooling layer data, multiplying the pooling layer data by a preset matrix to obtain middle convolution layer data corresponding to the pooling layer data.
The preset matrix may be preset by a user. The intermediate convolution layer data refers to data translated from pooled layer data.
For each piece of pooling layer data, the computer device multiplies the pooling layer data by a preset matrix, and can convert the pooling layer data into convolution layer data to obtain middle convolution layer data.
For example, pooling layer data isThe predetermined matrix isThen, the pooling layer data is multiplied by a preset matrix:pooling layer data may be converted to convolutional layer data。
And step 306, multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
The second series of operators refers to the result of multiplying at least two intermediate convolution layer data.
In this embodiment, at least two convolution layer data included in a neural network model in a memory are determined, and the at least two convolution layer data are multiplied to obtain a first series of operators; determining at least two pooling layer data included in a neural network model in a memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators, so that the size of the memory space occupied by the neural network model can be reduced.
In one embodiment, after multiplying the obtained at least two intermediate convolution layer data to obtain the second series of operators, the method further includes: inputting an input object into the compressed neural network model, multiplying the input object by the first series of operators through the compressed neural network model to obtain a first result, multiplying the first result by the second series of operators to obtain a second result, and outputting the second result.
The input object refers to an object in the input neural network model. The input object may be text, image, video, audio, etc. data. Text such as an article, a question, a sentence, etc.
In this embodiment, an input object is input into a compressed neural network model, the compressed neural network model multiplies the input object by a first series of operators to obtain a first result, and then multiplies the first result by a second series of operators, so that the neural network model only needs to operate the input object by the first series of operators and the second series of operators obtained by compression, thereby avoiding operating the input object with each convolutional layer and each pooling layer data, improving the operation efficiency, and being capable of obtaining the result and outputting the result more quickly.
It should be understood that, although the steps in the flowcharts of fig. 1 to 3 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 to 3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 4, there is provided a processing apparatus 400 of a neural network model, including: a load module 402, a convert module 404, and a compress module 406, wherein:
a loading module 402, configured to load the neural network model in a memory.
A conversion module 404, configured to read each data from the neural network model in the memory, determine data of a numerical type from each data, and convert the data of the numerical type into data of a character string type. And/or
A compression module 406, configured to determine at least two layers of structural layer data included in the neural network model in the memory, and compress the at least two layers of structural layer data.
The processing device of the neural network model loads the neural network model in a memory; reading each data from a neural network model in a memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type occupying a small memory space if the data of the numerical value type occupies a large memory space, so that the memory space occupied by the neural network model can be reduced; and/or determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data, so that the size of the memory space occupied by the neural network model can be further reduced.
In one embodiment, the structural layer data comprises convolutional layer data; the compression module 406 is further configured to determine at least two convolution layer data included in the neural network model in the memory, and multiply the at least two convolution layer data to obtain a first series of operators.
In an embodiment, the processing apparatus of the neural network model further includes an operation module, configured to input the input object into the compressed neural network model, multiply the input object by the first series of operators through the compressed neural network model to obtain a result, and output the result.
In one embodiment, the structural layer data comprises pooling layer data; the compression module 406 is further configured to determine at least two pooling layer data included in the neural network model in the memory, and convert the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In one embodiment, the compression module 406 is further configured to multiply the pooled layer data by a preset matrix for each pooled layer data to obtain intermediate convolution layer data corresponding to the pooled layer data.
In one embodiment, the structural layer data includes convolution layer data and pooling layer data; the compression module 406 is further configured to determine at least two convolution layer data included in the neural network model in the memory, and multiply the at least two convolution layer data to obtain a first series of operators; determining at least two pooling layer data included in a neural network model in a memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In an embodiment, the processing apparatus of the neural network model further includes a deserialization module, configured to, if the required data is data of a character string type, obtain data of the character string type, and deserialize the data of the character string type to obtain data of a numerical type corresponding to the data of the character string type.
For specific definition of the processing device of the neural network model, reference may be made to the above definition of the processing method of the neural network model, and details are not described here. The respective modules in the processing device of the neural network model described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as a neural network model, a preset matrix and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a processing method of a neural network model.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: loading the neural network model in a memory; reading each data from a neural network model in a memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type; and/or determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data.
In one embodiment, the structural layer data comprises convolutional layer data; the processor, when executing the computer program, further performs the steps of: determining at least two convolution layer data included in a neural network model in a memory, and multiplying the at least two convolution layer data to obtain a first series of operators.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and inputting the input object into the compressed neural network model, multiplying the input object by the first series of operators through the compressed neural network model to obtain a result, and outputting the result.
In one embodiment, the structural layer data comprises pooling layer data; the processor, when executing the computer program, further performs the steps of: determining at least two pooling layer data included in a neural network model in a memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and for each piece of pooling layer data, multiplying the pooling layer data by a preset matrix to obtain middle convolution layer data corresponding to the pooling layer data.
In one embodiment, the structural layer data includes convolution layer data and pooling layer data; the processor, when executing the computer program, further performs the steps of: determining at least two convolution layer data included in a neural network model in a memory, and multiplying the at least two convolution layer data to obtain a first series of operators; determining at least two pooling layer data included in a neural network model in a memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and if the required data is the data of the character string type, acquiring the data of the character string type, and performing deserialization on the data of the character string type to obtain the data of the numerical value type corresponding to the data of the character string type.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: loading the neural network model in a memory; reading each data from a neural network model in a memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type; and/or determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data.
In one embodiment, the structural layer data comprises convolutional layer data; the computer program when executed by the processor further realizes the steps of: determining at least two convolution layer data included in a neural network model in a memory, and multiplying the at least two convolution layer data to obtain a first series of operators.
In one embodiment, the computer program when executed by the processor further performs the steps of: and inputting the input object into the compressed neural network model, multiplying the input object by the first series of operators through the compressed neural network model to obtain a result, and outputting the result.
In one embodiment, the structural layer data comprises pooling layer data; the computer program when executed by the processor further realizes the steps of: determining at least two pooling layer data included in a neural network model in a memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In one embodiment, the computer program when executed by the processor further performs the steps of: and for each piece of pooling layer data, multiplying the pooling layer data by a preset matrix to obtain middle convolution layer data corresponding to the pooling layer data.
In one embodiment, the structural layer data includes convolution layer data and pooling layer data; the computer program when executed by the processor further realizes the steps of: determining at least two convolution layer data included in a neural network model in a memory, and multiplying the at least two convolution layer data to obtain a first series of operators; determining at least two pooling layer data included in a neural network model in a memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
In one embodiment, the computer program when executed by the processor further performs the steps of: and if the required data is the data of the character string type, acquiring the data of the character string type, and performing deserialization on the data of the character string type to obtain the data of the numerical value type corresponding to the data of the character string type.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (12)
1. A method of processing a neural network model, the method comprising:
loading the neural network model in a memory;
determining at least two structural layer data included in the neural network model in the memory, and compressing the at least two structural layer data;
the structural layer data comprises convolution layer data and pooling layer data; the determining at least two layers of structural layer data included in the neural network model in the memory, and compressing the at least two layers of structural layer data includes:
determining at least two convolution layer data included in the neural network model in the memory, and multiplying the at least two convolution layer data to obtain a first series of operators;
determining at least two pooling layer data included in the neural network model in the memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer;
and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
2. The method of claim 1, wherein after said multiplying said at least two resulting intermediate convolution layer data to produce a second series of operators, further comprising:
inputting an input object into a compressed neural network model, and multiplying the input object by the first series of operators through the compressed neural network model to obtain a first result;
and multiplying the first result by the second series of operators to obtain a second result, and outputting the second result.
3. The method of claim 1, wherein converting the pooled layer data into corresponding intermediate convolution layer data for each of the pooled layers comprises:
and for each piece of pooling layer data, multiplying the pooling layer data by a preset matrix to obtain intermediate convolution layer data corresponding to the pooling layer data.
4. The method of claim 1, wherein after loading the neural network model in the memory, further comprising:
reading each data from the neural network model in the memory, determining data of a numerical value type from each data, and converting the data of the numerical value type into data of a character string type.
5. The method of claim 4, wherein after converting the numeric data into the string data, further comprising:
and if the required data is the data of the character string type, acquiring the data of the character string type, and performing deserialization on the data of the character string type to obtain the data of the numerical value type corresponding to the data of the character string type.
6. The method of claim 4, wherein reading each datum from the neural network model in the memory, determining a numerical type of datum from each datum, comprises:
and reading the type information marked by each data from the neural network model in the memory, and determining that the type information is data of a numerical value type.
7. The method of claim 1, wherein loading the neural network model in memory comprises:
and acquiring a required neural network model from a disk, and loading the required neural network model in a memory.
8. The method of claim 7, wherein obtaining the required neural network model from the disk and loading the required neural network model into the memory comprises:
acquiring the identification of the needed neural network model, matching the identification with the identification of each neural network model stored in a disk, acquiring the matched neural network model, and loading the matched neural network model in a memory.
9. An apparatus for processing a neural network model, the apparatus comprising:
the loading module is used for loading the neural network model in the memory;
the compression module is used for determining at least two structural layer data included in the neural network model in the memory and compressing the at least two structural layer data; the structural layer data comprises convolution layer data and pooling layer data; the determining at least two layers of structural layer data included in the neural network model in the memory, and compressing the at least two layers of structural layer data includes: determining at least two convolution layer data included in the neural network model in the memory, and multiplying the at least two convolution layer data to obtain a first series of operators; determining at least two pooling layer data included in the neural network model in the memory, and converting the pooling layer data into corresponding intermediate convolution layer data for each pooling layer; and multiplying the obtained at least two intermediate convolution layer data to obtain a second series of operators.
10. The apparatus of claim 9, further comprising a conversion module, wherein the conversion module is configured to read each data from the neural network model in the memory, determine a data type from each data, and convert the data type into a data type of a string.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011213215.3A CN112036554B (en) | 2020-11-04 | 2020-11-04 | Neural network model processing method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011213215.3A CN112036554B (en) | 2020-11-04 | 2020-11-04 | Neural network model processing method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112036554A CN112036554A (en) | 2020-12-04 |
CN112036554B true CN112036554B (en) | 2021-04-06 |
Family
ID=73573614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011213215.3A Active CN112036554B (en) | 2020-11-04 | 2020-11-04 | Neural network model processing method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112036554B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033779B (en) * | 2021-03-18 | 2024-08-27 | 联想(北京)有限公司 | Model processing method based on equipment parameters and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490244A (en) * | 2019-08-14 | 2019-11-22 | 吉林大学 | A kind of data processing method and device |
CN111047020A (en) * | 2018-10-12 | 2020-04-21 | 上海寒武纪信息科技有限公司 | Neural network operation device and method supporting compression and decompression |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106485316B (en) * | 2016-10-31 | 2019-04-02 | 北京百度网讯科技有限公司 | Neural network model compression method and device |
CN107992329B (en) * | 2017-07-20 | 2021-05-11 | 上海寒武纪信息科技有限公司 | Calculation method and related product |
CN111026748B (en) * | 2019-11-05 | 2020-11-17 | 广州市玄武无线科技股份有限公司 | Data compression method, device and system for network access frequency management and control |
CN110909801B (en) * | 2019-11-26 | 2020-10-09 | 山东师范大学 | Data classification method, system, medium and equipment based on convolutional neural network |
-
2020
- 2020-11-04 CN CN202011213215.3A patent/CN112036554B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047020A (en) * | 2018-10-12 | 2020-04-21 | 上海寒武纪信息科技有限公司 | Neural network operation device and method supporting compression and decompression |
CN110490244A (en) * | 2019-08-14 | 2019-11-22 | 吉林大学 | A kind of data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112036554A (en) | 2020-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102434726B1 (en) | Treatment method and device | |
CN112257858B (en) | Model compression method and device | |
CN109478144B (en) | Data processing device and method | |
US20190318231A1 (en) | Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information | |
US10474430B2 (en) | Mixed-precision processing elements, systems, and methods for computational models | |
US20230026006A1 (en) | Convolution computation engine, artificial intelligence chip, and data processing method | |
CN116415654A (en) | Data processing method and related equipment | |
CN110472002B (en) | Text similarity obtaining method and device | |
CN114925320B (en) | Data processing method and related device | |
CN110162783B (en) | Method and device for generating hidden states in recurrent neural networks for language processing | |
CN111353591A (en) | Computing device and related product | |
CN111652349A (en) | Neural network processing method and related equipment | |
CN113435581A (en) | Data processing method, quantum computer, device and storage medium | |
CN116401552A (en) | Classification model training method and related device | |
CN112036554B (en) | Neural network model processing method and device, computer equipment and storage medium | |
CN110874627A (en) | Data processing method, data processing apparatus, and computer readable medium | |
CN113537447A (en) | Generation method, device, application method and storage medium of multi-layer neural network | |
US20220108156A1 (en) | Hardware architecture for processing data in sparse neural network | |
KR102372869B1 (en) | Matrix operator and matrix operation method for artificial neural network | |
CN116362301A (en) | Model quantization method and related equipment | |
CN118262380A (en) | Model training method and related equipment thereof | |
CN114707643A (en) | Model segmentation method and related equipment thereof | |
US10922618B2 (en) | Multi-pass system for emulating a quantum computer and methods for use therewith | |
US11086634B2 (en) | Data processing apparatus and method | |
JP7444870B2 (en) | Data processing method and device, computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |