CN112346782B - A method, device, equipment and storage medium for processing data in a function - Google Patents
A method, device, equipment and storage medium for processing data in a function Download PDFInfo
- Publication number
- CN112346782B CN112346782B CN201910726018.2A CN201910726018A CN112346782B CN 112346782 B CN112346782 B CN 112346782B CN 201910726018 A CN201910726018 A CN 201910726018A CN 112346782 B CN112346782 B CN 112346782B
- Authority
- CN
- China
- Prior art keywords
- elements
- vector
- data stream
- element set
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 239000013598 vector Substances 0.000 claims abstract description 135
- 230000006870 function Effects 0.000 claims description 63
- 230000008569 process Effects 0.000 claims description 19
- 238000003672 processing method Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000007667 floating Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010923 batch production Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
- Advance Control (AREA)
Abstract
The application discloses a method, a device, equipment and a storage medium for processing data in a function, wherein the method comprises the steps of determining a vector in an input objective function, determining the number of elements which are processed by single instruction multiple data streams for one time based on the number of bits of single elements in the vector and the number of single processing data bits of the single instruction multiple data streams, determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements which are processed by single instruction multiple data streams for one time, wherein the ratio of the sum of the number of elements in the first element set and the number of elements which are processed by single instruction multiple data streams for one time is a positive integer, processing the first element set based on the single instruction multiple data streams, and processing the second element set based on the single instruction multiple data streams. By adopting the technical scheme of the application, the processing efficiency of the data in the elements is improved, thereby improving the operation speed of the function.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing data in a function.
Background
The RELU function is a core function in deep learning, and its speed is critical. Currently, RELU functions known in the art are implemented using single instruction single data stream (Single Instruction SINGLE DATA, SISD) instructions. The SISD instruction processes one element of the input vector at a time until all elements of the vector have been processed.
The RELU function based on SISD instructions is slow to calculate, and when RELU functions are executed in critical areas, such as in a glance online service, a slower speed means a greater delay, more timeout failures, and worse user experience.
Therefore, it is necessary to provide a data processing method, apparatus, device and storage medium that can increase the speed of function operation.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for processing data in a function, which can improve the processing efficiency of the data in elements, thereby improving the operation speed of the function and improving the user experience.
In one aspect, the present application provides a method for processing data in a function, where the method includes:
determining a vector in an input objective function;
Determining the number of elements processed by the single-instruction multiple-data stream once based on the number of bits of single elements in the vector and the number of single-processing data bits of the single-instruction multiple-data stream;
Determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements processed by the single-instruction multiple data stream in a single mode, wherein the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single-instruction multiple data stream in a single mode is a positive integer, and the second element set is a set of elements except the first element set in the vector;
processing the first set of elements based on the single instruction multiple data stream;
the second set of elements is processed based on a single instruction single data stream.
In another aspect, there is provided a data processing apparatus in a function, the apparatus comprising:
the vector determining module is used for determining a vector in the input objective function;
The single processing element number determining module is used for determining the number of elements which are processed by the single instruction multiple data stream once based on the number of bits of single elements in the vector and the number of single processing data bits of the single instruction multiple data stream;
The element set determining module is used for determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements processed by the single-instruction multiple-data stream in a single mode, wherein the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single-instruction multiple-data stream in the single mode is a positive integer, and the second element set is a set of elements except the first element set in the vector;
A first element set processing module, configured to process the first element set based on the single instruction multiple data stream;
and the second element set processing module is used for processing the second element set based on a single instruction single data stream.
In another aspect there is provided an in-function data processing apparatus comprising a processor and a memory having stored therein at least one instruction, at least one program, code set or instruction set, the at least one instruction, at least one program, code set or instruction set being loaded and executed by the processor to implement an in-function data processing method as described above.
Another aspect provides a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes or a set of instructions, the at least one instruction, the at least one program, the set of codes or the set of instructions being loaded and executed by a processor to implement a method of data processing in a function as described above.
The method, the device, the equipment and the storage medium for processing the data in the function have the following technical effects:
The method and the device for processing the data in the single instruction multiple data stream determine the number of the elements which can be processed in one time by the single instruction multiple data stream according to the number of the single element in the input vector of the function and the number of the single instruction multiple data stream single processing data, determine the number of the elements distributed to the single instruction multiple data stream by combining the total number of the elements in the vector, and distribute the rest elements to the single instruction multiple data stream for processing.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data processing system in a function according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for processing data in a function according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for determining the number of elements for single processing of a single instruction multiple data stream according to an embodiment of the present application;
FIG. 4 is a flow chart of a method for determining a first set of elements and a second set of elements in the vector according to an embodiment of the present application;
FIG. 5 is a flow chart of another method for determining a first set of elements and a second set of elements in the vector according to an embodiment of the present application;
FIG. 6 is a flow chart of a method for determining a first number of elements in the first set of elements provided by an embodiment of the present application;
FIG. 7 is a flow chart of a method for processing the first element set based on the single instruction multiple data stream according to an embodiment of the present application;
FIG. 8 is a flow chart of a method for determining multiple element groups provided by an embodiment of the present application;
FIG. 9 is a flowchart of a method for processing data in vectors according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a data processing device in a function according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a structure of an element set determining module according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram of a data processing system in a function according to an embodiment of the present application, and as shown in fig. 1, the data processing system in the function may at least include a server 01 and a client 02.
In particular, in the embodiment of the present disclosure, the server 01 may include a server that operates independently, or a distributed server, or a server cluster that is formed by a plurality of servers. The server 01 may include a network communication unit, a processor, a memory, and the like. Specifically, the server 01 may be configured to receive an information query request sent by the client 02, and perform data processing by using a function to obtain query information.
Specifically, in the embodiment of the present disclosure, the client 02 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, an intelligent wearable device, or other types of physical devices, or may include software running in the physical devices, for example, web pages provided by some service providers to users, or may also provide applications provided by the service providers to users. Specifically, the client 02 may be configured to send an information query request to the server 01.
In the following description, a method for processing data in a function according to an embodiment of the present application is described, and fig. 2 is a schematic flow chart of a method for processing data in a function according to an embodiment of the present application, where the method includes steps as described in the examples or the flow chart, but may include more or less steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). As shown in fig. 2, the method may include:
S201, determining vectors in the input objective function.
In the embodiment of the present disclosure, the objective function may be a linear rectification function (RECTIFIED LINEAR Unit, reLU), which is also called a modified linear Unit, and is an activation function commonly used in an artificial neural network, and generally refers to a nonlinear function represented by a ramp function and its variants.
In the present embodiment, the objective function may include a vector or a plurality of vectors.
S203, determining the number of elements processed by the single-instruction multi-data stream at a time based on the number of bits of single elements in the vector and the number of single-processing data bits of the single-instruction multi-data stream.
In the embodiment of the present disclosure, a single instruction multiple data (Single Instruction Multiple Data, SIMD) refers to a class of instruction sets capable of simultaneously processing multiple data elements in a single instruction cycle, and uses data-level parallelism to improve the running efficiency, where the usage environment of such instructions is to perform the same processing on multiple data, so that a typical application scenario is the multimedia field, especially in the codec flow therein.
Single instruction multiple data flow means that only one instruction cycle is needed to batch process multiple data at the same time, and although the instruction cycle of the instruction itself may be longer than that of a general instruction, the overall consideration is to improve the processing efficiency of the data.
In this embodiment of the present disclosure, as shown in fig. 3, the determining, based on the number of bits of a single element in the vector and the number of bits of single-processing data of the single-instruction multiple-data stream, the number of elements of single-processing of the single-instruction multiple-data stream may include:
s2031 determining a number of bits of a single element in the vector;
s2033, calculating the ratio of the number of single-processing data bits of the single-instruction multi-data stream to the number of single-element bits in the vector to obtain the number of single-processing elements of the single-instruction multi-data stream.
In the embodiment of the present specification, the types of the elements in the vector are the same, the number of bits of each element is the same, the number of bits of a single element in the vector may be the binary number of bits of the element, for example, the number of bits of the element with the type of single-precision floating point number is 32 bits, the number of bits of the element with the type of double-precision floating point number is 64 bits, the SIMD instruction may include an asm_max instruction, and the number of data bits that the asm_max instruction may process at a time may be 128,256,512.
In some embodiments, the element type T of the vector is a single-precision (32-bit) or double-precision (64-bit) floating point number, and b1 represents the binary number of 1T element, and b1 is 32 or 64. As shown in fig. 9, the asm_max instruction may process b2 bits at a time, b2 may be 128,256,512. Obviously, b=b2/b 1 is the number of T elements that the asm_max instruction can handle at once, and b must be an integer power of 2. And b is the number of elements processed by the single instruction multiple data stream at a time.
S205, determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements processed by the single-instruction multiple-data stream in a single mode, wherein the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single-instruction multiple-data stream in the single mode is a positive integer, and the second element set is a set of elements except the first element set in the vector.
In this embodiment of the present disclosure, as shown in fig. 4, the determining, based on the number of elements in the vector and the number of elements processed by the single instruction multiple data stream at a time, the first element set and the second element set in the vector may include:
s2051, dividing the number of elements in the vector by the number of elements processed by the single instruction multiple data stream for one time to obtain quotient and remainder of the elements;
s2053, determining a first number of elements in the first element set based on the quotient of the elements and the number of elements processed by the single instruction multiple data stream at a time;
s2055, taking the remainder of the elements as a second number of elements in the second element set;
In the embodiment of the present specification, the remainder of the element is smaller than the number of elements that are processed by the single instruction multiple data stream at a time.
In this embodiment of the present disclosure, the second number may be determined according to the first number, and the total number of elements in the vector is subtracted from the first number to obtain the second number.
S2057 determining a first set of elements and a second set of elements in the vector based on the first number and the second number.
In some embodiments, as shown in fig. 5, the determining the first set of elements and the second set of elements in the vector based on the first number and the second number may include:
s20571, forming a first number of elements which are ranked at the front in the vector into the first element set;
s20573, forming a second element set by the second number of elements which are ranked later in the vector.
In some embodiments, a second set of elements may also be determined from the first set of elements, with elements of the vector other than the first set of elements constituting the second set of elements.
In some embodiments, a first number of elements at any position in the vector may also be grouped into the first set of elements, and the elements at the remaining positions may be grouped into the second set of elements. For example, the first number of elements adjacent to each other or the interval elements can be taken, and the positions of the elements in the first element set can be determined according to actual situations on the premise of ensuring the number of the elements in the first element set.
In the embodiment of the present specification, a first number of elements that are ranked farther back in the vector may also be formed into the first element set, and correspondingly, a second number of elements that are ranked farther front in the vector may be formed into the second element set.
In some embodiments, as shown in fig. 6, the determining the first number of elements in the first element set based on the quotient of the elements and the number of elements processed by the single instruction multiple data stream at a time may include:
S20531, calculating the product of the quotient of the element and the number of the elements processed by the single instruction multiple data stream at a time;
and S20533, taking the product of the quotient of the elements and the number of the elements processed by the single instruction multiple data stream at a time as the first number of the elements in the first element set.
S207, processing the first element set based on the single instruction multiple data stream.
In this embodiment of the present disclosure, as shown in fig. 7, the objective function is configured to determine a larger value in a corresponding element at a same location in a first vector and a second vector, where the first vector and the second vector have the same length, and the processing the first element set based on the single instruction multiple data stream may include:
S2071, determining a plurality of element groups based on a first element set in the first vector and a first element set in the second vector;
S2073, based on the single instruction multiple data stream, forming the element with larger value in each element group into a third element set.
In a specific embodiment, as shown in fig. 9, 03 is data corresponding to an element in a first vector, 04 is data corresponding to an element in a second vector, a first element set in two vectors is processed by using an asm_max instruction, the number of bits of data processed by the asm_max instruction at a time is b2, the number of data in the first element set is an integer multiple of b2, and elements (second element set) except for the first element set in the vectors are all processed by a single instruction single data stream (Single Instruction SINGLE DATA, SISD). SISD is a class of CPU instruction sets, and a SISD instruction can process a piece of data. The number of executions of the SISD instruction is determined according to the number of bits of the data in the first element set.
In the present embodiment, when the remainder of the element is 0, all elements in the vector can be processed by SIMD without the need for SISD to process any data.
In the embodiment of the specification, the data is processed according to the number of bits of the data in the vector and combining the SIMD and the SISD, and the data is processed by adopting the SIMD in the function based on the characteristic that the SIMD can process a plurality of data at one time, so that the processing efficiency of the data is improved, and the operation speed of the function is accelerated.
In some embodiments, as shown in fig. 8, the determining the plurality of element groups based on the first element set in the first vector and the first element set in the second vector may include:
s20711, acquiring two elements in the same position of a first element set in the first vector and a first element set in the second vector;
S20713, taking the two elements at the same position as an element group.
In the present description embodiment, the SIMD may process data in one or more element groups at a time.
S209, processing the second element set based on the single instruction single data stream.
In the embodiment of the present specification, the single instruction single data stream can only process one data in an element at a time.
In some embodiments, RELU functions may be y=max (x, 0);
Inputting a vector x with a type T and a length n bytes;
and outputting a vector y with a type T and a length n bytes.
The data processing method in the function can be implemented by adopting the following version one code.
Version one:
1. Initializing a vector y of length n bytes
2. Zero vector zero of length b bytes is initialized
I=0 (where i is an integer variable)
N1=n% b (where n1 means the remainder of n for b, where n is the length of the pointing quantity x, n has the same meaning as n, n1< b
For (i=0; i < n1; i=i+b) (where i=0 refers to a cycle start condition; i < n1 is used to define a cycle end condition, and when this condition is not met, the cycle exits; i=i+b refers to adding i to b for each execution of the cycle)
Y [ i:i+b ] = asm_max (x [ i:i+b ], zero) (where asm_max is a SIMD instruction, x [ i:i+b ] refers to the i-th byte through i+b-1-th byte in x)
7.endfor
For (; i < n; i=i+1) (where i < n is used to define a loop termination condition, and when this condition is not met, loop exit; i=i+1 means that i is added to 1 for each execution of a loop)
Y [ i ] =asm_max (x [ i ], 0) (where asm_max is the SISD instruction here)
10.endfor
11. Returning to y.
In other embodiments, the data processing method in the function can also be circularly expanded to form k paths, wherein k is the integer power of 2, the k value can be determined according to the number of registers in a Central Processing Unit (CPU), the number of the registers is large, the number of the registers is small, and the data processing method in the function can be realized by adopting codes of the following version two by utilizing all the registers.
Version two:
1. initializing a vector y with a length of n bytes;
2. initializing a zero vector zero of length b bytes;
i=0, (where i is an integer variable)
N1=n% (b×k) (where n1 is the remainder of n for b×k, where n is the length of the pointing quantity x, n is the same as n in the following code, n1< b×k)
For (i=0; i < n1; i=i+b×k) (where i=0 refers to a cycle start condition; i < n1 is used to define a cycle end condition, and when this condition is not met, the cycle exits; i=i+b×k refers to adding i to b×k for each execution of the cycle
Y [ i:i+b ] = asm_max (x [ i:i+b ], zero) (where asm_max is a SIMD instruction, x [ i:i+b ] refers to the i-th byte through i+b-1-th byte in x)
Y [ i+b: i+b×2] =asm_max (x [ i+b: i+b×2], zero) (where asm_max is SIMD instruction, x [ i+b: i+b×2] refers to the i+b-th byte to i+b×2-1-th byte in x)
Y [ i+b+b+2:i+b+3 ] =asm_max (x [ i+b+2:i+b+3 ], zero) (where asm_max is a SIMD instruction, x [ i+b+2:i+b+3 ] refers to the i+b+2 th byte to the i+b+3-1 th byte in x)
9....
Y [ i+b (k-1): i+b (k-1) =asm_max (x [ i+b (k-1): i+b (k), zero) (where asm_max is a SIMD instruction, x [ i+b (k-1): i+b (k) refers to the i+b (k-1) th byte in x)
11.endfor
For (; i < n; i=i+1) (where i < n is used to define a loop termination condition, and when this condition is not met, loop exit; i=i+1 means that i is added to 1 for each execution of a loop)
Y [ i ] =asm_max (x [ i ], 0) (where asm_max is the SISD instruction here)
14.endfor
15. Returning to y.
The method comprises the steps of calling a SIMD instruction once by a first version and calling a plurality of SIMD instructions by a second version, wherein if the step 5 of the first version needs to be executed m times, the step 5 of the second version only needs to be executed m/k times, the time consumed by the step 5 of the second version is less, the time spent by the step two on loop control is 1/k of the step one, and the execution times of the step 6 of the step one and the step 6-10 of the step two in a loop body are the same, so that the speed of the step two is faster.
In a specific embodiment, the SIMD-based double-precision RELU function is 3-8 times the computation speed of the SISD-based RELU function, depending on the dimensions of the input vector and the type of floating point number. For example, in looking at the recommended service, compared with the prior art, the technical scheme of the application reduces the delay of the back end by about 5 milliseconds.
The technical scheme provided by the embodiment of the specification can be seen that the embodiment of the specification determines the number of the elements which can be processed in one time by the single-instruction multi-data stream according to the number of the single elements in the input vector of the function and the single-processing data number of the single-instruction multi-data stream, determines the number of the elements distributed to the single-instruction multi-data stream by combining the total number of the elements in the vector, distributes the rest elements to the single-instruction single-data stream for processing, and can process a plurality of elements in one time based on the single-instruction multi-data stream.
The embodiment of the application also provides a device for processing data in a function, as shown in fig. 10, the device may include:
a vector determination module 1010, configured to determine a vector in the input objective function;
a single processing element number determining module 1020, configured to determine the number of elements that are single processed by the single instruction multiple data stream based on the number of bits of a single element in the vector and the number of single processing data bits of the single instruction multiple data stream;
An element set determining module 1030, configured to determine a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements processed by the single instruction multiple data stream in a single process, where a ratio of a sum of the number of elements in the first element set to the number of elements processed by the single instruction multiple data stream in a single process is a positive integer, and the second element set is a set of elements in the vector except for the first element set;
a first element set processing module 1040 for processing the first element set based on the single instruction multiple data stream;
A second element set processing module 1050, configured to process the second element set based on a single instruction single data stream.
In some embodiments, the single processing element number determination module may include:
An element bit number determining unit for determining the bit number of a single element in the vector;
And the element number determining unit is used for calculating the ratio of the number of single-time processing data bits of the single-instruction multi-data stream to the number of single-time element bits in the vector to obtain the number of the single-time processing elements of the single-instruction multi-data stream.
In some embodiments, as shown in fig. 11, the element set determination module may include:
A quotient and remainder determining unit 1110, configured to divide the number of elements in the vector by the number of elements that are processed by the single instruction multiple data stream at a time, so as to obtain a quotient and remainder of the elements;
A first number determining unit 1120, configured to determine a first number of elements in the first element set based on a quotient of the elements and a number of elements that are processed by the single instruction multiple data stream at a single time;
A second number determining unit 1130, configured to use a remainder of the elements as a second number of elements in the second element set;
An element set determining unit 1140 is configured to determine a first element set and a second element set in the vector based on the first number and the second number.
In some embodiments, the element set determining unit may include:
A first element set determining subunit, configured to compose a first number of elements in the vector that are ranked first into the first element set;
A second element set determining subunit, configured to compose a second number of elements ordered later in the vector into the second element set.
In some embodiments, the first number determining unit may include:
a product calculating subunit, configured to calculate a product of the quotient of the element and the number of elements that are processed by the single instruction multiple data stream at a single time;
a first number determining subunit, configured to take a product of a quotient of the element and a number of elements that are processed by the single instruction multiple data stream at a time as a first number of elements in the first element set.
In some embodiments, the objective function is configured to determine a larger value in a corresponding element in a first vector and a second vector, the first vector and the second vector being the same length, and the first element set processing module may include:
an element group determining unit configured to determine a plurality of element groups based on a first element set in the first vector and a first element set in the second vector;
And the third element set determining unit is used for forming elements with larger values in each element group into a third element set based on the single-instruction multi-data stream.
In some embodiments, the element group determination unit may include:
an element obtaining subunit, configured to obtain two elements in the first vector, where the first element set is located at the same position as the first element set in the second vector;
and the element group determining subunit is used for taking the two elements at the same position as one element group.
The device and method embodiments in the device embodiments described are based on the same inventive concept.
The embodiment of the application provides a data processing device in a function, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the data processing method in the function provided by the embodiment of the method.
Embodiments of the present application also provide a computer readable storage medium, which may be provided in a server to store at least one instruction, at least one program, a code set, or a set of instructions for implementing a method of data processing in a function in a method embodiment, where the at least one instruction, the at least one program, the code set, or the set of instructions are loaded and executed by the processor to implement the method of data processing in a function provided in the method embodiment.
Alternatively, in the present description embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to, a U disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, etc. various media that can store program codes.
The memory according to the embodiments of the present disclosure may be used to store software programs and modules, and the processor executes the software programs and modules stored in the memory to perform various functional applications and data processing. The memory may mainly include a storage program area which may store an operating system, application programs required for functions, and the like, and a storage data area which may store data created according to the use of the device, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.
The embodiment of the method for processing data in the function provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or similar computing devices. Taking the operation on a server as an example, fig. 12 is a block diagram of a hardware structure of a server of a method for processing data in a function according to an embodiment of the present application. As shown in fig. 12, the server 1200 may vary considerably in configuration or performance and may include one or more central processing units (Central Processing Units, CPU) 1210 (the processor 1210 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), memory 1230 for storing data, one or more storage mediums 1220 (e.g., one or more mass storage devices) for storing applications 1223 or data 1222. Wherein memory 1230 and storage medium 1220 can be transitory or persistent. The program stored on the storage medium 1220 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 1210 may be configured to communicate with a storage medium 1220 and execute a series of instruction operations in the storage medium 1220 on the server 1200. The server 1200 may also include one or more power supplies 1260, one or more wired or wireless network interfaces 1250, one or more input/output interfaces 1240, and/or one or more operating systems 1221, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
The input-output interface 1240 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 1200. In one example, the input/output interface 1240 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the input/output interface 1240 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 12 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the server 1200 may also include more or fewer components than shown in fig. 12, or have a different configuration than shown in fig. 12.
The embodiment of the data processing method, the device, the equipment or the storage medium in the function provided by the application can be seen that the application determines the number of the elements which can be processed in one time by the single-instruction multi-data stream according to the number of the single element in the input vector of the function and the single-instruction multi-data stream single-processing data number, determines the number of the elements which are distributed to the single-instruction multi-data stream by combining the total number of the elements in the vector, distributes the rest elements to the single-instruction single-data stream for processing, and can process a plurality of elements in one time based on the single-instruction multi-data stream.
It should be noted that the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, device, storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.
Claims (14)
1. A method of processing data in a function, the method comprising:
determining a vector in the input objective function;
Determining the number of elements processed by the single-instruction multiple-data stream once based on the number of bits of single elements in the vector and the number of single-processing data bits of the single-instruction multiple-data stream;
Dividing the number of elements in the vector by the number of elements processed by the single instruction multiple data stream for one time to obtain quotient and remainder of the elements;
determining a first number of elements in a first element set based on a quotient of the elements and a number of elements processed in a single pass by the single instruction multiple data stream;
Taking the remainder of the elements as a second number of elements in a second element set;
Determining a first element set and a second element set in the vector based on the first quantity and the second quantity, wherein the ratio of the sum of the element numbers in the first element set to the element number of the single-instruction multiple-data stream single processing is a positive integer, and the second element set is a set of elements in the vector except the first element set;
processing the first set of elements based on the single instruction multiple data stream;
the second set of elements is processed based on a single instruction single data stream.
2. The method of claim 1, wherein determining the number of elements for a single processing of the single instruction multiple data stream based on the number of bits for a single element in the vector and the number of bits for a single processing of the single instruction multiple data stream comprises:
Determining the number of bits of a single element in the vector;
And calculating the ratio of the number of single-processing data bits of the single-instruction multi-data stream to the number of bits of single elements in the vector to obtain the number of elements processed by the single-instruction multi-data stream in a single time.
3. The method of claim 1, wherein the determining a first set of elements and a second set of elements in the vector based on the first number and the second number comprises:
Forming a first number of elements in the vector, which are ranked first, into the first element set;
and forming a second element set by a second number of elements which are ranked later in the vector.
4. The method of claim 1, wherein determining the first number of elements in the first set of elements based on the quotient of the elements and the number of elements that are processed in a single pass by the single instruction multiple data stream comprises:
Calculating the product of the quotient of the element and the number of the elements processed by the single instruction multiple data stream at a time;
And taking the product of the quotient of the element and the number of elements processed by the single instruction multiple data stream at a time as a first number of elements in the first element set.
5. The method of claim 1, wherein the objective function is used to determine a larger value in a co-located corresponding element in a first vector and a second vector, the first vector being the same length as the second vector, the processing the first set of elements based on the single instruction multiple data stream comprising:
determining a plurality of element groups based on a first element set in the first vector and a first element set in the second vector;
and based on the single-instruction multi-data stream, forming a third element set by the elements with larger values in each element group.
6. The method of claim 5, wherein the determining a plurality of element groups based on the first set of elements in the first vector and the first set of elements in the second vector comprises:
Acquiring two elements in the same position in a first element set in the first vector and a first element set in the second vector;
And taking the two elements at the same position as an element group.
7. A data processing apparatus in a function, the apparatus comprising:
a vector determination module for determining a vector in the input objective function;
The single processing element number determining module is used for determining the number of elements which are processed by the single instruction multiple data stream once based on the number of bits of single elements in the vector and the number of single processing data bits of the single instruction multiple data stream;
The element set determining module is used for determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements processed by the single-instruction multiple-data stream in a single mode, wherein the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single-instruction multiple-data stream in the single mode is a positive integer, and the second element set is a set of elements except the first element set in the vector;
A first element set processing module, configured to process the first element set based on the single instruction multiple data stream;
The second element set processing module is used for processing the second element set based on a single instruction single data stream;
The element set determining module comprises a quotient and remainder determining unit, a first quantity determining unit and a second quantity determining unit, wherein the quotient and remainder determining unit is used for dividing the number of elements in the vector by the number of elements processed by the single-instruction multiple data stream for one time to obtain the quotient and remainder of the elements, the first quantity determining unit is used for determining the first quantity of the elements in the first element set based on the quotient of the elements and the number of the elements processed by the single-instruction multiple data stream for one time, the second quantity determining unit is used for taking the remainder of the elements as the second quantity of the elements in the second element set, and the element set determining unit is used for determining the first element set and the second element set in the vector based on the first quantity and the second quantity.
8. The apparatus of claim 7, wherein the single-pass element number determination module comprises an element number determination unit configured to determine a number of bits of a single element in the vector, and an element number determination unit configured to calculate a ratio of a number of single-pass data bits of the single-instruction multiple-data stream to a number of bits of a single element in the vector, to obtain the number of elements single-pass processed by the single-instruction multiple-data stream.
9. The apparatus according to claim 7, wherein the element set determining unit includes:
A first element set determining subunit, configured to compose a first number of elements in the vector that are ranked first into the first element set;
A second element set determining subunit, configured to compose a second number of elements ordered later in the vector into the second element set.
10. The apparatus of claim 7, wherein the first number determination unit comprises a product calculation subunit operable to calculate a product of the quotient of the element and the number of elements that are processed in a single pass by the single instruction multiple data stream, and a first number determination subunit operable to take the product of the quotient of the element and the number of elements that are processed in a single pass by the single instruction multiple data stream as the first number of elements in the first element set.
11. The apparatus of claim 7, wherein the objective function is configured to determine a larger value in a co-located corresponding element of a first vector and a second vector, the first vector and the second vector being the same length, the first element set processing module comprising:
an element group determining unit configured to determine a plurality of element groups based on a first element set in the first vector and a first element set in the second vector;
And the third element set determining unit is used for forming elements with larger values in each element group into a third element set based on the single-instruction multi-data stream.
12. The apparatus according to claim 11, wherein the element group determination unit includes:
an element obtaining subunit, configured to obtain two elements in the first vector, where the first element set is located at the same position as the first element set in the second vector;
and the element group determining subunit is used for taking the two elements at the same position as one element group.
13. A data processing apparatus in a function, characterized in that the apparatus comprises a processor and a memory in which at least one instruction, at least one program, a set of codes or a set of instructions is stored, which at least one instruction, at least one program, set of codes or set of instructions is loaded and executed by the processor to implement the data processing method in a function according to any one of claims 1-6.
14. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the method of data processing in a function of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910726018.2A CN112346782B (en) | 2019-08-07 | 2019-08-07 | A method, device, equipment and storage medium for processing data in a function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910726018.2A CN112346782B (en) | 2019-08-07 | 2019-08-07 | A method, device, equipment and storage medium for processing data in a function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112346782A CN112346782A (en) | 2021-02-09 |
CN112346782B true CN112346782B (en) | 2024-12-13 |
Family
ID=74366600
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910726018.2A Active CN112346782B (en) | 2019-08-07 | 2019-08-07 | A method, device, equipment and storage medium for processing data in a function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112346782B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116719559B (en) * | 2022-07-20 | 2024-06-11 | 广州众远智慧科技有限公司 | Method and device for infrared scanning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1174353A (en) * | 1996-08-19 | 1998-02-25 | 三星电子株式会社 | Single-instruction-multiple-data processing using multiple banks of vector registers |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6292814B1 (en) * | 1998-06-26 | 2001-09-18 | Hitachi America, Ltd. | Methods and apparatus for implementing a sign function |
US20040006667A1 (en) * | 2002-06-21 | 2004-01-08 | Bik Aart J.C. | Apparatus and method for implementing adjacent, non-unit stride memory access patterns utilizing SIMD instructions |
CN103999045B (en) * | 2011-12-15 | 2017-05-17 | 英特尔公司 | Methods to optimize a program loop via vector instructions using a shuffle table and a blend table |
EP3563307B1 (en) * | 2017-02-23 | 2023-04-12 | Cerebras Systems Inc. | Accelerated deep learning |
-
2019
- 2019-08-07 CN CN201910726018.2A patent/CN112346782B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1174353A (en) * | 1996-08-19 | 1998-02-25 | 三星电子株式会社 | Single-instruction-multiple-data processing using multiple banks of vector registers |
Also Published As
Publication number | Publication date |
---|---|
CN112346782A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Muła et al. | Faster population counts using AVX2 instructions | |
CN110688158B (en) | Computing device and processing system of neural network | |
RU2263947C2 (en) | Integer-valued high order multiplication with truncation and shift in architecture with one commands flow and multiple data flows | |
CN109815406B (en) | Data processing and information recommendation method and device | |
CN111506520B (en) | Address generation method, related device and storage medium | |
CN110706147B (en) | Image processing environment determination method, device, electronic equipment and storage medium | |
US11520582B2 (en) | Carry chain for SIMD operations | |
Gupta et al. | Accelerating molecular sequence analysis using distributed computing environment | |
CN112346782B (en) | A method, device, equipment and storage medium for processing data in a function | |
Jinguji et al. | An FPGA realization of a random forest with k-means clustering using a high-level synthesis design | |
CN114428722B (en) | Hardware simulation method, device, equipment and storage medium | |
TW202219739A (en) | Execution of a conditional statement by an arithmetic and/or bitwise unit | |
Ghitza | Distinguishing Hecke eigenforms | |
CN110505276B (en) | Object matching method, device and system, electronic equipment and storage medium | |
Li et al. | HOM4PS-2.0 para: Parallelization of HOM4PS-2.0 for solving polynomial systems | |
CN110032407A (en) | Promote the method and device and electronic equipment of CPU parallel performance | |
CN116594763A (en) | Method and device for advanced scheduling of dynamic computational graph | |
Dubey et al. | GPU computing for compute-intensive scientific calculation | |
CN112308868B (en) | A method, device, equipment and storage medium for determining jagged points on image edges | |
CN109190039B (en) | Method and device for determining similar objects and computer readable storage medium | |
Plagianakos et al. | Locating and computing in parallel all the simple roots of special functions using PVM | |
CN111435938B (en) | Data request processing method, device and equipment | |
CN114064119A (en) | Optimization method and optimization system for non-multiply-add computing operations in FPGA hardware accelerator | |
Liang et al. | Parallel computation of standard competition rankings over a sorted array | |
US12099840B1 (en) | Throughput increase for tensor operations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TG01 | Patent term adjustment | ||
TG01 | Patent term adjustment |