CN111186139A

CN111186139A - Multi-level parallel slicing method for 3D printing model

Info

Publication number: CN111186139A
Application number: CN201911355386.7A
Authority: CN
Inventors: 谷建华; 董旭伟; 赵天海; 王云岚; 侯正雄; 曹梓祥; 李超; 吴婕菲
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-05-22
Anticipated expiration: 2039-12-25
Also published as: CN111186139B

Abstract

The invention relates to a multi-level parallel slicing method for a 3D printing model. The method accelerates the slicing of a three-dimensional model by means of multi-level parallel processing, and specifically includes four levels of slicing parallelization, which are computing node level and multi-GPU level respectively. , thread block level and thread level. In each level of parallelization, according to the address space distribution, memory access method and data structure characteristics of the current level, the corresponding task division and data interaction scheme is designed to balance the load of each parallel execution unit and reduce the data traffic. In the thread-level parallelization, the atomic addition of the array index is adopted to solve the data race problem during parallel insertion. The invention can effectively reduce the time-consuming of three-dimensional model slice processing without reducing the original slice precision. At the same time, through the parallel reading of the corresponding sub-model files by each computing node, the hard disk I/O time can be reduced; each computing node only needs to process the sub-model data, which reduces the memory usage.

Description

Multi-level parallel slicing method for 3D printing model

Technical Field

The invention belongs to the technical field of 3D printing, and particularly relates to a multi-level parallel slicing method of a 3D printing model.

Background

3D printing refers to a process of performing digital modeling and layering on a three-dimensional entity and then performing layer-by-layer additive manufacturing on an object through a certain material and a related support technology. The layering and slicing process links the representation of a three-dimensional model of an article stored in a computer with a finished product finally processed, and the three-dimensional digital model can be identified by a 3D printing device and the printing of the actual article can be executed only by being processed and layered into a series of two-dimensional plane data in advance.

The STL is a data format that can cooperate between the CAD and the printing apparatus, and is a de facto file format standard in the field of 3D printing, which uses a large number of triangle patches to approximate an original CAD model, and vertex information and unit normal vectors of each triangle patch are recorded in the STL file. With the continuous improvement of the industrial demand, the three-dimensional model of the manufactured article to be printed is more and more complex in representation, the requirement on the precision of the finished product is higher and higher, the time consumption of the slicing processing process of the three-dimensional model is greatly increased due to the factors, and the factors gradually become a bottleneck for limiting the improvement of the whole production efficiency of 3D printing.

The document "STL model layered adjacency ordering fast slicing algorithm, computer aided design and graphics bulletin, 2011, Vol23(4), p 600-606" discloses a STL model layered adjacency ordering fast slicing algorithm. The method comprises the steps of firstly, solving the maximum and minimum coordinate values of each triangular patch projected in the slicing direction in a model, combining the coordinate values of each tangent plane, further determining the tangent plane intersected with each triangular patch, and solving the coordinates of the intersection point of each triangular patch and each tangent plane. In addition, for each sliced layer, an intersection chain table is constructed according to the adjacency relation between the triangular patches, and the intersections are stored. Compared with the traditional slicing method based on topology information extraction and rectangular grouping, the method disclosed by the document achieves performance improvement. The method described in the literature is still a serial algorithm in nature, the parallelization processing potential of the slicing problem is not analyzed and utilized, and the time consumption is still long when a large-scale three-dimensional model is sliced, so that the slicing processing efficiency is influenced.

Disclosure of Invention

Technical problem to be solved

In order to reduce time consumption of a three-dimensional model slicing processing process and improve slicing efficiency, the invention provides a multi-layer parallel slicing method for processing a 3D printing model. A four-level parallelization slicing scheme is designed, the parallelization potential of the slicing problem is utilized, the effect of reducing the time consumption of three-dimensional model slicing processing is achieved by means of parallel processing of all execution units, and the original slicing precision is not reduced.

Technical scheme

A multi-level parallel slicing method of a 3D printing model is characterized by comprising the following steps:

step 1: executing parallelization of computing node level slices, performing task division among computing nodes by taking a patch as a basic unit in a cluster, and dividing an original model file according to a division result to obtain a plurality of sub-model files;

step 2: executing multi-GPU-level slice parallelization, performing task division among a plurality of GPUs by taking a patch as a basic unit in each computing node, reading corresponding sub-model files into a memory in a grading manner according to a division result, constructing patches and vertex arrays, and importing the patches and the vertex arrays required by the slices into the GPU memory from a computing node main memory;

and step 3: executing thread block level slice parallelization, and dividing a patch array led into a GPU memory into a plurality of sub-patch arrays among thread blocks;

and 4, step 4: executing thread-level slice parallelization, dividing sub-patch arrays belonging to a thread block among threads, sequentially processing the allocated patches by the threads, solving all tangent planes intersected with each patch, calculating tangent segments, and mutually exclusively inserting the tangent segments into the tangent segment arrays of the corresponding layers by the threads;

and 5: exporting the tangent line segment array from each GPU memory to a main memory, merging the calculation results of each GPU at a CPU end, and generating a sub-tangent line segment array corresponding to the current calculation node at each layer;

step 6: task division is carried out among the computing nodes according to layers, sub-tangent segment arrays are collected and merged according to the division results according to the layers, and a corresponding tangent segment array is obtained in each layer obtained through distribution;

and 7: and each computing node sequentially processes the tangent line array of each layer existing in the current node and parallelly executes the tangent line connection to generate the intra-layer tangent contour line.

The step 1 of dividing tasks among the computing nodes means that a patch is taken as a basic unit, an original model file is divided in a continuous equipartition mode, and the equipartition remainder is distributed to the last computing node; and dividing the original model file according to the division result to form a plurality of sub-model files, wherein one sub-model file is allocated to one computing node.

In the step 2, a task division finger is made among the GPUs, a patch is taken as a basic unit, the sub-model file is divided in a continuous dividing mode, and the remainder after dividing is distributed to the last GPU.

And 2, reading in the sub-model files and constructing the array in the step 2 means that the sub-model files distributed to the calculation nodes are read for a plurality of times according to the task division result, wherein the reading times are equal to the number of GPUs contained in the calculation nodes. After reading, the main memory of the computing node contains the model data with the same group number, and each group contains a patch array and a vertex array.

In the step 3, the patch is divided among the thread blocks, the patch is firstly distributed in a continuous equipartition mode, and the remainders after equipartition are sequentially distributed to each thread block one at a time, so that the task amount distribution is balanced and each thread block is distributed to the continuous patch.

In the step 4, the sub-patch arrays are divided among the threads, and a mode of sequentially distributing the sub-patch arrays among the threads one at a time is adopted, so that the distribution of the task amount is balanced, the memory units read by adjacent threads are continuous, and the merged reading of the GPU memory is facilitated to improve the memory access bandwidth.

The mutually exclusive insertion of the tangent segments into the array in the step 4 means that when the threads participating in the parallel processing process respective triangular patches, the calculated tangent segments have the probability of belonging to the same layer, and data needs to be mutually exclusive inserted into the same array. The exclusive insertion mode is that an array index is set to point to the next idle position to be inserted in the array, the atomic addition operation needs to be executed on the array index when data is specified to be inserted, and the current value of the array index is obtained as the insertion position.

And 5, merging the calculation results of the GPUs, namely merging the tangent segment data according to layers at the CPU end, namely merging the tangent segments belonging to the same layer into the same array.

In the step 6, collecting merged tangent segment data among the computing nodes specifically includes the following steps:

1) taking a layer as a basic unit, distributing merging tasks among the computing nodes in a continuous and uniform dividing mode, and distributing the equalized remainder to the last computing node;

2) the computing nodes perform data merging on the distributed layers, collect tangent segment data located on the same layer in all the computing nodes, and merge the tangent segment data into an array in a unified manner; after the combination of the data of the tangent line segments is completed, each computing node discards the data of the layer which does not belong to the current computing node, and the memory space is released.

Advantageous effects

The invention provides a multi-level parallel slicing method for a 3D printing model, which accelerates three-dimensional model slicing in a multi-level parallel processing mode, and specifically comprises four levels of slicing parallelization, namely a computing node level, a multi-GPU level, a thread block level and a thread level. In the parallelization of each layer, a corresponding task division and data interaction scheme is designed according to the address space distribution, the access mode and the data structure characteristics of the current layer, so that the load of each parallel execution unit is balanced and the data communication amount is reduced. In thread-level parallelization, a data competition problem during parallel insertion is solved by adding array index atoms. The method can effectively reduce the time consumption of three-dimensional model slicing processing under the condition of not reducing the original slicing precision. Meanwhile, the I/O time of the hard disk can be reduced by reading the corresponding sub-model files in parallel by each computing node; each computing node only needs to process the sub-model data, and the memory occupation amount is reduced.

Drawings

FIG. 1 is a block diagram of the flow of each stage of a four-stage parallelized slicing method proposed by the present invention;

FIG. 2 is a diagram of the structure of each level of execution units of the four-level parallelized slice of the 3D printing model proposed by the present invention;

FIG. 3 is a diagram of compute node level parallelization task partitioning in the present invention;

FIG. 4 is a diagram of multi-GPU-level parallelization task partitioning according to the present invention;

FIG. 5 is a diagram of thread block level parallelization task partitioning according to the present invention;

FIG. 6 is a diagram of thread-level parallelization task partitioning according to the present invention;

FIG. 7 is a diagram illustrating the insertion of data into a same-tier array of tangent segments in a mutually exclusive manner by two threads, according to an embodiment;

FIG. 8 is a comparison of slicing time between the current scheme and the original scheme.

Detailed Description

The invention will now be further described with reference to the following examples and drawings:

referring to fig. 1-2, the invention improves the slicing processing speed of the three-dimensional model by a multi-level parallel method, specifically including four-level parallelization. From a top-down perspective, compute node level, multi-GPU level, thread block level, and thread level, respectively.

The method comprises the following steps: and executing parallelization of the computing node level slices, performing task division among the computing nodes by taking the surface patches as basic units in the cluster, dividing the original model file according to the division result, executing the step two, and collecting merged tangent segment data among the computing nodes after the step two is finished.

The task division among the computing nodes is to divide the original model file in a continuous equipartition mode by taking a patch as a basic unit, and the equipartition remainder is distributed to the last computing node. And dividing the original model file according to the division result to form a plurality of sub-model files, wherein one sub-model file is allocated to one computing node.

Referring to fig. 3, if the total number of patches N contained in an STL model file, the number of compute nodes in a cluster is N1, the number of patches that each compute node has to divide is count _ uniform 1, and the number of patches that the last process has to divide is last _ count1, then:

wherein,

indicating rounding down,% indicates the remainder. The original STL model file is divided into n1 sub model files according to the number of slices calculated above.

The benefits of segmenting the original model file are:

1) the data amount of each computing node read in the memory from the hard disk is reduced, and further the I/O time is reduced;

2) reducing the memory usage amount of each computing node, so that a larger model can be sliced under the condition that the total memory amount of the computing nodes is fixed;

3) each computing node executes slice computation in parallel, which brings about effective reduction of computation time.

The method for collecting merged tangent segment data among the computing nodes specifically comprises the following steps:

1) and distributing merging tasks among the computing nodes in a continuous and uniform division mode by taking the layer as a basic unit, and distributing the averaged remainder to the last computing node.

The number of layers obtained by each computing node is uniform _ layer _ count, the residual number obtained by the average is distributed to the last computing node, and the number of layers obtained by the average is last _ layer _ count, then:

2) and the computing nodes perform data merging on the distributed layers, collect tangent segment data positioned on the same layer in all the computing nodes, and uniformly merge the tangent segment data into an array. In one embodiment, the merge operation is performed on the sliced segment data using the MPI _ Gather () interface of MPI, which is called once by merging the data of one layer. After the combination of the data of the tangent line segments is completed, each computing node discards the data of the layer which does not belong to the current computing node, and the memory space is released.

Step two: and executing multi-GPU-level slice parallelization, performing task division among a plurality of GPUs by taking a patch as a basic unit in a computing node, reading sub-model files into a memory in a grading manner according to division results, constructing a patch and a vertex array, importing data required by the slice into a GPU memory from a computing node main memory, executing the step three, exporting tangent segment data from each GPU memory to a main memory after the execution, and merging each GPU computing result at a CPU (central processing unit) end.

The task division among the GPUs is realized by dividing the sub-model file in a continuous equal division mode by taking a patch as a basic unit, and distributing the remainder after the equal division to the last GPU.

Referring to fig. 4, if the number of GPUs in each compute node is n2, the number of patches included in the sub-model file allocated to the current compute node is count1, the number of patches evenly divided by each GPU is uniform _ count2, and the number of patches divided by the last GPU is last _ count2, then:

and in the second step, reading in the sub-model files and constructing the array refers to that the sub-model files distributed to the calculation nodes are read for a plurality of times according to the task division result, wherein the reading times are equal to the number of GPUs contained in the calculation nodes. After reading, the main memory of the computing node contains the model data with the same group number, and each group contains a patch array and a vertex array.

After reading, the main memory of the computing node contains n2 groups of data, and each group of data contains a patch array Face [ ] and a Vertex array Vertex [ ]. In the patch array, one element records the index of three vertices contained in one patch, and the index refers to the subscript of the Vertex in the Vertex [ ] array. In the vertex array, one element records the x, y, z three-dimensional coordinate system coordinates of the vertex.

Furthermore, the merging of the calculation results of the GPUs is to merge the tangent segment data in layers at the CPU end, that is, to merge the tangent segments belonging to the same layer into the same array.

In one embodiment, OpenMP is used to implement control of multiple GPUs in a compute node, where the number of OpenMP threads created is equal to the number of GPUs in the compute node, and one OpenMP thread takes over one GPU compute card.

Step three: and executing thread block level slice parallelization, dividing the patch array led into the GPU memory among all thread blocks, and executing the step four.

The dividing of the surface patch among the thread blocks is that firstly the surface patch is distributed in a continuous equal distribution mode, and the remainder after the equal distribution is sequentially distributed to each thread block one at a time, so that the distribution of the task amount is balanced and each thread block is divided into continuous surface patches.

Referring to fig. 5, if the number of thread blocks is n3, the number of patches in the Face [ ] array in the GPU memory is count2, the remainder after the division is remain _ count, the number of patches divided by each thread block is Face _ count, and each thread block has a continuous and unique number id, then:

in one embodiment, the thread blocks are organized into one dimension, and the number of thread blocks is reasonably set according to the number of streaming multiprocessors contained in the GPU.

Step four: and executing thread-level slice parallelization, dividing sub-patch arrays belonging to a thread block among threads, sequentially processing the allocated patches by the threads, solving all tangent planes intersected with each patch, calculating tangent segments, and mutually exclusively inserting the tangent segments into the tangent segment arrays of the corresponding layers by the threads.

Referring to fig. 6, the number of threads is n4, and the sub-patch arrays are divided among the threads, and are sequentially allocated among the threads one at a time, so that the task amount allocation is balanced, and the memory units read by adjacent threads are continuous, which facilitates the merged reading of the GPU memory to improve the memory access bandwidth.

The mutually exclusive insertion of the tangent segments into the array means that when the threads participating in the parallel processing process respective triangular patches, the calculated tangent segments have the probability of belonging to the same layer, and data needs to be mutually exclusive inserted into the same array. The exclusive insertion mode is that an array index is set to point to the next idle position to be inserted in the array, the atomic addition operation needs to be executed on the array index when data is specified to be inserted, and the current value of the array index is obtained as the insertion position.

In one embodiment, the threads are organized in one dimension, the number of threads included in a thread block is usually an integer multiple of 32, and the number of threads is chosen reasonably according to the number of hardware resources such as registers on the GPU.

Referring to fig. 7, in an embodiment, an index e is used to record a current position to be inserted into an arr array, a thread x and a thread y respectively calculate tangent segments s1 and s2 of the same tangent plane and respective triangle patches, and to Insert into the arr array at the same time, the thread x executes atomadd (e), Insert (s1), the thread y executes atomadd (e), and Insert (s2), so that correct insertion of mutual exclusion into the same array is achieved.

Step five: and executing the line cutting segment connection in parallel by each computing node according to the distributed layers to generate the slice contour lines in the layers.

The layer allocated to each computing node is the layer allocated when the combined tangent segment data is collected among the computing nodes in the step one. During data collection and merging, tangent segment data is divided into computing nodes in the cluster according to layers in a balanced mode. After division, the data of each layer are independent, and each computing node can execute segment cutting connection in parallel to generate an in-layer slice contour line so as to complete the slice processing of the model.

Referring to fig. 8, slicing time consumption of an original scheme and a current scheme is compared, wherein the original scheme refers to a serial slicing scheme, the current scheme refers to the multi-level parallel slicing scheme, a hardware device used for testing is a GPU cluster, the cluster comprises two computing nodes, the type of a CPU on the computing nodes is intel (r) xeon (r) Gold 6132CPU @2.60GHz, each computing node is provided with three GPU computing cards, and the type of each GPU computing card is Tesla V100-PCIE-32 GB. Test results show that the multi-level parallel slicing scheme can effectively reduce slicing time consumption, and the larger the model scale is, the more obvious the acceleration effect is.

Claims

1. a multi-level parallel slicing method of a 3D printing model is characterized in that the steps are as follows:

Step 1: Execute the parallelization of computing node-level slices. In the cluster, use the patch as the basic unit to divide tasks among the computing nodes, and divide the original model file according to the division result to obtain multiple sub-model files;

Step 2: Perform multi-GPU-level slice parallelization. In each computing node, use the patch as the basic unit to divide tasks among multiple GPUs, and read the corresponding sub-model files into the memory in stages according to the division results and construct the patch and vertex arrays, import the patches and vertex arrays required for slicing from the compute node main memory into GPU memory;

Step 3: Execute thread block-level slicing parallelization, and divide the patch array imported into GPU memory into multiple sub-patch arrays between thread blocks;

Step 4: Execute thread-level slice parallelization, divide the sub-patch array belonging to a thread block among threads, the threads process the assigned patches in turn, and obtain all the slices that intersect with each patch. plane, and calculate the tangent segment, the thread inserts the tangent segment into the tangent segment array of the corresponding layer mutually exclusive;

Step 5: Export the tangent segment array from each GPU memory to the main memory, merge the calculation results of each GPU on the CPU side, and generate a sub-tangent segment array corresponding to the current computing node at each layer;

Step 6: Divide tasks by layer among computing nodes, collect and merge sub-tangent segment arrays by layer according to the division result, and obtain a corresponding tangent segment array in each layer obtained by allocation;

Step 7: Each computing node sequentially processes the tangent segment arrays of each layer existing in the current node, and performs the tangent segment connection in parallel to generate the contour lines of the slices in the layer.

2. The multi-level parallel slicing method of a 3D printing model according to claim 1, characterized in that in said step 1, the task division between computing nodes refers to taking a patch as a basic unit, and using a continuous average The original model file is divided by dividing, and the remainder after the division is distributed to the last computing node; after dividing the original model file according to the division result, several sub-model files are formed, and one computing node is allocated one.

3. The multi-level parallel slicing method of a 3D printing model according to claim 1, characterized in that in the step 2, task division refers to a plurality of GPUs, taking the face patch as a basic unit, and dividing it continuously and equally. The sub-model file is divided by the method, and the remainder after the division is allocated to the last GPU.

4. The multi-level parallel slicing method of a 3D printing model according to claim 1, characterized in that in said step 2, reading in the sub-model file and constructing the array means that the computing node divides the result according to the task and divides it into several times. Read the assigned submodel file with a number of reads equal to the number of GPUs contained in the compute node. After reading, the main memory of the computing node contains the same number of model data, and each group contains a patch array and a vertex array.

5. The multi-level parallel slicing method of a 3D printing model according to claim 1, characterized in that in the step 3, the face pieces are divided among the thread blocks, and the face pieces are firstly distributed in a continuous and equal manner. , the remainder after the equal division is taken one at a time and distributed to each thread block in turn, so that the task amount is distributed evenly and each thread block is divided into consecutive patches.

6. The multi-level parallel slicing method of a 3D printing model according to claim 1, characterized in that in the step 4, the sub-patch array is divided between threads, which is to use one at a time to distribute among the threads in turn In this way, the task load is distributed evenly, and the memory units read by adjacent threads are continuous, which promotes the combined reading of GPU memory to improve the memory access bandwidth.

7. The multi-level parallel slicing method of a 3D printing model according to claim 1, wherein the tangent segments are inserted into the array mutually exclusive in the step 4, which means that the threads participating in parallel are processing the respective triangular faces When slicing, the calculated tangent segments have a probability of belonging to the same layer, and data must be inserted into the same array in a mutually exclusive manner. The mutually exclusive insertion method is to set an array index to point to the next free position to be inserted in the array, specify that when inserting data, perform an atomic addition operation on the array index, and obtain the current value of the array index as the insertion position.

8. the multi-level parallel slicing method of a kind of 3D printing model according to claim 1, it is characterized in that in the described step 5, merge each GPU calculation result, is to merge the tangent segment data by layer at the CPU end, about Tangent segments belonging to the same layer are merged into the same array.

9. The multi-level parallel slicing method of a 3D printing model according to claim 1, characterized in that in said step 6, collecting and merging tangent segment data between computing nodes, specifically comprising the following steps:

1) Take the layer as the basic unit, distribute the merge task among the computing nodes in a continuous and evenly divided manner, and distribute the remainder after the equal division to the last computing node;

2) The computing node performs data merging on the assigned layer, collects the tangent segment data located on the same layer in all computing nodes, and merges them into an array; after the tangent segment data is merged, each computing node discards the data that does not belong to the current computing node. Layer data, freeing up memory space.