CN111186139B

CN111186139B - Multi-level parallel slicing method for 3D printing model

Info

Publication number: CN111186139B
Application number: CN201911355386.7A
Authority: CN
Inventors: 谷建华; 董旭伟; 赵天海; 王云岚; 侯正雄; 曹梓祥; 李超; 吴婕菲
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2022-03-15
Anticipated expiration: 2039-12-25
Also published as: CN111186139A

Abstract

The invention relates to a multi-level parallel slicing method for a 3D printing model. The method accelerates the slicing of a three-dimensional model by means of multi-level parallel processing, and specifically includes four levels of slicing parallelization, which are computing node level and multi-GPU level respectively. , thread block level and thread level. In each level of parallelization, according to the address space distribution, memory access method and data structure characteristics of the current level, the corresponding task division and data interaction scheme is designed to balance the load of each parallel execution unit and reduce the data traffic. In the thread-level parallelization, the atomic addition of the array index is adopted to solve the data race problem during parallel insertion. The invention can effectively reduce the time-consuming of three-dimensional model slice processing without reducing the original slice precision. At the same time, through the parallel reading of the corresponding sub-model files by each computing node, the hard disk I/O time can be reduced; each computing node only needs to process the sub-model data, which reduces the memory usage.

Description

A Multi-level Parallel Slicing Method for 3D Printing Models

技术领域technical field

本发明属于3D打印技术领域，具体涉及一种3D打印模型的多层次并行切片方法。The invention belongs to the technical field of 3D printing, and in particular relates to a multi-level parallel slicing method of a 3D printing model.

背景技术Background technique

3D打印是指对三维实体进行数字建模和分层，再通过某种材料和相关支持技术对物体进行逐层累加制造的过程。分层切片过程沟通了物品存储在计算机中的三维模型表示和最终加工出来的成品，三维数字模型只有预先被处理分层为一系列的二维平面数据，才能被3D打印设备识别并执行实际物品的打印。3D printing refers to the process of digital modeling and layering of three-dimensional entities, and then layer-by-layer additive manufacturing of objects through certain materials and related supporting technologies. The layered slicing process communicates the 3D model representation of the item stored in the computer and the final processed product. Only when the 3D digital model is pre-processed and layered into a series of 2D plane data can the 3D printing equipment recognize and execute the actual item. print.

STL是一种可在CAD和打印设备之间进行协作的数据格式，它是3D打印领域事实上的文件格式标准，它使用大量三角形面片来近似表示原始CAD模型，STL文件中记录了各三角面片的顶点信息和单位法向量。随着工业需求的不断提高，待打印制造物品的三维模型表示越来越复杂，对成品的精度的要求也越来越高，这些因素使得对三维模型的切片处理过程的耗时大幅度增加，逐渐成为制约3D打印整体生产效率提升的一个瓶颈。STL is a data format that can collaborate between CAD and printing equipment. It is the de facto file format standard in the field of 3D printing. It uses a large number of triangle patches to approximate the original CAD model. The STL file records each triangle Vertex information and unit normal vector for the patch. With the continuous improvement of industrial demand, the representation of the 3D model of the item to be printed is becoming more and more complex, and the requirements for the accuracy of the finished product are getting higher and higher. These factors have greatly increased the time-consuming process of slicing the 3D model. It has gradually become a bottleneck restricting the improvement of the overall production efficiency of 3D printing.

文献“STL模型的分层邻接排序快速切片算法，计算机辅助设计与图形学学报，2011，Vol23(4)，p600-606”公开了一种STL模型的分层邻接排序快速切片算法。该方法首先求出模型内各三角面片在切片方向上投影的最大与最小坐标值，结合各切平面的坐标值，进而确定与每个三角面片相交的切平面，并求出三角面片与切平面的交点坐标。此外，对于各切片层，按照三角面片间的邻接关系构造交点链表，对交点进行存储。文献所述方法相较于传统基于拓扑信息提取及矩形分组的切片方法取得了性能提升。文献所述方法本质上仍是一种串行算法，未对切片问题本身所具备的并行化处理潜力进行分析及加以利用，在对较大规模三维模型进行切片时，耗时依然很长，影响切片处理效率。The document "Hierarchical Adjacency Sorting Fast Slicing Algorithm for STL Model, Journal of Computer Aided Design and Graphics, 2011, Vol23(4), p600-606" discloses a hierarchical adjacency sorting fast slicing algorithm for STL model. This method firstly obtains the maximum and minimum coordinate values of each triangular facet in the model projected in the slicing direction, and then combines the coordinate values of each tangent plane to determine the tangent plane that intersects each triangular facet, and obtains the triangular facet. The coordinates of the intersection with the tangent plane. In addition, for each slice layer, a linked list of intersection points is constructed according to the adjacency relationship between the triangular facets, and the intersection points are stored. Compared with the traditional slicing method based on topology information extraction and rectangular grouping, the method described in the literature has achieved a performance improvement. The method described in the literature is still a serial algorithm in essence, and the parallel processing potential of the slicing problem itself has not been analyzed and utilized. Slicing processing efficiency.

发明内容SUMMARY OF THE INVENTION

要解决的技术问题technical problem to be solved

为了减少三维模型切片处理过程的耗时，提升切片效率，本发明提出了一种处理3D打印模型的多层次并行切片方法。设计四级并行化切片方案，利用了切片问题本身具备的并行化潜力，依靠各执行单元并行处理达到了降低三维模型切片处理耗时的效果，并且不降低原有的切片精度。In order to reduce the time-consuming of the three-dimensional model slicing process and improve the slicing efficiency, the present invention proposes a multi-level parallel slicing method for processing 3D printing models. The four-level parallelization slicing scheme is designed, taking advantage of the parallelization potential of the slicing problem itself, and relying on the parallel processing of each execution unit to achieve the effect of reducing the time-consuming of 3D model slicing processing without reducing the original slicing accuracy.

技术方案Technical solutions

一种3D打印模型的多层次并行切片方法，其特征在于步骤如下：A multi-level parallel slicing method for 3D printing models, characterized in that the steps are as follows:

步骤1：执行计算节点级切片并行化，在集群内，以面片为基本单元在计算节点间做任务划分，根据划分结果分割原始模型文件得到多个子模型文件；Step 1: Execute the parallelization of computing node-level slices. In the cluster, use the patch as the basic unit to divide tasks among the computing nodes, and divide the original model file according to the division result to obtain multiple sub-model files;

步骤2：执行多GPU级切片并行化，在每个计算节点内，以面片为基本单元在多个GPU间做任务划分，根据划分结果将对应子模型文件分次读入内存并构造面片和顶点数组，将切片所需的面片和顶点数组从计算节点主存导入GPU内存；Step 2: Perform multi-GPU-level slice parallelization. In each computing node, use the patch as the basic unit to divide tasks among multiple GPUs, and read the corresponding sub-model files into the memory in stages according to the division results and construct the patch and vertex arrays, import the patches and vertex arrays required for slicing from the compute node main memory into GPU memory;

步骤3：执行线程块级切片并行化，将导入GPU内存的面片数组在各线程块之间划分为多个子面片数组；Step 3: Execute thread block-level slicing parallelization, and divide the patch array imported into GPU memory into multiple sub-patch arrays between thread blocks;

步骤4：执行线程级的切片并行化，将属于一个线程块的子面片数组在各线程之间划分，线程依次处理分配到的各面片，对每个面片求取与其相交的所有切平面，并计算切线段，线程将切线段互斥地插入对应层的切线段数组；Step 4: Execute thread-level slice parallelization, divide the sub-patch array belonging to a thread block among threads, the threads process the assigned patches in turn, and obtain all the slices that intersect with each patch. plane, and calculate the tangent segment, the thread inserts the tangent segment into the tangent segment array of the corresponding layer mutually exclusive;

步骤5：将切线段数组从各GPU内存导出到主存，在CPU端合并各GPU计算结果，在每层产生一个对应于当前计算节点的子切线段数组；Step 5: Export the tangent segment array from each GPU memory to the main memory, merge the calculation results of each GPU on the CPU side, and generate a sub-tangent segment array corresponding to the current computing node at each layer;

步骤6：在计算节点之间按层进行任务划分，根据划分结果按层收集合并子切线段数组，在分配得到的每一层中得到一个对应的切线段数组；Step 6: Divide tasks by layer among computing nodes, collect and merge sub-tangent segment arrays by layer according to the division result, and obtain a corresponding tangent segment array in each layer obtained by allocation;

步骤7：各计算节点对当前节点内存在的各层切线段数组依次处理，并行执行切线段连接生成层内切片轮廓线。Step 7: Each computing node sequentially processes the tangent segment arrays of each layer existing in the current node, and performs the tangent segment connection in parallel to generate the contour lines of the slices in the layer.

所述的步骤1中在计算节点间做任务划分是指，以面片为基本单元，以连续均分的方式划分原始模型文件，均分后的余数分配给最后一个计算节点；根据划分结果分割原始模型文件后形成若干子模型文件，一个计算节点分配一个。In the step 1, the task division between computing nodes refers to using the patch as the basic unit, dividing the original model file in a continuous and even way, and assigning the remainder after the dividing to the last computing node; dividing according to the division result. After the original model file, several sub-model files are formed, one for each computing node.

所述步骤2中在多个GPU间做任务划分指，以面片为基本单元，以连续均分的方式划分子模型文件，均分后的余数分配给最后一个GPU。The task division among multiple GPUs in the step 2 means that the sub-model files are divided into the sub-model files in a way of continuous equal division with the patch as the basic unit, and the remainder after equal division is allocated to the last GPU.

所述步骤2中子模型文件读入及构造数组是指，计算节点根据任务划分结果，分若干次读取分配到的子模型文件，读取次数等于计算节点中包含的GPU数。读取完毕后计算节点主存中包含同组数的模型数据，每组包含一个面片数组和一个顶点数组。The reading of the sub-model file and constructing the array in the step 2 means that the computing node reads the assigned sub-model file several times according to the task division result, and the number of reading times is equal to the number of GPUs included in the computing node. After reading, the main memory of the computing node contains the same number of model data, and each group contains a patch array and a vertex array.

所述步骤3中将面片在线程块之间划分，是首先采用连续均分的方式分配面片，对于均分后的余数采取一次一个依次分配给各线程块，使任务量分配均衡及每个线程块分到连续的面片。In the step 3, the patches are divided among the thread blocks. First, the patches are allocated in a continuous and equal manner, and the remainder after the equal division is allocated to each thread block one at a time, so that the task amount is distributed evenly and each thread block is distributed. Thread blocks are divided into consecutive patches.

所述步骤4中将子面片数组在线程之间划分，是采用一次一个在各线程间依次分配的方式，以使任务量分配均衡，及使相邻线程读取的内存单元连续，促成对GPU内存的合并读取以提升访存带宽。In the step 4, the sub-patch array is divided among threads, one at a time is used to allocate among each thread in turn, so as to balance the distribution of tasks, and to make the memory units read by adjacent threads continuous, which promotes the Coalescing reads of GPU memory to improve memory access bandwidth.

所述步骤4中将切线段互斥地插入数组，是指参与并行的线程在处理各自的三角面片时，计算出的切线段有几率属于同一层，需互斥地向同一数组中插入数据。互斥插入方式是，设置一个数组索引指向数组中下一个空闲的待插入位置，规定插入数据时需对数组索引执行原子加操作，并获取数组索引的当前值作为插入位置。In the step 4, the tangent segments are inserted into the array mutually exclusive, which means that when the threads participating in parallel process their respective triangular patches, the calculated tangent segments have a probability of belonging to the same layer, and the data needs to be inserted into the same array in a mutually exclusive manner. . The mutually exclusive insertion method is to set an array index to point to the next free position to be inserted in the array, specify that when inserting data, perform an atomic addition operation on the array index, and obtain the current value of the array index as the insertion position.

所述步骤5中合并各GPU计算结果，是在CPU端对切线段数据按层进行合并处理，即将属于同一层的切线段合并到同一个数组中。In the step 5, the calculation results of each GPU are merged, and the tangent segment data is merged by layers on the CPU side, that is, the tangent segments belonging to the same layer are merged into the same array.

所述步骤6中在计算节点间收集合并切线段数据，具体包括以下步骤：In the step 6, collecting and merging tangent segment data between computing nodes specifically includes the following steps:

1)以层为基本单元，以连续均匀划分的方式在计算节点间分配合并任务，均分后的余数分配给最后一个计算节点；1) Take the layer as the basic unit, distribute the merge task among the computing nodes in a continuous and evenly divided manner, and distribute the remainder after the equal division to the last computing node;

2)计算节点对分配到的层执行数据合并，收集所有计算节点中位于同一层的切线段数据，统一合并到一个数组中；切线段数据合并完成后，各计算节点丢弃不属于当前计算节点的层的数据，释放内存空间。2) The computing node performs data merging on the assigned layer, collects the tangent segment data located on the same layer in all computing nodes, and merges them into an array; after the tangent segment data is merged, each computing node discards the data that does not belong to the current computing node. Layer data, freeing up memory space.

有益效果beneficial effect

本发明提出的一种3D打印模型的多层次并行切片方法，该方法通过多层次并行处理的方式对三维模型切片进行加速，具体包含四个层次的切片并行化，分别为计算节点级、多GPU级、线程块级和线程级。在每一层次的并行化中，依据当前层次的地址空间分布、访存方式及数据结构特点设计了相应的任务划分与数据交互方案，使得各并行执行单元负载均衡及减少数据通信量。在线程级的并行化中，采用了对数组索引原子加的方式解决并行插入时的数据竞争问题。本发明能够在不降低原有切片精度的情况下，有效地降低三维模型切片处理耗时。同时，通过各计算节点对相应子模型文件的并行读取，能够降低硬盘I/O时间；各计算节点仅需对子模型数据进行处理，减少了内存占用量。The present invention proposes a multi-level parallel slicing method for 3D printing models. The method accelerates 3D model slicing by means of multi-level parallel processing, and specifically includes four levels of slicing parallelization, which are computing node level, multi-GPU level, thread block level, and thread level. In each level of parallelization, according to the address space distribution, memory access method and data structure characteristics of the current level, the corresponding task division and data interaction scheme is designed to balance the load of each parallel execution unit and reduce the data traffic. In the thread-level parallelization, the atomic addition of the array index is adopted to solve the data race problem during parallel insertion. The invention can effectively reduce the time-consuming of three-dimensional model slice processing without reducing the original slice precision. At the same time, through parallel reading of the corresponding sub-model files by each computing node, the hard disk I/O time can be reduced; each computing node only needs to process the sub-model data, which reduces the memory usage.

附图说明Description of drawings

图1是本发明提出的四级并行化切片方法的各级流程框图；Fig. 1 is the flow chart of each level of the four-level parallelization slicing method proposed by the present invention;

图2是本发明提出的3D打印模型的四级并行化切片的各级执行单元结构图；2 is a structural diagram of each level of execution unit of the four-level parallelized slice of the 3D printing model proposed by the present invention;

图3是本发明中计算节点级并行化任务划分示意图；3 is a schematic diagram of the division of parallelization tasks at the computing node level in the present invention;

图4是本发明中多GPU级并行化任务划分示意图；Fig. 4 is the multi-GPU level parallelization task division schematic diagram in the present invention;

图5是本发明中线程块级并行化任务划分示意图；Fig. 5 is the schematic diagram of thread block level parallelization task division in the present invention;

图6是本发明中线程级并行化任务划分示意图；6 is a schematic diagram of thread-level parallelization task division in the present invention;

图7是一实施例中两线程互斥地向同层切线段数组插入数据示意图；7 is a schematic diagram of inserting data into the tangent segment array of the same layer mutually exclusive by two threads in an embodiment;

图8是原有方案和当前方案的切片耗时对比图。FIG. 8 is a comparison diagram of slicing time consumption between the original scheme and the current scheme.

具体实施方式Detailed ways

现结合实施例、附图对本发明作进一步描述：The present invention will now be further described in conjunction with the embodiments and accompanying drawings:

参照图1-2，本发明通过多层次并行的方法提高三维模型的切片处理速度，具体包含四个层次的并行化。从自顶向下的角度看，分别为计算节点级、多GPU级、线程块级和线程级。1-2, the present invention improves the slice processing speed of the three-dimensional model through a multi-level parallel method, which specifically includes four levels of parallelization. From a top-down perspective, they are compute node level, multi-GPU level, thread block level, and thread level.

步骤一：执行计算节点级切片并行化，在集群内，以面片为基本单元在计算节点间做任务划分，根据划分结果分割原始模型文件，执行步骤二，完毕后在计算节点间收集合并切线段数据。Step 1: Execute the parallelization of computing node-level slices. In the cluster, use the patch as the basic unit to divide tasks among computing nodes, and divide the original model file according to the division result. Perform step 2. After completion, collect and merge tangents between computing nodes. segment data.

所述在计算节点间做任务划分是指，以面片为基本单元，以连续均分的方式划分原始模型文件，均分后的余数分配给最后一个计算节点。根据划分结果分割原始模型文件后形成若干子模型文件，一个计算节点分配一个。The division of tasks among the computing nodes means that the original model file is divided in a continuous and equal manner by taking the patch as the basic unit, and the remainder after the equal division is allocated to the last computing node. After dividing the original model file according to the division result, several sub-model files are formed, one for each computing node.

参照图3，一个STL模型文件包含的面片总数N，集群中计算节点数为n1，每个计算节点均分到的面片数量为unify_count1，最后一个进程分到的面片数量为last_count1，则：Referring to Figure 3, an STL model file contains the total number of patches N, the number of computing nodes in the cluster is n1, the number of patches allocated to each computing node is unify_count1, and the number of patches allocated to the last process is last_count1, then :

其中，

表示向下取整，％指求余数。按照上述计算出的面片数将原始STL模型文件分割为n1个子模型文件。in,

Indicates rounding down, and % refers to the remainder. Divide the original STL model file into n1 sub-model files according to the number of patches calculated above.

所述分割原始模型文件的益处有：The benefits of splitting the original model file are:

1)减少每个计算节点从硬盘读入内存的数据量，进而减少I/O时间；1) Reduce the amount of data that each computing node reads from the hard disk into the memory, thereby reducing the I/O time;

2)减少每个计算节点的内存使用量，使得在计算节点内存总量固定的情况下，能够对更大的模型切片；2) Reduce the memory usage of each computing node, so that larger models can be sliced when the total memory of the computing node is fixed;

3)各计算节点并行执行切片计算，带来计算时间的有效降低。3) Each computing node performs slice computation in parallel, which brings about an effective reduction in computation time.

所述在计算节点间收集合并切线段数据，具体包括以下步骤：The collecting and merging tangent segment data between computing nodes specifically includes the following steps:

1)以层为基本单元，以连续均匀划分的方式在计算节点间分配合并任务，均分后的余数分配给最后一个计算节点。1) Taking the layer as the basic unit, the merging task is distributed among the computing nodes in a continuous and evenly divided manner, and the remainder after the equal division is distributed to the last computing node.

每个计算节点均分得的层数为unify_layer_count，均分剩余的余数分配给最后一个计算节点，分得的层数为last_layer_count，则：The number of layers that each computing node is equally divided into is unify_layer_count, and the remainder of the average distribution is allocated to the last computing node, and the number of layers obtained is last_layer_count, then:

2)计算节点对分配到的层执行数据合并，收集所有计算节点中位于同一层的切线段数据，统一合并到一个数组中。在一个实施例中，使用MPI的MPI_Gather()接口对切线段数据进行合并操作，合并一层的数据调用一次此接口。切线段数据合并完成后，各计算节点丢弃不属于当前计算节点的层的数据，释放内存空间。2) The computing node performs data merging on the assigned layer, collects the tangent segment data located in the same layer in all computing nodes, and merges them into an array uniformly. In one embodiment, the MPI_Gather( ) interface of MPI is used to perform the merging operation on the tangent segment data, and this interface is called once for merging the data of one layer. After the tangent segment data is merged, each computing node discards the data of the layer that does not belong to the current computing node and releases the memory space.

步骤二：执行多GPU级切片并行化，在计算节点内，以面片为基本单元在多个GPU间做任务划分，根据划分结果将子模型文件分次读入内存并构造面片和顶点数组，将切片所需数据从计算节点主存导入GPU内存，执行步骤三，完毕后将切线段数据从各GPU内存导出到主存，在CPU端合并各GPU计算结果。Step 2: Execute multi-GPU-level slice parallelization. In the computing node, use the patch as the basic unit to divide tasks among multiple GPUs, read the sub-model files into the memory in stages according to the division results, and construct the patch and vertex arrays , import the data required for slicing from the main memory of the computing node into the GPU memory, and perform step 3. After completion, export the tangent segment data from each GPU memory to the main memory, and merge the calculation results of each GPU on the CPU side.

所述在多个GPU间做任务划分指，以面片为基本单元，以连续均分的方式划分子模型文件，均分后的余数分配给最后一个GPU。The division of tasks among multiple GPUs refers to using a patch as a basic unit to divide the sub-model files in a way of continuous equal division, and the remainder after equal division is allocated to the last GPU.

参照图4，每个计算节点中GPU数量为n2，当前计算节点分配到的子模型文件中包含的面片数量为count1，每个GPU均分得的面片数量为unify_count2，最后一个GPU分得的面片数量为last_count2，则：Referring to Figure 4, the number of GPUs in each computing node is n2, the number of patches included in the sub-model file allocated by the current computing node is count1, the number of patches each GPU is equally divided into is unify_count2, and the last GPU is assigned The number of patches is last_count2, then:

所述步骤二中子模型文件读入及构造数组是指，计算节点根据任务划分结果，分若干次读取分配到的子模型文件，读取次数等于计算节点中包含的GPU数。读取完毕后计算节点主存中包含同组数的模型数据，每组包含一个面片数组和一个顶点数组。The step 2 of reading the sub-model file and constructing the array means that the computing node reads the assigned sub-model file several times according to the task division result, and the number of reading times is equal to the number of GPUs included in the computing node. After reading, the main memory of the computing node contains the same number of model data, and each group contains a patch array and a vertex array.

其中，读取完毕后，计算节点主存中包含n2组数据，每组数据包含一个面片数组Face[]和一个顶点数组Vertex[]。在面片数组中，一个元素记录了一个面片中包含的三个顶点的索引，索引指顶点在Vertex[]数组中的下标。在顶点数组中，一个元素记录了顶点的x，y，z三维坐标系坐标。Among them, after the reading is completed, the main memory of the computing node contains n2 groups of data, and each group of data includes a face array Face[] and a vertex array Vertex[]. In the patch array, an element records the index of the three vertices contained in a patch, and the index refers to the subscript of the vertex in the Vertex[] array. In the vertex array, an element records the x, y, z coordinates of the vertex in the three-dimensional coordinate system.

进一步，所述合并各GPU计算结果，是在CPU端对切线段数据按层进行合并处理，即将属于同一层的切线段合并到同一个数组中。Further, the merging of the calculation results of each GPU is to merge the tangent segment data by layers at the CPU end, that is, merging the tangent segments belonging to the same layer into the same array.

在一个实施例中，使用OpenMP实现对计算节点内多GPU的控制，其中，创建的OpenMP线程数等于计算节点内的GPU数，一个OpenMP线程接管一个GPU计算卡。In one embodiment, OpenMP is used to implement control over multiple GPUs in a computing node, wherein the number of OpenMP threads created is equal to the number of GPUs in the computing node, and one OpenMP thread takes over one GPU computing card.

步骤三：执行线程块级切片并行化，将导入GPU内存的面片数组在各线程块之间划分，执行步骤四。Step 3: Execute thread block-level slice parallelization, divide the patch array imported into the GPU memory among thread blocks, and execute step 4.

所述将面片在线程块之间划分，是首先采用连续均分的方式分配面片，对于均分后的余数采取一次一个依次分配给各线程块，使任务量分配均衡及每个线程块分到连续的面片。The division of the dough among thread blocks is to firstly allocate the dough in a continuous and equal manner, and allocate the remainder after the equal division to each thread block one at a time, so that the task amount is distributed evenly and each thread block is distributed. Divide into consecutive slices.

参照图5，线程块数量为n3，GPU内存中Face[]数组的面片数为count2，均分后的余数为remain_count，每个线程块分到的面片数为face_count，每个线程块具有连续且唯一的编号id，则：Referring to Figure 5, the number of thread blocks is n3, the number of faces in the Face[] array in the GPU memory is count2, the remainder after equal division is remain_count, the number of faces allocated to each thread block is face_count, and each thread block has Consecutive and unique number id, then:

在一个实施例中，将线程块组织为一维的，并根据GPU中包含流多处理器的数量合理设置线程块数。In one embodiment, the thread blocks are organized as one-dimensional, and the number of thread blocks is reasonably set according to the number of stream multiprocessors included in the GPU.

步骤四：执行线程级的切片并行化，将属于一个线程块的子面片数组在各线程之间划分，线程依次处理分配到的各面片，对每个面片求取与其相交的所有切平面，并计算切线段，线程将切线段互斥地插入对应层的切线段数组。Step 4: Execute thread-level slice parallelization, divide the sub-patch array belonging to a thread block among threads, and the threads process the assigned patches in turn, and obtain all the slices that intersect with each patch. plane, and calculate the tangent segment, the thread inserts the tangent segment into the tangent segment array of the corresponding layer mutually exclusive.

参照图6，线程数量为n4，所述将子面片数组在线程之间划分，是采用一次一个在各线程间依次分配的方式，以使任务量分配均衡，及使相邻线程读取的内存单元连续，促成对GPU内存的合并读取以提升访存带宽。Referring to FIG. 6, the number of threads is n4, and the sub-patch array is divided among threads in a way of assigning one at a time among each thread, so as to balance the distribution of tasks and enable adjacent threads to read Memory cells are contiguous, enabling coalesced reads to GPU memory to increase memory access bandwidth.

所述将切线段互斥地插入数组，是指参与并行的线程在处理各自的三角面片时，计算出的切线段有几率属于同一层，需互斥地向同一数组中插入数据。互斥插入方式是，设置一个数组索引指向数组中下一个空闲的待插入位置，规定插入数据时需对数组索引执行原子加操作，并获取数组索引的当前值作为插入位置。Inserting the tangent segments into the array mutually exclusive means that when the parallel threads process their respective triangular patches, the calculated tangent segments have a probability of belonging to the same layer, and data needs to be inserted into the same array in a mutually exclusive manner. The mutually exclusive insertion method is to set an array index to point to the next free position to be inserted in the array, specify that when inserting data, perform an atomic addition operation on the array index, and obtain the current value of the array index as the insertion position.

在一个实施例中，将线程组织为一维的，一个线程块中包含的线程数通常为32的整数倍，依据GPU上寄存器等硬件资源数合理选取线程数。In one embodiment, the threads are organized into one dimension, the number of threads included in a thread block is usually an integer multiple of 32, and the number of threads is reasonably selected according to the number of hardware resources such as registers on the GPU.

参照图7，在一个实施例中，采用一个索引e记录arr数组当前待插入的位置，线程x与线程y分别计算出同一切平面与各自三角面片的切线段s1与s2，欲同时向arr数组插入，线程x执行AtomAdd(e)，Insert(s1)，线程y执行AtomAdd(e)，Insert(s2)，实现了向同一数组的互斥的正确插入。Referring to Figure 7, in one embodiment, an index e is used to record the current position to be inserted in the arr array, and the thread x and thread y calculate the tangent segments s1 and s2 of the same tangent plane and the respective triangular facets respectively, and want to simultaneously send the arr Array insertion, thread x executes AtomAdd(e), Insert(s1), thread y executes AtomAdd(e), Insert(s2), which realizes the correct insertion of mutual exclusion to the same array.

步骤五：各计算节点依据分配到的层，并行执行切线段连接生成层内切片轮廓线。Step 5: According to the assigned layer, each computing node executes the tangent segment connection in parallel to generate the slice outline in the layer.

其中，各计算节点分配到的层是指，步骤一中在计算节点间收集合并切线段数据时分配的层。在数据收集合并期间，切线段数据被均衡地按层划分给了集群内各计算节点。划分后各层数据互相独立，各计算节点能够并行执行切线段连接生成层内切片轮廓线，完成对模型的切片处理。The layer allocated to each computing node refers to the layer allocated when collecting and merging tangent segment data between computing nodes in step 1. During the data collection and merging, the tangent segment data is evenly divided by layers to each computing node in the cluster. After the division, the data of each layer is independent of each other, and each computing node can perform the tangent segment connection in parallel to generate the contour line of the slice in the layer, and complete the slice processing of the model.

参照图8，对原有方案和当前方案的切片耗时进行对比，其中原有方案指串行切片方案，当前方案指所述多层次并行切片方案，测试所用硬件设备为一个GPU集群，集群中包含两个计算节点，计算节点上CPU型号为Intel(R)Xeon(R)Gold 6132CPU@2.60GHz，每个计算节点上有三块GPU计算卡，型号为Tesla V100-PCIE-32GB。测试结果显示，所述多层次并行切片方案可以有效降低切片耗时，且模型规模越大，加速效果越明显。Referring to Figure 8, compare the slicing time consumption of the original scheme and the current scheme, where the original scheme refers to the serial slicing scheme, the current scheme refers to the multi-level parallel slicing scheme, and the hardware device used for the test is a GPU cluster. It includes two computing nodes. The CPU model on the computing node is Intel(R) Xeon(R) Gold 6132CPU@2.60GHz, and each computing node has three GPU computing cards, the model is Tesla V100-PCIE-32GB. The test results show that the multi-level parallel slicing scheme can effectively reduce the time-consuming of slicing, and the larger the model scale, the more obvious the acceleration effect.

Claims

1. A multi-level parallel slicing method of a 3D printing model is characterized by comprising the following steps:

step 1: executing parallelization of computing node level slices, performing task division among computing nodes by taking a patch as a basic unit in a cluster, and dividing an original model file according to a division result to obtain a plurality of sub-model files;

the task division is carried out among the computing nodes: dividing an original model file in a continuous equipartition mode by taking a patch as a basic unit, and distributing the equipartition remainder to the last computing node; dividing the original model file according to the division result to form a plurality of sub-model files, and allocating one computing node;

and after the original model file is divided according to the division result, a plurality of sub-model files are formed: the total number of patches N contained in an STL model file, the number of compute nodes in the cluster is N1, the number of patches that each compute node distributes to is uniform _ count1, and the number of patches that the last process distributes to is last _ count1, then:

wherein,

indicating rounding down,% indicates remainder; dividing the original STL model file into n1 sub model files according to the calculated number of the slices;

step 2: executing multi-GPU-level slice parallelization, performing task division among a plurality of GPUs by taking a patch as a basic unit in each computing node, reading corresponding sub-model files into a memory in a grading manner according to a division result, constructing patches and vertex arrays, and importing the patches and the vertex arrays required by the slices into the GPU memory from a computing node main memory;

and step 3: executing thread block level slice parallelization, and dividing a patch array led into a GPU memory into a plurality of sub-patch arrays among thread blocks;

and 4, step 4: executing thread-level slice parallelization, dividing sub-patch arrays belonging to a thread block among threads, sequentially processing the allocated patches by the threads, solving all tangent planes intersected with each patch, calculating tangent segments, and mutually exclusively inserting the tangent segments into the tangent segment arrays of the corresponding layers by the threads;

and 5: exporting the tangent line segment array from each GPU memory to a main memory, merging the calculation results of each GPU at a CPU end, and generating a sub-tangent line segment array corresponding to the current calculation node at each layer;

step 6: task division is carried out among the computing nodes according to layers, sub-tangent segment arrays are collected and merged according to the division results according to the layers, and a corresponding tangent segment array is obtained in each layer obtained through distribution;

and 7: and each computing node sequentially processes the tangent line array of each layer existing in the current node and parallelly executes the tangent line connection to generate the intra-layer tangent contour line.

2. The method according to claim 1, wherein in step 2, a task division is performed among the GPUs, a sub-model file is divided in a continuous and equal division manner by using a patch as a basic unit, and the remainder after the equal division is allocated to the last GPU.

3. The method for multilevel parallel slicing of 3D printing model according to claim 1, wherein the reading in and constructing the array of the sub-model file in the step 2 means that the sub-model file allocated is read by the computing node for a plurality of times according to the task division result, and the number of reading times is equal to the number of GPUs included in the computing node; after reading, the main memory of the computing node contains the model data with the same group number, and each group contains a patch array and a vertex array.

4. The method of claim 1, wherein in step 3, the patches are divided among the thread blocks, and the patches are allocated in a continuous averaging manner, and the remainder after averaging is sequentially allocated to the thread blocks one at a time, so that the task allocation is balanced and each thread block is divided into consecutive patches.

5. The method as claimed in claim 1, wherein the step 4 is performed by dividing the sub-patch array among the threads, and sequentially allocating the sub-patch arrays among the threads one at a time to balance the task allocation and to make the memory units read by the adjacent threads continuous, so as to facilitate the merged reading of the GPU memory and improve the memory access bandwidth.

6. The method for multi-level parallel slicing of 3D printing model according to claim 1, wherein the mutually exclusive insertion of the tangent segments into the array in step 4 means that the calculated probability of the tangent segments belonging to the same layer when the threads participating in the parallel processing the respective triangular patches is the same, and the mutually exclusive insertion of data into the same array is required; the exclusive insertion mode is that an array index is set to point to the next idle position to be inserted in the array, the atomic addition operation needs to be executed on the array index when data is specified to be inserted, and the current value of the array index is obtained as the insertion position.

7. The method as claimed in claim 1, wherein the step 5 of combining the GPU results is to combine the slice data in layers at the CPU, that is, combine the slices belonging to the same layer into the same array.

8. The method for multi-level parallel slicing of 3D printing model according to claim 1, wherein the step 6 of collecting merged tangent segment data among the computing nodes comprises the following steps:

1) taking a layer as a basic unit, distributing merging tasks among the computing nodes in a continuous and uniform dividing mode, and distributing the equalized remainder to the last computing node;

2) the computing nodes perform data merging on the distributed layers, collect tangent segment data located on the same layer in all the computing nodes, and merge the tangent segment data into an array in a unified manner; after the combination of the data of the tangent line segments is completed, each computing node discards the data of the layer which does not belong to the current computing node, and the memory space is released.