CN111186139B - Multi-level parallel slicing method for 3D printing model - Google Patents
Multi-level parallel slicing method for 3D printing model Download PDFInfo
- Publication number
- CN111186139B CN111186139B CN201911355386.7A CN201911355386A CN111186139B CN 111186139 B CN111186139 B CN 111186139B CN 201911355386 A CN201911355386 A CN 201911355386A CN 111186139 B CN111186139 B CN 111186139B
- Authority
- CN
- China
- Prior art keywords
- array
- sub
- computing node
- level
- tangent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000010146 3D printing Methods 0.000 title claims abstract description 16
- 230000008569 process Effects 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000003780 insertion Methods 0.000 claims abstract description 10
- 230000037431 insertion Effects 0.000 claims abstract description 10
- 238000003491 array Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims 2
- 230000003993 interaction Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 230000006872 improvement Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 238000012966 insertion method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011960 computer-aided design Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B29—WORKING OF PLASTICS; WORKING OF SUBSTANCES IN A PLASTIC STATE IN GENERAL
- B29C—SHAPING OR JOINING OF PLASTICS; SHAPING OF MATERIAL IN A PLASTIC STATE, NOT OTHERWISE PROVIDED FOR; AFTER-TREATMENT OF THE SHAPED PRODUCTS, e.g. REPAIRING
- B29C64/00—Additive manufacturing, i.e. manufacturing of three-dimensional [3D] objects by additive deposition, additive agglomeration or additive layering, e.g. by 3D printing, stereolithography or selective laser sintering
- B29C64/30—Auxiliary operations or equipment
- B29C64/386—Data acquisition or data processing for additive manufacturing
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B33—ADDITIVE MANUFACTURING TECHNOLOGY
- B33Y—ADDITIVE MANUFACTURING, i.e. MANUFACTURING OF THREE-DIMENSIONAL [3-D] OBJECTS BY ADDITIVE DEPOSITION, ADDITIVE AGGLOMERATION OR ADDITIVE LAYERING, e.g. BY 3-D PRINTING, STEREOLITHOGRAPHY OR SELECTIVE LASER SINTERING
- B33Y50/00—Data acquisition or data processing for additive manufacturing
Landscapes
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Materials Engineering (AREA)
- Manufacturing & Machinery (AREA)
- Physics & Mathematics (AREA)
- Mechanical Engineering (AREA)
- Optics & Photonics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
本发明涉及一种3D打印模型的多层次并行切片方法,该方法通过多层次并行处理的方式对三维模型切片进行加速,具体包含四个层次的切片并行化,分别为计算节点级、多GPU级、线程块级和线程级。在每一层次的并行化中,依据当前层次的地址空间分布、访存方式及数据结构特点设计了相应的任务划分与数据交互方案,使得各并行执行单元负载均衡及减少数据通信量。在线程级的并行化中,采用了对数组索引原子加的方式解决并行插入时的数据竞争问题。本发明能够在不降低原有切片精度的情况下,有效地降低三维模型切片处理耗时。同时,通过各计算节点对相应子模型文件的并行读取,能够降低硬盘I/O时间;各计算节点仅需对子模型数据进行处理,减少了内存占用量。
The invention relates to a multi-level parallel slicing method for a 3D printing model. The method accelerates the slicing of a three-dimensional model by means of multi-level parallel processing, and specifically includes four levels of slicing parallelization, which are computing node level and multi-GPU level respectively. , thread block level and thread level. In each level of parallelization, according to the address space distribution, memory access method and data structure characteristics of the current level, the corresponding task division and data interaction scheme is designed to balance the load of each parallel execution unit and reduce the data traffic. In the thread-level parallelization, the atomic addition of the array index is adopted to solve the data race problem during parallel insertion. The invention can effectively reduce the time-consuming of three-dimensional model slice processing without reducing the original slice precision. At the same time, through the parallel reading of the corresponding sub-model files by each computing node, the hard disk I/O time can be reduced; each computing node only needs to process the sub-model data, which reduces the memory usage.
Description
技术领域technical field
本发明属于3D打印技术领域,具体涉及一种3D打印模型的多层次并行切片方法。The invention belongs to the technical field of 3D printing, and in particular relates to a multi-level parallel slicing method of a 3D printing model.
背景技术Background technique
3D打印是指对三维实体进行数字建模和分层,再通过某种材料和相关支持技术对物体进行逐层累加制造的过程。分层切片过程沟通了物品存储在计算机中的三维模型表示和最终加工出来的成品,三维数字模型只有预先被处理分层为一系列的二维平面数据,才能被3D打印设备识别并执行实际物品的打印。3D printing refers to the process of digital modeling and layering of three-dimensional entities, and then layer-by-layer additive manufacturing of objects through certain materials and related supporting technologies. The layered slicing process communicates the 3D model representation of the item stored in the computer and the final processed product. Only when the 3D digital model is pre-processed and layered into a series of 2D plane data can the 3D printing equipment recognize and execute the actual item. print.
STL是一种可在CAD和打印设备之间进行协作的数据格式,它是3D打印领域事实上的文件格式标准,它使用大量三角形面片来近似表示原始CAD模型,STL文件中记录了各三角面片的顶点信息和单位法向量。随着工业需求的不断提高,待打印制造物品的三维模型表示越来越复杂,对成品的精度的要求也越来越高,这些因素使得对三维模型的切片处理过程的耗时大幅度增加,逐渐成为制约3D打印整体生产效率提升的一个瓶颈。STL is a data format that can collaborate between CAD and printing equipment. It is the de facto file format standard in the field of 3D printing. It uses a large number of triangle patches to approximate the original CAD model. The STL file records each triangle Vertex information and unit normal vector for the patch. With the continuous improvement of industrial demand, the representation of the 3D model of the item to be printed is becoming more and more complex, and the requirements for the accuracy of the finished product are getting higher and higher. These factors have greatly increased the time-consuming process of slicing the 3D model. It has gradually become a bottleneck restricting the improvement of the overall production efficiency of 3D printing.
文献“STL模型的分层邻接排序快速切片算法,计算机辅助设计与图形学学报,2011,Vol23(4),p600-606”公开了一种STL模型的分层邻接排序快速切片算法。该方法首先求出模型内各三角面片在切片方向上投影的最大与最小坐标值,结合各切平面的坐标值,进而确定与每个三角面片相交的切平面,并求出三角面片与切平面的交点坐标。此外,对于各切片层,按照三角面片间的邻接关系构造交点链表,对交点进行存储。文献所述方法相较于传统基于拓扑信息提取及矩形分组的切片方法取得了性能提升。文献所述方法本质上仍是一种串行算法,未对切片问题本身所具备的并行化处理潜力进行分析及加以利用,在对较大规模三维模型进行切片时,耗时依然很长,影响切片处理效率。The document "Hierarchical Adjacency Sorting Fast Slicing Algorithm for STL Model, Journal of Computer Aided Design and Graphics, 2011, Vol23(4), p600-606" discloses a hierarchical adjacency sorting fast slicing algorithm for STL model. This method firstly obtains the maximum and minimum coordinate values of each triangular facet in the model projected in the slicing direction, and then combines the coordinate values of each tangent plane to determine the tangent plane that intersects each triangular facet, and obtains the triangular facet. The coordinates of the intersection with the tangent plane. In addition, for each slice layer, a linked list of intersection points is constructed according to the adjacency relationship between the triangular facets, and the intersection points are stored. Compared with the traditional slicing method based on topology information extraction and rectangular grouping, the method described in the literature has achieved a performance improvement. The method described in the literature is still a serial algorithm in essence, and the parallel processing potential of the slicing problem itself has not been analyzed and utilized. Slicing processing efficiency.
发明内容SUMMARY OF THE INVENTION
要解决的技术问题technical problem to be solved
为了减少三维模型切片处理过程的耗时,提升切片效率,本发明提出了一种处理3D打印模型的多层次并行切片方法。设计四级并行化切片方案,利用了切片问题本身具备的并行化潜力,依靠各执行单元并行处理达到了降低三维模型切片处理耗时的效果,并且不降低原有的切片精度。In order to reduce the time-consuming of the three-dimensional model slicing process and improve the slicing efficiency, the present invention proposes a multi-level parallel slicing method for processing 3D printing models. The four-level parallelization slicing scheme is designed, taking advantage of the parallelization potential of the slicing problem itself, and relying on the parallel processing of each execution unit to achieve the effect of reducing the time-consuming of 3D model slicing processing without reducing the original slicing accuracy.
技术方案Technical solutions
一种3D打印模型的多层次并行切片方法,其特征在于步骤如下:A multi-level parallel slicing method for 3D printing models, characterized in that the steps are as follows:
步骤1:执行计算节点级切片并行化,在集群内,以面片为基本单元在计算节点间做任务划分,根据划分结果分割原始模型文件得到多个子模型文件;Step 1: Execute the parallelization of computing node-level slices. In the cluster, use the patch as the basic unit to divide tasks among the computing nodes, and divide the original model file according to the division result to obtain multiple sub-model files;
步骤2:执行多GPU级切片并行化,在每个计算节点内,以面片为基本单元在多个GPU间做任务划分,根据划分结果将对应子模型文件分次读入内存并构造面片和顶点数组,将切片所需的面片和顶点数组从计算节点主存导入GPU内存;Step 2: Perform multi-GPU-level slice parallelization. In each computing node, use the patch as the basic unit to divide tasks among multiple GPUs, and read the corresponding sub-model files into the memory in stages according to the division results and construct the patch and vertex arrays, import the patches and vertex arrays required for slicing from the compute node main memory into GPU memory;
步骤3:执行线程块级切片并行化,将导入GPU内存的面片数组在各线程块之间划分为多个子面片数组;Step 3: Execute thread block-level slicing parallelization, and divide the patch array imported into GPU memory into multiple sub-patch arrays between thread blocks;
步骤4:执行线程级的切片并行化,将属于一个线程块的子面片数组在各线程之间划分,线程依次处理分配到的各面片,对每个面片求取与其相交的所有切平面,并计算切线段,线程将切线段互斥地插入对应层的切线段数组;Step 4: Execute thread-level slice parallelization, divide the sub-patch array belonging to a thread block among threads, the threads process the assigned patches in turn, and obtain all the slices that intersect with each patch. plane, and calculate the tangent segment, the thread inserts the tangent segment into the tangent segment array of the corresponding layer mutually exclusive;
步骤5:将切线段数组从各GPU内存导出到主存,在CPU端合并各GPU计算结果,在每层产生一个对应于当前计算节点的子切线段数组;Step 5: Export the tangent segment array from each GPU memory to the main memory, merge the calculation results of each GPU on the CPU side, and generate a sub-tangent segment array corresponding to the current computing node at each layer;
步骤6:在计算节点之间按层进行任务划分,根据划分结果按层收集合并子切线段数组,在分配得到的每一层中得到一个对应的切线段数组;Step 6: Divide tasks by layer among computing nodes, collect and merge sub-tangent segment arrays by layer according to the division result, and obtain a corresponding tangent segment array in each layer obtained by allocation;
步骤7:各计算节点对当前节点内存在的各层切线段数组依次处理,并行执行切线段连接生成层内切片轮廓线。Step 7: Each computing node sequentially processes the tangent segment arrays of each layer existing in the current node, and performs the tangent segment connection in parallel to generate the contour lines of the slices in the layer.
所述的步骤1中在计算节点间做任务划分是指,以面片为基本单元,以连续均分的方式划分原始模型文件,均分后的余数分配给最后一个计算节点;根据划分结果分割原始模型文件后形成若干子模型文件,一个计算节点分配一个。In the
所述步骤2中在多个GPU间做任务划分指,以面片为基本单元,以连续均分的方式划分子模型文件,均分后的余数分配给最后一个GPU。The task division among multiple GPUs in the
所述步骤2中子模型文件读入及构造数组是指,计算节点根据任务划分结果,分若干次读取分配到的子模型文件,读取次数等于计算节点中包含的GPU数。读取完毕后计算节点主存中包含同组数的模型数据,每组包含一个面片数组和一个顶点数组。The reading of the sub-model file and constructing the array in the
所述步骤3中将面片在线程块之间划分,是首先采用连续均分的方式分配面片,对于均分后的余数采取一次一个依次分配给各线程块,使任务量分配均衡及每个线程块分到连续的面片。In the
所述步骤4中将子面片数组在线程之间划分,是采用一次一个在各线程间依次分配的方式,以使任务量分配均衡,及使相邻线程读取的内存单元连续,促成对GPU内存的合并读取以提升访存带宽。In the step 4, the sub-patch array is divided among threads, one at a time is used to allocate among each thread in turn, so as to balance the distribution of tasks, and to make the memory units read by adjacent threads continuous, which promotes the Coalescing reads of GPU memory to improve memory access bandwidth.
所述步骤4中将切线段互斥地插入数组,是指参与并行的线程在处理各自的三角面片时,计算出的切线段有几率属于同一层,需互斥地向同一数组中插入数据。互斥插入方式是,设置一个数组索引指向数组中下一个空闲的待插入位置,规定插入数据时需对数组索引执行原子加操作,并获取数组索引的当前值作为插入位置。In the step 4, the tangent segments are inserted into the array mutually exclusive, which means that when the threads participating in parallel process their respective triangular patches, the calculated tangent segments have a probability of belonging to the same layer, and the data needs to be inserted into the same array in a mutually exclusive manner. . The mutually exclusive insertion method is to set an array index to point to the next free position to be inserted in the array, specify that when inserting data, perform an atomic addition operation on the array index, and obtain the current value of the array index as the insertion position.
所述步骤5中合并各GPU计算结果,是在CPU端对切线段数据按层进行合并处理,即将属于同一层的切线段合并到同一个数组中。In the
所述步骤6中在计算节点间收集合并切线段数据,具体包括以下步骤:In the step 6, collecting and merging tangent segment data between computing nodes specifically includes the following steps:
1)以层为基本单元,以连续均匀划分的方式在计算节点间分配合并任务,均分后的余数分配给最后一个计算节点;1) Take the layer as the basic unit, distribute the merge task among the computing nodes in a continuous and evenly divided manner, and distribute the remainder after the equal division to the last computing node;
2)计算节点对分配到的层执行数据合并,收集所有计算节点中位于同一层的切线段数据,统一合并到一个数组中;切线段数据合并完成后,各计算节点丢弃不属于当前计算节点的层的数据,释放内存空间。2) The computing node performs data merging on the assigned layer, collects the tangent segment data located on the same layer in all computing nodes, and merges them into an array; after the tangent segment data is merged, each computing node discards the data that does not belong to the current computing node. Layer data, freeing up memory space.
有益效果beneficial effect
本发明提出的一种3D打印模型的多层次并行切片方法,该方法通过多层次并行处理的方式对三维模型切片进行加速,具体包含四个层次的切片并行化,分别为计算节点级、多GPU级、线程块级和线程级。在每一层次的并行化中,依据当前层次的地址空间分布、访存方式及数据结构特点设计了相应的任务划分与数据交互方案,使得各并行执行单元负载均衡及减少数据通信量。在线程级的并行化中,采用了对数组索引原子加的方式解决并行插入时的数据竞争问题。本发明能够在不降低原有切片精度的情况下,有效地降低三维模型切片处理耗时。同时,通过各计算节点对相应子模型文件的并行读取,能够降低硬盘I/O时间;各计算节点仅需对子模型数据进行处理,减少了内存占用量。The present invention proposes a multi-level parallel slicing method for 3D printing models. The method accelerates 3D model slicing by means of multi-level parallel processing, and specifically includes four levels of slicing parallelization, which are computing node level, multi-GPU level, thread block level, and thread level. In each level of parallelization, according to the address space distribution, memory access method and data structure characteristics of the current level, the corresponding task division and data interaction scheme is designed to balance the load of each parallel execution unit and reduce the data traffic. In the thread-level parallelization, the atomic addition of the array index is adopted to solve the data race problem during parallel insertion. The invention can effectively reduce the time-consuming of three-dimensional model slice processing without reducing the original slice precision. At the same time, through parallel reading of the corresponding sub-model files by each computing node, the hard disk I/O time can be reduced; each computing node only needs to process the sub-model data, which reduces the memory usage.
附图说明Description of drawings
图1是本发明提出的四级并行化切片方法的各级流程框图;Fig. 1 is the flow chart of each level of the four-level parallelization slicing method proposed by the present invention;
图2是本发明提出的3D打印模型的四级并行化切片的各级执行单元结构图;2 is a structural diagram of each level of execution unit of the four-level parallelized slice of the 3D printing model proposed by the present invention;
图3是本发明中计算节点级并行化任务划分示意图;3 is a schematic diagram of the division of parallelization tasks at the computing node level in the present invention;
图4是本发明中多GPU级并行化任务划分示意图;Fig. 4 is the multi-GPU level parallelization task division schematic diagram in the present invention;
图5是本发明中线程块级并行化任务划分示意图;Fig. 5 is the schematic diagram of thread block level parallelization task division in the present invention;
图6是本发明中线程级并行化任务划分示意图;6 is a schematic diagram of thread-level parallelization task division in the present invention;
图7是一实施例中两线程互斥地向同层切线段数组插入数据示意图;7 is a schematic diagram of inserting data into the tangent segment array of the same layer mutually exclusive by two threads in an embodiment;
图8是原有方案和当前方案的切片耗时对比图。FIG. 8 is a comparison diagram of slicing time consumption between the original scheme and the current scheme.
具体实施方式Detailed ways
现结合实施例、附图对本发明作进一步描述:The present invention will now be further described in conjunction with the embodiments and accompanying drawings:
参照图1-2,本发明通过多层次并行的方法提高三维模型的切片处理速度,具体包含四个层次的并行化。从自顶向下的角度看,分别为计算节点级、多GPU级、线程块级和线程级。1-2, the present invention improves the slice processing speed of the three-dimensional model through a multi-level parallel method, which specifically includes four levels of parallelization. From a top-down perspective, they are compute node level, multi-GPU level, thread block level, and thread level.
步骤一:执行计算节点级切片并行化,在集群内,以面片为基本单元在计算节点间做任务划分,根据划分结果分割原始模型文件,执行步骤二,完毕后在计算节点间收集合并切线段数据。Step 1: Execute the parallelization of computing node-level slices. In the cluster, use the patch as the basic unit to divide tasks among computing nodes, and divide the original model file according to the division result. Perform
所述在计算节点间做任务划分是指,以面片为基本单元,以连续均分的方式划分原始模型文件,均分后的余数分配给最后一个计算节点。根据划分结果分割原始模型文件后形成若干子模型文件,一个计算节点分配一个。The division of tasks among the computing nodes means that the original model file is divided in a continuous and equal manner by taking the patch as the basic unit, and the remainder after the equal division is allocated to the last computing node. After dividing the original model file according to the division result, several sub-model files are formed, one for each computing node.
参照图3,一个STL模型文件包含的面片总数N,集群中计算节点数为n1,每个计算节点均分到的面片数量为unify_count1,最后一个进程分到的面片数量为last_count1,则:Referring to Figure 3, an STL model file contains the total number of patches N, the number of computing nodes in the cluster is n1, the number of patches allocated to each computing node is unify_count1, and the number of patches allocated to the last process is last_count1, then :
其中,表示向下取整,%指求余数。按照上述计算出的面片数将原始STL模型文件分割为n1个子模型文件。in, Indicates rounding down, and % refers to the remainder. Divide the original STL model file into n1 sub-model files according to the number of patches calculated above.
所述分割原始模型文件的益处有:The benefits of splitting the original model file are:
1)减少每个计算节点从硬盘读入内存的数据量,进而减少I/O时间;1) Reduce the amount of data that each computing node reads from the hard disk into the memory, thereby reducing the I/O time;
2)减少每个计算节点的内存使用量,使得在计算节点内存总量固定的情况下,能够对更大的模型切片;2) Reduce the memory usage of each computing node, so that larger models can be sliced when the total memory of the computing node is fixed;
3)各计算节点并行执行切片计算,带来计算时间的有效降低。3) Each computing node performs slice computation in parallel, which brings about an effective reduction in computation time.
所述在计算节点间收集合并切线段数据,具体包括以下步骤:The collecting and merging tangent segment data between computing nodes specifically includes the following steps:
1)以层为基本单元,以连续均匀划分的方式在计算节点间分配合并任务,均分后的余数分配给最后一个计算节点。1) Taking the layer as the basic unit, the merging task is distributed among the computing nodes in a continuous and evenly divided manner, and the remainder after the equal division is distributed to the last computing node.
每个计算节点均分得的层数为unify_layer_count,均分剩余的余数分配给最后一个计算节点,分得的层数为last_layer_count,则:The number of layers that each computing node is equally divided into is unify_layer_count, and the remainder of the average distribution is allocated to the last computing node, and the number of layers obtained is last_layer_count, then:
2)计算节点对分配到的层执行数据合并,收集所有计算节点中位于同一层的切线段数据,统一合并到一个数组中。在一个实施例中,使用MPI的MPI_Gather()接口对切线段数据进行合并操作,合并一层的数据调用一次此接口。切线段数据合并完成后,各计算节点丢弃不属于当前计算节点的层的数据,释放内存空间。2) The computing node performs data merging on the assigned layer, collects the tangent segment data located in the same layer in all computing nodes, and merges them into an array uniformly. In one embodiment, the MPI_Gather( ) interface of MPI is used to perform the merging operation on the tangent segment data, and this interface is called once for merging the data of one layer. After the tangent segment data is merged, each computing node discards the data of the layer that does not belong to the current computing node and releases the memory space.
步骤二:执行多GPU级切片并行化,在计算节点内,以面片为基本单元在多个GPU间做任务划分,根据划分结果将子模型文件分次读入内存并构造面片和顶点数组,将切片所需数据从计算节点主存导入GPU内存,执行步骤三,完毕后将切线段数据从各GPU内存导出到主存,在CPU端合并各GPU计算结果。Step 2: Execute multi-GPU-level slice parallelization. In the computing node, use the patch as the basic unit to divide tasks among multiple GPUs, read the sub-model files into the memory in stages according to the division results, and construct the patch and vertex arrays , import the data required for slicing from the main memory of the computing node into the GPU memory, and perform
所述在多个GPU间做任务划分指,以面片为基本单元,以连续均分的方式划分子模型文件,均分后的余数分配给最后一个GPU。The division of tasks among multiple GPUs refers to using a patch as a basic unit to divide the sub-model files in a way of continuous equal division, and the remainder after equal division is allocated to the last GPU.
参照图4,每个计算节点中GPU数量为n2,当前计算节点分配到的子模型文件中包含的面片数量为count1,每个GPU均分得的面片数量为unify_count2,最后一个GPU分得的面片数量为last_count2,则:Referring to Figure 4, the number of GPUs in each computing node is n2, the number of patches included in the sub-model file allocated by the current computing node is count1, the number of patches each GPU is equally divided into is unify_count2, and the last GPU is assigned The number of patches is last_count2, then:
所述步骤二中子模型文件读入及构造数组是指,计算节点根据任务划分结果,分若干次读取分配到的子模型文件,读取次数等于计算节点中包含的GPU数。读取完毕后计算节点主存中包含同组数的模型数据,每组包含一个面片数组和一个顶点数组。The
其中,读取完毕后,计算节点主存中包含n2组数据,每组数据包含一个面片数组Face[]和一个顶点数组Vertex[]。在面片数组中,一个元素记录了一个面片中包含的三个顶点的索引,索引指顶点在Vertex[]数组中的下标。在顶点数组中,一个元素记录了顶点的x,y,z三维坐标系坐标。Among them, after the reading is completed, the main memory of the computing node contains n2 groups of data, and each group of data includes a face array Face[] and a vertex array Vertex[]. In the patch array, an element records the index of the three vertices contained in a patch, and the index refers to the subscript of the vertex in the Vertex[] array. In the vertex array, an element records the x, y, z coordinates of the vertex in the three-dimensional coordinate system.
进一步,所述合并各GPU计算结果,是在CPU端对切线段数据按层进行合并处理,即将属于同一层的切线段合并到同一个数组中。Further, the merging of the calculation results of each GPU is to merge the tangent segment data by layers at the CPU end, that is, merging the tangent segments belonging to the same layer into the same array.
在一个实施例中,使用OpenMP实现对计算节点内多GPU的控制,其中,创建的OpenMP线程数等于计算节点内的GPU数,一个OpenMP线程接管一个GPU计算卡。In one embodiment, OpenMP is used to implement control over multiple GPUs in a computing node, wherein the number of OpenMP threads created is equal to the number of GPUs in the computing node, and one OpenMP thread takes over one GPU computing card.
步骤三:执行线程块级切片并行化,将导入GPU内存的面片数组在各线程块之间划分,执行步骤四。Step 3: Execute thread block-level slice parallelization, divide the patch array imported into the GPU memory among thread blocks, and execute step 4.
所述将面片在线程块之间划分,是首先采用连续均分的方式分配面片,对于均分后的余数采取一次一个依次分配给各线程块,使任务量分配均衡及每个线程块分到连续的面片。The division of the dough among thread blocks is to firstly allocate the dough in a continuous and equal manner, and allocate the remainder after the equal division to each thread block one at a time, so that the task amount is distributed evenly and each thread block is distributed. Divide into consecutive slices.
参照图5,线程块数量为n3,GPU内存中Face[]数组的面片数为count2,均分后的余数为remain_count,每个线程块分到的面片数为face_count,每个线程块具有连续且唯一的编号id,则:Referring to Figure 5, the number of thread blocks is n3, the number of faces in the Face[] array in the GPU memory is count2, the remainder after equal division is remain_count, the number of faces allocated to each thread block is face_count, and each thread block has Consecutive and unique number id, then:
在一个实施例中,将线程块组织为一维的,并根据GPU中包含流多处理器的数量合理设置线程块数。In one embodiment, the thread blocks are organized as one-dimensional, and the number of thread blocks is reasonably set according to the number of stream multiprocessors included in the GPU.
步骤四:执行线程级的切片并行化,将属于一个线程块的子面片数组在各线程之间划分,线程依次处理分配到的各面片,对每个面片求取与其相交的所有切平面,并计算切线段,线程将切线段互斥地插入对应层的切线段数组。Step 4: Execute thread-level slice parallelization, divide the sub-patch array belonging to a thread block among threads, and the threads process the assigned patches in turn, and obtain all the slices that intersect with each patch. plane, and calculate the tangent segment, the thread inserts the tangent segment into the tangent segment array of the corresponding layer mutually exclusive.
参照图6,线程数量为n4,所述将子面片数组在线程之间划分,是采用一次一个在各线程间依次分配的方式,以使任务量分配均衡,及使相邻线程读取的内存单元连续,促成对GPU内存的合并读取以提升访存带宽。Referring to FIG. 6, the number of threads is n4, and the sub-patch array is divided among threads in a way of assigning one at a time among each thread, so as to balance the distribution of tasks and enable adjacent threads to read Memory cells are contiguous, enabling coalesced reads to GPU memory to increase memory access bandwidth.
所述将切线段互斥地插入数组,是指参与并行的线程在处理各自的三角面片时,计算出的切线段有几率属于同一层,需互斥地向同一数组中插入数据。互斥插入方式是,设置一个数组索引指向数组中下一个空闲的待插入位置,规定插入数据时需对数组索引执行原子加操作,并获取数组索引的当前值作为插入位置。Inserting the tangent segments into the array mutually exclusive means that when the parallel threads process their respective triangular patches, the calculated tangent segments have a probability of belonging to the same layer, and data needs to be inserted into the same array in a mutually exclusive manner. The mutually exclusive insertion method is to set an array index to point to the next free position to be inserted in the array, specify that when inserting data, perform an atomic addition operation on the array index, and obtain the current value of the array index as the insertion position.
在一个实施例中,将线程组织为一维的,一个线程块中包含的线程数通常为32的整数倍,依据GPU上寄存器等硬件资源数合理选取线程数。In one embodiment, the threads are organized into one dimension, the number of threads included in a thread block is usually an integer multiple of 32, and the number of threads is reasonably selected according to the number of hardware resources such as registers on the GPU.
参照图7,在一个实施例中,采用一个索引e记录arr数组当前待插入的位置,线程x与线程y分别计算出同一切平面与各自三角面片的切线段s1与s2,欲同时向arr数组插入,线程x执行AtomAdd(e),Insert(s1),线程y执行AtomAdd(e),Insert(s2),实现了向同一数组的互斥的正确插入。Referring to Figure 7, in one embodiment, an index e is used to record the current position to be inserted in the arr array, and the thread x and thread y calculate the tangent segments s1 and s2 of the same tangent plane and the respective triangular facets respectively, and want to simultaneously send the arr Array insertion, thread x executes AtomAdd(e), Insert(s1), thread y executes AtomAdd(e), Insert(s2), which realizes the correct insertion of mutual exclusion to the same array.
步骤五:各计算节点依据分配到的层,并行执行切线段连接生成层内切片轮廓线。Step 5: According to the assigned layer, each computing node executes the tangent segment connection in parallel to generate the slice outline in the layer.
其中,各计算节点分配到的层是指,步骤一中在计算节点间收集合并切线段数据时分配的层。在数据收集合并期间,切线段数据被均衡地按层划分给了集群内各计算节点。划分后各层数据互相独立,各计算节点能够并行执行切线段连接生成层内切片轮廓线,完成对模型的切片处理。The layer allocated to each computing node refers to the layer allocated when collecting and merging tangent segment data between computing nodes in
参照图8,对原有方案和当前方案的切片耗时进行对比,其中原有方案指串行切片方案,当前方案指所述多层次并行切片方案,测试所用硬件设备为一个GPU集群,集群中包含两个计算节点,计算节点上CPU型号为Intel(R)Xeon(R)Gold 6132CPU@2.60GHz,每个计算节点上有三块GPU计算卡,型号为Tesla V100-PCIE-32GB。测试结果显示,所述多层次并行切片方案可以有效降低切片耗时,且模型规模越大,加速效果越明显。Referring to Figure 8, compare the slicing time consumption of the original scheme and the current scheme, where the original scheme refers to the serial slicing scheme, the current scheme refers to the multi-level parallel slicing scheme, and the hardware device used for the test is a GPU cluster. It includes two computing nodes. The CPU model on the computing node is Intel(R) Xeon(R) Gold 6132CPU@2.60GHz, and each computing node has three GPU computing cards, the model is Tesla V100-PCIE-32GB. The test results show that the multi-level parallel slicing scheme can effectively reduce the time-consuming of slicing, and the larger the model scale, the more obvious the acceleration effect.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911355386.7A CN111186139B (en) | 2019-12-25 | 2019-12-25 | Multi-level parallel slicing method for 3D printing model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911355386.7A CN111186139B (en) | 2019-12-25 | 2019-12-25 | Multi-level parallel slicing method for 3D printing model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111186139A CN111186139A (en) | 2020-05-22 |
CN111186139B true CN111186139B (en) | 2022-03-15 |
Family
ID=70703318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911355386.7A Active CN111186139B (en) | 2019-12-25 | 2019-12-25 | Multi-level parallel slicing method for 3D printing model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111186139B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113297400B (en) * | 2021-05-31 | 2024-04-30 | 西北工业大学 | Metadata extraction method of 3D printing model |
CN113681897B (en) * | 2021-08-27 | 2023-03-03 | 深圳市纵维立方科技有限公司 | Slice processing method, printing method, system, device and storage medium |
CN114311682B (en) * | 2022-03-03 | 2022-08-02 | 深圳市创想三维科技股份有限公司 | Model generation method, apparatus, device and storage medium |
CN115972584B (en) * | 2022-12-12 | 2024-07-16 | 西北工业大学 | Method for parallelizing slicing of additive manufacturing model based on cooperation of CPU and GPU |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101819675A (en) * | 2010-04-19 | 2010-09-01 | 浙江大学 | Method for quickly constructing bounding volume hierarchy (BVH) based on GPU |
CN101908087A (en) * | 2010-07-16 | 2010-12-08 | 清华大学 | Parallel Simulation Method of Integrated Circuit Power and Ground Network Based on GPU |
EP2750018A3 (en) * | 2012-12-27 | 2015-05-20 | LSI Corporation | Non-volatile memory program failure recovery via redundant arrays |
CN105630409A (en) * | 2014-11-25 | 2016-06-01 | Sap欧洲公司 | Dual data storage using an in-memory array and an on-disk page structure |
CN106202145A (en) * | 2016-06-17 | 2016-12-07 | 北京四维新世纪信息技术有限公司 | A kind of preprocessing of remote sensing images system of Data-intensive computing |
CN106686352A (en) * | 2016-12-23 | 2017-05-17 | 北京大学 | Real-time processing method of multi-channel video data on multi-GPU platform |
CN106846236A (en) * | 2016-12-26 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of expansible distributed GPU accelerating method and devices |
CN107563955A (en) * | 2017-09-12 | 2018-01-09 | 武汉锐思图科技有限公司 | A kind of parallel map dicing method and system based on GPU |
CN109159425A (en) * | 2018-08-21 | 2019-01-08 | 东莞中国科学院云计算产业技术创新与育成中心 | Three-dimensional model slicing method and three-dimensional printing device |
CN109857543A (en) * | 2018-12-21 | 2019-06-07 | 中国地质大学(北京) | A kind of streamline simulation accelerated method calculated based on the more GPU of multinode |
US10379868B1 (en) * | 2019-02-04 | 2019-08-13 | Bell Integrator Inc. | Optimization method with parallel computations |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10606353B2 (en) * | 2012-09-14 | 2020-03-31 | Interaxon Inc. | Systems and methods for collecting, analyzing, and sharing bio-signal and non-bio-signal data |
CN104239133B (en) * | 2014-09-26 | 2018-03-30 | 北京国双科技有限公司 | A kind of log processing method, device and server |
US10474458B2 (en) * | 2017-04-28 | 2019-11-12 | Intel Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
CN108555301B (en) * | 2018-05-03 | 2020-09-29 | 温州职业技术学院 | A partitioned parallel three-dimensional printing forming method for large precision metal parts |
CN110349255B (en) * | 2019-07-15 | 2023-04-25 | 万东百胜(苏州)医疗科技有限公司 | Organ ultrasonic modeling 3D printing method |
-
2019
- 2019-12-25 CN CN201911355386.7A patent/CN111186139B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101819675A (en) * | 2010-04-19 | 2010-09-01 | 浙江大学 | Method for quickly constructing bounding volume hierarchy (BVH) based on GPU |
CN101908087A (en) * | 2010-07-16 | 2010-12-08 | 清华大学 | Parallel Simulation Method of Integrated Circuit Power and Ground Network Based on GPU |
EP2750018A3 (en) * | 2012-12-27 | 2015-05-20 | LSI Corporation | Non-volatile memory program failure recovery via redundant arrays |
CN105630409A (en) * | 2014-11-25 | 2016-06-01 | Sap欧洲公司 | Dual data storage using an in-memory array and an on-disk page structure |
CN106202145A (en) * | 2016-06-17 | 2016-12-07 | 北京四维新世纪信息技术有限公司 | A kind of preprocessing of remote sensing images system of Data-intensive computing |
CN106686352A (en) * | 2016-12-23 | 2017-05-17 | 北京大学 | Real-time processing method of multi-channel video data on multi-GPU platform |
CN106846236A (en) * | 2016-12-26 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of expansible distributed GPU accelerating method and devices |
CN107563955A (en) * | 2017-09-12 | 2018-01-09 | 武汉锐思图科技有限公司 | A kind of parallel map dicing method and system based on GPU |
CN109159425A (en) * | 2018-08-21 | 2019-01-08 | 东莞中国科学院云计算产业技术创新与育成中心 | Three-dimensional model slicing method and three-dimensional printing device |
CN109857543A (en) * | 2018-12-21 | 2019-06-07 | 中国地质大学(北京) | A kind of streamline simulation accelerated method calculated based on the more GPU of multinode |
US10379868B1 (en) * | 2019-02-04 | 2019-08-13 | Bell Integrator Inc. | Optimization method with parallel computations |
Non-Patent Citations (1)
Title |
---|
3D打印技术中的数据文件格式;李彦生,尚奕彤,袁艳萍,陈继民,李东方,王颖,刘春春,窦阳;《北京工业大学学报》;20160713;第1009-1016页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111186139A (en) | 2020-05-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111186139B (en) | Multi-level parallel slicing method for 3D printing model | |
Keuper et al. | Distributed training of deep neural networks: Theoretical and practical limits of parallel scalability | |
US20200257268A1 (en) | Creating a voxel representation of a three dimensional (3-d) object | |
KR20130016120A (en) | System, method, and computer-readable recording medium for constructing an acceleration structure | |
CN101639769B (en) | Method and device for splitting and sequencing dataset in multiprocessor system | |
CN105022670B (en) | Heterogeneous distributed task processing system and its processing method in a kind of cloud computing platform | |
CN110516316B (en) | A GPU-accelerated method for solving Euler equations by discontinuous Galerkin method | |
CN114490011B (en) | Parallel acceleration implementation method of N-body simulation in heterogeneous architecture | |
CN106845536B (en) | Parallel clustering method based on image scaling | |
CN110543663A (en) | A Coarse-grained MPI+OpenMP Mixed Parallel Method for Structural Mesh Region Division | |
CN110188462A (en) | LBM algorithm optimization method based on martial prowess framework | |
CN111080653A (en) | Method for simplifying multi-view point cloud by using region segmentation and grouping random simplification method | |
CN114969860A (en) | Automatic hexahedron non-structural grid generation method | |
CN110211234A (en) | A kind of grid model sewing system and method | |
CN108897847A (en) | Multi-GPU Density Peak Clustering Method Based on Locality Sensitive Hashing | |
CN115344383A (en) | A Parallel Acceleration Method for Streamline Visualization Based on Process Parallelism | |
Guo et al. | A 3D Surface Reconstruction Method for Large‐Scale Point Cloud Data | |
CN106780747A (en) | A kind of method that Fast Segmentation CFD calculates grid | |
CN114722571B (en) | A CPU-GPU collaborative parallel scan line filling method for additive manufacturing | |
Martin et al. | Load-Balanced Isosurfacing on Multi-GPU Clusters. | |
CN110083446A (en) | A kind of GPU parallel with remote sensing image real-time processing method and system under zero I/O mode | |
Hou et al. | A GPU-based tabu search for very large hardware/software partitioning with limited resource usage | |
CN113297537A (en) | GPU platform-oriented sparse structured trigonometric equation set solution high-performance implementation method and device | |
Binyahib | Scientific visualization on supercomputers: A survey | |
CN114595736A (en) | Network node clustering method, system and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |