CN104317751B

CN104317751B - Data flow processing system and its data flow processing method on a kind of GPU

Info

Publication number: CN104317751B
Application number: CN201410657243.2A
Authority: CN
Inventors: 卢晓伟; 沈铂; 周勇
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2014-11-18
Filing date: 2014-11-18
Publication date: 2017-03-01
Anticipated expiration: 2034-11-18
Also published as: CN104317751A

Abstract

The invention discloses data flow processing system and its data flow processing method on a kind of GPU, belong to the technical field of Data Stream Processing on GPU, data flow processing system on a kind of GPU, the data flow of data source passes through data flow processing system to client, and data flow processing system includes CPU main frame and GPU equipment；CPU main frame includes CPU end and loads engine modules, CPU end buffer module, data flow pretreatment module, data flow Reduction of Students' Study Load module and visualization model；GPU equipment includes GPU end and loads engine modules, GPU end buffer module, data flow summary abstraction module, data Stream Processing Model storehouse data stream process module；CPU end loads the loading of engine modules or memory element passes through interacting of internet and data source, the loading of GPU end loading engine modules or memory element and client.The present invention has significant speed advantage, meets the real-time demand of High Dimensional Data Streams well, can be widely used in High Dimensional Data Streams excavation applications as general analysis method.

Description

A data stream processing system and data stream processing method on GPU

技术领域technical field

本发明涉及一种GPU上数据流处理的技术领域，具体地说是一种GPU上数据流处理系统及其数据流处理方法。The present invention relates to the technical field of data flow processing on GPU, in particular to a data flow processing system on GPU and a data flow processing method thereof.

背景技术Background technique

GPU(Graphic Processing Unit)，中文翻译为“图形处理器”。GPU是显卡的“心脏”，也就相当于CPU在电脑中的作用。GPU具有相当高的内存带宽，以及大量的执行单元，它可帮助CPU进行一些复杂的计算工作，使显卡减少了对CPU的依赖。GPU (Graphic Processing Unit), the Chinese translation is "graphics processing unit". The GPU is the "heart" of the graphics card, which is equivalent to the role of the CPU in the computer. The GPU has a fairly high memory bandwidth and a large number of execution units, which can help the CPU to perform some complex calculations, reducing the graphics card's dependence on the CPU.

传统上，GPU的应用被局限于处理图形渲染计算任务，无疑是对计算资源的极大浪费。随着GPU可编程性的不断提高，利用GPU完成通用计算的研究渐渐活跃起来。将GPU用于图形渲染以外领域的计算成为GPGPU(General-purpose computing on graphicsprocessing units，基于GPU的通用计算)。GPGPU计算通常采用CPU+GPU异构模式，由CPU负责执行复杂逻辑处理和事务管理等不适合数据并行的计算，由GPU负责计算密集型的大规模数据并行计算。这种利用GPU强大处理能力和高带宽弥补CPU性能不足的计算方式在发掘计算机潜在的性能，在成本和性价比方面有显著优势。但是传统的GPGPU受硬件可编程和开发方式的制约，应用领域受到了限制，开发难度也很大。Traditionally, the application of GPU is limited to processing graphics rendering computing tasks, which is undoubtedly a great waste of computing resources. With the continuous improvement of GPU programmability, the research on using GPU to complete general computing is gradually becoming active. Computing that uses GPUs for areas other than graphics rendering is called GPGPU (General-purpose computing on graphics processing units, GPU-based general computing). GPGPU computing usually adopts the CPU+GPU heterogeneous mode. The CPU is responsible for performing complex logic processing and transaction management and other calculations that are not suitable for data parallelism, and the GPU is responsible for computing-intensive large-scale data parallel computing. This computing method, which uses the powerful processing power of GPU and high bandwidth to make up for the lack of CPU performance, has significant advantages in terms of cost and cost performance in exploring the potential performance of computers. However, the traditional GPGPU is restricted by hardware programming and development methods, the application field is limited, and the development is very difficult.

2007年，由NVIDIA推出的CUDA(Compute Unified Device Architecture，统一计算设备架构)，这一编程接口弥补了传统GPGPU的不足。利用CUDA编程接口，可以用C语言直接调用GPU资源，而无需将其映射到图形API，为GPU的非图形编程普及消除了障碍。In 2007, CUDA (Compute Unified Device Architecture, unified computing device architecture) launched by NVIDIA, this programming interface makes up for the shortcomings of traditional GPGPU. Using the CUDA programming interface, you can directly call GPU resources in C language without mapping them to graphics APIs, which removes obstacles for the popularization of GPU non-graphics programming.

CUDA模型将CPU作为主机(Host)，GPU作为协处理器(co-processor)或设备(device),两者协同工作。CPU负责进行逻辑性强的事物处理和串行计算，GPU则专注于执行高度线程化的并行处理任务。CPU、GPU各自拥有相互独立的存储器地址空间：主机端内存和设备端显存。一旦确定了程序中的并行计算函数(kernel)，就考虑把这部分计算交给GPU。The CUDA model uses the CPU as the host (Host), and the GPU as a co-processor (co-processor) or device (device), and the two work together. The CPU is responsible for logical transaction processing and serial computing, while the GPU focuses on executing highly threaded parallel processing tasks. The CPU and GPU each have independent memory address spaces: the host-side memory and the device-side video memory. Once the parallel calculation function (kernel) in the program is determined, consider handing over this part of the calculation to the GPU.

(数据流的定义)数据流实际上就是连续移动的元素队伍，其中的元素是由相关数据的集合组成。令t表示任一时间戳，at表示在该时间戳到达的数据，流数据可以表示成{…，at-1，at，at+1，…}.区别于传统应用模型，流数据模型具有以下4点共性:(1)数据实时到达；(2)数据到达次序独立，不受应用系统所控制；(3)数据规模宏大且不能预知其最大值；(4)数据一经处理，除非特意保存，否则不能被再次取出处理，或者再次提取数据代价昂贵。(Definition of data flow) A data flow is actually a continuously moving procession of elements, where elements are composed of collections of related data. Let t represent any time stamp, and at represent the data arriving at that time stamp, and stream data can be expressed as {..., at-1, at, at+1, ...}. Different from the traditional application model, the stream data model has the following 4 points of commonality: (1) The data arrives in real time; (2) The order of data arrival is independent and not controlled by the application system; (3) The data scale is huge and its maximum value cannot be predicted; (4) Once the data is processed, unless it is specially saved, Otherwise, it cannot be retrieved and processed again, or it is expensive to retrieve the data again.

同时，流是以双重身份出现的：(1)作为一个软件可见的程序变量存在。(2)作为一个硬件可见的管理单位存在。实际应用中流往往具有很多属性，当流被映射到硬件中时，这些属性仍然被保持或者变个形式被硬件所见。At the same time, the stream appears in a double capacity: (1) exists as a program variable visible to the software. (2) It exists as a hardware-visible management unit. In practical applications, streams often have many attributes. When the stream is mapped to the hardware, these attributes are still maintained or changed to be seen by the hardware.

现有技术的数据挖掘中，为了消除数据中的噪音、空值和异常值等错误数据，以保证结果的准确性，通常会在挖掘数据库中的静态数据集之前进行预处理操作；当然，数据流中也避免不了各种错误数据，为了提高挖掘结果的精确度，对其进行预处理也是十分必要的。然而数据流挖掘一般都是在线进行的，无法在挖掘前预处理数据。In data mining of the prior art, in order to eliminate erroneous data such as noise, null value and outlier value in the data, to guarantee the accuracy of the result, preprocessing operation is usually performed before mining the static data set in the database; of course, the data Various erroneous data cannot be avoided in the stream. In order to improve the accuracy of mining results, it is also necessary to preprocess them. However, data stream mining is generally carried out online, and data cannot be preprocessed before mining.

GPU并行计算如何在数据流挖掘领域应用？在计算资源受限的环境下，如何保证数据流处理的实时性和通用性？How is GPU parallel computing applied in the field of data stream mining? In an environment with limited computing resources, how to ensure the real-time and versatility of data flow processing?

发明内容Contents of the invention

本发明的技术任务是提供一种具有显著的速度优势，很好地满足了高维数据流的实时性需求，可以作为通用的分析方法广泛应用于高维数据流挖掘领域的一种GPU上数据流处理系统及其数据流处理方法。The technical task of the present invention is to provide a data processing method on a GPU that has a significant speed advantage, satisfies the real-time requirements of high-dimensional data streams, and can be widely used as a general analysis method in the field of high-dimensional data stream mining. A stream processing system and a data stream processing method thereof.

本发明的技术任务是按以下方式实现的，Technical task of the present invention is realized in the following manner,

数据源输出的是高维的时间序列数据流，经数据流处理系统处理后，再输出给客户端的是数据流的频繁模式或查询结果。The output of the data source is a high-dimensional time series data stream. After being processed by the data stream processing system, the frequent pattern or query result of the data stream is output to the client.

一种GPU上数据流处理系统，数据源(data sources)的数据流通过数据流处理系统至客户端(client)，数据流处理系统包括CPU主机(CPU-Host)和GPU设备(GPU-Device)；A data flow processing system on a GPU, the data flow of data sources passes through the data flow processing system to the client (client), and the data flow processing system includes a CPU host (CPU-Host) and a GPU device (GPU-Device) ;

CPU主机包括CPU端加载引擎模块(CPU-Side Load Engine Area)、CPU端缓冲模块(CPU-Side Buffer Area)、数据流预处理模块(Data Stream Preprocessing Area)、数据流减负模块(Data Stream Load Shedding Area)和可视化模块(Visual Area)，CPU端加载引擎模块设置有装载或存储单元(Load/Store Unit)，CPU端缓冲模块设置有内存(MainMemory,MM)，CPU端加载引擎模块的装载或存储单元、数据流预处理模块、数据流减负模块和可视化模块均与CPU端缓冲模块的内存连接交互，CPU端加载引擎模块的装载或存储单元与可视化模块连接交互；The CPU host includes CPU-Side Load Engine Area, CPU-Side Buffer Area, Data Stream Preprocessing Area, Data Stream Load Shedding Area) and visualization module (Visual Area), the CPU-side loading engine module is provided with a loading or storage unit (Load/Store Unit), the CPU-side buffer module is provided with a memory (MainMemory, MM), and the loading or storing of the CPU-side loading engine module The unit, the data stream preprocessing module, the data stream load reduction module and the visualization module all interact with the memory connection of the CPU-side buffer module, and the loading or storage unit of the CPU-side loading engine module connects and interacts with the visualization module;

GPU设备包括GPU端加载引擎模块(GPU-Side Load Engine Area)、GPU端缓冲模块(GPU-Side Buffer Area)、数据流概要抽取模块(Data Stream Synopsis ExtractionArea)、数据流处理模型库(Data Stream Processing Model Library)和数据流处理模块(Data Stream Processing Area)，GPU端加载引擎模块设置有装载或存储单元(Load/Store Unit)，GPU端缓冲模块设置有显存(Device Memory,DM)，数据流概要抽取模块用于集成概要抽取方法供数据流处理模块调用，数据流处理模型库用于集成数据流处理算法供数据流处理模块调用，GPU端加载引擎模块的装载或存储单元、数据流处理模块均与GPU端缓冲模块的显存连接交互，数据流概要抽取模块、数据流处理模型库均与GPU端加载引擎模块的装载或存储单元连接，GPU端缓冲模块的显存中开辟有存储空间为滑动窗口；GPU devices include GPU-Side Load Engine Area, GPU-Side Buffer Area, Data Stream Synopsis Extraction Area, Data Stream Processing Model Library (Data Stream Processing Model Library) and data stream processing module (Data Stream Processing Area), GPU-side loading engine module is provided with loading or storage unit (Load/Store Unit), GPU-side buffer module is provided with video memory (Device Memory, DM), data flow summary The extraction module is used to integrate the summary extraction method for the call of the data flow processing module, the data flow processing model library is used to integrate the data flow processing algorithm for the call of the data flow processing module, the loading or storage unit of the GPU-side loading engine module, and the data flow processing module are both Interact with the video memory connection of the GPU-side buffer module, the data stream summary extraction module, and the data stream processing model library are all connected to the loading or storage unit of the GPU-side loading engine module, and a storage space is opened in the video memory of the GPU-side buffer module as a sliding window;

CPU端加载引擎模块的装载或存储单元通过互联网络(InterconnectionNetwork)与数据源、GPU端加载引擎模块的装载或存储单元以及客户端的交互。The loading or storing unit of the CPU-side loading engine module interacts with the data source, the loading or storing unit of the GPU-side loading engine module, and the client through the Interconnection Network.

CPU端缓冲模块还设置有用于管理内存的内存管理器，内存管理器内设置有输入监测器，输入监测器用于监视内存中临时存储未处理的数据流；CPU端加载引擎模块包括速度调节器(Speed Regulator)、装载或存储单元(Load/Store Unit)和初始化集成器(Initialization Integrator)；速度调节器用于根据内存的缓存状态调整数据源的数据流流入CPU端加载引擎模块的装载或存储单元内的流速，速度调节器内设有反馈机制(Feedback Mechanism)；初始化集成器用于集成CPU主机和GPU设备的初始化操作。The CPU end buffer module is also provided with a memory manager for managing memory, and the memory manager is provided with an input monitor, which is used to monitor the unprocessed data streams temporarily stored in the memory; the CPU end load engine module includes a speed regulator ( Speed Regulator), loading or storage unit (Load/Store Unit) and initialization integrator (Initialization Integrator); the speed regulator is used to adjust the data flow of the data source to flow into the loading or storage unit of the CPU-side loading engine module according to the cache state of the memory The speed regulator is equipped with a feedback mechanism (Feedback Mechanism); the initialization integrator is used to integrate the initialization operation of the CPU host and GPU device.

一种GPU上数据流处理方法，数据源输出的数据流通过数据流处理系统处理后将数据结果传输至客户端；数据流的处理流程如下：A data stream processing method on the GPU, in which the data stream output by the data source is processed by the data stream processing system and the data result is transmitted to the client; the data stream processing flow is as follows:

(1)、加载数据流：数据源中的数据流流入CPU端加载引擎模块的装载或存储单元，由CPU端加载引擎模块的装载或存储单元将数据流存储到CPU端缓冲模块的内存中；(1), loading data stream: the data stream in the data source flows into the loading or storage unit of the CPU-side loading engine module, and the loading or storage unit of the CPU-side loading engine module stores the data stream into the memory of the CPU-side buffer module;

(2)、数据流预处理：数据流预处理模块将内存中的原始数据流进行预处理，并把预处理过的数据流存入内存；(2), data stream preprocessing: the data stream preprocessing module preprocesses the original data stream in the memory, and stores the preprocessed data stream into the memory;

(3)、传输数据流：预处理过的数据流由内存至CPU端加载引擎模块的装载或存储单元，由CPU端加载引擎模块的装载或存储单元至互联网络，经互联网络到达GPU端加载引擎模块的装载或存储单元，再由GPU端加载引擎模块的装载或存储单元将其加载到显存的滑动窗口中；(3) Transmission data stream: the preprocessed data stream is sent from the memory to the loading or storage unit of the CPU-side loading engine module, from the loading or storage unit of the CPU-side loading engine module to the Internet, and through the Internet to the GPU side for loading The loading or storage unit of the engine module is loaded into the sliding window of the video memory by the loading or storage unit of the engine module on the GPU side;

(4)、数据流概要抽取：由数据流处理模块调用数据流概要抽取模块中的概要抽取方法，对滑动窗口中的数据流进行概要抽取，并将最终形成的概要数据结构存储到显存中；(4), data flow summary extraction: call the summary extraction method in the data flow summary extraction module by the data flow processing module, carry out summary extraction to the data flow in the sliding window, and store the final summary data structure in the display memory;

(5)、数据流处理：由数据流处理模块调用数据流处理模型库中的数据流处理算法对概要数据进行处理，并把处理的数据结果存储到显存中；(5), data flow processing: the data flow processing algorithm in the data flow processing model library is invoked by the data flow processing module to process the summary data, and store the processed data results in the video memory;

(6)、传输数据结果：数据结果由GPU端缓冲模块的显存至GPU端加载引擎模块的装载或存储单元，由GPU端加载引擎模块的装载或存储单元发送至互联网络，经互联网络到达CPU端加载引擎模块的装载或存储单元，再由CPU端加载引擎模块的装载或存储单元将数据结果加载到内存，或者是由CPU端加载引擎模块的装载或存储单元将数据结果发送给可视化模块；(6) Transmission of data results: the data results are sent from the video memory of the GPU-side buffer module to the loading or storage unit of the GPU-side loading engine module, sent to the Internet by the loading or storage unit of the GPU-side loading engine module, and then reach the CPU via the Internet The load or storage unit of the load engine module on the CPU side loads the data result into the memory by the load or storage unit of the load engine module on the CPU side, or sends the data result to the visualization module by the load or storage unit of the load engine module on the CPU side;

(7)、结果可视化：可视化模块将数据结果规范化之后发送给CPU端加载引擎模块的装载或存储单元，由CPU端加载引擎模块的装载或存储单元将数据结果展示给客户端。(7) Result visualization: the visualization module normalizes the data results and sends them to the loading or storage unit of the CPU-side loading engine module, and the loading or storage unit of the CPU-side loading engine module displays the data results to the client.

一种GPU上数据流处理方法，步骤(2)中，数据流预处理模块将内存中的原始数据流使用预处理方法进行预处理，预处理方法包括数据清洗方法、数据集成方法、数据变换方法和数据规约方法，使用的预处理方法可以为上述的一种方法，或者选择多种方法结合使用。A data stream processing method on a GPU, in step (2), the data stream preprocessing module preprocesses the original data stream in the memory using a preprocessing method, and the preprocessing method includes a data cleaning method, a data integration method, and a data transformation method As with the data reduction method, the preprocessing method used can be one of the above methods, or a combination of multiple methods can be selected.

㈠、数据清洗方法：填补缺失值、平滑噪声数据、去除异常值和解决数据不一致问题；数据清洗是数据预处理中非常重要的过程，但也是最耗时间的过程；缺失值、噪声和不一致性都将导致数据不准确，而数据清洗可以有效地避免这种情况；(1) Data cleaning method: filling missing values, smoothing noise data, removing outliers and solving data inconsistencies; data cleaning is a very important process in data preprocessing, but it is also the most time-consuming process; missing values, noise and inconsistency All will lead to inaccurate data, and data cleaning can effectively avoid this situation;

㈡、数据集成方法：模式集成和对象匹配、去除冗余数据、数据值冲突的检测与处理；数据集成是指把原本存储在多个数据源中的数据集成起来，形成一个数据源，并以某种统一的格式进行集中存储，以方便后续的数据处理工作；(2) Data integration method: pattern integration and object matching, removal of redundant data, detection and processing of data value conflicts; data integration refers to the integration of data originally stored in multiple data sources to form a data source, and to Centralized storage in a unified format to facilitate subsequent data processing;

㈢、数据变换方法：数据平滑、数据聚集、数据泛化、数据规范化和属性构造；数据变换是将数据转换成一种适合于数据挖掘的形式；比如数据项间的维度可能不一致，这时就需要消减高维数据项的维度，以减少它们之间的差异，方便处理；3. Data transformation methods: data smoothing, data aggregation, data generalization, data normalization and attribute construction; data transformation is to convert data into a form suitable for data mining; for example, the dimensions between data items may be inconsistent, then it is necessary Reduce the dimensions of high-dimensional data items to reduce the differences between them and facilitate processing;

㈣、数据规约方法：数据立方体聚集、属性子集选择、维度归约、数值归约、离散化和概念分层；数据规约又称为数据消减技术。(iv) Data reduction methods: data cube aggregation, attribute subset selection, dimension reduction, numerical reduction, discretization, and concept hierarchy; data reduction is also known as data reduction technology.

一种GPU上数据流处理方法，步骤(2)后，当数据流超负荷时，经过数据流减负模块对数据流进行减负处理，具体步骤为：A method for processing data streams on a GPU. After step (2), when the data stream is overloaded, the data stream is subjected to load reduction processing through a data stream load reduction module. The specific steps are:

Ⅰ、CPU端缓冲模块的内存管理器的输入监测器监视内存中临时存储未处理的数据流，并决定在一个时间单位内，新来数据流的体积是否超过了GPU设备的数据流处理模块的处理能力；Ⅰ. The input monitor of the memory manager of the CPU side buffer module monitors the temporary storage of unprocessed data streams in the memory, and determines whether the volume of the new data stream exceeds the data stream processing module of the GPU device within a time unit. processing capacity;

Ⅱ、若GPU设备的数据流处理模块能够处理所有的数据流，则进行传输数据流；若新来数据流的体积超过了GPU设备的数据流处理模块的处理能力，即数据流超负荷，则将数据流转入数据流减负模块；Ⅱ. If the data stream processing module of the GPU device can handle all the data streams, then transmit the data stream; if the volume of the new data stream exceeds the processing capacity of the data stream processing module of the GPU device, that is, the data stream is overloaded, then Transfer the data flow to the data flow offloading module;

Ⅲ、数据流减负模块对数据流进行减负处理；Ⅲ. The data stream load reduction module performs load reduction processing on the data stream;

Ⅳ、数据流减负模块将剩余数据流转入内存中，进行下一个步骤的传输数据流。Ⅳ. The data stream load reduction module transfers the remaining data streams into the memory for the next step of transmitting data streams.

一种GPU上数据流处理方法，数据流减负模块对数据流进行减负处理采用如下策略的一种或多种结合：A method for processing data streams on a GPU, wherein the data stream offloading module adopts one or more combinations of the following strategies for offloading the data streams:

ⅰ、基于数据的丢弃：在接收到的和未处理的时间序列数据流中的数据，找到长度最长的数据，并把它们丢弃；i. Data-based discarding: among the received and unprocessed data in the time series data stream, find the data with the longest length and discard them;

ⅱ、基于属性的修整：数据流中每个数据都有d个属性，在数据中，将拥有最低频数的属性除去，以此对数据的属性修整；ⅱ. Attribute-based trimming: each data in the data stream has d attributes, and among the data, remove the attribute with the lowest frequency, so as to trim the attributes of the data;

ⅲ、基于优先级的丢弃：每个新到的数据流被分配一个优先级，在接收到的和未处理的数据流中的数据，选择那些拥有最低优先级的数据并丢弃它们。三种策略各有各的目的；基于数据的丢弃策略，丢弃了那些系统需要花费很多时间去处理的长数据，试图尽可能快的减少系统负载，它是面向效率的；基于属性的修整的策略，从数据中删除了那些对处理结果影响不显著的、最不频发的属性；基于优先级的丢弃的策略，从数据中删除了那些拥有最低优先级的数据；相比而言，后面这两个策略是面向精度的，因为它们试图保留高精度的挖掘结果。ⅲ. Priority-based discarding: Each new incoming data stream is assigned a priority, among the received and unprocessed data streams, select those with the lowest priority and discard them. Each of the three strategies has its own purpose; the data-based discarding strategy discards the long data that the system takes a lot of time to process, and tries to reduce the system load as quickly as possible, which is efficiency-oriented; the attribute-based trimming strategy , the least frequent attributes that have no significant impact on the processing results are deleted from the data; the priority-based discarding strategy deletes the data with the lowest priority from the data; in comparison, the latter Both strategies are precision-oriented because they try to preserve high-precision mining results.

一种GPU上数据流处理方法，显存使用全局存储器(Global Memory)去存储各种数据(如概要数据、中间数据、结果数据等)，显存中开辟的滑动窗口(Sliding Window,SW)用于把从CPU主机到达GPU设备的数据流保存起来(由于数据流无限性和存储空间的有限性，为了减少内存和显存间的数据拷贝，获得更有效的挖掘结果，所以在显存中划分出一部分空间作为滑动窗口暂存数据)，滑动窗口使用基于元组个数定义的滑动窗口，即是窗口大小固定的滑动窗口，用于保存最近到达的K个数据流；A data flow processing method on the GPU. The video memory uses the global memory (Global Memory) to store various data (such as summary data, intermediate data, result data, etc.), and the sliding window (Sliding Window, SW) opened in the video memory is used. The data flow from the CPU host to the GPU device is saved (due to the infinite data flow and the limited storage space, in order to reduce the data copy between the memory and the video memory and obtain more effective mining results, a part of the space is divided in the video memory as Sliding window temporarily stores data), the sliding window uses a sliding window defined based on the number of tuples, that is, a sliding window with a fixed window size, which is used to save the recently arrived K data streams;

滑动窗口的数据流的处理方法使用可重写循环的滑动窗口方法，更新时将新数据直接覆盖了要过期的数据，并提供了格局变换函数来维护滑动窗口中数据的逻辑格局状态。The data flow processing method of the sliding window uses the rewritable loop sliding window method. When updating, the new data directly covers the data to be expired, and a format transformation function is provided to maintain the logical format state of the data in the sliding window.

一种GPU上数据流处理方法，步骤(4)的数据流概要抽取，将数据流通过概要抽取方法获得概要数据结构，概要抽取方法包括抽样(Sampling)方法，小波(Wavelet)方法，略图(Sketch)方法和直方图(Histogram)方法。对数据流进行压缩，构造一个比整个数据流的数据规模小得多的数据结构来保存数据流的主要特征，称之为概要数据结构，通过概要数据结构所获得的近似值是在用户可接受范围之内的。A method for processing data streams on a GPU. The data stream summary in step (4) is extracted, and the data stream is obtained by a summary data structure through the summary extraction method. The summary extraction method includes sampling (Sampling) method, wavelet (Wavelet) method, sketch (Sketch) ) method and histogram method. Compress the data stream and construct a data structure much smaller than the data size of the entire data stream to save the main characteristics of the data stream, which is called a summary data structure. The approximate value obtained through the summary data structure is within the acceptable range of users within.

一种GPU上数据流处理方法，数据流处理模型库集成了数据流处理时的各种数据流处理算法，包括查询处理算法(query processing algorithma)、聚类算法(clusteringalgorithma)、分类算法(classification algorithma)、频繁项集挖掘算法(frequentitemsets mining algorithma)和多条数据流间的相关性分析算法(correlationanalysis algorithma)。A data stream processing method on the GPU. The data stream processing model library integrates various data stream processing algorithms during data stream processing, including query processing algorithms, clustering algorithms, and classification algorithms. ), frequent itemsets mining algorithm (frequent itemsets mining algorithm) and correlation analysis algorithm between multiple data streams (correlation analysis algorithm).

一种GPU上数据流处理方法，数据流处理模块的任务是调用数据流概要抽取模块的概要抽取方法对数据流进行概要抽取，以及调用数据流处理模型库中的数据流处理算法对概要数据进行并行计算；数据流处理模块包括数据流输入装配器(Data Stream InputAssembler)、全局线程块调度器(Global Block Scheduler)和计算阵列(Compute Array)，计算阵列内设置有共享存储器、装载或存储单元；数据流输入装配器负责把显存中的数据读入到数据流处理模块的共享存储器中，全局线程块调度器负责对共享存储器中的线程块、线程和指令进行调度分配管理，计算阵列用于线程的计算；计算阵列中的装载或存储单元在计算时，从显存中加载数据到共享存储器。A data stream processing method on the GPU, the task of the data stream processing module is to call the summary extraction method of the data stream summary extraction module to extract the summary data from the data stream, and to call the data stream processing algorithm in the data stream processing model library to extract the summary data Parallel computing; the data stream processing module includes a data stream input assembler (Data Stream Input Assembler), a global thread block scheduler (Global Block Scheduler) and a computing array (Compute Array), and a shared memory, loading or storage unit is set in the computing array; The data stream input assembler is responsible for reading the data in the video memory into the shared memory of the data stream processing module. The global thread block scheduler is responsible for scheduling and allocating the thread blocks, threads and instructions in the shared memory. The computing array is used for thread The calculation of the calculation; the load or storage unit in the calculation array loads data from the video memory to the shared memory during calculation.

本发明的一种GPU上数据流处理系统及其数据流处理方法具有以下优点：A data stream processing system and a data stream processing method thereof on a GPU of the present invention have the following advantages:

1、通用性：以往的使用GPU加速的数据流处理系统仅仅局限于数据流处理时的某一种任务，要么是聚类，要么是分类或者其它；然而，本发明的数据流处理系统，适合于各个应用领域的多条高维时间序列数据流，它涵盖了数据流的预处理、减负、概要抽取和挖掘处理等多项功能，GPU设备部分包含的数据流处理模型库集成了各种数据流处理算法，如查询处理算法、聚类算法、分类算法、频繁项集挖掘算法、相关性分析算法，能完成数据流处理时的多项任务，从而为本发明赋予了通用性；1. Versatility: The previous data stream processing system using GPU acceleration was limited to a certain task during data stream processing, either clustering, classification or others; however, the data stream processing system of the present invention is suitable for Multiple high-dimensional time series data streams in various application fields, which cover multiple functions such as data stream preprocessing, load reduction, summary extraction, and mining processing. The data stream processing model library included in the GPU device part integrates various data streams. Stream processing algorithms, such as query processing algorithms, clustering algorithms, classification algorithms, frequent itemset mining algorithms, and correlation analysis algorithms, can complete multiple tasks during data stream processing, thereby endowing the present invention with versatility;

2、高效性：本发明的概要抽取方法以及所有数据流处理算法的并行部分都使用GPU进行加速，充分利用了GPU强大的处理能力和流水线特性，进一步提高了执行效率；2. Efficiency: The summary extraction method of the present invention and the parallel parts of all data flow processing algorithms are accelerated by GPU, which makes full use of the powerful processing capability and pipeline characteristics of GPU, and further improves the execution efficiency;

3、额外I/O开销的控制：数据流处理系统中把滑动窗口开辟在了显存中，这样在进行数据流概要抽取时就避免了数据在内存和显存间的频繁拷贝；另外，在进行概要抽取和数据流处理时，使用了显存，大大减少了从内存中读写数据的次数；3. Control of additional I/O overhead: In the data stream processing system, the sliding window is opened in the video memory, so that the frequent copying of data between the memory and the video memory is avoided when the data stream summary is extracted; in addition, when performing the summary When extracting and data stream processing, video memory is used, which greatly reduces the number of times to read and write data from memory;

4、因为初始的数据流量过大，如果直接将数据流传送到GPU设备中进行数据流预处理，会大大增加I/O开销，所以本发明把数据流预处理过程设计在了CPU主机的数据流预处理模块中，即进行了数据流预处理消除数据中的噪音、空值和异常值等错误数据，又减少了I/O开销；4. Because the initial data flow is too large, if the data stream is directly sent to the GPU device for data stream preprocessing, the I/O overhead will be greatly increased. Therefore, the present invention designs the data stream preprocessing process in the data stream of the CPU host. In the stream preprocessing module, data stream preprocessing is performed to eliminate error data such as noise, null values, and abnormal values in the data, and reduce I/O overhead;

5、现有技术中，在滑动窗口未满阶段，新数据直接填充窗口，在窗口已满阶段，随着窗口的滑动，新数据进入窗口内将导致其它已在窗口内的数据前移，覆盖前面的数据；然而，本发明采用的可重写循环的滑动窗口方法，在滑动窗口已满阶段并不需要移动数据，它是将新数据直接覆盖(重写)了要过期的数据，并提供了格局变换函数来维护滑动窗口中数据的逻辑格局状态，节省了大量时间。5. In the prior art, when the sliding window is not full, new data directly fills the window. When the window is full, as the window slides, new data entering the window will cause other data already in the window to move forward, covering The previous data; yet, the sliding window method of the rewritable cycle that the present invention adopts does not need to move data at the full stage of the sliding window, it directly covers (rewrites) the data to be expired with new data, and provides The pattern transformation function is used to maintain the logical pattern state of the data in the sliding window, which saves a lot of time.

附图说明Description of drawings

下面结合附图对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings.

附图1为一种GPU上数据流处理系统的框架图。Accompanying drawing 1 is a frame diagram of a data stream processing system on a GPU.

具体实施方式detailed description

参照说明书附图和具体实施例对本发明的一种GPU上数据流处理系统及其数据流处理方法作以下详细地说明。A data stream processing system on a GPU and a data stream processing method thereof according to the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

实施例1：Example 1:

本发明的一种GPU上数据流处理系统，数据源(data sources)的数据流通过数据流处理系统至客户端(client)，数据流处理系统包括CPU主机(CPU-Host)和GPU设备(GPU-Device)；In the data flow processing system on a GPU of the present invention, the data flow of data sources (data sources) passes through the data flow processing system to the client (client), and the data flow processing system includes a CPU host (CPU-Host) and a GPU device (GPU -Device);

实施例2：Example 2:

本发明的一种GPU上数据流处理方法，数据源输出的数据流通过数据流处理系统处理后将数据结果传输至客户端；数据流的处理流程如下：In the data stream processing method on a GPU of the present invention, the data stream output by the data source is processed by the data stream processing system and the data result is transmitted to the client; the processing flow of the data stream is as follows:

(1)、加载数据流：数据源中的数据流流入CPU端加载引擎模块的装载或存储单元(数据流流向如图1中①所示)，由CPU端加载引擎模块的装载或存储单元将数据流存储到CPU端缓冲模块的内存中(数据流流向如图1中②所示)；(1) Loading data flow: the data flow in the data source flows into the loading or storage unit of the CPU-side loading engine module (the data flow direction is shown in Figure 1 in ①), and the loading or storage unit of the CPU-side loading engine module will The data stream is stored in the memory of the buffer module on the CPU side (the data stream flow is shown in Figure 1 as shown in ②);

(2)、数据流预处理：数据流预处理模块将内存中的原始数据流进行预处理(数据流流向如图1中③所示)，并把预处理过的数据流存入内存(数据流流向如图1中④所示)；(2), data stream preprocessing: the data stream preprocessing module preprocesses the original data stream in the memory (the data stream flows as shown in ③ in Figure 1), and stores the preprocessed data stream into the memory (data The flow direction is shown as ④ in Fig. 1);

(3)、传输数据流：预处理过的数据流由内存至CPU端加载引擎模块的装载或存储单元(数据流流向如图1中⑦所示)，由CPU端加载引擎模块的装载或存储单元至互联网络(数据流流向如图1中⑧所示)，经互联网络到达GPU端加载引擎模块的装载或存储单元(数据流流向如图1中⑨所示)，再由GPU端加载引擎模块的装载或存储单元将其加载到显存的滑动窗口中(数据流流向如图1中⑩所示)；(3) Transmission data stream: the preprocessed data stream is loaded from the memory to the loading or storage unit of the CPU-side loading engine module (the data flow direction is shown in Figure 1 as ⑦), and is loaded or stored by the CPU-side loading engine module The unit goes to the Internet (the data flow direction is shown in Figure 1 as ⑧), and through the Internet, it reaches the loading or storage unit of the GPU-side loading engine module (the data flow direction is shown in Figure 1 as ⑨), and then the GPU-side loads the engine The loading or storage unit of the module loads it into the sliding window of the video memory (the data flow direction is shown in Figure 1 as shown in ⑩);

(4)、数据流概要抽取：由数据流处理模块调用数据流概要抽取模块中的概要抽取方法，对滑动窗口中的数据流进行概要抽取(数据流流向如图1中所示)，并将最终形成的概要数据结构存储到显存中(数据流流向如图1中所示)；(4), data flow summary extraction: call the summary extraction method in the data flow summary extraction module by the data flow processing module, carry out summary extraction to the data flow in the sliding window (data flow flow direction as shown in Figure 1), and The final summary data structure is stored in the video memory (the data flow direction is shown in Figure 1);

(5)、数据流处理：由数据流处理模块调用数据流处理模型库中的数据流处理算法对概要数据进行处理(数据流流向如图1中所示)，并把处理的数据结果存储到显存中(数据流流向如图1中所示)；(5), data flow processing: the data flow processing algorithm in the data flow processing model library is invoked by the data flow processing module to process the summary data (the flow direction of the data flow is shown in Figure 1), and the processed data results are stored in In video memory (data flow direction is shown in Figure 1);

(6)、传输数据结果：数据结果由GPU端缓冲模块的显存至GPU端加载引擎模块的装载或存储单元(数据流流向如图1中所示)，由GPU端加载引擎模块的装载或存储单元发送至互联网络(数据流流向如图1中所示)，经互联网络到达CPU端加载引擎模块的装载或存储单元(数据流流向如图1中所示)，再由CPU端加载引擎模块的装载或存储单元将数据结果加载到内存(数据流流向如图1中所示)，或者是由CPU端加载引擎模块的装载或存储单元将数据结果发送给可视化模块(数据流流向如图1中所示)；(6), transfer data result: the data result is loaded or stored by the video memory of the GPU-side buffer module to the loading or storage unit of the GPU-side loading engine module (the data flow direction is shown in Figure 1), and is loaded or stored by the GPU-side loading engine module The unit is sent to the Internet (the data flow direction is shown in Figure 1), and then reaches the loading or storage unit of the CPU-side loading engine module through the Internet (the data flow direction is shown in Figure 1), and then the CPU-side loads the engine module The load or store unit of the load or store unit loads the data result into the memory (the flow of data flow is shown in Figure 1), or the load or store unit of the load engine module on the CPU side sends the data result to the visualization module (the flow of data flow is shown in Figure 1 shown in);

(7)、结果可视化：可视化模块将数据结果规范化之后发送给CPU端加载引擎模块的装载或存储单元(数据流流向如图1中所示)，由CPU端加载引擎模块的装载或存储单元将数据结果展示给客户端(数据流流向如图1中2○1所示)。(7) Result visualization: the visualization module normalizes the data results and sends them to the loading or storage unit of the CPU-side loading engine module (the data flow direction is shown in Figure 1), and the loading or storage unit of the CPU-side loading engine module will The data result is displayed to the client (the data flow direction is shown as 2○1 in Figure 1).

实施例3：Example 3:

步骤(2)中，数据流预处理模块将内存中的原始数据流使用预处理方法进行预处理，预处理方法包括数据清洗方法、数据集成方法、数据变换方法和数据规约方法，使用的预处理方法可以为上述的一种方法，或者选择多种方法结合使用。In step (2), the data stream preprocessing module preprocesses the original data stream in the memory using a preprocessing method. The preprocessing method includes a data cleaning method, a data integration method, a data transformation method, and a data reduction method. The preprocessing method used The method can be one of the above methods, or a combination of several methods can be selected.

步骤(2)后，当数据流超负荷时，经过数据流减负模块对数据流进行减负处理，具体步骤为：After step (2), when the data flow is overloaded, the data flow is offloaded through the data flow offloading module. The specific steps are:

Ⅱ、若GPU设备的数据流处理模块能够处理所有的数据流，则进行传输数据流；若新来数据流的体积超过了GPU设备的数据流处理模块的处理能力，即数据流超负荷，则将数据流转入数据流减负模块(数据流流向如图1中⑤所示)；Ⅱ. If the data stream processing module of the GPU device can handle all the data streams, then transmit the data stream; if the volume of the new data stream exceeds the processing capacity of the data stream processing module of the GPU device, that is, the data stream is overloaded, then Transfer the data flow to the data flow load reduction module (the data flow direction is shown as ⑤ in Figure 1);

Ⅳ、数据流减负模块将剩余数据流转入内存中(数据流流向如图1中⑥所示)，进行下一个步骤的传输数据流。IV. The data stream offloading module transfers the remaining data streams into the memory (the data stream flow direction is shown as ⑥ in FIG. 1 ), and transmits the data stream in the next step.

数据流减负模块对数据流进行减负处理采用如下策略的一种或多种结合：The data stream offloading module adopts one or more combinations of the following strategies to offload the data stream:

ⅱ、基于属性的修整：数据流中每个数据都有个属性，在数据中，将拥有最低频数的属性除去，以此对数据的属性修整；ⅱ. Attribute-based trimming: Each data in the data stream has an attribute. In the data, the attribute with the lowest frequency is removed, so as to trim the attribute of the data;

显存使用全局存储器(Global Memory)去存储各种数据(如概要数据、中间数据、结果数据等)，显存中开辟的滑动窗口(Sliding Window,SW)用于把从CPU主机到达GPU设备的数据流保存起来(由于数据流无限性和存储空间的有限性，为了减少内存和显存间的数据拷贝，获得更有效的挖掘结果，所以在显存中划分出一部分空间作为滑动窗口暂存数据)，滑动窗口使用基于元组个数定义的滑动窗口，即是窗口大小固定的滑动窗口，用于保存最近到达的K个数据流；The video memory uses the global memory (Global Memory) to store various data (such as summary data, intermediate data, result data, etc.), and the sliding window (Sliding Window, SW) opened in the video memory is used to transfer the data flow from the CPU host to the GPU device Save it (due to the infinite data flow and the limited storage space, in order to reduce the data copy between the memory and the video memory and obtain more effective mining results, a part of the space in the video memory is divided as a sliding window to temporarily store data), sliding window Use a sliding window defined based on the number of tuples, that is, a sliding window with a fixed window size, which is used to save the recently arrived K data streams;

步骤(4)的数据流概要抽取，将数据流通过概要抽取方法获得概要数据结构，概要抽取方法包括抽样(Sampling)方法，小波(Wavelet)方法，略图(Sketch)方法和直方图(Histogram)方法。对数据流进行压缩，构造一个比整个数据流的数据规模小得多的数据结构来保存数据流的主要特征，称之为概要数据结构，通过概要数据结构所获得的近似值是在用户可接受范围之内的。The data stream summary extraction in step (4) is to obtain the summary data structure through the data flow through the summary extraction method, the summary extraction method includes sampling (Sampling) method, wavelet (Wavelet) method, sketch (Sketch) method and histogram (Histogram) method . Compress the data stream and construct a data structure much smaller than the data size of the entire data stream to save the main characteristics of the data stream, which is called a summary data structure. The approximate value obtained through the summary data structure is within the acceptable range of users within.

数据流处理模型库集成了数据流处理时的各种数据流处理算法，包括查询处理算法(query processing algorithma)、聚类算法(clustering algorithma)、分类算法(classification algorithma)、频繁项集挖掘算法(frequent itemsets miningalgorithma)和多条数据流间的相关性分析算法(correlation analysis algorithma)。The data stream processing model library integrates various data stream processing algorithms in data stream processing, including query processing algorithm (query processing algorithm), clustering algorithm (clustering algorithm), classification algorithm (classification algorithm), frequent itemset mining algorithm ( frequent itemsets mining algorithm) and correlation analysis algorithm between multiple data streams (correlation analysis algorithm).

数据流处理模块的任务是调用数据流概要抽取模块的概要抽取方法对数据流进行概要抽取，以及调用数据流处理模型库中的数据流处理算法对概要数据进行并行计算；数据流处理模块包括数据流输入装配器(Data Stream Input Assembler)、全局线程块调度器(Global Block Scheduler)和计算阵列(Compute Array)，计算阵列内设置有共享存储器、装载或存储单元；数据流输入装配器负责把显存中的数据读入到数据流处理模块的共享存储器中，全局线程块调度器负责对共享存储器中的线程块、线程和指令进行调度分配管理，计算阵列用于线程的计算；计算阵列中的装载或存储单元在计算时，从显存中加载数据到共享存储器。The task of the data flow processing module is to call the summary extraction method of the data flow summary extraction module to extract the summary of the data flow, and to call the data flow processing algorithm in the data flow processing model library to perform parallel calculation on the summary data; the data flow processing module includes data Stream input assembler (Data Stream Input Assembler), global thread block scheduler (Global Block Scheduler) and computing array (Compute Array), the computing array is set with shared memory, loading or storage unit; data stream input assembler is responsible for the video memory The data in the data flow processing module is read into the shared memory of the data flow processing module. The global thread block scheduler is responsible for scheduling and allocating the thread blocks, threads and instructions in the shared memory. The computing array is used for the calculation of threads; the loading in the computing array Or the storage unit loads data from the video memory to the shared memory during calculation.

在数据流处理系统中，对数据流进行处理，其中CPU主机端的处理步骤如下：In the data stream processing system, the data stream is processed, and the processing steps at the CPU host side are as follows:

1.启动CUDA(数据流处理系统的简称)；1. Start CUDA (abbreviation for data stream processing system);

2.为输入数据分配MM(内存)；2. Allocate MM (memory) for the input data;

3.CPU端加载引擎模块从数据源获取输入数据并进行初始化；3. The CPU-side loading engine module obtains input data from the data source and initializes it;

4.数据流预处理模块对输入数据流进行数据清洗、集成等的预处理；4. The data stream preprocessing module performs preprocessing such as data cleaning and integration on the input data stream;

5.当系统超负荷时，数据流减负模块对数据流进行减负处理；5. When the system is overloaded, the data stream load reduction module performs load reduction processing on the data stream;

6.为GPU设备分配显存的滑动窗口，用于存放输入数据；6. Allocate a sliding window of video memory for the GPU device to store input data;

7.初始化集成器对概要抽取方法及数据流处理算法初始化；7. The initialization integrator initializes the summary extraction method and data flow processing algorithm;

8.将内存中的数据拷贝到显存的滑动窗口中；8. Copy the data in the memory to the sliding window of the video memory;

9.为GPU设备分配显存，用于存放数据流处理模块抽取出来的概要数据；9. Allocate video memory for the GPU device to store the summary data extracted by the data stream processing module;

10.调用GPU设备端的数据流处理算法的并行计算函数(kernel)进行并行计算，获得概要数据，并将其写到显存中的对应区域；10. Call the parallel calculation function (kernel) of the data stream processing algorithm on the GPU device side to perform parallel calculations, obtain summary data, and write it to the corresponding area in the video memory;

11.为GPU设备分配显存，用于存放传回来的输出数据；11. Allocate video memory for the GPU device to store the returned output data;

12.将显存中的结果回读到内存中；12. Read back the results in the video memory to the memory;

13.使用可视化模块对数据进行后续处理，如规范化、可视化等；13. Use the visualization module to perform subsequent processing on the data, such as normalization, visualization, etc.;

14.释放内存和显存空间；14. Release memory and video memory space;

15.退出CUDA。15. Quit CUDA.

GPU设备端的处理步骤如下：The processing steps on the GPU device side are as follows:

1.分配共享存储器(Shared Memory)；1. Allocate shared memory (Shared Memory);

2.将显存的全局存储器中的数据读入共享存储器；2. Read the data in the global memory of the video memory into the shared memory;

3.进行计算，将结果写到共享存储器；3. Perform calculations and write the results to the shared memory;

4.将共享存储器中的结果写到全局存储器。4. Write the result in the shared memory to the global memory.

以下介绍数据流处理系统中各模块的代码：The following introduces the code of each module in the data stream processing system:

一、初始化集成器的各个功能代码：1. Initialize the various function codes of the integrator:

首先，启动和退出数据流处理系统(简称CUDA)环境：First, start and exit the data stream processing system (CUDA for short) environment:

CUT_DEVICE_INIT(argc,argv)；//启动CUDACUT_DEVICE_INIT(argc,argv);//Start CUDA

CUT_EXIT(argc,argv)；//退出CUDA；CUT_EXIT(argc,argv);//Exit CUDA;

接着，分配内存和显存：Next, allocate memory and video memory:

在CPU主机分配内存，h_表示CPU主机端，i表示input，o表示output，mem_size表示为数据分配的存储器大小，Allocate memory on the CPU host, h_ indicates the CPU host side, i indicates input, o indicates output, mem_size indicates the memory size allocated for data,

float*h_idata＝(float*)malloc(mem_size)；float*h_idata=(float*)malloc(mem_size);

float*h_odata＝(float*)malloc(mem_size)；float*h_odata=(float*)malloc(mem_size);

在GPU设备分配显存，d_表示GPU设备端，Allocate video memory on the GPU device, d_ indicates the GPU device side,

float*d_idata；CUDA_SAFE_CALL(cudaMalloc((void**)&d_idata,mem_size))；float*d_idata; CUDA_SAFE_CALL(cudaMalloc((void**)&d_idata, mem_size));

float*d_odata；CUDA_SAFE_CALL(cudaMalloc((void**)&d_odata,mem_size))；float*d_odata; CUDA_SAFE_CALL(cudaMalloc((void**)&d_odata, mem_size));

接着，任务划分，即二维grid和block的维度设计：Next, task division, that is, the dimension design of two-dimensional grid and block:

dim3grid(gridDim.x,gridDim.y,1)；//第三维恒为1；dim3grid(gridDim.x,gridDim.y,1);//The third dimension is always 1;

dim3block(blockDim.x,blockDim.y,1)；//第三维可以不为1，因block是二维，所以置1；dim3block(blockDim.x, blockDim.y, 1);//The third dimension may not be 1, because the block is two-dimensional, so it is set to 1;

testKernel<<<grid,block>>>(d_idata,d_odata)；//调用内核函数，进行并行计算；testKernel<<<grid,block>>>(d_idata,d_odata);//Call the kernel function for parallel computing;

然后，内存和显存间的数据拷贝：Then, data copy between memory and video memory:

将内存中的值读入显存，Read the value in memory into video memory,

CUDA_SAFE_CALL(cudaMemcpy(d_idata,h_idata,mem_size,cudaMemcpyHostToDevice))；CUDA_SAFE_CALL(cudaMemcpy(d_idata,h_idata,mem_size,cudaMemcpyHostToDevice));

将结果从显存写入内存，Write the result from video memory to memory,

CUDA_SAFE_CALL(cudaMemcpy(h_odata,d_odata,mem_size,cudaMemcpyDeviceToHost))；CUDA_SAFE_CALL(cudaMemcpy(h_odata, d_odata, mem_size, cudaMemcpyDeviceToHost));

下面的代码用于释放内存、显存的存储空间。The following code is used to release the storage space of memory and video memory.

free(h_idata)；//释放内存；free(h_idata);//release memory;

free(h_odata)；free(h_odata);

CUDA_SAFE_CALL(cudaFree(d_idata))；//释放显存；CUDA_SAFE_CALL(cudaFree(d_idata));//Release video memory;

CUDA_SAFE_CALL(cudaFree(d_odata))；CUDA_SAFE_CALL(cudaFree(d_odata));

除了上面提到的这些功能，初始化集成器还有选择数据流处理算法(具体到某类算法中的某种算法)，选择概要抽取方法，各种数据流处理算法、概要抽取方法的初始化(例如Haar Wavelet是获得执行完全分解时的分解层数，k-means是初始化聚类中心点)的功能。In addition to the functions mentioned above, the initialization integrator also selects the data stream processing algorithm (specifically to a certain algorithm in a certain type of algorithm), selects the summary extraction method, and initializes various data stream processing algorithms and summary extraction methods (such as Haar Wavelet is to obtain the number of decomposition layers when performing a complete decomposition, and k-means is the function of initializing the cluster center point).

二、滑动窗口的形式化表示为：CircularSW＝<w,num,front,fun>；2. The formal expression of the sliding window is: CircularSW=<w, num, front, fun>;

其中，w为滑动窗口(SW)的宽度；num为当前滑动窗口中数据流的数据量；front为滑动窗口中末端数据的标记，新到的数据放置在这个位置；fun为滑动窗口的格局变换函数，它决定了新数据到来时滑动窗口中已存在数据的格局变化。fun的定义如下表1所示：Among them, w is the width of the sliding window (SW); num is the data volume of the data stream in the current sliding window; front is the mark of the end data in the sliding window, and the newly arrived data is placed in this position; fun is the pattern transformation of the sliding window function, which determines the pattern change of the existing data in the sliding window when new data arrives. The definition of fun is shown in Table 1 below:

表1可重写循环滑动窗口的格局变换函数Table 1. Pattern transformation function of rewritable circular sliding window

可重写循环滑动窗口的新颖之处就在于它直接计算出了过期数据(即将被移出数据)的位置，新到达的数据则放置到该位置，直接覆盖窗口中原来的数据即可，另外还需要修改front值，使其指向窗口中数据的末端，即front永远指向新数据。与以往的滑动窗口相比，可重写循环滑动窗口能够提升数据流处理系统的效率，它使用同样的存储空间，又避免了窗口内数据的移动，并且允许更细粒度的并发控制。The novelty of the rewritable circular sliding window is that it directly calculates the position of expired data (to be removed from the data), and the newly arrived data is placed in this position, directly covering the original data in the window. The front value needs to be modified so that it points to the end of the data in the window, that is, the front always points to new data. Compared with the previous sliding window, the rewritable circular sliding window can improve the efficiency of the data flow processing system. It uses the same storage space, avoids the movement of data in the window, and allows finer-grained concurrency control.

三、为了提高处理效率，概要抽取方法均需要GPU加速，比如小波方法中的小波分解、略图方法中的哈希映射等。下面以小波方法中的哈尔小波(Haar Wavelet)为例进行详细介绍。小波方法是一种重要的数据压缩方法，通过对原始数据集进行小波变换，保存部分重要的小波系数，能够近似地还原出原始数据集合。Haar Wavelet是小波中最简单的一种，二维或三维(2D或3D)的小波变换均可分解为2个或3个一维(1D)的小波变换。一维HaarWavelet分解是将向量变换为小波系数。表2演示了该序列的Haar Wavelet变换。3. In order to improve the processing efficiency, the summary extraction methods all need GPU acceleration, such as the wavelet decomposition in the wavelet method, the hash map in the sketch method, etc. In the following, Haar wavelet (Haar Wavelet) in the wavelet method is taken as an example to introduce in detail. The wavelet method is an important data compression method. By performing wavelet transformation on the original data set and saving some important wavelet coefficients, the original data set can be restored approximately. Haar Wavelet is the simplest type of wavelet. A two-dimensional or three-dimensional (2D or 3D) wavelet transform can be decomposed into two or three one-dimensional (1D) wavelet transforms. One-dimensional HaarWavelet decomposition is to transform the vector into wavelet coefficients. Table 2 demonstrates the Haar Wavelet transform of this sequence.

表2序列X的Haar Wavelet变换Table 2 Haar Wavelet transform of sequence X

具体计算如下：对Resolution列中层次，Averages列中是原始序列。对原始序列数据两两分对求其均值，得到层次中的Averages；The specific calculation is as follows: For the hierarchy in the Resolution column, the original sequence is in the Averages column. Calculate the mean value of the original sequence data in pairs to obtain the Averages in the hierarchy;

显然，在求Averages的过程中，我们丢掉了原始序列中的某些信息，单靠平均值是无法重构原始序列的，平均值只是近似值，所以为了重构原始数据，还需要保存细节系数，即将每对数据中的平均值和第2个数据的差保存在Detail Coefficients列中；Obviously, in the process of finding the Averages, we lost some information in the original sequence. The original sequence cannot be reconstructed by the average value alone. The average value is only an approximation. Therefore, in order to reconstruct the original data, we also need to save the detail coefficients. That is, the difference between the average value of each pair of data and the second data is stored in the Detail Coefficients column;

依次进行下去。该序列的小波系数由第0层的平均值和全部细节系数组成。Go on in turn. The wavelet coefficients of this sequence consist of the mean value of layer 0 and all detail coefficients.

以一维哈尔小波变换为例，给出GPU设备的核心代码如下：Taking the one-dimensional Haar wavelet transform as an example, the core code of the GPU device is given as follows:

四、数据流处理算法中的聚类算法的k-means算法：4. The k-means algorithm of the clustering algorithm in the data stream processing algorithm:

k-means算法是最具有代表性的聚类算法，它的主要目的是对具有相同数据类型的样本数据按距离最短规则进行集合的划分，最终获取各等价类。本文用距离表示数据间的相似程度。然而，由于数据流的特殊性，传统聚类算法很难在数据流上实现，所以我们在数据流处理模型库中以哈尔小波为例进行了概要抽取。基于这种小波概要，能够快速计算数据流与聚类中心之间的近似距离。这样，k-means算法实现起来就容易多了。The k-means algorithm is the most representative clustering algorithm. Its main purpose is to divide the sample data with the same data type according to the shortest distance rule, and finally obtain the equivalence classes. In this paper, distance is used to represent the similarity between data. However, due to the particularity of data streams, traditional clustering algorithms are difficult to implement on data streams, so we use Haar wavelet as an example to perform summary extraction in the data stream processing model library. Based on this wavelet summary, approximate distances between data streams and cluster centers can be quickly calculated. In this way, the k-means algorithm is much easier to implement.

在聚类中，欧几里德距离(即欧式距离)有着非常直观的意义。In clustering, Euclidean distance (that is, Euclidean distance) has a very intuitive meaning.

一个数据项与一个数据集之间的距离定义为该数据项与该数据集中所有数据项当中距离最小值。The distance between a data item and a data set is defined as the minimum distance between the data item and all data items in the data set.

该问题所对应的是计算密集型的任务(即求解大量的“距离”)，因此在算法上并没有太大的优化空间。而在“增强并行性”上做文章无疑会带来显著的加速效果，因为不同距离之间的计算是完全不存在依赖性的。可以说，这是一个GPU拥有绝对优势的任务场合。This problem corresponds to a computationally intensive task (that is, solving a large number of "distances"), so there is not much room for optimization in the algorithm. And making a fuss about "enhanced parallelism" will undoubtedly bring significant acceleration effects, because the calculations between different distances are completely independent. It can be said that this is a task occasion where the GPU has an absolute advantage.

高维数据流中每个数据项的维数通常很高，所以，实现时我们用自定义的矩阵来存储数据。聚类时，需要GPU频繁计算矩阵的转置、数据项间距离的平方和等极为耗时的操作。设备端k-means算法的核心代码如下。其中，每个thread对应一个距离。The dimensionality of each data item in a high-dimensional data stream is usually very high, so we use a custom matrix to store data during implementation. When clustering, the GPU needs to frequently calculate the transposition of the matrix, the sum of the squares of the distances between data items and other extremely time-consuming operations. The core code of the device-side k-means algorithm is as follows. Among them, each thread corresponds to a distance.

五、计算阵列中的线程网格(Grid)、线程块(Block)和线程(Thread)分别被加载到计算阵列的流处理器阵列(SPA)、流多处理器(SM)和流处理器(SP)上执行。线程网格之间通过显存交换数据，而各个线程块是并行执行的，不能相互通信，只能通过显存共享数据；同一线程块内的线程可以通过共享存储器(Shared Memory)和同步实现通信。5. The thread grid (Grid), thread block (Block) and thread (Thread) in the computing array are respectively loaded into the stream processor array (SPA), stream multiprocessor (SM) and stream processor ( SP). Data is exchanged between thread grids through video memory, and each thread block is executed in parallel and cannot communicate with each other, but can only share data through video memory; threads in the same thread block can communicate through shared memory (Shared Memory) and synchronization.

由于数据流的高维特性，在进行任务划分时，我们将线程网格(Grid)和线程块(Block)的维度都设计成是二维的。下面这个是CPU主机端代码，用于设置运行参数，即线程网格的形状和线程块的形状。其中，gridDim,blockDim,blockIdx,threadIdx是CUDA C中的内建变量。Due to the high-dimensional nature of the data flow, we designed the dimensions of the thread grid (Grid) and thread block (Block) to be two-dimensional when performing task division. The following is the CPU host-side code, which is used to set the running parameters, that is, the shape of the thread grid and the shape of the thread block. Among them, gridDim, blockDim, blockIdx, threadIdx are built-in variables in CUDA C.

dim3grid(gridDim.x,gridDim.y,1)；//第三维恒为1dim3grid(gridDim.x,gridDim.y,1);//The third dimension is always 1

dim3block(blockDim.x,blockDim.y,1)；//第三维可以不为1，因block是二维，所以置1dim3block(blockDim.x,blockDim.y,1);//The third dimension may not be 1, because the block is two-dimensional, so it is set to 1

其中，blockIdx.x∈[0,gridDim.x－1]，blockIdx.y∈[0,gridDim.y－1]，threadIdx.x∈[0,blockDim.x－1]，threadIdx.y∈[0,blockDim.y－1]。Among them, blockIdx.x∈[0, gridDim.x－1], blockIdx.y∈[0, gridDim.y－1], threadIdx.x∈[0, blockDim.x－1], threadIdx.y∈[0 ,blockDim.y-1].

图中存在两个层次的并行，即线程网格(Grid)中的线程块(Block)间并行和线程块(Block)中的线程(Thread)间并行。(N+1)表示线程块(block)总数，(M+1)表示线程(thread)总数。There are two levels of parallelism in the figure, that is, the parallelism among the thread blocks (Block) in the thread grid (Grid) and the parallelism among the threads (Thread) in the thread block (Block). (N+1) represents the total number of thread blocks (block), and (M+1) represents the total number of threads (thread).

(N+1)＝gridDim.x*gridDim.y≤65535*65535，其中gridDim.x≤65535，gridDim.y≤65535。(N+1)=gridDim.x*gridDim.y≤65535*65535, where gridDim.x≤65535, gridDim.y≤65535.

(M+1)＝blockDim.x*blockDim.y≤1024，其中blockDim.x≤512，blockDim.y≤512。(M+1)=blockDim.x*blockDim.y≤1024, where blockDim.x≤512, blockDim.y≤512.

由于线程网格(Grid)和线程块(Block)是二维的，所以在内核函数中也要使用二维的索引。下面是设备端代码，用于计算线程索引，即确定线程(Thread)在整个线程网格(Grid)中的位置。Since the thread grid (Grid) and the thread block (Block) are two-dimensional, a two-dimensional index is also used in the kernel function. The following is the code on the device side, which is used to calculate the thread index, that is, to determine the position of the thread (Thread) in the entire thread grid (Grid).

unsigned int bid_in_grid＝blockIdx.x+blockIdx.y*gridDim.x；unsigned int bid_in_grid = blockIdx.x + blockIdx.y * gridDim.x;

unsigned int tid_in_block＝threadIdx.x+threadIdx.y*blockDim.x；unsigned int tid_in_block = threadIdx.x + threadIdx.y * blockDim.x;

unsigned int tid_in_grid_x＝threadIdx.x+blockIdx.x*blockDim.x；//xunsigned int tid_in_grid_x = threadIdx.x + blockIdx.x * blockDim.x; //x

unsigned int tid_in_grid_y＝threadIdx.y+blockIdx.y*blockDim.y；//yunsigned int tid_in_grid_y = threadIdx.y + blockIdx.y * blockDim.y; //y

unsigned int tid_in_grid＝tid_in_grid_x+tid_in_grid_y*blockDim.x*gridDim.x；//offset。unsigned int tid_in_grid = tid_in_grid_x + tid_in_grid_y * blockDim.x * gridDim.x; //offset.

另外，为了有效利用执行单元，设计线程块(Block)时，应该尽量使每个block中的线程数量是32的整数倍，最好是保持在64～256之间。为了充分利用GPU的资源，提高其执行效率，代码实现时我们必须要对线程网格(Grid)和线程块(Block)的维度设计格外重视。In addition, in order to effectively utilize the execution unit, when designing a thread block (Block), the number of threads in each block should be an integer multiple of 32, preferably between 64 and 256. In order to make full use of GPU resources and improve its execution efficiency, we must pay special attention to the dimensional design of the thread grid (Grid) and thread block (Block) when implementing the code.

以上说明仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权力要求书的保护范围为准。The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto, any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention, All should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. A data stream processing system on a GPU, characterized in that the data stream of a data source passes through the data stream processing system to the client, and the data stream processing system includes a CPU host and a GPU device;

The CPU host includes a CPU-side loading engine module, a CPU-side buffer module, a data stream preprocessing module, a data stream load reduction module, and a visualization module. The CPU-side loading engine module is provided with a loading or storage unit, and the CPU-side buffer module is provided with a memory. The loading or storage unit of the loading engine module, the data stream preprocessing module, the data stream offloading module and the visualization module all interact with the memory connection of the buffer module on the CPU side, and the loading or storage unit of the loading engine module on the CPU side interacts with the visualization module;

The GPU device includes a GPU-side loading engine module, a GPU-side buffer module, a data flow summary extraction module, a data flow processing model library, and a data flow processing module. The GPU-side loading engine module is provided with a loading or storage unit, and the GPU-side buffer module is provided with a video memory , the data flow summary extraction module is used to integrate the summary extraction method for the data flow processing module to call, the data flow processing model library is used to integrate the data flow processing algorithm for the data flow processing module to call, the loading or storage unit of the GPU-side loading engine module, data The stream processing module interacts with the video memory connection of the GPU-side buffer module, the data stream summary extraction module, and the data stream processing model library are all connected to the loading or storage unit of the GPU-side loading engine module, and a storage space is opened up in the video memory of the GPU-side buffer module is a sliding window;

The loading or storage unit of the CPU-side loading engine module interacts with the data source, the loading or storage unit of the GPU-side loading engine module, and the client through the Internet;

The loading or storage unit of the CPU-side loading engine module is used to store the data stream of the data source in the memory of the CPU-side buffer module, and is used to send the preprocessed data flow back from the memory to the GPU-side loading engine through the Internet The loading or storage unit of the module is used to load the data results returned by the loading or storage unit of the GPU-side loading engine module into the memory, and to send the data results to the visualization module, and to display the normalized data results of the visualization module to the client;

The data stream preprocessing module is used to preprocess the original data stream in the memory of the CPU-side buffer module, and store the preprocessed data stream into the memory;

The memory of the CPU-side buffer module is used to interact with the load or storage unit of the CPU-side load engine module, and to interact with the data stream preprocessing module;

The loading or storing unit of the loading engine module on the GPU side is used to load the preprocessed data stream transmitted by the loading or storing unit of the loading engine module on the CPU side through the Internet into the sliding window of the video memory of the buffering module on the GPU side, and used for Send the data result returned by the video memory to the loading or storage unit of the loading engine module on the CPU side through the Internet;

The data flow processing module is used to call the summary extraction method in the data flow summary extraction module, extract the summary of the data flow in the sliding window, and store the final summary data structure in the video memory; and the data flow processing module is used to call The data flow processing algorithm in the data flow processing model library processes the summary data and stores the processed data results in the video memory;

The video memory of the GPU-side buffer module is used to interact with the loading or storage unit of the GPU-side loading engine module, and is used to interact with the data stream processing module;

The visualization module is used to normalize the data result and send it to the load or store unit of the CPU-side load engine module.

2. the data flow processing system on a kind of GPU according to claim 1, it is characterized in that CPU end buffer module is also provided with the memory manager that is used to manage internal memory, is provided with input monitor in the memory manager, input monitor is used for Monitor the temporary storage of unprocessed data streams in the memory; the CPU-side load engine module includes a speed regulator, a load or store unit, and an initialization integrator; the speed regulator is used to adjust the data flow of the data source to flow into the CPU-side load engine according to the cache state of the memory The flow rate in the loading or storage unit of the module has a feedback mechanism in the speed regulator; the initialization integrator is used to integrate the initialization operation of the CPU host and GPU device.

3. A data stream processing method on a GPU, characterized in that the data stream output by the data source is processed by any one of the data stream processing systems in claim 1 or 2, and the data result is transmitted to the client; the processing flow of the data stream is as follows :

(1) Loading data stream: the data stream in the data source flows into the loading or storage unit of the CPU-side loading engine module, and the loading or storage unit of the CPU-side loading engine module stores the data stream into the memory of the CPU-side buffer module;

(2) Data stream preprocessing: the data stream preprocessing module preprocesses the original data stream in the memory, and stores the preprocessed data stream into the memory;

(3) Transmission data stream: the preprocessed data stream is sent from the memory to the loading or storage unit of the CPU-side loading engine module, from the loading or storage unit of the CPU-side loading engine module to the Internet, and then to the GPU side for loading via the Internet. The loading or storage unit of the engine module is loaded into the sliding window of the video memory by the loading or storage unit of the engine module on the GPU side;

(4) Data flow summary extraction: the data flow processing module calls the summary extraction method in the data flow summary extraction module, performs summary extraction on the data stream in the sliding window, and stores the final summary data structure in the video memory;

(5) Data stream processing: the data stream processing module calls the data stream processing algorithm in the data stream processing model library to process the summary data, and stores the processed data results in the video memory;

(6) Data transmission results: The data results are sent from the video memory of the buffer module on the GPU side to the loading or storage unit of the loading engine module on the GPU side, and then sent to the Internet by the loading or storage unit of the loading engine module on the GPU side, and then reach the CPU via the Internet The load or storage unit of the load engine module on the CPU side loads the data result into the memory by the load or storage unit of the load engine module on the CPU side, or sends the data result to the visualization module by the load or storage unit of the load engine module on the CPU side;

(7) Result visualization: the visualization module normalizes the data results and sends them to the loading or storage unit of the CPU-side loading engine module, and the loading or storage unit of the CPU-side loading engine module displays the data results to the client.

4. A data stream processing method on GPU according to claim 3, characterized in that in step (2), the data stream preprocessing module preprocesses the original data stream in the memory using a preprocessing method, and the preprocessing method Including data cleaning methods, data integration methods, data transformation methods and data specification methods, the preprocessing method used can be one of the above methods, or a combination of multiple methods can be selected.

5. A data stream processing method on a GPU according to claim 3, characterized in that after step (2), when the data stream is overloaded, the data stream is subjected to load reduction processing through the data stream load reduction module, and the specific steps are:

Ⅰ. The input monitor of the memory manager of the CPU side buffer module monitors the temporary storage of unprocessed data streams in the memory, and determines whether the volume of the new data stream exceeds the data stream processing module of the GPU device within a time unit. processing capacity;

Ⅱ. If the data stream processing module of the GPU device can handle all the data streams, then transmit the data stream; if the volume of the new data stream exceeds the processing capacity of the data stream processing module of the GPU device, that is, the data stream is overloaded, then Transfer the data flow to the data flow offloading module;

Ⅲ. The data stream load reduction module performs load reduction processing on the data stream;

Ⅳ. The data stream load reduction module transfers the remaining data streams into the memory for the next step of transmitting data streams.

6. The data stream processing method on a GPU according to claim 5, wherein the data stream load reduction module adopts one or more combinations of the following strategies for data stream load reduction processing:

i. Data-based discarding: among the received and unprocessed data in the time series data stream, find the data with the longest length and discard them;

ⅱ. Attribute-based trimming: Each data in the data stream has an attribute. In the data, the attribute with the lowest frequency is removed, so as to trim the attribute of the data;

ⅲ. Priority-based discarding: Each new incoming data stream is assigned a priority, among the received and unprocessed data streams, select those with the lowest priority and discard them.

7. the data flow processing method on a kind of GPU according to claim 3, it is characterized in that display memory uses global memory to store various data, and the sliding window that opens up in display memory is used for saving the data flow that arrives at GPU device from CPU host Up, the sliding window uses a sliding window defined based on the number of tuples, that is, a sliding window with a fixed window size, which is used to save the recently arrived data stream;

The data flow processing method of the sliding window uses the rewritable loop sliding window method. When updating, the new data directly covers the data to be expired, and a format transformation function is provided to maintain the logical format state of the data in the sliding window.

8. A data stream processing method on a GPU according to claim 3, characterized in that the data stream summary extraction in step (4), the data stream is obtained through a summary extraction method to obtain a summary data structure, and the summary extraction method includes a sampling method, Wavelet method, sketch method and histogram method.

9. The data stream processing method on a GPU according to claim 3, wherein the data stream processing model library integrates various data stream processing algorithms during data stream processing, including query processing algorithms, clustering algorithms, classification algorithm, frequent itemset mining algorithm and correlation analysis algorithm among multiple data streams.

10. A data stream processing method on a GPU according to claim 3, 7, 8 or 9, wherein the task of the data stream processing module is to call the summary extraction method of the data stream summary extraction module to extract the summary of the data stream , and call the data stream processing algorithm in the data stream processing model library to perform parallel calculations on the summary data; the data stream processing module includes a data stream input assembler, a global thread block scheduler and a computing array, and the computing array is provided with shared memory, loading or storage unit; the data stream input assembler is responsible for reading the data in the video memory into the shared memory of the data stream processing module, and the global thread block scheduler is responsible for scheduling, allocating and managing the thread blocks, threads and instructions in the shared memory, computing The array is used for the calculation of the thread; the load or storage unit in the calculation array loads data from the video memory to the shared memory during calculation.