[go: up one dir, main page]

Skip to content

Development of deep learning inference code by OpenCL kerenl function.

License

Notifications You must be signed in to change notification settings

yester31/OpenCL_EX

Repository files navigation

OpenCL_EX

Enviroments

  • Windows 10 laptop
  • CPU 11th Gen Intel(R) Core(TM) i7-11375H @ 3.30GHz (cpu)
  • Intel(R) Iris(R) Xe Graphics (iGPU)
  • NVIDIA GeForce RTX 3060 Laptop GPU (gpu)

OpenCL Vector Add

  • vector sum

OpenCL Matrix Multiplication(in progress)

  • MatrixMultiplication project

    1. Naive Matrix Multiplication (Completed)

      • main.cpp with MatMul.cl
      • No limit on input size
    2. Matrix Multiplication Tiling in the local memoryv (Completed)

      • main2.cpp with MatMul2.cl
      • No limit on input size
    3. Matrix Multiplication Tiling in the local memory with register level(Verification Required)

      • main3.cpp with MatMul3.cl
    4. Register Blocking Matrix Multiplication(Verification Required)

      • main4.cpp with MatMul4.cl
    • Time check for A[1024, 1024] * B[1024, 1024] = C[1024, 1024] (wo : without data transfer time for Device)

OpenCL Convolution(in progress)

  1. GEMM Convolution(Completed)

    • process : im2col -> Matrix Multiplication -> col2im
    • Matrix Multiplication can be changed to better logic
  2. Naive Convolution(Completed)

  3. FFT Convolution(Plan)

  4. Winograd Convolution(Plan)

    • Time Check (input[1, 3, 512, 512] kernel[3, 3, 3, 3] output[ 1, 3, 510, 510])
      • GEMM Conv2d (gpu) : 1.76237 [msec]
      • Naive Conv2d (gpu) : 0.22528 [msec]
      • Naive Conv2d (cpu) : 22.32400 [msec]

OpenCL Bicubic Interpolation

  • Bicubic Interpolation
  • Add test code(validation results with pytorch Bicubic Interpolation)

OpenCL Nearnest Neighbor Interpolation

  • Nearnest Neighbor Interpolation
  • Add test code(validation results with pytorch Nearnest Neighbor Interpolation)

OpenCL BGR2YCbCr

  • transformation image data format BGR to YCbC, NHWC->NCHW
  • Add test code(validation results with python BGR2YCbC)

OpenCL YCbCr2BGR

  • transformation image data format YCbC to BGR, NCHW->NHWC
  • Add test code(validation results with python YCbC2BGR)

OpenCL Concat

  • Concatenate two tensor for channal side
  • Add test code(validation results with python concat)

Reference

About

Development of deep learning inference code by OpenCL kerenl function.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages