在 C# 呼叫 CUDA 的方法－No More Codes

最近開始研究CUDA，準備做演算法加速，因此生出這篇筆記。因CUDA 是叫用 nvcc 進行編譯，CUDA函數無法被C#直接呼叫，在同一個CUDA專案中只能用C或C++來呼叫 CUDA，因此必須在CUDA專案中加一個C/C++的wrapper函數，將CUDA專案包裝成C語言的DLL檔，然後在C#中用DllImport呼叫C函式來轉給CUDA計算。以下實作以Visual Studio 2017 Community與 CUDA 9.2 SDK為例，完整程式碼在https://github.com/ghostyguo/CudaDotNet。

建立Cuda/C++ DLL程式庫

先建立一個名稱為CudaDotNet空白Visual Studio方案：

undefined

在 CudaDotNet方案下，建立一個名稱為CudaKernel的Cuda專案，它會自動產生一個 kernel，cu檔：

undefined

完成後的方案總管：

undefined

先測試 CUDA 環境是否正常，先改寫一下kernel.cu 的 main()，加一行 getchar() 讓他執行完畢後可以暫停：

undefined

編譯後執行，能看到結果，表示 CUDA 環境正確：

undefined

之後我們要將這個 CudaKernal專案打包成 DLL，之後用不到 main()，而addWithCuda() 函數無法在 DLL輸出到 stderr，因此這裡先把 main() 與 addWithCuda() 裡面的所有 fprintd(stdrr，…) 註解掉或刪除：

undefined

在 CudaKernel專案新增一個 Visual C++的 CudaKernel.cpp 檔：

undefined

參考剛剛的 kernel.cu，在 CudaKernel.cpp 輸入以下程式碼，這個 AddVec() 函數是要在產生的 DLL 內提供C#程式呼叫使用，它會幫忙轉去呼叫由 CUDA 執行的 addWithCuda()函數：

#include <iostream>

#include <stdlib.h>

#include <cuda_runtime.h>

#include <vector_types.h>

//#include <helper_cuda.h>

#define DLLEXPORT __declspec(dllexport)

extern "C" DLLEXPORT cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size);

extern "C" DLLEXPORT bool AddVec(int* c, int* a, int* b, int size)

{

cudaError_t cudaStatus = addWithCuda(c, a, b, size);

return (cudaStatus == cudaSuccess);

}

像這樣：

undefined

修改 kernl.cu，將 addWituCuda()宣告前面也將加上 extern "C"：

undefined

完整程式碼如下：

#include "cuda_runtime.h"

#include "device_launch_parameters.h"

#include <stdio.h>

extern "C" cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size);

__global__ void addKernel(int *c. const int *a, const int *b)

{

int i = threadIdx.x;

c[i] = a[i] + b[i];

}

extern "C" cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size)

{

int *dev_a = 0;

int *dev_b = 0;

int *dev_c = 0;

cudaError_t cudaStatus;

// Choose which GPU to run on， change this on a multi-GPU system，

cudaStatus = cudaSetDevice(0);

if (cudaStatus != cudaSuccess) {

goto Error;

}

// Allocate GPU buffers for three vectors (two input， one output) ，

cudaStatus = cudaMalloc((void**)&dev_c, size * sizeof(int));

if (cudaStatus != cudaSuccess) {

goto Error;

}

cudaStatus = cudaMalloc((void**)&dev_a, size * sizeof(int));

if (cudaStatus != cudaSuccess) {

goto Error;

}

cudaStatus = cudaMalloc((void**)&dev_b, size * sizeof(int));

if (cudaStatus != cudaSuccess) {

goto Error;

}

// Copy input vectors from host memory to GPU buffers，

cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);

if (cudaStatus != cudaSuccess) {

goto Error;

}

cudaStatus = cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);

if (cudaStatus != cudaSuccess) {

goto Error;

}

// Launch a kernel on the GPU with one thread for each element，

addKernel << <1, size >> >(dev_c, dev_a, dev_b);

// Check for any errors launching the kernel

cudaStatus = cudaGetLastError();

if (cudaStatus != cudaSuccess) {

goto Error;

}

// cudaDeviceSynchronize waits for the kernel to finish， and returns

// any errors encountered during the launch，

cudaStatus = cudaDeviceSynchronize();

if (cudaStatus != cudaSuccess) {

goto Error;

}

// Copy output vector from GPU buffer to host memory，

cudaStatus = cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);

if (cudaStatus != cudaSuccess) {

goto Error;

}

Error:

cudaFree(dev_c);

cudaFree(dev_a);

cudaFree(dev_b);

return cudaStatus;

}

設定CudaKernel專案的屬性，將組態類型設定為動態程式庫 dll，以及CLR支援，否則無法產生支援 .NET 的程式庫：

undefined

進行編譯，即可得到 DLL檔。

建立C#專案

在 CudaDotNet 方案下，新增一個 CudaUI 的 C# Windows Form專案：

undefined

將Form1改名為MainForm，並在畫面上增加一個名為 tbOutput的TextBox元件以及btnRun按鈕：

undefined

在專案中的參考中加入剛剛的 CudaKernel.dll 檔：

undefined

MainForm程式中一開始加入下一行：

using System.Runtime.InteropServices;

在MainForm類別一開始加入DllImport敘述：

[DllImport("CudaKernel.dll", EntryPoint = "AddVec")]

private static extern bool AddVec(int[] c, int[] a, int[] b, int size);

在btnRun的Click事件中加入以下程式碼，完整程式碼如下：

using System;

using System.Runtime.InteropServices;

using System.Windows.Forms;

namespace CudaUI

{

public partial class MainForm : Form

{

[DllImport("CudaKerne.dll"， EntryPoint = "AddVec")]

private static extern bool AddVec(int[] c, int[] a, int[] b, int size);

public MainForm()

{

InitializeComponent();

}

private void btnRun_Click(object sender， EventArgs e)

{

int arraySize = 5;

int[] a = new int[] { 1, 2, 3, 4, 5 };

int[] b = new int[] { 10, 20, 30, 40, 50 };

int[] c = new int[arraySize];

bool result = AddVec(c, a, b, arraySize);

tbOutput.Text = "";

for (int i=0; i<arraySize; i++)

{

tbOutput.Text += c[i].ToString() + " ";

}

將CudaUI設定為起始程式，然後執行，如果有碰到以下錯誤，是因為編譯平台選項設定的關係：

undefined

打開 CudaUI 專案的屬性設定，將目標平台設定成與 CudaKernel 相同即可 (這裡是用x64, 原本預設是AnyCPU)：

undefined

成功的畫面：

undefined

ghostyguo

No More Codes

ghostyguo 發表在痞客邦留言(0) 人氣()

E-mail轉寄

No More Codes

歡迎光臨ghostyguo在痞客邦的小天地

在 C# 呼叫 CUDA 的方法

歷史上的今天

留言列表

文章分類

雜項記錄 (3)

硬體設計 (1)

訊號處理 (4)

Raspberry Pi (1)

程式設計 (19)

熱門文章

最新留言

參觀人氣

QR Code

文章搜尋

最新文章