分配显存空间

示例代码：

#define N 10
char *p =NULL；
cudaMalloc((void**)&p,N*sizeof(char));

为什么要使用void**?

All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that’s why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, you need to pass a double-pointer.

Another well-known function which operates on addresses for the same reason is the scanf function. How many times have you forgotten to write this & before the variable that you want to store the value to? ;)
翻译过来核心观点就是：
因为cudaMalloc 返回的的error code ,没有返回指针，且所有参数都是引用传递。当想要修改一个值的时候，必须传递这个值的地址。所以想要修改一个指针的值，就必须传递二级指针

GPU调用显存内的空间：

示例代码：

对字符在GPU进行加密操作
操作完成后通过CPU在显示器上显示

__global__ encrypt_p(char *s1,char *s2){
    int idx= threadIdx.x;
    s1[id]=s2[id]+3;
}

分配显存空间

char *d_A=NULL;
char *d_B=NULL;
cudaMalloc((void **)&d_A, strlen(A)*sizeof(char));
cudaMalloc((void **)&d_B, strlen(A)*sizeof(char));

将内存中的字符串赋值到显存中的字符串中

cudaMemcpy(d_A, A, strlen(A)*sizeof(char), cudaMemcpyHostToDevice);

在GPU中对显存中字符串进行操作

encrypt_p<<<1, strlen(A)>>>(d_A, d_B);

将显存中的字符串传回到内存中的字符串中

cudaMemcpy(B, d_B, strlen(A)*sizeof(char), cudaMemcpyDeviceToHost);