Can the GPU use the main computer RAM (as an extension)?
Short answer: No, you can't.
Longer answer: The bandwidth, and more importantly, latency between the GPU and RAM over the PCIe bus is an order of magnitude worse than between the GPU and VRAM, so if you are going to do that you might as well be number crunching on the CPU.
CPU can use a part of VRAM (part mapped into the PCI aperture, usually 256MB) directly as RAM, but it will be slower than regular RAM because PCIe is a bottleneck. Using it for something like swap might be feasible.
It used to be possible to increase the memory aperture size by changing the strap bits on the GPU BIOS, but I haven't tried this since Nvidia Fermi (GeForce 4xx) GPUs. If it still works, it is also required that your BIOS is up to the task of mapping apertures bigger than standard (it is highly unlikely to have ever been tested on a laptop).
For example, a Xeon Phi compute card needs to map it's entire RAM into the PCI aperture, so it needs a 64-bit capable BIOS in the host that knows how to map apertures above the traditional 4GB (32-bit) boundary.
Yes. This is the "shared" memory between the CPU and GPU, and there is always going to be a small amount required as buffers to transfer dataat the GPU but it can also be used as a slower "backing" to the graphics card in much the same way as a pagefile is a slower backing store to your main memory.
You can find shared memory in use in the built-in Windows Task Manager by going to the Performance tab and clicking on your GPU.
Shared memory will be slower than your GPU memory though, but probably faster than your disk. Shared memory will be your CPU memory which may operate up to 30GB/s on a reasonably new machine, but your GPU memory is probably able to do 256GB/s or more. You will also be limited by the link between your GPU and CPU, the PCIe bridge. That may be your limiting factor and you will need to know whether you have a Gen3 or Gen4 PCIe and how many lanes (usually "x16") it is using to find out total theoretical bandwidth between CPU and GPU memory.
As far as I know, you can share the host's RAM as long as it is page-locked (pinned) memory. In that case, data transfer will be much faster because you don't need to explicitly transfer data, you just need to make sure that your synchronize your work (with cudaDeviceSynchronize
for instance, if using CUDA).
Now, for this question:
I am just wondering, if the total memory (all the variables and all the arrays) in my kernel hits 6 GB of the GPU RAM, can I somehow use the CPU’s one?
I don't know if there is a way to "extend" the GPU memory. I don't think the GPU can use pinned memory that is bigger than its own, but I am not certain. What I think you could do in this case is to work in batches. Can your work be distributed so that you only work on 6gb at a time, save the result, and work on another 6gb? In that case, then working in batches might be a solution.
For example, you could implement a simple batching scheme like this:
int main() {
float *hst_ptr = nullptr;
float *dev_ptr = nullptr;
size_t ns = 128; // 128 elements in this example
size_t data_size = ns * sizeof(*hst_ptr);
cudaHostAlloc((void**)&hst_ptr, data_size, cudaHostAllocMapped);
cudaHostGetDevicePointer(&dev_ptr, hst_ptr, 0);
// say that we want to work on 4 batches of 128 elements
for (size_t cnt = 0; cnt < 4; ++cnt) {
populate_data(hst_ptr); // read from another array in ram
kernel<<<1, ns>>>(dev_ptr);
cudaDeviceSynchronize();
save_data(hst_ptr); // write to another array in ram
}
cudaFreeHost(hst_ptr);
}