How'd multi-GPU programming work with Vulkan?
Updated with more recent information, now that Vulkan exists.
There are two kinds of multi-GPU setups: where multiple GPUs are part of some SLI-style setup, and the kind where they are not. Vulkan supports both, and supports them both in the same computer. That is, you can have two NVIDIA GPUs that are SLI-ed together, and the Intel embedded GPU, and Vulkan can interact with them all.
Non-SLI setups
In Vulkan, there is something called the Vulkan instance. This represents the base Vulkan system itself; individual devices register themselves to the instance. The Vulkan instance system is, essentially, implemented by the Vulkan SDK.
Physical devices represent a specific piece of hardware that implements the interface to a GPU. Each piece of hardware that exposes a Vulkan implementation does so by registering its physical device with the instance system. You can query which physical devices are available, as well as some basic properties about them (their names, how much memory they offer, etc).
You then create logical devices for the physical devices you use. Logical devices are how you actually do stuff in Vulkan. They have queues, command buffers, etc. And each logical device is separate... mostly.
Now, you can bypass the whole "instance" thing and load devices manually. But you really shouldn't. At least, not unless you're at the end of development. Vulkan layers are far too critical for day-to-day debugging to just opt out of that.
There are mechanisms, core in Vulkan 1.1, that allow individual devices to be able to communicate some information to other devices. In 1.1, only certain kinds of information can be shared across physical devices (namely, fences and semaphores, and even then, only on Linux through sync files). While these APIs could provide a mechanism for sharing data between two physical devices, at present, the restriction on most forms of data sharing is that both physical devices must have matching UUIDs (and therefore are the same physical device).
SLI setups
Dealing with SLI is covered by two Vulkan 1.0 extensions: KHR_device_group
and KHR_device_group_creation
. The former is for dealing with "device groups" in Vulkan, while the latter is an instance extension for creating device-grouped devices. Both of these are core in Vulkan 1.1.
The idea with this is that the SLI aggregation is exposed as a single VkDevice
, which is created from a number of VkPhysicalDevice
s. Each internal physical device is a "sub-device". You can query sub-devices and some properties about them. Memory allocations are specific to a particular sub-device. Resource objects (buffers and images) are not specific to a sub-device, but they can be associated with different memory allocations on the different sub-devices.
Command buffers and queues are not specific to sub-devices; when you execute a CB on a queue, the driver figures out which sub-device(s) it will run on, and fills in the descriptors that use the images/buffers with the proper GPU pointers for the memory that those images/buffers have been bound to on those particular sub-devices.
Alternate-frame rendering is simply presenting images generated from one sub-device on one frame, then presenting images from a different sub-device on another frame. Split-frame rendering is handled by a more complex mechanism, where you define the memory for the destination image of a rendering command to be split among devices. You can even do this with presentable images.
In vulkan you need to enumerate the devices and select the one you want to work with. There will be nothing stopping you from trying to work with 2 different ones separately. Each vulkan call needs at least 1 parameter as context. The loader layer will then forward the call to the correct driver. Or you can load the functions for each device separately to avoid the loader's trampoline.
A generated frame will need to be forwarded to the card that is connected to the screen for display. So it's more likely that 1 GPU is responsible for graphics and the others are used for physics.Only a single device can be connected to a specific surface at a time so that device needs to get the rendered frame to copy it into the renderable image that gets pushed to the screen.