Intel iGPU+dGPU Multi-Adapter Tech Shows Promise Thanks to its Realistic Goals

Intel is revisiting the concept of asymmetric multi-GPU introduced with DirectX 12. The company posted an elaborate technical slide-deck it originally planned to present to game developers at the now-cancelled GDC 2020. The technology shows promise because the company isn’t insulting developers’ intelligence by proposing that the iGPU lying dormant be made to shoulder the game’s entire rendering pipeline for a single-digit percentage performance boost. Rather, it has come up with innovating augments to the rendering path such that only certain lightweight compute aspects of the game’s rendering be passed on to the iGPU’s execution units, so it has a more meaningful contribution to overall performance. To that effect, Intel is on the path of coming up with SDK that can be integrated with existing game engines.

Microsoft DirectX 12 introduced the holy grail of multi-GPU technology, under its Explicit Multi-Adapter specification. This allows game engines to send rendering traffic to any combinations or makes of GPUs that support the API, to achieve a performance uplift over single GPU. This was met with lukewarm reception from AMD and NVIDIA, and far too few DirectX 12 games actually support it. Intel proposes a specialization of explicit multi-adapter approach, in which the iGPU’s execution units are made to process various low-bandwidth elements both during the rendering and post-processing stages, such as Occlusion Culling, AI, game physics, etc. Intel’s method leverages cross-adapter shared resources sitting in system memory (main memory), and D3D12 asynchronous compute, which creates separate processing queues for rendering and compute.

Intel developed easy code for game engine developers to integrate the new tech, with code for creating cross-adapter resources, shared heaps, and resources. The presentation also includes examples of how to how to leverage async compute and get the lightweight rendering- and compute paths to work with as little latency as possible. Intel also developed code for cross-adapter synchronization, called Intel Command Queue Throttle. This piece of code ensures performance and low frame-times when when the load is inconsistent between the iGPU and dGPU.All current Intel Graphics drivers include support for the extension, and Intel has started giving out headers for the extension through its developer support. Intel notes that its method can be used for various kinds of async compute tasks such as shadows, AI, mesh deformation, and physics. Load on the system’s PCIe and memory bandwidth is minimized because the iGPU isn’t made to handle heavyweight resources such as texture filtering.Intel iGPUs are approaching the 1 TFLOPs compute power barrier, with Gen11 and the upcoming Xe-based iGPU debuting with “Tiger Lake.” That’s a lot of compute power not to take advantage of. Intel’s tech can prove particularly useful with notebooks that have entry- thru mid-range discrete GPUs, as all Intel mobile processors pack iGPUs and implement dynamic switching between iGPU and dGPU.The complete Intel presentation follows.