Portal: Prelude RTX is an impressive showcase of Nvidia’s RTX Remix technology, which takes what was once a source mod for Portal and gives it visual features and technology that rival and even surpass high-end triple AAA versions. It’s truly spectacular – and hopefully one of many remasters plotted in the future as the RTX Remix modding tools release nears.
More interestingly though, Prelude is also the first game to support RTX I/O, a GPU-accelerated decompression scheme running under Vulkan. It’s essentially an Nvidia-branded version of Direct Storage 1.2, which is also included in Ratchet and Clank: Rift Apart which launches on PC later this month. Its goal is to make games load faster and deliver assets faster on the PC platform, and its inclusion here gives us a good excuse to see how the technology works.
Historically, loading involved game data such as textures or models being transferred from a hard drive to system memory and then to the GPU under CPU control. This was a fairly latency-heavy serial approach, as the disk had to physically spin the spindle, locate the data, and then load the data block by block in a way that minimized the amount of seek required.
This technique worked quite well with relatively small game assets loaded from hard drives, but with games that are hundreds of gigabytes in size with extremely detailed assets, all of that data needs to be compressed to make good use of available storage space and bandwidth. This means that assets must be decompressed by the CPU before they can be used on the GPU, and the extra time and CPU load this imposes means the traditional approach is starting to break down.
Fortunately, the advent of fast, low-latency flash storage in SSDs means we don’t need to read data sequentially to minimize seek times – we can create a new standard. First, we want to access data in parallel to massively reduce load times compared to the old Windows I/O standard. Second, we want to make sure data is moved from storage to GPU Before it’s decompressed. GPUs have lots of cores and perform massively parallel tasks like decompression better than CPUs, so this approach saves a lot of time. This is the new system being considered for RTX IO and Direct Storage 1.2, and it offers faster load times and, when used in-game for streaming purposes, reduced CPU load which can potentially improve performance.
For RTX IO, as is the case here in Portal Prelude RTX, on-disk data is compressed using the GDeflate format and temporarily moved to system memory, then to VRAM and decompressed by the GPU. This GDeflate format is an open GPU compression standard from Nvidia, which has been outsourced to Microsoft and the Kronos group, and is the format I expect to see used in Direct Storage 1.2 games using DirectX on PC – with GPUs from Nvidia, AMD and Intel all supported.
In contrast, Portal: Prelude RTX uses the Vulkan graphics API, which has no agreed-upon, vendor-neutral standard calls for GPU decompression; as far as I know there are currently only extensions offered by Nvidia. These Nvidia extensions for GPU decompression could potentially be the ones that are being roughly adapted by the Kronos group for Vulkan’s Direct Storage Equivalent. In the meantime, fast GPU decompression in Portal Prelude: RTX will only work on drivers that support these specific extensions, i.e. on Nvidia RTX graphics cards.
However, Portal Prelude RTX still operates on a more traditional loading paradigm, which means RTX IO does not increase framerates. After all, RTX Remix doesn’t replace the game engine or change the way levels are split and loaded; Instead, RTX Remix simply changes how rendering is done and how assets are loaded to power that rendering. This is different from Ratchet and Clank: Rift Apart, which should also use GPU decompression to speed up gameplay. Portal Prelude RTX therefore benefits primarily in terms of dedicated load times and visible texture load times.
To test the effect of the technology here, I tested a version of the game running with RTX IO disabled and running on a 500MB/s capped SATA SSD. The game loads fairly quickly, but textures take a while to reach their best quality – without RTXIO’s GDeflate compression, the game on disc is entirely uncompressed and around 60% larger. Bandwidth is therefore taxed accordingly to move textures into VRAM, which takes a little over a second for the last texture to load. With RTX IO enabled, that same texture on a SATA SSD loads in less than half the time.
|Configuration||Load in game||Texture load|
|12900K + 500MB/s SATA SSD + RTX IO disabled||1.13s||2.36s|
|12900K + 500MB/s SATA SSD + RTX I/O on||0.67s||1.16s|
|12900K + 3.5GB/s NVMe SSD + RTX IO disabled||0.57s||1.45s|
|12900K + 3.5 GB/s NVMe SSD + RTX I/O on||0.53s||1.07s|
It’s not exactly the biggest difference in the real world, as half a second flies by, but the halving of the time is still impressive. After doing a number of tests in different configurations, I have two interesting takeaways. First, a 500MB/s SATA drive with RTX IO enabled beats a 3.5GB/s NVMe drive with RTX IO disabled – pretty exceptional. Second, the CPU and GPU hardware differences did not significantly impact load times, with the RTX 2060 Super + Core i9 12900K performing roughly the same as the same CPU with the flagship RTX 4090; an RTX 4070 and Ryzen 5 3600 system was also very close in terms of load times.
So Portal: Prelude RTX is a promising first outing of this technology on PC, but at the same time, it’s trivial as it applies to a game that uses an old loading paradigm in the first place. With games that use active streaming and no loading screens of any kind, like Ratchet and Clank: Rift Apart and other future games, that’s where this technology will do best. Of course, we can’t wait to cover this title very soon, with the game coming to PC on July 26th – so stay tuned.