Quantum Error Correction with GPUs: Real-Time Fault Tolerance via Hybrid Control
Quantum error correction (QEC) is essential for building scalable quantum computers. But reaching fault tolerance requires more than clever codes. It depends on the entire hardware-software stack, from the fidelity of physical qubits to the performance of the classical control system that coordinates them.
Let’s explore the critical hurdles on the path to fault tolerance, from surpassing the error threshold, to qubit drift and QEC latency – and how new hybrid control architectures are being engineered to overcome them.
Step Zero: Surpassing the Error Threshold
The first step toward fault tolerance is to achieve and maintain low physical error rates. For many leading QEC codes, such as the surface code, this means consistently keeping physical gate errors of raw uncorrected qubits below a threshold of approximately 0.1% (10⁻³). Surpassing this threshold is critical, as the code can then correct errors faster than they accumulate. While demonstrating this on a single qubit pair is a milestone, a true quantum processor must sustain these values across many qubits for the full duration of a long computation.
This is challenging because quantum systems are not static and their physical parameters drift over time. To quantify this, we ran millions of simple sequences of reset → π pulse → readout on a transmon qubit at the Israeli Quantum Computing Center, with each sequence taking 2.5 µs. A time trace of the results, shown in Figure 1, demonstrates how the gate fidelity drifts, observing both gradual drifts and sudden jumps.
Figure 1 – Qubit drift over short and long timescales, as observed through repeated fidelity measurements.
Using Allan deviation analysis, we found that significant parameter drift begins on timescales of 10 to 100 milliseconds. Since quantum computations often span across these timescales, drift cannot be ignored.
To accurately track and counteract these fluctuations, a feedback system must run significantly faster than the drift it’s meant to correct. This means calibration and correction cycles need to operate at kilohertz rates, so at least ten times faster than the onset of drift.
The Next Hurdle: Decoding and Real-Time Communication
The next major challenge in fault-tolerant quantum computing is executing the error correction code itself, a process that hinges on a demanding computational task called decoding. After performing a series of stabilizer measurements to detect potential errors, the system must quickly interpret the measurement outcomes, identify the most likely error, and determine the corresponding correction.
This decoding step is a race against time. It must happen fast enough to avoid a decoding bottleneck, where the stream of error data builds up faster than it can be processed. Sometimes multiple decodings must happen simultaneously and inform one another, in a complex web of classical computing tasks. For superconducting qubits, the allowed latency for decoding is around 10 µs [1]. If this latency is exceeded, errors accumulate faster than they can be corrected, defeating the entire purpose of QEC.
But meeting this tight latency window is about more than just stabilization. It is also a prerequisiste for mid-circuit feedforward operations, where the outcome of a measurement influences the next steps in the pulse sequence. This is essential for executing advanced protocols, especially those involving non-Clifford operations like the T gate, which are required for universal quantum computing.
A Hybrid Architecture for a Hybrid Problem
To meet these demands, NVIDIA and Quantum Machines teamed up to develop a novel hybrid quantum-classical architecture, NVIDIA DGX-Quantum, the first reference architecture of its kind, integrating CPU, GPU, and QPU. It connects the OPX1000 quantum controller to an NVIDIA Grace Hopper Superchip via a high-speed PCIe interface (see Figure 2).
This setup allows computationally heavy tasks like QEC decoding to be offloaded to the powerful CPU or GPU on the server, while the OPX handles the real-time pulse sequencing and orchestration.
Figure 2 – NVIDIA DGX-Quantum reference architecture for fault tolerance.
Our benchmarks, which measure the full roundtrip from the OPX1000 to the Grace Hopper Superchip and back, demonstrate a breakthrough in low-latency integration:
- GPU roundtrip latency of ~3.5 µs
- CPU roundtrip latency under 3 µs
This sub-4-microsecond latency is a critical achievement. It shows that the architecture can execute a full round-trip, meaning sending measurement data, offloading calculations, and returning corrective actions, well within the real-time constraints of QEC. Not only does it carry out communication well below the 10 µs threshold, but it leaves ample time to perform even complex calculations, from Bayesian estimations to large-scale decoding, and much more.
Nvidia DGX Quantum isn’t only a reference architecture. It is already delivering results for customers today. For example, at Diraq in Sydney, engineers integrated silicon qubits with the NVIDIA Grace Hopper GPUs and Quantum Machines’s OPX1000. Within a week, they demonstrated real-time readout enhancement, ML-driven calibration, and accelerated state initialization, applications that previously required offline processing or hours of manual tuning. It’s important to emphasize that NVIDIA DGX Quantum isn’t a prototype. It’s a working system that’s already solving real bottlenecks in quantum computing.
Tackling Drift with Real-Time Reinforcement Learning
The low-latency feedback loop is powerful not just for QEC, but also for dynamic calibration during execution. At the Israeli Quantum Computing Center, we used it to perform real-time optimization of multiple qubit parameters to combat drift.
We deployed a reinforcement learning (RL) agent running on a GPU to optimize the preparation of a Greenberger–Horne–Zeilinger (GHZ) state, a highly entangled multi-qubit state. The agent proposed sets of control parameters, including two gate amplitudes and four single-qubit phases, used to generate the entangling pulse sequence.
These parameters were sent to the OPX1000, which updated the pulses accordingly on-the-fly, without any recompilation. After executing the sequence, the resulting state was measured, and a fidelity estimate was computed directly on the OPX1000. That fidelity metric was then sent back to the RL agent on the GPU, which updated its policy and returned a new set of six parameters. The loop is executed continuously, enabling the system to adapt and improve with each iteration, as Figure 3 shows in detail.
Figure 3 – A reinforcement learning agent dynamically optimizes two gate amplitudes (A₁, A₂) and four single-qubit phases (φ₁–φ₄) in real time to maximize the estimated fidelity of a GHZ state.
The results were striking: the RL-optimized control not only reached a higher peak fidelity than what we achieved with statically calibrated parameters, but it also maintained this high fidelity over extended periods, actively compensating for system drift throughout the experiment.
Summary: The Building Blocks of Scalable Fault Tolerance
Achieving fault tolerance is not just about adding more qubits. It’s about building smarter, faster, and more integrated architectures that can:
- Consistently maintain physical error rates below the ~10⁻³ threshold required by leading QEC codes
- Run complex decoding algorithms and provide feedback within tight QEC timing constraints
- Continuously optimize system parameters during program execution, not just in offline calibration phases
This is the future we’re building: a truly hybrid, real-time quantum-classical system that is fast enough for error correction, flexible enough for dynamic calibration, and scalable enough for what’s next.
Want to hear more about Nvidia DGX-Quantum and discuss what it could enable in your lab? Send us a message and let’s chat!
NVIDIA DGX Quantum Demo
Arthur Strauss (Quantum Machines) presents the NVIDIA DGX Quantum system – a collaborative architecture integrating NVIDIA Grace Hopper processors with OPX1000 quantum control hardware. This demonstration shows real-time quantum gate calibration achieving sub-4 microsecond round-trip latency.
[1] Kurman, Yaniv, et al. “Controller-decoder system requirements derived by implementing Shor’s algorithm with surface code.” arXiv:2412.00289 (2024)