Solver Guide
QKAN supports multiple solver backends for computing quantum variational activation functions. Choose the right solver based on your hardware, model size, and deployment target.
Solver Overview
Solver |
Device |
Use Case |
Install |
|---|---|---|---|
|
CPU / GPU |
Default. Pure PyTorch reference implementation. Best for debugging and small models. |
Included with |
|
GPU |
Triton fused kernels. Best speed/memory tradeoff for most GPU training. |
Included with |
|
GPU |
cuTile (NVIDIA Tile Language) fused kernels. BF16/FP8 mixed-precision with coalesced state layout. |
|
|
GPU / CPU |
cuQuantum tensor network contraction. Best for extremely large layers near OOM. |
GPU: |
|
CPU |
PennyLane quantum circuits. For demonstration, not optimized. |
|
|
IBM QPU |
IBM Quantum backends via Qiskit Runtime. For real quantum device inference. |
|
|
QPU / GPU |
NVIDIA CUDA-Q. Supports AWS Braket QPUs, GPU simulators, and CPU simulators. |
Ansatz Choice
``pz`` (default): Most reliable quality across tasks. Uses RZ-RY-RZ rotation layers with data re-uploading.
``rpz``: Reduced pz encoding with trainable preactivation. Fewer parameters per layer.
``real``: Real-valued ansatz (no complex arithmetic). Can be faster but may hurt accuracy on some tasks.
Recommendation: Start with pz. Only switch to real if you validate it on your task.
Mixed Precision
The flash and cutile solvers support BF16 and FP8 compute via the c_dtype parameter:
qkan = QKAN([10, 10], solver="flash", c_dtype=torch.bfloat16, device="cuda")
c_dtype: Compute dtype for quantum simulation kernels (state vectors, trig ops).p_dtype: Parameter storage dtype (theta, preacts). Keep atfloat32.
Performance (from #12):
BF16: 2.3–2.5x faster training, 45% less peak memory, identical convergence.
FP8 (
torch.float8_e4m3fn): Additional memory savings for state checkpoints via prescaled storage.All ansatzes (
pz,rpz,real) are supported.
Note
p_dtype=torch.float8_e4m3fn is not supported — PyTorch has no FP8 arithmetic kernels.
Use FP8 only for c_dtype.
Real Quantum Device Deployment
QKAN can run inference on real quantum hardware. The workflow:
Train locally with
exactorflashsolver on CPU/GPU.Transfer weights to a device-backed model using
initialize_from_another_model.Run inference on the QPU.
IBM Quantum (Qiskit)
from qiskit_ibm_runtime import QiskitRuntimeService
service = QiskitRuntimeService(channel="ibm_quantum_platform")
backend = service.least_busy(operational=True, simulator=False)
ibm_model = QKAN(
[1, 2, 1], solver="qiskit", fast_measure=False,
solver_kwargs={
"backend": backend,
"shots": 1000,
"optimization_level": 3,
"parallel_qubits": backend.num_qubits,
},
)
ibm_model.initialize_from_another_model(trained_model)
AWS Braket via CUDA-Q
qpu_model = QKAN(
[1, 2, 1], solver="cudaq", fast_measure=False,
solver_kwargs={
"target": "braket",
"machine": "arn:aws:braket:us-west-1::device/qpu/rigetti/Ankaa-3",
"shots": 1000,
"parallel_qubits": 20,
},
)
qpu_model.initialize_from_another_model(trained_model)
Key parameters:
fast_measure=False: Required for real devices. Uses|alpha|^2 - |beta|^2(Born rule) instead of the quantum-inspired|alpha| - |beta|shortcut.parallel_qubits: Packs N independent single-qubit circuits onto N qubits of one multi-qubit job, reducing QPU submissions by ~Nx.shots: Number of measurement samples per circuit. More shots = less statistical noise.
Error Mitigation
Real quantum hardware introduces gate errors, readout noise, and shot noise. QKAN provides
framework-level error mitigation via the mitigation key in solver_kwargs:
solver_kwargs={
"backend": backend,
"shots": 1000,
"parallel_qubits": 127,
"mitigation": {
"zne": {"scale_factors": [1, 3, 5]}, # Zero-Noise Extrapolation
"n_repeats": 3, # Multi-shot averaging
"clip_expvals": True, # Clamp <Z> to [-1, 1]
},
}
Zero-Noise Extrapolation (ZNE): Runs circuits at amplified noise levels (via gate folding)
and Richardson-extrapolates to the zero-noise limit. For Qiskit, you can alternatively use
resilience_level=2 for Qiskit-native ZNE.
Multi-shot averaging (n_repeats): Runs the entire batch N times and averages results.
Reduces variance from shot-to-shot fluctuations.
Expectation clipping (clip_expvals): Clamps values to [-1, 1] after mitigation.
Prevents catastrophic outliers from ZNE extrapolation.
Technique |
Circuit multiplier |
When to use |
|---|---|---|
|
1x |
Always (zero cost) |
|
3x |
When shot noise dominates |
ZNE |
3x |
When gate noise dominates |
ZNE + |
9x |
Maximum accuracy, inference only |
IBM-specific options (passed directly, not under mitigation):
solver_kwargs={
"resilience_level": 2, # Qiskit-native ZNE
"twirling": {"enable_gates": True, "enable_measure": True},
}