Summary
Background
Phase unwrapping arises in interferometric imaging applications such as InSAR, digital holography, and optical metrology, where the measured phase is wrapped in 2π discontinuities. Goldstein's branch-cut algorithm is a classical path-following approach that first detects residues, then places branch cuts connecting opposite-polarity residues or residues to the image boundary, and finally integrates/unwraps phase along paths that avoid those cuts.
The first stage is a good fit for GPU: each residue test depends only on a small neighborhood, enabling tiled shared-memory kernels with strong locality. Stages 2 and 3 are harder. Branch-cut placement is traditionally greedy and serial. Integration is effectively a graph traversal with queue dependence, irregular control flow, and poor SIMD behavior. Recent work suggests tile/block-based GPU processing and BFS-style frontier traversal as the two main directions for exposing parallelism in these irregular stages.
The Challenge
This project is challenging because Goldstein's algorithm mixes a regular image-processing stage with two highly irregular, path-dependent stages. CUDA GPUs are most effective for workloads with regular memory access, abundant thread-level parallelism, and limited synchronization — and Goldstein's stages 2 and 3 violate all of these assumptions.
Resources
Goals & Deliverables
We aim to match or exceed the ~61× speedup reported in the reference GPU phase unwrapping paper (CPU: AMD Ryzen 5900H vs GPU: NVIDIA RTX 3070-laptop).
Platform Choice
We chose NVIDIA GPU and CUDA because at least part of Goldstein is clearly data-parallel, and the project's core interest is in how to map the irregular stages onto SIMD hardware.
The residue detection stage is a pixel-wise parallel image kernel with strong locality — shared memory tiling can directly accelerate it. More importantly, branch-cut placement and integration have poor native SIMD compatibility, making them ideal case studies for learning how to reformulate branch-heavy, sparse algorithms into GPU-friendly forms.
This project is not merely about speeding up a dense numerical kernel. It is about understanding how branch divergence, sparse matching, atomics, and frontier expansion behave on a real parallel system — all of which are central themes of 15-418/618.
Schedule
This schedule will be updated each week based on actual progress.