Gaussian splatting explained

Gaussian splatting is a volume rendering technique that deals with the direct rendering of volume data without converting the data into surface or line primitives.[1] The technique was originally introduced as splatting by Lee Westover in the early 1990s.[2]

With advancements in computer graphics, newer methods such as 3D Gaussian splatting and 3D Temporal Gaussian splatting have been developed to offer real-time radiance field rendering and dynamic scene rendering respectively.

3D Gaussian splatting

3D Gaussian splatting is a technique used in the field of real-time radiance field rendering.[3] It enables the creation of high-quality real-time novel-view scenes by combining multiple photos or videos, addressing a significant challenge in the field.

The method represents scenes with 3D Gaussians that retain properties of continuous volumetric radiance fields, integrating sparse points produced during camera calibration. It introduces an Anisotropic representation using 3D Gaussians to model radiance fields, along with an interleaved optimization and density control of the Gaussians. A fast visibility-aware rendering algorithm supporting anisotropic splatting is also proposed, catered to GPU usage.

Method

The method involves several key steps:

The method uses differentiable 3D Gaussian splatting, which is unstructured and explicit, allowing rapid rendering and projection to 2D splats. The covariance of the Gaussians can be thought of as configurations of an ellipsoid, which can be mathematically decomposed into a scaling matrix and a rotation matrix. The gradients for all parameters are derived explicitly to overcome any overhead due to autodiff.

The optimization creates a dense set of 3D Gaussians that represent the scene as accurately as possible. Each step of rendering is followed by a comparison to the training views available in the dataset.

Results and evaluation

The authors tested their algorithm on 13 real scenes from previously published datasets and the synthetic Blender dataset. They compared their method against state-of-the-art techniques like Mip-NeRF360,[5] InstantNGP,[6] and Plenoxels. Quantitative evaluation metrics used were PSNR, L-PIPS, and SSIM.

Their fully converged model (30,000 iterations) achieves quality on par with or slightly better than Mip-NeRF360, but with significantly reduced training time (35–45 minutes vs. 48 hours) and faster rendering (real-time vs. 10 seconds per frame). At 7,000 iterations (5–10 minutes of training), their method achieves comparable quality to InstantNGP and Plenoxels.

For synthetic bounded scenes (Blender dataset), they achieved state-of-the-art results even with random initialization, starting from 100,000 uniformly random Gaussians.

Limitations

Some limitations of the method include:

The authors note that some of these limitations could potentially be addressed through future improvements like better culling approaches, antialiasing, regularization, and compression techniques.

3D Temporal Gaussian splatting

Extending 3D Gaussian splatting to dynamic scenes, 3D Temporal Gaussian splatting incorporates a time component, allowing for real-time rendering of dynamic scenes with high resolutions.[7] It represents and renders dynamic scenes by modeling complex motions while maintaining efficiency. The method uses a HexPlane to connect adjacent Gaussians, providing an accurate representation of position and shape deformations. By utilizing only a single set of canonical 3D Gaussians and predictive analytics, it models how they move over different timestamps.[8]

It is sometimes referred to as "4D Gaussian splatting"; however, this naming convention implies the use of 4D Gaussian primitives (parameterized by a 4×4 mean and a 4×4 covariance matrix). Most work in this area still employs 3D Gaussian primitives, applying temporal constraints as an extra parameter of optimization.

Achievements of this technique include real-time rendering on dynamic scenes with high resolutions, while maintaining quality. It showcases potential applications for future developments in film and other media, although there are current limitations regarding the length of motion captured.[8]

Applications

3D Gaussian splatting has been adapted and extended across various computer vision and graphics applications, from dynamic scene rendering to autonomous driving simulations and 4D content creation:

See also

Notes and References

  1. Web site: Westover. Lee Alan. SPLATTING: A Parallel, Feed-Forward Volume Rendering Algorithm. October 18, 2023. July 1991.
  2. Web site: Huang. Jian. Splatting. 5 August 2011. PPT. Spring 2002.
  3. Bernhard Kerbl . Georgios Kopanas . Thomas Leimkühler . George Drettakis . 8 Aug 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. 2308.04079. cs.GR.
  4. Book: Fridovich-Keil . Sara . Yu . Alex . Tancik . Matthew . Chen . Qinhong . Recht . Benjamin . Kanazawa . Angjoo . Plenoxels: Radiance Fields without Neural Networks . June 2022 . 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . http://dx.doi.org/10.1109/cvpr52688.2022.00542 . 5491–5500 . IEEE . 10.1109/cvpr52688.2022.00542. 2112.05131 . 978-1-6654-6946-3 .
  5. Book: Barron . Jonathan T. . Mildenhall . Ben . Verbin . Dor . Srinivasan . Pratul P. . Hedman . Peter . Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields . June 2022 . 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . http://dx.doi.org/10.1109/cvpr52688.2022.00539 . 5460–5469 . IEEE . 10.1109/cvpr52688.2022.00539. 2111.12077 . 978-1-6654-6946-3 .
  6. Müller . Thomas . Evans . Alex . Schied . Christoph . Keller . Alexander . July 2022 . Instant neural graphics primitives with a multiresolution hash encoding . ACM Transactions on Graphics . 41 . 4 . 1–15 . 10.1145/3528223.3530127 . 0730-0301. 2201.05989 .
  7. Guanjun Wu . Taoran Yi . Jiemin Fang . Lingxi Xie . Xiaopeng Zhang . Wei Wei . Wenyu Liu . Qi Tian . Xinggang Wang . 12 Oct 2023. 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering. 2310.08528. cs.CV.
  8. Web site: Actors' worst fears come true? New 3D Temporal Gaussian Splatting method captures human motion. Franzen. Carl. October 18, 2023. venturebeat.com. VentureBeat.
  9. Book: Chen . Zilong . Wang . Feng . Wang . Yikai . Liu . Huaping . Text-to-3D using Gaussian Splatting . 2024-06-16 . 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . http://dx.doi.org/10.1109/cvpr52733.2024.02022 . IEEE . abs/2211.0 1324 . 21401–21412 . 10.1109/cvpr52733.2024.02022. 2309.16585 . 979-8-3503-5300-6 .
  10. Chen . Li . Wu . Penghao . Chitta . Kashyap . Jaeger . Bernhard . Geiger . Andreas . Li . Hongyang . 2024 . End-to-end Autonomous Driving: Challenges and Frontiers . IEEE Transactions on Pattern Analysis and Machine Intelligence . PP . 1–20 . 10.1109/tpami.2024.3435937 . 39078757 . 0162-8828. 2306.16927 .
  11. Book: Guédon . Antoine . Lepetit . Vincent . SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering . 2024-06-16 . 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . http://dx.doi.org/10.1109/cvpr52733.2024.00512 . IEEE . 5354–5363 . 10.1109/cvpr52733.2024.00512. 2311.12775 . 979-8-3503-5300-6 .
  12. Book: Keetha . Nikhil . Karhade . Jay . Jatavallabhula . Krishna Murthy . Yang . Gengshan . Scherer . Sebastian . Ramanan . Deva . Luiten . Jonathon . SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM . 2024-06-16 . 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . http://dx.doi.org/10.1109/cvpr52733.2024.02018 . IEEE . 21357–21366 . 10.1109/cvpr52733.2024.02018. 979-8-3503-5300-6 .
  13. Book: Ling . Huan . Kim . Seung Wook . Torralba . Antonio . Fidler . Sanja . Kreis . Karsten . Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models . 2024-06-16 . 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . http://dx.doi.org/10.1109/cvpr52733.2024.00819 . IEEE . 8576–8588 . 10.1109/cvpr52733.2024.00819. 2312.13763 . 979-8-3503-5300-6 .