View Synthesis: From NeRF to 3D Gaussian Splatting

Posted Jul 15, 2024

3D Gaussian Splatting (right) from series of images (left)

By Rohit Dey

10 min read

View Synthesis: From NeRF to 3D Gaussian Splatting

Overview

Photorealistic 3D scene reconstruction from 2D images represents one of the most challenging problems in computer vision. While traditional methods rely on explicit geometry (meshes, point clouds), modern neural approaches have revolutionized the field by learning implicit scene representations. This project explores two cutting-edge techniques: Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), implementing both from scratch with a focus on practical deployment and real-time performance.

The goal was to build a complete pipeline—from multi-view image capture to interactive 3D visualization—that demonstrates the evolution of view synthesis technology and provides hands-on experience with state-of-the-art neural rendering techniques.

The Challenge: Novel View Synthesis

Given a sparse set of images from different viewpoints, how can we synthesize photorealistic images from arbitrary camera positions? This problem, known as novel view synthesis, requires understanding:

Scene Geometry: Where are objects located in 3D space?
Appearance Modeling: How do materials interact with light?
View-Dependent Effects: How do reflections and specularity change with viewpoint?
Computational Efficiency: Can we render in real-time?

Traditional approaches like Structure-from-Motion (SfM) + Multi-View Stereo (MVS) reconstruct explicit geometry but struggle with:

Fine detail capture (thin structures, hair, foliage)
View-dependent appearance (specularities, reflections)
Completeness (holes in reconstruction)

Neural methods address these limitations by learning continuous volumetric representations.

Approach 1: Neural Radiance Fields (NeRF)

Core Concept

NeRF represents scenes as continuous 5D functions that map:

Input: 3D position $(x, y, z)$ and viewing direction $(\theta, \phi)$
Output: Volume density $\sigma$ and view-dependent RGB color $(r, g, b)$

\[F_\Theta : (\mathbf{x}, \mathbf{d}) \rightarrow (\mathbf{c}, \sigma)\]

Where $\Theta$ represents the weights of a Multi-Layer Perceptron (MLP) neural network.

Volume Rendering with Ray Marching

For each pixel in the target image, we:

Cast a ray from the camera through the pixel
Sample points along the ray at intervals $t_i$
Query the MLP at each sample point
Accumulate color using volume rendering:

\[C(\mathbf{r}) = \sum_{i=1}^{N} T_i \cdot \alpha_i \cdot \mathbf{c}_i\]

Where:

$T_i = \exp\left(-\sum_{j=1}^{i-1} \sigma_j \delta_j\right)$ is the accumulated transmittance
$\alpha_i = 1 - \exp(-\sigma_i \delta_i)$ is the opacity contribution
$\delta_i = t_{i+1} - t_i$ is the distance between samples

Positional Encoding

Raw 3D coordinates lack high-frequency detail. We apply positional encoding to map inputs to a higher-dimensional space:

\[\gamma(p) = \left[\sin(2^0 \pi p), \cos(2^0 \pi p), \ldots, \sin(2^{L-1} \pi p), \cos(2^{L-1} \pi p)\right]\]

This enables the network to learn fine geometric details and sharp textures.

Hierarchical Sampling Strategy

To improve efficiency, NeRF uses two networks:

Coarse network: Samples uniformly along the ray
Fine network: Focuses sampling on regions with high density (where objects exist)

This reduces wasted computation in empty space by 2-3x.

Figure 1: NeRF synthesizing novel views from a learned volumetric representation. Notice the smooth camera transitions and view-dependent lighting effects.

Approach 2: 3D Gaussian Splatting

Motivation: The Need for Speed

While NeRF produces stunning results, it’s prohibitively slow:

Training: 24-48 hours on high-end GPUs
Rendering: 10-30 seconds per frame

For interactive applications (VR, gaming, robotics), we need real-time rendering (30+ FPS).

Core Innovation: Explicit 3D Gaussians

Instead of an implicit neural field, 3DGS represents scenes as a collection of 3D Gaussian primitives. Each Gaussian is defined by:

Position: Center location $\mu \in \mathbb{R}^3$

Covariance: 3D shape defined by covariance matrix $\Sigma$:

\[\Sigma = R S S^T R^T\]

Where $R$ is rotation (quaternion) and $S$ is a diagonal scaling matrix.

Opacity: Transparency value $\alpha \in [0, 1]$

Spherical Harmonics: View-dependent color encoded as SH coefficients up to degree 3, capturing view-dependent effects efficiently.

Differentiable Rasterization

The rendering process is fully differentiable:

Project Gaussians to 2D screen space using camera parameters
Sort by depth for correct alpha blending (back-to-front)
Rasterize using tile-based rendering:

For each pixel, blend overlapping Gaussians:

\[C = \sum_{i \in \mathcal{N}} c_i \alpha_i \prod_{j=1}^{i-1} (1 - \alpha_j)\]

Where $\mathcal{N}$ are Gaussians affecting the pixel, sorted by depth.

Key Advantage: This entire pipeline runs on GPU in CUDA, enabling real-time rendering at 30-100 FPS.

Adaptive Density Control

During training, Gaussians undergo densification and pruning:

Clone: Split Gaussians in under-reconstructed regions (high gradient)
Split: Divide large Gaussians covering complex geometry
Prune: Remove Gaussians with low opacity (< 0.005)

This dynamic optimization balances quality and efficiency.

Implementation Pipeline

1. Data Preprocessing with COLMAP

Both methods require multi-view images with known camera poses. We use COLMAP, an SfM pipeline that:

Extracts SIFT features from images
Matches features across views
Estimates camera intrinsics and extrinsics
Generates sparse point cloud initialization

  
# Run full COLMAP pipeline
colmap feature_extractor --database_path database.db --image_path images/
colmap exhaustive_matcher --database_path database.db
colmap mapper --database_path database.db --image_path images/ --output_path sparse/

2. Training Configuration

NeRF Training Hyperparameters:

MLP Architecture: 8 layers, 256 units per layer
Positional Encoding: L=10 for position, L=4 for direction
Learning Rate: 5e-4 with exponential decay
Batch Size: 1024 rays
Training Time: ~24 hours on RTX 2060

3DGS Training Hyperparameters:

Initial Gaussians: ~100K from COLMAP sparse reconstruction
Optimization: Adam with custom learning rates per parameter
- Position: 1.6e-4
- Opacity: 0.05
- Scaling: 5e-3
- Rotation: 1e-3
Training Time: ~30 minutes (7K iterations) on RTX 2060

3. Loss Functions

Both methods optimize using photometric reconstruction loss:

\[\mathcal{L} = \lambda_1 \mathcal{L}_1 + \lambda_2 \mathcal{L}_{SSIM}\]

Where:

$\mathcal{L}1 = |C{pred} - C_{gt}|_1$ measures pixel-wise difference
$\mathcal{L}_{SSIM}$ captures structural similarity

For 3DGS, we use: $\lambda_1 = 0.8, \lambda_2 = 0.2$

Experimental Results

Quantitative Comparison

Metric	NeRF	3D Gaussian Splatting
PSNR	28.5 dB	30.2 dB
SSIM	0.89	0.94
LPIPS	0.12	0.08
Training Time	24 hours	30 minutes
Rendering Speed	0.1 FPS	60 FPS
Memory (Training)	8 GB	12 GB
Final Model Size	5 MB	500 MB

Key Observations:

3DGS achieves 48x faster training and 600x faster rendering
Quality: 3DGS produces sharper results with better high-frequency detail
Trade-off: Larger model size for 3DGS due to explicit Gaussian storage

Visual Quality Analysis

Strengths of NeRF:

Compact representation (small model size)
Smooth interpolation between views
No artifacts from discrete primitives

Strengths of 3DGS:

Crisp edges and fine details (hair, text, mesh patterns)
Accurate view-dependent effects (specularities, reflections)
Real-time performance enables interactive applications

Figure 2: 3D Gaussian Splatting rendering quality comparison. The explicit Gaussian representation captures fine details with significantly faster rendering times compared to NeRF.

Real-Time Visualization

Both implementations integrate with SIBR Viewers (System for Image-Based Rendering), providing:

Interactive camera navigation (WASD + mouse)
Real-time rendering at 30-60 FPS (for 3DGS)
Debug visualization modes (depth maps, normals, point clouds)

Controls:

W/A/S/D: Camera movement
Mouse: Look around
Q/E: Vertical movement
F: Toggle full-screen
Tab: Show/hide UI

Note: A real-time rendering demo video (gs3d_real_time_rendering.mp4) is available in the project repository, showcasing the interactive performance of the 3D Gaussian Splatting implementation at 60+ FPS on an RTX 2060.

Deployment Pipeline

Docker Containerization & Setup

To simplify the complex dependency stack (CUDA, PyTorch, COLMAP, custom CUDA kernels), I created complete Docker-based workflows for both implementations. The repository is available at: github.com/rohitDey23/view_synthesis

3D Gaussian Splatting Setup (Docker)

The 3DGS implementation uses a fully containerized environment with all dependencies pre-configured:

1. Clone and Build:

  
# Clone the repository and checkout the gaussian_splatting branch
git clone https://github.com/rohitDey23/view_synthesis.git
cd view_synthesis
git checkout gaussian_splatting

# Build Docker image (~10 minutes)
docker build -t view_synthesis .

2. Run Container:

  
# Navigate to model directory for bind mounting
cd model

# Launch container with GPU support
docker run --rm -it --name view_synth \
    --gpus all \
    -e DISPLAY=host.docker.internal:0 \
    -e LIBGL_ALWAYS_INDIRECT=0 \
    --mount type=bind,src=.,dst=/home/user_dev/code_ws/model/ \
    --runtime=nvidia \
    view_synthesis bash

3. Train 3DGS:

  
# Inside container: activate conda environment
conda activate view_synthesis

# Download dataset (COLMAP format required)
cd data
wget https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/datasets/input/tandt_db.zip
unzip tandt_db.zip && rm tandt_db.zip

# Install custom CUDA submodules
pip3 install src/submodules/diff-gaussian-rasterization
pip3 install src/submodules/simple-knn

# Train the model
cd /home/user_dev/code_ws/
python3 src/train.py -s ./data/train_data -m ./model/

4. Render Results:

  
# Render test views
python3 src/render.py -s ./data -m ./model/

# Create GIF from renders (optional)
python3 src/create_gif.py <ground/truth/path/> <renders/path/> <output/path/filename.gif> --duration 4

NeRF Setup (Using UV)

The NeRF implementation uses UV (modern Python package manager) for dependency management, providing faster and more reliable installations:

1. Clone and Setup:

  
# Clone the repository and checkout the nerf branch
git clone https://github.com/rohitDey23/view_synthesis.git
cd view_synthesis
git checkout nerf

# Initialize UV project (UV handles all dependencies)
uv init && uv sync

# Activate the virtual environment
source .venv/bin/activate

2. Train NeRF:

  
# Training with default configuration
python train.py --config configs/lego.txt

# Training time: ~24 hours on RTX 2060
# Output: Saved in logs/ directory

Key Differences:

3DGS: Docker-based, ~30min training, real-time rendering
NeRF: UV-based, ~24hr training, slower rendering but compact model

Both implementations support COLMAP for camera pose estimation and include SIBR viewers for interactive visualization.

Technical Challenges & Solutions

Challenge 1: CUDA Memory Management

Problem: Training crashes with OOM errors on consumer GPUs (6-12 GB VRAM)

Solution:

Gradient checkpointing for NeRF (reduces memory 50%)
Dynamic batch sizing based on available memory
Mixed precision training (FP16) with gradient scaling
Offload optimizer states to CPU when needed

Challenge 2: Gaussian Splatting Instabilities

Problem: Gaussians grow unbounded or collapse during training

Solution:

Adaptive learning rate scaling based on Gaussian size
Regularization: Limit maximum scale to 10% of scene extent
Opacity reset every 3000 iterations (forces re-evaluation)
Gradient clipping (norm < 2.0)

Challenge 3: COLMAP Failure on Challenging Scenes

Problem: SfM fails on low-texture, repetitive, or reflective surfaces

Solution:

Increase SIFT feature detection threshold
Use sequential matching instead of exhaustive (for ordered captures)
Mask out problematic regions (mirrors, windows) manually
Provide approximate camera poses via ARKit/ARCore when available

Future Directions

Several exciting avenues remain unexplored:

Technical Extensions

Dynamic Scenes: Extend to video with temporal consistency (4D Gaussian Splatting)
Large-Scale Scenes: City-scale reconstruction using Block-NeRF concepts
Faster NeRF Variants: Integrate Instant-NGP for competitive speed
Semantic Understanding: Add semantic segmentation for object-level editing

Application Areas

VR/AR: Real-time rendering for immersive experiences
Robotics Navigation: Photorealistic simulation environments
Cultural Heritage: Digital preservation of historical sites
E-commerce: Interactive 3D product visualization

Lessons Learned

NeRF Insights

Hierarchical sampling is critical: Provides 3x speedup with no quality loss
Positional encoding frequency matters: L=10 for geometry, L=4 for appearance
Convergence is slow but steady: Always train for 200K+ iterations

3D Gaussian Splatting Insights

Initialization quality is crucial: Poor COLMAP reconstruction → poor final result
Densification timing: Start at iteration 500, stop at 15K to avoid overfitting
Opacity reset prevents mode collapse: Essential for stable training
View-dependent effects need high SH degree: Degree 3 captures most specularities

General Best Practices

Always validate COLMAP results before starting expensive training
Use learning rate warmup to stabilize early training
Log intermediate renders every 1K iterations for debugging
Checkpoint frequently: Training failures are common with custom CUDA ops

Conclusion

This project demonstrates the rapid evolution of neural view synthesis, from the groundbreaking but slow NeRF to the real-time capable 3D Gaussian Splatting. While NeRF remains valuable for its compact representation and theoretical elegance, 3DGS has emerged as the practical choice for applications demanding interactivity.

The field is moving incredibly fast—techniques presented at SIGGRAPH 2023 are already being surpassed by newer methods in 2024. Yet the fundamental principles—differentiable rendering, volumetric scene representations, and multi-view consistency—remain constant and will continue to drive innovation in 3D computer vision.

For researchers and practitioners entering this space, I hope this implementation serves as both a learning resource and a practical starting point for building next-generation view synthesis systems.

Resources & Links

GitHub Repository: rohitDey23/view_synthesis
NeRF Branch: View NeRF Implementation
3DGS Branch: View Gaussian Splatting Implementation

Original Papers:

NeRF: Mildenhall et al., ECCV 2020
3D Gaussian Splatting: Kerbl et al., SIGGRAPH 2023

Acknowledgments: This work builds upon the excellent open-source implementations from GRAPHDECO Research Group at Inria and the broader neural rendering community. Special thanks to the authors for making their code publicly available.

Computer Vision, 3D Reconstruction

This post is licensed under CC BY 4.0 by the author.

Overview

The Challenge: Novel View Synthesis

Approach 1: Neural Radiance Fields (NeRF)

Core Concept

Volume Rendering with Ray Marching

Positional Encoding

Hierarchical Sampling Strategy

Approach 2: 3D Gaussian Splatting

Motivation: The Need for Speed

Core Innovation: Explicit 3D Gaussians

Differentiable Rasterization

Adaptive Density Control

Implementation Pipeline

1. Data Preprocessing with COLMAP

2. Training Configuration

3. Loss Functions

Experimental Results

Quantitative Comparison

Visual Quality Analysis

Real-Time Visualization

Deployment Pipeline

Docker Containerization & Setup

3D Gaussian Splatting Setup (Docker)

NeRF Setup (Using UV)

Technical Challenges & Solutions

Challenge 1: CUDA Memory Management

Challenge 2: Gaussian Splatting Instabilities

Challenge 3: COLMAP Failure on Challenging Scenes

Future Directions

Technical Extensions

Application Areas

Lessons Learned

NeRF Insights

3D Gaussian Splatting Insights

General Best Practices

Conclusion

Resources & Links

Trending Tags