View Synthesis: From NeRF to 3D Gaussian Splatting
Overview
Photorealistic 3D scene reconstruction from 2D images represents one of the most challenging problems in computer vision. While traditional methods rely on explicit geometry (meshes, point clouds), modern neural approaches have revolutionized the field by learning implicit scene representations. This project explores two cutting-edge techniques: Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), implementing both from scratch with a focus on practical deployment and real-time performance.
The goal was to build a complete pipeline—from multi-view image capture to interactive 3D visualization—that demonstrates the evolution of view synthesis technology and provides hands-on experience with state-of-the-art neural rendering techniques.
The Challenge: Novel View Synthesis
Given a sparse set of images from different viewpoints, how can we synthesize photorealistic images from arbitrary camera positions? This problem, known as novel view synthesis, requires understanding:
- Scene Geometry: Where are objects located in 3D space?
- Appearance Modeling: How do materials interact with light?
- View-Dependent Effects: How do reflections and specularity change with viewpoint?
- Computational Efficiency: Can we render in real-time?
Traditional approaches like Structure-from-Motion (SfM) + Multi-View Stereo (MVS) reconstruct explicit geometry but struggle with:
- Fine detail capture (thin structures, hair, foliage)
- View-dependent appearance (specularities, reflections)
- Completeness (holes in reconstruction)
Neural methods address these limitations by learning continuous volumetric representations.
Approach 1: Neural Radiance Fields (NeRF)
Core Concept
NeRF represents scenes as continuous 5D functions that map:
- Input: 3D position $(x, y, z)$ and viewing direction $(\theta, \phi)$
- Output: Volume density $\sigma$ and view-dependent RGB color $(r, g, b)$
Where $\Theta$ represents the weights of a Multi-Layer Perceptron (MLP) neural network.
Volume Rendering with Ray Marching
For each pixel in the target image, we:
- Cast a ray from the camera through the pixel
- Sample points along the ray at intervals $t_i$
- Query the MLP at each sample point
- Accumulate color using volume rendering:
Where:
- $T_i = \exp\left(-\sum_{j=1}^{i-1} \sigma_j \delta_j\right)$ is the accumulated transmittance
- $\alpha_i = 1 - \exp(-\sigma_i \delta_i)$ is the opacity contribution
- $\delta_i = t_{i+1} - t_i$ is the distance between samples
Positional Encoding
Raw 3D coordinates lack high-frequency detail. We apply positional encoding to map inputs to a higher-dimensional space:
\[\gamma(p) = \left[\sin(2^0 \pi p), \cos(2^0 \pi p), \ldots, \sin(2^{L-1} \pi p), \cos(2^{L-1} \pi p)\right]\]This enables the network to learn fine geometric details and sharp textures.
Hierarchical Sampling Strategy
To improve efficiency, NeRF uses two networks:
- Coarse network: Samples uniformly along the ray
- Fine network: Focuses sampling on regions with high density (where objects exist)
This reduces wasted computation in empty space by 2-3x.
Figure 1: NeRF synthesizing novel views from a learned volumetric representation. Notice the smooth camera transitions and view-dependent lighting effects.
Approach 2: 3D Gaussian Splatting
Motivation: The Need for Speed
While NeRF produces stunning results, it’s prohibitively slow:
- Training: 24-48 hours on high-end GPUs
- Rendering: 10-30 seconds per frame
For interactive applications (VR, gaming, robotics), we need real-time rendering (30+ FPS).
Core Innovation: Explicit 3D Gaussians
Instead of an implicit neural field, 3DGS represents scenes as a collection of 3D Gaussian primitives. Each Gaussian is defined by:
Position: Center location $\mu \in \mathbb{R}^3$
Covariance: 3D shape defined by covariance matrix $\Sigma$:
\[\Sigma = R S S^T R^T\]Where $R$ is rotation (quaternion) and $S$ is a diagonal scaling matrix.
Opacity: Transparency value $\alpha \in [0, 1]$
Spherical Harmonics: View-dependent color encoded as SH coefficients up to degree 3, capturing view-dependent effects efficiently.
Differentiable Rasterization
The rendering process is fully differentiable:
- Project Gaussians to 2D screen space using camera parameters
- Sort by depth for correct alpha blending (back-to-front)
- Rasterize using tile-based rendering:
For each pixel, blend overlapping Gaussians:
\[C = \sum_{i \in \mathcal{N}} c_i \alpha_i \prod_{j=1}^{i-1} (1 - \alpha_j)\]Where $\mathcal{N}$ are Gaussians affecting the pixel, sorted by depth.
Key Advantage: This entire pipeline runs on GPU in CUDA, enabling real-time rendering at 30-100 FPS.
Adaptive Density Control
During training, Gaussians undergo densification and pruning:
- Clone: Split Gaussians in under-reconstructed regions (high gradient)
- Split: Divide large Gaussians covering complex geometry
- Prune: Remove Gaussians with low opacity (< 0.005)
This dynamic optimization balances quality and efficiency.
Implementation Pipeline
1. Data Preprocessing with COLMAP
Both methods require multi-view images with known camera poses. We use COLMAP, an SfM pipeline that:
- Extracts SIFT features from images
- Matches features across views
- Estimates camera intrinsics and extrinsics
- Generates sparse point cloud initialization
1
2
3
4
# Run full COLMAP pipeline
colmap feature_extractor --database_path database.db --image_path images/
colmap exhaustive_matcher --database_path database.db
colmap mapper --database_path database.db --image_path images/ --output_path sparse/
2. Training Configuration
NeRF Training Hyperparameters:
- MLP Architecture: 8 layers, 256 units per layer
- Positional Encoding: L=10 for position, L=4 for direction
- Learning Rate: 5e-4 with exponential decay
- Batch Size: 1024 rays
- Training Time: ~24 hours on RTX 2060
3DGS Training Hyperparameters:
- Initial Gaussians: ~100K from COLMAP sparse reconstruction
- Optimization: Adam with custom learning rates per parameter
- Position: 1.6e-4
- Opacity: 0.05
- Scaling: 5e-3
- Rotation: 1e-3
- Training Time: ~30 minutes (7K iterations) on RTX 2060
3. Loss Functions
Both methods optimize using photometric reconstruction loss:
\[\mathcal{L} = \lambda_1 \mathcal{L}_1 + \lambda_2 \mathcal{L}_{SSIM}\]Where:
- $\mathcal{L}1 = |C{pred} - C_{gt}|_1$ measures pixel-wise difference
- $\mathcal{L}_{SSIM}$ captures structural similarity
For 3DGS, we use: $\lambda_1 = 0.8, \lambda_2 = 0.2$
Experimental Results
Quantitative Comparison
| Metric | NeRF | 3D Gaussian Splatting |
|---|---|---|
| PSNR | 28.5 dB | 30.2 dB |
| SSIM | 0.89 | 0.94 |
| LPIPS | 0.12 | 0.08 |
| Training Time | 24 hours | 30 minutes |
| Rendering Speed | 0.1 FPS | 60 FPS |
| Memory (Training) | 8 GB | 12 GB |
| Final Model Size | 5 MB | 500 MB |
Key Observations:
- 3DGS achieves 48x faster training and 600x faster rendering
- Quality: 3DGS produces sharper results with better high-frequency detail
- Trade-off: Larger model size for 3DGS due to explicit Gaussian storage
Visual Quality Analysis
Strengths of NeRF:
- Compact representation (small model size)
- Smooth interpolation between views
- No artifacts from discrete primitives
Strengths of 3DGS:
- Crisp edges and fine details (hair, text, mesh patterns)
- Accurate view-dependent effects (specularities, reflections)
- Real-time performance enables interactive applications
Figure 2: 3D Gaussian Splatting rendering quality comparison. The explicit Gaussian representation captures fine details with significantly faster rendering times compared to NeRF.
Real-Time Visualization
Both implementations integrate with SIBR Viewers (System for Image-Based Rendering), providing:
- Interactive camera navigation (WASD + mouse)
- Real-time rendering at 30-60 FPS (for 3DGS)
- Debug visualization modes (depth maps, normals, point clouds)
Controls:
W/A/S/D: Camera movementMouse: Look aroundQ/E: Vertical movementF: Toggle full-screenTab: Show/hide UI
Note: A real-time rendering demo video (
gs3d_real_time_rendering.mp4) is available in the project repository, showcasing the interactive performance of the 3D Gaussian Splatting implementation at 60+ FPS on an RTX 2060.
Deployment Pipeline
Docker Containerization & Setup
To simplify the complex dependency stack (CUDA, PyTorch, COLMAP, custom CUDA kernels), I created complete Docker-based workflows for both implementations. The repository is available at: github.com/rohitDey23/view_synthesis
3D Gaussian Splatting Setup (Docker)
The 3DGS implementation uses a fully containerized environment with all dependencies pre-configured:
1. Clone and Build:
1
2
3
4
5
6
7
# Clone the repository and checkout the gaussian_splatting branch
git clone https://github.com/rohitDey23/view_synthesis.git
cd view_synthesis
git checkout gaussian_splatting
# Build Docker image (~10 minutes)
docker build -t view_synthesis .
2. Run Container:
1
2
3
4
5
6
7
8
9
10
11
# Navigate to model directory for bind mounting
cd model
# Launch container with GPU support
docker run --rm -it --name view_synth \
--gpus all \
-e DISPLAY=host.docker.internal:0 \
-e LIBGL_ALWAYS_INDIRECT=0 \
--mount type=bind,src=.,dst=/home/user_dev/code_ws/model/ \
--runtime=nvidia \
view_synthesis bash
3. Train 3DGS:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Inside container: activate conda environment
conda activate view_synthesis
# Download dataset (COLMAP format required)
cd data
wget https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/datasets/input/tandt_db.zip
unzip tandt_db.zip && rm tandt_db.zip
# Install custom CUDA submodules
pip3 install src/submodules/diff-gaussian-rasterization
pip3 install src/submodules/simple-knn
# Train the model
cd /home/user_dev/code_ws/
python3 src/train.py -s ./data/train_data -m ./model/
4. Render Results:
1
2
3
4
5
# Render test views
python3 src/render.py -s ./data -m ./model/
# Create GIF from renders (optional)
python3 src/create_gif.py <ground/truth/path/> <renders/path/> <output/path/filename.gif> --duration 4
NeRF Setup (Using UV)
The NeRF implementation uses UV (modern Python package manager) for dependency management, providing faster and more reliable installations:
1. Clone and Setup:
1
2
3
4
5
6
7
8
9
10
# Clone the repository and checkout the nerf branch
git clone https://github.com/rohitDey23/view_synthesis.git
cd view_synthesis
git checkout nerf
# Initialize UV project (UV handles all dependencies)
uv init && uv sync
# Activate the virtual environment
source .venv/bin/activate
2. Train NeRF:
1
2
3
4
5
# Training with default configuration
python train.py --config configs/lego.txt
# Training time: ~24 hours on RTX 2060
# Output: Saved in logs/ directory
Key Differences:
- 3DGS: Docker-based, ~30min training, real-time rendering
- NeRF: UV-based, ~24hr training, slower rendering but compact model
Both implementations support COLMAP for camera pose estimation and include SIBR viewers for interactive visualization.
Technical Challenges & Solutions
Challenge 1: CUDA Memory Management
Problem: Training crashes with OOM errors on consumer GPUs (6-12 GB VRAM)
Solution:
- Gradient checkpointing for NeRF (reduces memory 50%)
- Dynamic batch sizing based on available memory
- Mixed precision training (FP16) with gradient scaling
- Offload optimizer states to CPU when needed
Challenge 2: Gaussian Splatting Instabilities
Problem: Gaussians grow unbounded or collapse during training
Solution:
- Adaptive learning rate scaling based on Gaussian size
- Regularization: Limit maximum scale to 10% of scene extent
- Opacity reset every 3000 iterations (forces re-evaluation)
- Gradient clipping (norm < 2.0)
Challenge 3: COLMAP Failure on Challenging Scenes
Problem: SfM fails on low-texture, repetitive, or reflective surfaces
Solution:
- Increase SIFT feature detection threshold
- Use sequential matching instead of exhaustive (for ordered captures)
- Mask out problematic regions (mirrors, windows) manually
- Provide approximate camera poses via ARKit/ARCore when available
Future Directions
Several exciting avenues remain unexplored:
Technical Extensions
- Dynamic Scenes: Extend to video with temporal consistency (4D Gaussian Splatting)
- Large-Scale Scenes: City-scale reconstruction using Block-NeRF concepts
- Faster NeRF Variants: Integrate Instant-NGP for competitive speed
- Semantic Understanding: Add semantic segmentation for object-level editing
Application Areas
- VR/AR: Real-time rendering for immersive experiences
- Robotics Navigation: Photorealistic simulation environments
- Cultural Heritage: Digital preservation of historical sites
- E-commerce: Interactive 3D product visualization
Lessons Learned
NeRF Insights
- Hierarchical sampling is critical: Provides 3x speedup with no quality loss
- Positional encoding frequency matters: L=10 for geometry, L=4 for appearance
- Convergence is slow but steady: Always train for 200K+ iterations
3D Gaussian Splatting Insights
- Initialization quality is crucial: Poor COLMAP reconstruction → poor final result
- Densification timing: Start at iteration 500, stop at 15K to avoid overfitting
- Opacity reset prevents mode collapse: Essential for stable training
- View-dependent effects need high SH degree: Degree 3 captures most specularities
General Best Practices
- Always validate COLMAP results before starting expensive training
- Use learning rate warmup to stabilize early training
- Log intermediate renders every 1K iterations for debugging
- Checkpoint frequently: Training failures are common with custom CUDA ops
Conclusion
This project demonstrates the rapid evolution of neural view synthesis, from the groundbreaking but slow NeRF to the real-time capable 3D Gaussian Splatting. While NeRF remains valuable for its compact representation and theoretical elegance, 3DGS has emerged as the practical choice for applications demanding interactivity.
The field is moving incredibly fast—techniques presented at SIGGRAPH 2023 are already being surpassed by newer methods in 2024. Yet the fundamental principles—differentiable rendering, volumetric scene representations, and multi-view consistency—remain constant and will continue to drive innovation in 3D computer vision.
For researchers and practitioners entering this space, I hope this implementation serves as both a learning resource and a practical starting point for building next-generation view synthesis systems.
Resources & Links
- GitHub Repository: rohitDey23/view_synthesis
- NeRF Branch: View NeRF Implementation
- 3DGS Branch: View Gaussian Splatting Implementation
Original Papers:
- NeRF: Mildenhall et al., ECCV 2020
- 3D Gaussian Splatting: Kerbl et al., SIGGRAPH 2023
Acknowledgments: This work builds upon the excellent open-source implementations from GRAPHDECO Research Group at Inria and the broader neural rendering community. Special thanks to the authors for making their code publicly available.