[ECCV 24] Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering

1University of Turin 2LTCI, Télécom Paris, Institut Polytechnique de Paris

Abstract

Since the introduction of NeRFs, considerable attention has been focused on improving their training and inference times, leading to the development of Fast-NeRFs models. Despite demonstrating impressive rendering speed and quality, the rapid convergence of such models poses challenges for further improving reconstruction quality. Common strategies to improve rendering quality involves augmenting model parameters or increasing the number of sampled points. However, these computationally intensive approaches encounter limitations in achieving significant quality enhancements. This study introduces a model-agnostic framework inspired by Sparsely-Gated Mixture of Experts to enhance rendering quality without escalating computational complexity. Our approach enables specialization in rendering different scene components by employing a mixture of experts with varying resolutions. We present a novel gate formulation designed to maximize expert capabilities and propose a resolution-based routing technique to effectively induce sparsity and decompose scenes. Our work significantly improves reconstruction quality while maintaining competitive performance

Naive Methods are Limited and Inefficient

Interpolate start reference image.

Increase Resolution

Interpolate start reference image.

Increase MLP size

Interpolation end reference image.

Sample more points along the ray


Typical naive approaches to enhance the reconstruction quality of Fast-NeRFs models include:

  • Increasing the parameters and resolution of the used data structures (e.g., voxel grid, hash grid, etc.).
  • Increasing the number of parameters in the neural network or the order of SHs
  • Increasing the number of sampled points per ray.

  • However the (marginal) increase in reconstruction quality results in a significant increase in computational costs.


    Can we increase quality of rendering without escalating computational costs?


    Contributions

    1. A model-agnostic framework using Sparse Mixture of Experts models at different resolutions, enhancing rendering quality while keeping training and inference times competitive.
    2. A novel Fast-NeRF inspired-gate formulation, which treats each model as a black-box, maximizing MoE capabilities.
    3. A new resolution-based routing technique that encourages token assignment to low-resolution models, increasing sparsity in high-resolution models and decomposing scenes by frequency.

    Method


    Boost Your NeRF framework. We first train M models at different resolutions. From them, we distill a small density field, which is used to compute density values for sampled points along a ray and to discard points in areas with negligible density. For each of these points, a gating function computes a probability score, indicating the likelihood of routing the point to each expert. We route each point only to the Top-K (with K equals to 1 or 2) experts, which compute radiance and density. We then aggregate and weight these values by the corresponding gating probability to obtain the final color and density of the point. Please note how our framework is model-agnostic, treating models as black boxes (we deliver them input, i.e., a point in space, and obtain density and color). The volume rendering equation yields pixel colors, and joint optimization refines our resolution-weighted auxiliary loss, enabling high-quality and efficient rendering.

    Results

    Our methods can overcome these limitations, while being model-agnostic by design. Although Top-1 already achieves state-of-the-art performance, Top-2 strikes an optimal balance between the efficiency of the Top-1 approach and the quality of the Ensemble method, where all experts are used for every input.


    Scene Decomposition

    Our method allows for a frequency-based scene decomposition, where high-res models render complex parts of the scene; whereas low-res models render low-frequency parts. The output of the gate module for each model is visualized on the left; on the right, the rendered part of the scene for each model is shown. Outputs are arranged in increasing order of resolution.


    High-Quality Rendering

    Our method ensures superior reconstruction quality compared to the baselines, effectively reproducing sharper details while reducing noise on texture-less spots.

    Ablations

    Gate Resolution and Gate Type

    Our novel gate design is lightweight (comparable to a linear gate) but achieves way superior quality. Furthermore, it achieves good results with a very-low resolution grid.

    Why Top-2? Top-2 strikes an good trade-off between quality and computational costs.