With AI, VR is even more promising. I have been working on a Gaussian splat renderer for the Quest 3, and by having Claude and ChatGPT read state-of-the-art papers, I have been able to build a training and rendering pipeline that is getting >50 fps for large indoor scenes on the Quest 3. I started with an (AI-driven) port of a desktop renderer, which got less than 1 fps, but I've integrated both training and rendering improvements from research and added a bunch of quality and performance improvements and now it's actually usable. Applying research papers to a novel product is something that used to take weeks or months of a person's time and can now be measured in minutes and hours (and tokens).
You might be interested in a new experimental 3D scene learning and rendering approach called Radiant foam [1], which is supposed to be better suited for GPUs that don't have hardware ray tracing acceleration.
Cool! I'll definitely check it out. The great thing about LLMs is I can probably have a trainer and renderer using this technology up and running for my platform in a day or two, OR I can just pick and choose parts that would work well for my implementation and merge them in.
Sorry if this is a basic question, but what's you workflow for feeding the papers into the LLM and getting the implementation done? The coding agents that I've used are not able to read PDFs, so I've been wondering how to do it.
this is actually a great question - I just extract the text with PyPDF, but did a brief search on the functionality I'd like to have (convert math equations to LaTeX, extract images, reformat in markdown, extract data from charts) and it looks like there are a couple of promising Python libs like Docling and Marker.. I should really improve this part of my workflow.
after looking into it for a little while, Docling and Marker work pretty well but are very slow. I haven't found anything else that extracts math suitably. It takes 10+ minutes per pdf, so I'm going to run it on a batch of these papers overnight and create my own little gaussian splatting RAG database. It's really too bad PDF is so terrible.
My understanding is that those models create gaussian splats from a text prompt, kinda like a 3d version of nano banana. I'm not doing that (yet), what I'm doing is creating splats from a set of photos - aka "splat training" and then rendering the splat as a static (working on dynamism) on the Quest headset. This is pretty well-worn territory with a lot of good implementations, but I have my own implementation of a trainer in C++/CUDA (originally based in SpeedySplat, which was written in Python, but now completely rewritten and not much of SpeedySplat is left) and renderer in C++/OpenXR for the Quest (originally based on a LLM-made port of 3DGS.cpp to OpenXR, but 100% rewritten now), and I can easily integrate techniques from research.