With AI, VR is even more promising. I have been working on a Gaussian splat rend...

guyomes · 2026-03-29T18:24:03 1774808643

You might be interested in a new experimental 3D scene learning and rendering approach called Radiant foam [1], which is supposed to be better suited for GPUs that don't have hardware ray tracing acceleration.

[1]: https://radfoam.github.io/

jerkstate · 2026-03-30T00:31:23 1774830683

Cool! I'll definitely check it out. The great thing about LLMs is I can probably have a trainer and renderer using this technology up and running for my platform in a day or two, OR I can just pick and choose parts that would work well for my implementation and merge them in.

paldepind2 · 2026-03-29T18:31:41 1774809101

Sorry if this is a basic question, but what's you workflow for feeding the papers into the LLM and getting the implementation done? The coding agents that I've used are not able to read PDFs, so I've been wondering how to do it.

jerkstate · 2026-03-30T00:22:20 1774830140

this is actually a great question - I just extract the text with PyPDF, but did a brief search on the functionality I'd like to have (convert math equations to LaTeX, extract images, reformat in markdown, extract data from charts) and it looks like there are a couple of promising Python libs like Docling and Marker.. I should really improve this part of my workflow.

jerkstate · 2026-03-30T03:01:15 1774839675

after looking into it for a little while, Docling and Marker work pretty well but are very slow. I haven't found anything else that extracts math suitably. It takes 10+ minutes per pdf, so I'm going to run it on a batch of these papers overnight and create my own little gaussian splatting RAG database. It's really too bad PDF is so terrible.

echelon · 2026-03-29T17:59:08 1774807148

What's your take on WorldLabs and Apple's splat models? Are there other open source alternatives?

How would editing work?

Do you think these will win over video world models like Genie?

Have you played with DiamondWM and other open source video world models?

jerkstate · 2026-03-30T00:24:44 1774830284

My understanding is that those models create gaussian splats from a text prompt, kinda like a 3d version of nano banana. I'm not doing that (yet), what I'm doing is creating splats from a set of photos - aka "splat training" and then rendering the splat as a static (working on dynamism) on the Quest headset. This is pretty well-worn territory with a lot of good implementations, but I have my own implementation of a trainer in C++/CUDA (originally based in SpeedySplat, which was written in Python, but now completely rewritten and not much of SpeedySplat is left) and renderer in C++/OpenXR for the Quest (originally based on a LLM-made port of 3DGS.cpp to OpenXR, but 100% rewritten now), and I can easily integrate techniques from research.