I wish they would just allow us to push everything to GPU as buffer pointers, li...

softfalcon · 2026-02-10T20:14:53 1770754493

I also want what you're describing. It seems like the ideal "data-in-out" pipeline for purely compute based shaders.

I've brought it up several times when talking with folks who work down in the chip level for optimizing these operations and all I can say is, there are a lot of unforeseen complications to what we're suggesting.

It's not that we can't have a GPU that does these things, it's apparently more of a combination of previous and current architectural decisions that don't want that. For instance, an nVidia GPU is focused on providing the hardware optimizations necessary to do either LLM compute or graphics acceleration, both essentially proprietary technologies.

The proprietariness isn't why it's obtuse though, you can make a chip go super-duper fast for specific tasks, or more general for all kinds of tasks. Somewhere, folks are making a tradeoff of backwards compatibility and supporting new hardware accelerated tasks.

Neither of these are "general purpose compute and data flow" focuses. As such, you get the GPU that only sorta is configurable for what you want to do. Which in my opinion explains your "GPU programming seems to be both super low level, but also high level" comment.

That's been my experience. I still think what you're suggesting is a great idea and would make GPU's a more open compute platform for a wider variety of tasks, while also simplifying things a lot.

cmovq · 2026-02-10T22:35:35 1770762935

This is true, but what the parent comment is getting at is we really just want to be able to address graphics memory the same way it's exposed in CUDA for example. Where you can just have pointers to GPU memory in structures visible to the CPU, without this song and dance with descriptor set bindings.

softfalcon · 2026-02-11T18:06:47 1770833207

Ah, yeah. I see what you're saying. That is more of an open vs proprietary platform problem.

I am fairly sure that nVidia intentionally wants to keep addressable memory as a feature only for CUDA (among many other features).

Having CUDA be far superior to other shader code methods is good for vendor lock-in to their hardware, their software, their drivers, etc.

It is really sad seeing that the addressing is possible, but they won't open it up to everyone.

jsheard · 2026-02-10T18:15:24 1770747324

Relevant: Descriptors are Hard from XDC 2025 - https://www.youtube.com/watch?v=TpwjJdkg2RE

Even on modern hardware there's still a lot of architectural differences to reconcile at the API level.

fc417fc802 · 2026-02-10T21:57:44 1770760664

If you got what you're asking for you'd presumably lose access to any fixed function hardware. RE your example, knowing the data format permits automagic hardware accelerated translations between image formats.

You're free to do what you're asking after by simply performing all operations manually in a compute shader. You can manually clip, transform, rasterize, and even sample textures. But you'll lose the implicit use of various fixed function hardware that you currently benefit from.

bsder · 2026-02-11T04:07:38 1770782858

> If you got what you're asking for you'd presumably lose access to any fixed function hardware.

Are there any fixed functions left that aren't just being implemented by the general compute shader hardware?

I guess the ray tracing stuff would qualify, but that isn't what people are complaining about here.

fc417fc802 · 2026-02-11T19:50:56 1770839456

I am under the (potentially mistaken) impression that at minimum rasterization and texture filtering retain dedicated hardware on modern cards. There's also the issue of the format you output versus the format the display hardware works in natively.

That said, I'm not clear the extent to which such dedicated functionality either already is or could be made accessible via the instruction set. But even then I'm not sure how ergonomic it would be to make use of from a shader language.

hinkley · 2026-02-10T20:54:21 1770756861

I’m not watching Rust as closely as I once did, but it seems like buffer ownership is something it should be leaning on more fully.

There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.

It is structurally similar to double buffered video, but for any sort of data.

It seems like Rust would be good for proving the soundness. And it should be a library now rather than a roll your own.

LoganDark · 2026-02-11T00:37:19 1770770239

> There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.

Isn't this just called a swapchain?

qskousen · 2026-02-11T14:51:53 1770821513

Just yesterday I watched this video: https://m.youtube.com/watch?v=7bSzp-QildA I am not a graphics programmer, but from what I understood I think he talks about doing what you are describing with Vulkan.