I wish they would just allow us to push everything to GPU as buffer pointers, like buffer_device address extension allows you to, and then reconstruct the data to your required format via shaders.
The GPU programming seems to be both super low level, but also high level, cause textures and descriptors need these ultra specific data format's, and then the way you construct and upload those formats are very complicated and change all the time.
Is there really no way to simplify this ?
Regular vertex data was supposed to be strictly pre formatted in pipeline too, util it was not suddenly, and now we can just give the shader a `device_address`extension memory pointer and construct the data from that.
I also want what you're describing. It seems like the ideal "data-in-out" pipeline for purely compute based shaders.
I've brought it up several times when talking with folks who work down in the chip level for optimizing these operations and all I can say is, there are a lot of unforeseen complications to what we're suggesting.
It's not that we can't have a GPU that does these things, it's apparently more of a combination of previous and current architectural decisions that don't want that. For instance, an nVidia GPU is focused on providing the hardware optimizations necessary to do either LLM compute or graphics acceleration, both essentially proprietary technologies.
The proprietariness isn't why it's obtuse though, you can make a chip go super-duper fast for specific tasks, or more general for all kinds of tasks. Somewhere, folks are making a tradeoff of backwards compatibility and supporting new hardware accelerated tasks.
Neither of these are "general purpose compute and data flow" focuses. As such, you get the GPU that only sorta is configurable for what you want to do. Which in my opinion explains your "GPU programming seems to be both super low level, but also high level" comment.
That's been my experience. I still think what you're suggesting is a great idea and would make GPU's a more open compute platform for a wider variety of tasks, while also simplifying things a lot.
This is true, but what the parent comment is getting at is we really just want to be able to address graphics memory the same way it's exposed in CUDA for example. Where you can just have pointers to GPU memory in structures visible to the CPU, without this song and dance with descriptor set bindings.
If you got what you're asking for you'd presumably lose access to any fixed function hardware. RE your example, knowing the data format permits automagic hardware accelerated translations between image formats.
You're free to do what you're asking after by simply performing all operations manually in a compute shader. You can manually clip, transform, rasterize, and even sample textures. But you'll lose the implicit use of various fixed function hardware that you currently benefit from.
I am under the (potentially mistaken) impression that at minimum rasterization and texture filtering retain dedicated hardware on modern cards. There's also the issue of the format you output versus the format the display hardware works in natively.
That said, I'm not clear the extent to which such dedicated functionality either already is or could be made accessible via the instruction set. But even then I'm not sure how ergonomic it would be to make use of from a shader language.
I’m not watching Rust as closely as I once did, but it seems like buffer ownership is something it should be leaning on more fully.
There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.
It is structurally similar to double buffered video, but for any sort of data.
It seems like Rust would be good for proving the soundness. And it should be a library now rather than a roll your own.
> There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.
Just yesterday I watched this video: https://m.youtube.com/watch?v=7bSzp-QildA I am not a graphics programmer, but from what I understood I think he talks about doing what you are describing with Vulkan.
The GPU programming seems to be both super low level, but also high level, cause textures and descriptors need these ultra specific data format's, and then the way you construct and upload those formats are very complicated and change all the time.
Is there really no way to simplify this ?
Regular vertex data was supposed to be strictly pre formatted in pipeline too, util it was not suddenly, and now we can just give the shader a `device_address`extension memory pointer and construct the data from that.