Maybe. Certainly in the past, before the world was aware LLMs on the level of Ch...

sfriedr · on March 20, 2023

> OpenAI's chosen not to release any real details about GPT-4

Actually, they have release some details about it, in this 99-page technical report https://arxiv.org/abs/2303.08774 (which is actually two papers stitches together, once you read it; oddly enough using different fonts).

But I'm not sure if this content qualifies as "real details".

kmeisthax · on March 20, 2023

The intro to that paper specifically says:

> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar. We are committed to independent auditing of our technologies, and shared some initial steps and ideas in this area in the system card accompanying this release. We plan to make further technical details available to additional third parties who can advise us on how to weigh the competitive and safety considerations above against the scientific value of further transparency.

In other words, "Stable Diffusion wasn't supposed to happen, so we're making all our methodology trade secret[0], if you want to Do Science then agree to this massive NDA and have enough skin in the game for us to cut you."

[0] Presumably at some point OpenAI will have to 'relent' to independent discovery by patenting AI architectures and refusing to license them

generalizations · on March 20, 2023

I've been using the Chat GPT-4 model, and openAI has been putting warnings about max queries per N hours. Given the degree to which they're limiting access (up until they crashed today, they'd dropped to 25 queries / 3 hours), I suspect GPT-4 is actually much, much larger, and they just don't have the computational resources to support its use at the same level as GPT-3.5 or GPT-3.5 Turbo.

zamnos · on March 20, 2023

You could be right! I don't claim access to any private OpenAI information so any theories by me are based on what's known publicly, which isn't much for GPT-4. I do want to call attention to the difference between training runs and inference runs (post-training usage of the model). If each training run costs mid six-figures, CompetitorGPT is going to have to be well-funded and likely sponsored by AWS/GCP (eg Deepmind) just to train up the model, given that it's probably not a one-shot. If it's much lower due to optimizations in training, on top of only having to fine-tune the model on a company's codebase instead of training the whole model from scratch each time, then I could see a company selling the service of creating CompetitorGPT or CompetitorCoPilot seems like it could be a very worthwhile investment, by companies that are willing to invest in such services for their developers. (Eg companies that are willing to pay Splunk's exorbitant costs vs one that would rather burn time self-hosting a graphana setup. Not to impugn graphana, but it's very much a home-grown, open source self-hosted deployment. Managing a Splunk cluster is also far from free, it's just that not all companies are willing to bear the yearly licensing cost for it and would prefer to self-host graphana solely for cost reasons even if TCO including the opportunity cost makes it more expensive in the long run.)