We know why they work, but not how. SotA models are an empirical goldmine, we are learning a lot about how information and intelligence organize themselves under various constraints. This is why there are new papers published every single day which further explore the capabilities and inner-workings of these models.
Ok, but the art and science of understanding what we're even looking at is actively being developed. What I said stands, we are still learning the how. Things like circuits, dependencies, grokking, etc.