Lololol show me an "MLIR" port. Do you mean tensorflow port or jax port or torch port (that uses torch-mlir)? Or do you really mean llama implemented in linalg/tosa/tendor?
I wasn't talking about Llama specifically. I was thinking of the SHARK Stable Diffusion port (which uses MLIR/IREE), as it considerably outpaced the ONNX runtime.