Well the whole point of a compression scheme like JPEG is to use what we know about human vision to minimize file size while not visually changing anything.
But JPEG is a pretty crude algorithm designed mainly to be computationally cheap [to match the capabilities of computer hardware in 1992 when it was created]. Trying to make better encoders within the limits of what JPEG decoders can handle is quite limiting, and yields only marginal improvements.
It’s possible to do much better if we’re willing to accept more CPU time spent in encoding/decoding using fancier algorithms. Unfortunately it’s really hard to get traction for anything else, because the whole world already has JPEG decoders built into everything. For instance JPEG 2000, designed to be a general purpose replacement with many improvements over JPEG, is now 13 years old but used only in niche applications, such as archiving very high resolution images.
But for use cases where you control the full stack, better compression is quite viable. Video game engines for instance devote considerable attention to image compression formats.
IMHO JPEG algorithm isn't crude. IMHO it's actually quite brilliant—simple and very closely tied to human perception. The core concept of DCT quantization hasn't been beaten yet—even latest video codecs use it, just with tweaks on top like block prediction and better entropy coding.
Wavelet compressors like JPEG 2000 beat JPEG only in lowest quality range where JPEG doesn't even try to compete. Wavelets seem great, because their blurring gives them high PSNR, but lack of texture and softened edges make them lose in human judgement.
JPEG isn't crude, agreed. But your comments seem off to me.
The core trick isn't DCT per se, it is transform coding, Which both DCY and wavelets are, followed by (usually) quantization and then entropy coding.
Typical wavelets used are orthonormal transforms, no loss there. The "lack of texture and softened edges" are a choice of the model used, not a consequence of the transform. True also of DCT and blocking artifacts. This should be obvious since both approaches allow for lossless encoders.
Typically wavelets will beat or match JPEG in any situation both in terms of psnr and the like, and perceptually (though this latter is much more controversial and poorly defined -- and to be fair I am not up to date on the literature here but I would be surprised of that has changed in the last decade).
The real reason for the huge popularity of DCT in still compression at first and video codecs later is that it is cheap to implement in hardware. And once there becomes very cheap to use.
JPEG 2000 is needlessly complex, at its core a wavelet codec is also very simple and elegant.
I have tried your JPEG compressor. While it does make red less dull at 2x2 chroma subsampling, it does also blur red (or orange) into (white) backgrounds. IMO the sum is negative, at least with the particular image I tried.
I'm particularly interested in this because lossy WebP only supports 2x2 chroma subsampling. So any optimisation for JPEG could maybe also be applied to WebP.
I have made a topic about this once on the WebP group, which also has the image I tried with your compressor.
With chroma subsampling there's tradeoff - either bleed black into color areas or bleed color outside. If have any ideas or know research on choosing when which option is best, then I'm all ears.
But JPEG is a pretty crude algorithm designed mainly to be computationally cheap [to match the capabilities of computer hardware in 1992 when it was created]. Trying to make better encoders within the limits of what JPEG decoders can handle is quite limiting, and yields only marginal improvements.
It’s possible to do much better if we’re willing to accept more CPU time spent in encoding/decoding using fancier algorithms. Unfortunately it’s really hard to get traction for anything else, because the whole world already has JPEG decoders built into everything. For instance JPEG 2000, designed to be a general purpose replacement with many improvements over JPEG, is now 13 years old but used only in niche applications, such as archiving very high resolution images.
But for use cases where you control the full stack, better compression is quite viable. Video game engines for instance devote considerable attention to image compression formats.