Hacker Newsnew | past | comments | ask | show | jobs | submit | msp26's commentslogin

They don't have the compute to make Mythos generally available: that's all there is to it. The exclusivity is also nice from a marketing pov.

They don't have demand for the price it would require for inference.

They are definitely distilling it into a much smaller model and ~98% as good, like everybody does.


Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6)

The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2).

Not impossible, but you have to be at least a little bit mad to deploy tokenizer replacement surgery at this scale.

They also changed the image encoder, so I'm thinking "new base model". Whatever base that was powering 4.5/4.6 didn't last long then.


Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining.

It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks.


Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat.

Not really similar to speculative decoding?

I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs.


> They don't have demand for the price it would require for inference.

citation needed. I find it hard to believe; I think there are more than enough people willing to spend $100/Mtok for frontier capabilities to dedicate a couple racks or aisles.


I've read so many conflicting things about Mythos that it's become impossible to make any real assumptions about it. I don't think it's vaporware necessarily, but the whole "we can't release it for safety reasons" feels like the next level of "POC or STFU".

> First, Opus 4.7 uses an updated tokenizer that improves how the model processes text

wow can I see it and run it locally please? Making API calls to check token counts is retarded.


> Data extraction tasks are amongst the easiest to evaluate because there’s a known “right” answer.

Wrong. There can be a lot of subjectivity and pretending that some golden answer exists does more harm and narrows down the scope of what you can build.

My other main problem with data extraction tasks and why I'm not satisfied with any of the existing eval tools is that the schemas I write change can drastically as my understanding of the problem increases. And nothing really seems to handle that well, I mostly just resort to reading diffs of what happens when I change something and reading the input/output data very closely. Marimo is fantastic for anything visual like this btw.

Also there is a difference between: the problem in reality → the business model → your db/application schema → the schema you send to the LLM. And to actually improve your schema/prompt you have to be mindful of the entire problem stack and how you might separate things that are handled through post processing rather than by the LLM directly.

> Abstract model calls. Make swapping GPT-4 for Claude a one-line change.

And in practice random limitations like structured output API schema limits between providers can make this non-trivial. God I hate the Gemini API.


This is very true! I could have been more careful/precise in how I worded this. I was really trying to just get across that it's in a sense easier than some tasks that can be much more open ended.

I'll think about how to word this better, thanks for the feedback!


This is extremely true. In fact, from what we see many/most of the problems to be solved with LLMs do not have ground-truth values; even hand-labeled data tends to be mostly subjective.


I think they're just saying that data extraction tasks are easy to evaluate because for a given input text/file you can specify the exact structured output you expect from it.


Man the lowest end pricing has been thoroughly hiked. It was convenient while it lasted.


I got claude to reverse engineer the extension and compare to changedetection and here's what it came up with. Apologies for clanker slop but I think its in poor taste to not attribute the opensource tool that the service is built on (one that's also funded by their SaaS plan)

---

Summary: What Is Objectively Provable

- The extension stores its config under the key changedetection_config

- 16 API endpoints in the extension are 1:1 matches with changedetection.io's documented API

- 16 data model field names are exact matches with changedetection.io's Watch model (including obscure ones like time_between_check_use_default, history_n, notification_muted, fetch_backend)

- The authentication mechanism (x-api-key header) is identical

- The default port (5000) matches changedetection.io's default

- Custom endpoints (/auth/, /feature-flags, /email/, /generate_key, /pregate) do NOT exist in changedetection.io — these are proprietary additions

- The watch limit error format is completely different from changedetection.io's, adding billing-specific fields (current_plan, upgrade_required)

- The extension ships with error tracking that sends telemetry (including user emails on login) to the developer's GlitchTip server at 100% sample rate

The extension is provably a client for a modified/extended changedetection.io backend. The open question is only the degree of modification - whether it's a fork, a proxy wrapper, or a plugin system. But the underlying engine is unambiguously changedetection.io.


Fair point, and I should have been upfront about this earlier. The backend is a fork of changedetection.io. I've built on top of it — added the browser extension workflow, element picker, billing, auth, notifications, and other things — but the core detection engine comes from their project. That should have been clearly attributed from the start, and I'll add it to the docs and about page.

changedetection.io is a genuinely great project. What I'm trying to build on top of it is the browser-first UX layer and hosted product that makes it easier for non-technical users to get value from it without self-hosting and AI focus approach

P.S -> I've also added an acknowledgements page to the docs: https://docs.sitespy.app/docs/acknowledgements


have you adhered to the license? https://github.com/dgtlmoon/changedetection.io/blob/master/C... . if so, where can I get a copy of the source?


Yes — the project is Apache 2.0 licensed (https://github.com/dgtlmoon/changedetection.io/tree/master?t...), which permits forking and commercial use. There's also a COMMERCIAL_LICENCE.md in the repo for hosting/resale cases, and I've reached out to the maintainer directly about it. Attribution is here: https://docs.sitespy.app/docs/acknowledgements



Apologies but I will use this thread as an opportunity to report CC VSCode extension bugs because I don't think there's an official channel that actually gets read by humans.

> yeah they're shipping too fast and everything is buggy as shit

- fork conversation button doesn't even work anymore in vscode extension

- sometimes when I reconnect to my remote SSH in VSCode, previously loaded chats become inaccessible. The chats are still there in the .jsonl files but for some reason the CC extension becomes incapable of reading them.

-- this issue happens so frequently that I ended up making a skill to allow CC to dig up info from the bugged sessions


many tasks don't need any reasoning


What the fuck is this price hike? It was such a nice low end, fast model. Who needs 10 years of reasoning on this model size??

I'm gonna switch some workflows to qwen3.5.

There's a lot of tasks that benefit from just having a mildly capable LLM and 2.5 Flash Lite worked out of the box for cheap.

Can we get flash lite lite please?

Edit: Logan said: "I think open source models like Gemma might be the answer here"

Implying that they're not interested in serving lower end Gemini models?


Are there good open models out there that beat gemini 2.5 flash on price? I often run data extraction queries ("here is this article, tell me xyz") with structured output (pydantic) and wasn't aware of any feasible (= supports pydantic) cheap enough soln :/


You'll have to try out models on your use case. Openrouter makes that easy.


> every single product/feature I've used other than the Claude Code CLI has been terrible

yeah they're shipping too fast and everything is buggy as shit

- fork conversation button doesn't even work anymore in vscode extension

- sometimes when I reconnect to my remote SSH in VSCode, previously loaded chats become inaccessible. The chats are still there in the .jsonl files but for some reason the CC extension becomes incapable of reading them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: