Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not affiliated, but happy modal.com user, which has very fast cold starts for the few demos i run with them.


The coding paradigms that Modal imposes make it very hard to develop for, in comparison to, say, Replicate or Runpod.


Founder of Modal here. We've spent a ton of time on this, including building our own distributed file system optimized for low-latency high-througput workloads. We don't use K8s or Docker and built our own custom infrastructure instead.

Cold starting containers quickly is a fascinating problems. We've gotten a long way but there's still a lot more to do. For GPU-based inference, starting containers isn't enough – you also need to initialize the model GPU quickly. We are working on a long list of things that will bring down cold start latency even further.


Is Modal a good solution for running fine-tuned LLMs and Whisper models? If the cold-start time is low we're more than willing to modify our code to use Modal's infra. Happy to follow up via email but didn't see one in your profile.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: