>> how does your system work in the IRIS or other examples you have? It looks like it hides under the hood a few things compared to this implementation - type of the model (tensorflow) and the logic that fetches it from S3, right?
That's exactly right. Cortex has three runtimes:
The Predictor runtime, which is used in this post, can run arbitrary Python. There is an optional key in `cortex.yaml` for Predictors called `model`, which is an S3 path to an exported model (or directory). If provided, Cortex will download the file/directory at that path and make it available as an argument in the `init(model_path, metadata)` function in your Predictor implementation (see here for the Predictor docs: https://www.cortex.dev/deployments/predictor)
The TensorFlow and ONNX runtimes behave a little differently (and similar to each other): `model` is a required field in the API config, and Cortex handles downloading the model and running inference against it. You may define a `request_handler`, which can contain pre- and post-request handling (here are the TensorFlow docs: https://www.cortex.dev/deployments/tensorflow, and here are the ONNX docs: https://www.cortex.dev/deployments/onnx)
>> Let's imagine we would want to make a DVC repo (just to store model versions to start) out of one of your examples instead of the DVC get started, how would we do that with the current implementation (through metadata + custom init)?
Yes, that is also exactly right - you'll have to use the Predictor runtime since that allows you to define how to download your model. You would specify metadata and leave out the `model` config field, similar to as done in this post. In `init(model_path, metadata)`, you would use the metadata to download and load the model
Cortex contributor here - you're right, I would say we can be compared to SageMaker model deployment. We are currently working on supporting spot instances for serving, and training is on our roadmap.
We have a lot of respect for the work that the KubeFlow team is doing. Their focus seems to be on helping you deploy a wide variety of open source ML tooling to Kubernetes. We use a more narrow stack and focus more on automating common workflows.
For example, we take a fully declarative approach; the “cortex deploy” command is a request to “make it so”, rather than “run this training job”. Cortex determines at runtime exactly what pipeline needs to be created to achieve the desired state, caching as aggressively as it can (e.g. if a hyperparameter to one model changes, only that model is re-trained and re-deployed, whereas if a transformer is updated, all transformed_columns which use that transformer are regenerated, all models which use those columns are re-trained, etc). We view it as an always-on ML application, rather than a one-off ML workload.
We use TesnorFlow serving (https://www.tensorflow.org/serving) for serving the trained models. We also run Flask to transform the incoming JSON to match the way the data has been transformed at training time.
Pretty cool name (cortex), but also taken by this project - https://github.com/cortexproject/cortex - "A multitenant, horizontally scalable Prometheus as a Service"
The main thing we try to help with is orchestrating Spark, TensorFlow, TensorFlow Serving, and other workloads without requiring you to manage the infrastructure. You’re right that we have a thin layer around tf.estimator (by design) because our goal is to make it easy to create scalable and reproducible pipelines from building blocks that people are familiar with. We translate the YAML blocks into workloads that run as a DAG on Kubernetes behind the scenes.
>> how does your system work in the IRIS or other examples you have? It looks like it hides under the hood a few things compared to this implementation - type of the model (tensorflow) and the logic that fetches it from S3, right?
That's exactly right. Cortex has three runtimes:
The Predictor runtime, which is used in this post, can run arbitrary Python. There is an optional key in `cortex.yaml` for Predictors called `model`, which is an S3 path to an exported model (or directory). If provided, Cortex will download the file/directory at that path and make it available as an argument in the `init(model_path, metadata)` function in your Predictor implementation (see here for the Predictor docs: https://www.cortex.dev/deployments/predictor)
The TensorFlow and ONNX runtimes behave a little differently (and similar to each other): `model` is a required field in the API config, and Cortex handles downloading the model and running inference against it. You may define a `request_handler`, which can contain pre- and post-request handling (here are the TensorFlow docs: https://www.cortex.dev/deployments/tensorflow, and here are the ONNX docs: https://www.cortex.dev/deployments/onnx)
>> Let's imagine we would want to make a DVC repo (just to store model versions to start) out of one of your examples instead of the DVC get started, how would we do that with the current implementation (through metadata + custom init)?
Yes, that is also exactly right - you'll have to use the Predictor runtime since that allows you to define how to download your model. You would specify metadata and leave out the `model` config field, similar to as done in this post. In `init(model_path, metadata)`, you would use the metadata to download and load the model