We're pleased to announce the release of autoscaling for model serving and deployments on Paperspace Gradient.
Autoscaling is an essential MLOps tool for adjusting compute resources or instances dedicated to a machine learning workload dynamically.
The benefits of autoscaling are multiple:
- Automatic rather than manual instance allocation
- Tight control of costs since instances provision / deprovision automatically
- High availability due to automatic instance failover
To autoscale a deployment on Gradient, follow these steps:
Step 0 - Create a deployment
Step 1 - Choose a model to deploy
Step 2 - Select a recommended container or a custom container
Step 3 - Select a cluster and a machine
Step 4 - Give the deployment a name
Step 5 - Enable autoscaling and provide target autoscaling parameters
Step 6 - Provide a point of entry to the container (optional) and then create your deployment!
At this point you should now have a brand new autoscaled deployment building in your deployments console! Congratulations!
For more information on autoscaling deployments and model serving, read the docs.
If you have any questions feel free to drop us a note!