We're pleased to announce the release of autoscaling for model serving and deployments on Paperspace Gradient.

Autoscaling is an essential MLOps tool for adjusting compute resources or instances dedicated to a machine learning workload dynamically.

The benefits of autoscaling are multiple:

  • Automatic rather than manual instance allocation
  • Tight control of costs since instances provision / deprovision automatically
  • High availability due to automatic instance failover

To autoscale a deployment on Gradient, follow these steps:

Step 0 - Create a deployment

Step 1 - Choose a model to deploy

Step 2 - Select a recommended container or a custom container

Step 3 - Select a cluster and a machine

Step 4 - Give the deployment a name

Step 5 - Enable autoscaling and provide target autoscaling parameters

Step 6 - Provide a point of entry to the container (optional) and then create your deployment!

At this point you should now have a brand new autoscaled deployment building in your deployments console! Congratulations!

For more information on autoscaling deployments and model serving, read the docs.

If you have any questions feel free to drop us a note!