Autoscaling

Autoscaling allows the user to define how many engines a pipeline starts with, the minimum amount of engines a pipeline can be using, and the maximum amount of engines a pipeline can scale up to. The pipeline scales up and down based on the average CPU utilization across the engines in a given pipeline as the user’s workload increases and decreases.

Example

Autoscaling is configured through the deployment configuration for a pipeline.

replica_count sets the initial amount of engines for the pipeline. This defaults to 1

replica_autoscale_min_max sets the minimum number of engines and the maximum amount of engines. The lowest minimum is current 1. This is also the default. The maximum parameter must be set by the user.

autoscale_cpu_utilization is set as an integer representing the average CPU utilization. The default value is 50, which represents an average of 50% CPU utilization for the engines in a pipeline.

[ ]:
import wallaroo

wl = wallaroo.Client(auth_type="sso")

dc = (wallaroo.DeploymentConfigBuilder()
    .replica_count(1)
    .replica_autoscale_min_max(minimum=2, maximum=5)
    .autoscale_cpu_utilization(60)
    .build())

model = wl.upload_model("cc-fraud", "keras_ccfraud.onnx").deploy("fraud", deployment_config=dc)