Autoscaling¶
Autoscaling allows the user to define how many engines a pipeline starts with, the minimum amount of engines a pipeline can be using, and the maximum amount of engines a pipeline can scale up to. The pipeline scales up and down based on the average CPU utilization across the engines in a given pipeline as the user’s workload increases and decreases.
Example¶
Autoscaling is configured through the deployment configuration for a pipeline.
replica_count
sets the initial amount of engines for the pipeline. This defaults to 1
replica_autoscale_min_max
sets the minimum number of engines and the maximum amount of engines. The lowest minimum is current 1
. This is also the default. The maximum
parameter must be set by the user.
autoscale_cpu_utilization
is set as an integer representing the average CPU utilization. The default value is 50
, which represents an average of 50% CPU utilization for the engines in a pipeline.
[ ]:
import wallaroo
wl = wallaroo.Client(auth_type="sso")
dc = (wallaroo.DeploymentConfigBuilder()
.replica_count(1)
.replica_autoscale_min_max(minimum=2, maximum=5)
.autoscale_cpu_utilization(60)
.build())
model = wl.upload_model("cc-fraud", "keras_ccfraud.onnx").deploy("fraud", deployment_config=dc)