Walk it like I talk it

By Nate Nowack • 05 Sep, 2025

One of the most common questions from OSS Prefect users is:

How do I run open-source Prefect server in a high availability (HA) mode?

Historically, it hasn’t been directly possible, for a couple reasons:

in-memory event-bus
“singleton server” assumptions (e.g. caching automation objects in memory)

Recently, we’ve been working towards allowing users to run a setup that you can reasonably call HA, by:

implementing a Redis Streams-based messaging implementation
using Postgres Listen/Notify to broadcast events to all server instances
unwinding those “singleton server” assumptions

We’ve had some brave souls test out the new recommended setup, but there have been some rough edges.

So, in the words of the visionary triumvirate known as Migos, one must:

Walk it like I talk it

… that is to say, we must run and use the HA setup that we suggest to others.

We always compulsively test fixes and features against actual prefect servers, but it’s been a while since we’ve made a concerted effort to scale a single server installation. The new/broad interest in HA server configurations has made it a great time to do so!

Making our test bed before we lay in it

We decided on the following initial setup in GKE:

2 replicas of the prefect webserver (i.e. prefect server start --no-services)
1 instance of the background services (i.e. prefect server services start)
1 instance of the kubernetes worker (i.e. prefect worker start --type kubernetes)
1 Google-managed Redis instance
1 Google-managed Postgres instance

For the yaml-junkies among us, here are (more or less) the Helm chart configurations we used:

Helm chart configuration

server.yaml:

 1---
 2apiVersion: helm.toolkit.fluxcd.io/v2
 3kind: HelmRelease
 4metadata:
 5  name: prefect-server
 6spec:
 7  interval: 5m
 8  chart:
 9    spec:
10      chart: prefect-server
11      version: "2025.9.5190948" # Pin to specific version
12      sourceRef:
13        kind: HelmRepository
14        name: prefect
15        namespace: flux-system
16  values:
17    global:
18      prefect:
19        image:
20          repository: prefecthq/prefect
21          prefectTag: 3.4.17-python3.11-kubernetes
22    server:
23      replicaCount: 2
24      loggingLevel: WARNING
25      uiConfig:
26        prefectUiApiUrl: http://localhost:4200/api
27      resources:
28        requests:
29          cpu: 500m
30          memory: 512Mi
31        limits:
32          cpu: "1"
33          memory: 1Gi
34      # Add Redis auth environment variables to server pod
35      extraEnvVarsSecret: prefect-redis-env-secret
36    migrations:
37      enabled: true
38    backgroundServices:
39      runAsSeparateDeployment: true
40      # Use the Redis auth secret for password
41      extraEnvVarsSecret: prefect-redis-env-secret
42      messaging:
43        broker: prefect_redis.messaging
44        cache: prefect_redis.messaging
45        redis:
46          host: <REDIS_HOST_IP>
47          port: 6379
48          db: 0
49          username: default
50    # External PostgreSQL (Cloud SQL)
51    postgresql:
52      enabled: false
53    # External Redis (Memorystore)
54    redis:
55      enabled: false
56    # External database connection via secret
57    # This secret will be created by SecretProviderClass from GCP Secret Manager
58    secret:
59      create: false
60      name: prefect-db-connection-secret

worker.yaml:

 1---
 2apiVersion: helm.toolkit.fluxcd.io/v2
 3kind: HelmRelease
 4metadata:
 5  name: prefect-worker
 6  namespace: <YOUR_NAMESPACE>
 7spec:
 8  chart:
 9    spec:
10      chart: prefect-worker
11      version: 2025.9.5190948
12      sourceRef:
13        kind: HelmRepository
14        name: prefect
15        namespace: flux-system
16  driftDetection:
17    mode: warn
18  install:
19    remediation:
20      retries: 3
21  interval: 5m
22  maxHistory: 2
23  upgrade:
24    remediation:
25      retries: 3
26  values:
27    worker:
28      config:
29        http2: false
30        workPool: <YOUR_WORK_POOL_NAME>
31      apiConfig: selfHostedServer
32      selfHostedServerApiConfig:
33        apiUrl: http://prefect-server.<YOUR_NAMESPACE>.svc.cluster.local:4200/api
34      image:
35        repository: prefecthq/prefect
36        prefectTag: 3.4.17-python3.11-kubernetes
37        pullPolicy: Always
38      livenessProbe:
39        enabled: true
40      revisionHistoryLimit: 2
41      resources:
42        requests:
43          memory: 1Gi
44        limits:
45          memory: 2Gi

You can read more about the helm chart here, and more about Flux here.

Note that as of prefect==3.4.16, prefect server start --no-services avoids running any services, so as to avoid a bug causing redis connection errors in the server pod. We still need to audit each background service independently to ensure they can be horizontally scaled.

There were several PRs that came out of this deployment process:

and in general we expect to feel more of the friction that Prefect Server operators have felt in the past in the coming weeks, as we expand our use of the server for our own needs.

For example, I immediately got very annoyed that I had to click many buttons to pause all schedules for all deployments, so I opened that #18860 PR above to add an --all flag to prefect deployment schedule pause.

Running flows against our new setup

We’ve got a nice set of flows deployed and executing successfully via our kubernetes worker, which should help us catch any bugs early using the nighly dev builds and generally act as a canary in the coalmine for issues with HA setups.

<< Previous Post

Next Post >>

#Oss #Prefect Server #HA