A private, keyless async pipeline: human approval → Sora 2 video generation → Blob storage
This document treats the Shorts Factory demo, which automatically generates vertical shorts videos with Sora 2, as an Azure architecture case study and explains which services it uses, how they are composed, and why it was designed this way from the perspective of engineers who want to build it themselves. (Baseline date: 2026-06-14, region: East US 2)
The live demo is protected by an access password. The "Open site" link above points to the Container Apps web app address, and a login screen appears first. This is a private pilot environment, not a public demo.
Shorts Factory is a multi-agent pipeline that automates the flow from "one-line topic → AI suggests 3 ideas → a human approves 1 → Sora 2 generates video → Blob storage → playback/download on the web."
The key themes are human-in-the-loop approval · asynchronous queues · keyless operation(no secrets) · private network isolation.
202, while a separate worker consumes the queue for heavy generation work.gpt-5.4 drives 5 logical agents(strategy, directing, story review, compliance, visual QA).Product principle: public trends are used only as abstract signals; original videos are not copied, clipped, transcribed, or imitated.
Internet ──HTTPS(only public entry)──▶ Container Apps public ingress
│
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ VNet vnet-shorts-app (10.42.0.0/16) │
│ │
│ snet-containerapps (10.42.0.0/27, delegated to Microsoft.App/environments) │
│ ┌───────────────────────────┐ send only 1 approved job_id ┌───────────────────┐ │
│ │ Container App: Web │ ──────────────────────────────────▶ │ (Service Bus queue)│ │
│ │ shorts-factory-web-secure │ └─────────┬─────────┘ │
│ │ - FastAPI + web UI │ │ KEDA rule │
│ │ - Generate/approve 3 ideas│ ┌──────────────────────────────────────▼─────────┐ │
│ │ - replica 1~1 (always on) │ │ Container App: Worker shorts-factory-worker │ │
│ └──────────┬─────────────────┘ │ - Queue consume → direct/review/compliance/Sora/ │ │
│ │ │ assemble/QA │ │
│ │ │ - No ingress, runs only when queue has messages │ │
│ │ └──────────┬───────────────────────────┬──────────┘ │
│ snet-private-endpoints (10.42.1.0/24) │ read/write job JSON │ save MP4 │
│ ┌───────────────────┐ ┌───────────────────┐ │ │ │
│ │ pe-shorts-blob │ │ pe-shorts-servicebus│◀─┘ (Sender=Web / Receiver=Worker) │
│ └─────────┬─────────┘ └──────────┬─────────┘ │
└────────────┼────────────────────────┼─────────────────────────────────────────────────────┘
│ Private DNS │ Private DNS
▼ ▼
Azure Blob Storage Azure Service Bus (Premium) ┌──────────────────────────────┐
stshortsfacbc56b85d17 sbshortsfac6c7f27 │ Microsoft Foundry (East US 2) │
- shorts-jobs (job JSON) - queue shorts-generation │ foundry-ncxnghbrr7n6w │
- shorts-videos (MP4) - duplicate detection/DLQ/5 sends │ - gpt-5.4 (5 text agents) │
* Public net Disabled, keys Disabled * Public net Disabled, Private Link │ - sora-2 (vertical video) │
└──────────────────────────────┘
Worker ──Managed Identity(keyless)──▶ Foundry(gpt-5.4 / sora-2)
Flow Summary
1. A user enters topic, audience, tone, and length(default 36 seconds) on the web → Web uses gpt-5.4(Strategy) to generate 3 original ideas and stores the job in Blob as awaiting_approval.
2. The user approves 1 idea → Web places only the job_id on the Service Bus queue and immediately returns 202 Accepted.
3. When a message appears in the queue, KEDA wakes the Worker → Worker runs directing(Director) → story review(Story Editor) → compliance → sequentially generates three 12-second scenes with Sora 2 → assembles with FFmpeg → performs visual QA on the final video → stores the MP4 in Blob(shorts-videos) → marks the job completed.
4. The user checks status on the web and plays/downloads the video.
| Layer | Resource | Verified Configuration | Role |
|---|---|---|---|
| Runtime | cae-shorts-secure (Container Apps Env) |
VNet-connected, East US 2 | Shared runtime for Web/Worker |
| Web | shorts-factory-web-secure |
External HTTPS ingress, 1 vCPU / 2 GiB, replica 1~1 | FastAPI + web UI, idea generation/approval |
| Worker | shorts-factory-worker |
No ingress, 1 vCPU / 2 GiB, KEDA(Service Bus) rule | Queue consumption + video generation pipeline |
| Registry | acrue2423kqbdy4k |
Basic | Web/Worker container images |
| Messaging | sbshortsfac6c7f27 |
Premium, 1 MU, queue shorts-generation, public network Disabled |
Approved job queue(durability, duplicate detection, DLQ) |
| Queue policy | shorts-generation |
lock 5 min, max delivery 5 times, duplicate detection 10 min window, DLQ on expiration | One job at a time, safe retries |
| Storage | stshortsfacbc56b85d17 |
StorageV2, Standard LRS, public network Disabled, Blob public access blocked, shared key blocked | Job JSON + completed MP4 |
| Containers | shorts-jobs / shorts-videos |
Private | jobs/{id}.json / {id}/short.mp4 |
| AI | foundry-ncxnghbrr7n6w |
Azure AI Services, East US 2 | Foundry project/model hosting |
| Model | gpt-5.4 (ver 2026-03-05) |
GlobalStandard | Strategy, directing, review, compliance, visual QA |
| Model | sora-2 (ver 2025-12-08) |
GlobalStandard | 720×1280 vertical video generation |
| Network | vnet-shorts-app + 2 PEs + 2 Private DNS zones |
Private Endpoints for Blob/Service Bus | Private path isolation |
| Identity | mi-ue2423kqbdy4k / mi-shorts-worker |
2 user-assigned Managed Identities | Keyless authentication(role separation) |
Sora video generation is expensive. Therefore POST /api/jobs creates only 3 ideas and stops there.
Only after a person selects 1 idea through POST /api/jobs/{id}/approve is the job placed on the queue. The compliance agent can
reject risky concepts before the Sora call, adding one more filter before money is spent.
→ Beginner analogy: an expensive order(video generation) proceeds only after the clerk asks, "Do you really want to pay?"
Creating one video takes several minutes. If that work runs inside a web request, it causes timeouts, duplicate charges, and scaling limits.
So Web puts only the job_id on the queue and immediately returns 202, while the Worker consumes the queue and performs the heavy work.
The worker is awakened by KEDA based on queue length, so it uses almost no resources when there is no work.
The queue traffic for this demo would be sufficient on Standard. The single decisive reason Premium is used is that Service Bus Private Link/Private Endpoint is supported only on the Premium tier. Premium is required to close the public endpoint and access the queue only over a VNet private path(additionally bringing dedicated throughput and AZ redundancy options). → Trade-off: Premium incurs a fixed cost even when idle. If cost matters more than security isolation, Standard(+public network+Entra authentication) or Storage Queue are alternatives.
| Option | Pros | Cons | Best Fit |
|---|---|---|---|
| Current: Service Bus Premium | Private Endpoint, DLQ, duplicate detection, dedicated throughput | Fixed cost | When private network isolation must be maintained |
| Service Bus Standard | Lower cost, retains most queue features | No Private Endpoint(public endpoint required) | When public network + Entra authentication is acceptable |
| Azure Storage Queue | Reuses existing Storage, supports Queue PE | No built-in DLQ/duplicate detection(app must implement) | When cost optimization matters more than security |
process_job first stores each stage and each per-scene Sora operation ID in the Blob job JSON, and only then starts polling.
Even if the worker dies or the message is redelivered, it does not create a new paid Sora job; it resumes by looking up the saved operation ID.
The storage key is also deterministic, {job_id}/short.mp4, so retries overwrite the same Blob. → The core invariant that prevents duplicate charges.
No connection strings or API keys are used; everything authenticates through DefaultAzureCredential(Managed Identity at runtime).
- Web ID mi-ue2423kqbdy4k: AcrPull, Foundry User, Cognitive Services User, Storage Blob Data Contributor, Service Bus Data Sender
- Worker ID mi-shorts-worker: same as above + Service Bus Data Receiver
By separating Sender/Receiver into different identities, Web cannot consume the queue and Worker cannot arbitrarily submit new jobs.
Storage has publicNetworkAccess=Disabled, with Blob public access and shared keys both blocked. Service Bus also blocks public network access and disables local key authentication.
Both services are resolved/accessed only through Private DNS → Private Endpoint. The only external opening is the Web ingress(HTTPS).
→ Caution: uploading directly to Blob from a local PC will fail because of the private network. It works only from the Azure runtime(in the same VNet).
The Sora SDK supports only 4/8/12 seconds per scene. So 36 seconds is split into 12-second hook → 12-second rise → 12-second payoff,
and the last frame of one scene(720×1280 PNG) is passed to the next scene as an input_reference to create continuity. FFmpeg stitches them together
with 0.12-second xfade/acrossfade transitions and encodes an H.264/AAC MP4. After assembly, a visual QA agent inspects 15 actual frames;
if the overall score is below 82 or any item is below 75, it partially regenerates up to 2 times starting from the failed scene(accepted scenes are preserved).
[*] ─▶ awaiting_approval ──human approval──▶ generating ──compliance rejection──▶ rejected
│
├─ temp assembly ─▶ visual_review ─ quality approved ─▶ completed
│ │
│ └ partial regen from failed scene ─▶ generating
└─ provider error ─▶ failed ── transient redelivery ─▶ generating
Stages the Worker records in the Blob job JSON: directing → creative_review → compliance → video → visual_review → storage → completed.
The queue lock is 5 minutes, but the SDK AutoLockRenewer renews it for up to 75 minutes; the Sora timeout is 20 minutes per scene. After 5 failures, the message moves to the DLQ.
Prices change, so do not make definitive claims. Check actual estimates with the Azure Pricing Calculator and your subscription's Cost Management.
Microsoft.App/environments, and Private Endpoints should be placed in a separate subnet.privatelink.blob.core.windows.net, privatelink.servicebus.windows.net) must be linked to the VNet so names resolve to private IPs. If missing, the worker cannot find the queue/Blob."A secure asynchronous pipeline that processes only human-approved expensive AI video generation, inside a private network, without secrets(Managed Identity), and resumes without duplicate charges even after failures" — a pattern that satisfies security, cost control, and durability together.