Building a SaaS from scratch is one of those things that sounds straightforward until you actually start doing it. Podhoc — an AI-powered platform that turns any content into a podcast — has been one of the most demanding and rewarding engineering challenges I have taken on.

Here is what I learned.

The idea

The premise is simple: you paste a URL, and you get a podcast. YouTube video, PDF, blog post, Word document — Podhoc extracts the content, generates a conversational script using AI, and produces the audio with text-to-speech.

Simple to describe. Brutally complex to build well.

The transcript problem

The first real challenge was content extraction. Every source type is different. YouTube has subtitles (sometimes), PDFs have layers of formatting chaos, DOCX files have nested XML, and web pages are a jungle of divs, ads, and cookie banners.

Each processor had to be built as its own Lambda function. Each one had edge cases that only surfaced in production. A PDF with scanned images instead of text. A YouTube video with auto-generated captions in the wrong language. A DOCX file exported from Google Docs with invisible formatting.

The lesson: the “simple” part of any pipeline — getting clean input — is where most of the real work lives.

Multi-provider AI architecture

Early on I made a decision that proved critical: build a provider-agnostic AI layer. Instead of coupling the system to a single LLM, Podhoc supports Google Gemini, OpenAI, Grok, and DeepSeek through a unified ProviderClient interface.

This was not premature abstraction. The AI landscape shifts fast. Models get deprecated, pricing changes overnight, and quality varies wildly across languages. Having the ability to swap providers per-request — or even per-user — turned out to be essential.

The same pattern applies to text-to-speech. Different voice engines handle different languages better. Spanish narration from one provider sounds robotic while another nails the intonation. The multi-provider architecture lets Podhoc route intelligently.

The orchestration nightmare

A podcast generation is not a single API call. It is a pipeline:

  1. Validate the source
  2. Fetch the transcript (which might take seconds or minutes depending on the source)
  3. Generate the script (LLM calls with retries and fallbacks)
  4. Split the script into audio units
  5. Generate audio for each unit
  6. Transcode and concatenate the final audio
  7. Update credits, notify the user

AWS Step Functions orchestrate this, but designing the state machine was its own puzzle. What happens when step 3 fails halfway through? What if the user runs out of credits between step 2 and step 3? What if the TTS provider returns a 429 at step 5?

Every failure mode needed a graceful path. Partial results need cleanup. Credits need to be refunded on failure. The user needs to know what happened without seeing a stack trace.

The database migration trap

One memorable incident: I migrated all primary keys from integer IDs to UUIDs. The migration ran cleanly. Then a new feature branch tried to add a table with user_id INTEGER REFERENCES users(id) — and the foreign key constraint blew up because users.id was now a UUID.

The fix was trivial. The lesson was not: always verify the current state of referenced columns, especially after schema migrations. Never assume.

Authentication complexity

Podhoc supports email/password and social login (Google, Apple) through AWS Cognito. Sounds standard. Then you hit the edge cases.

What happens when a user signs up with Google, then later tries to sign in with the same email via password? Account linking. Custom Cognito attributes for tracking provider types and linking state. Pre-sign-up Lambda triggers that silently failed because the custom attributes did not exist in the Cognito schema yet.

The debugging was painful because Cognito errors are opaque. A user would fail to register and the only signal was a missing group assignment downstream. Tracing back to a missing custom:provider_type attribute took longer than building the feature.

Going serverless (for real)

The entire backend is serverless: Lambda functions, API Gateway, Step Functions, S3, RDS. No EC2 instances. No containers. No servers to patch at 3am.

The trade-off is cold starts and the Lambda execution model. Each function is isolated. Database connections need careful management — reused across invocations but scoped within a single Lambda instance. The common layer pattern (shared utilities across all functions) required its own build pipeline.

Deployment is handled through GitHub Actions. Infrastructure through Terraform. Every environment (dev, prod) has its own state, its own config files, its own Cognito pools. The CI/CD pipeline deploys Lambdas, web apps, database migrations, and infrastructure changes independently.

Credits and payments

Podhoc uses a credits-based model with Stripe. Users purchase credits, and each podcast generation deducts based on the source length and model used.

The credits ledger is immutable — every transaction is an append-only entry. This makes debugging straightforward and auditing trivial. Refunds on failed generations are automatic. The ledger tells the complete story.

What I would do differently

Start with fewer providers. The multi-provider architecture was the right call, but supporting four LLM providers and multiple TTS engines from day one spread testing and debugging thin. Ship with one, prove the concept, then expand.

Invest in observability earlier. Custom structured logging saved me repeatedly, but I wish I had set up proper distributed tracing from the start. Following a request through API Gateway → Lambda → Step Functions → multiple Lambdas → S3 is painful without it.

Simplify the auth flow. Social login account linking is a UX minefield. If I started over, I would pick one auth strategy and do it exceptionally well before adding complexity.

The state of Podhoc today

Podhoc is live, processing content, and generating podcasts. The platform handles multiple languages, multiple AI providers, and multiple content source types. It runs on a serverless infrastructure that scales to zero when idle and handles bursts without intervention.

Building it as a solo founder meant wearing every hat: product, engineering, infrastructure, design, and support. It is exhausting and clarifying in equal measure. Every decision has a direct consequence. Every shortcut surfaces eventually.

The challenge of building a SaaS is not any single technical problem. It is the compound complexity of solving hundreds of them while keeping the product coherent and the user experience simple.

Podhoc is far from done. But it works, it ships, and it turns URLs into podcasts. That is the point.