Most file conversion tools upload your files to a remote server, process them, and send them back. That means your data leaves your device — which can be a problem when you're working with sensitive documents.

I wanted to explore a more privacy-friendly approach: do as much as possible directly in the browser, and only fall back to server processing when the web platform can't do the job well. This "Hybrid (Browser-First)" model is what I used to build FastlyConvert — a multi-format conversion & compression suite covering PDF, image, video, and audio.

The Architecture: What Runs Where

Not every conversion can happen client-side. Browsers are great at image operations, but heavier workloads (e.g., large video transcoding, speech recognition) still require server compute for good UX and reliability.

Here's how I split the work today:

Conversion TypeWhere it RunsTechnology
Image format (JPG, PNG, WebP)100% browserCanvas API + toBlob()
Image resize / compress100% browserCanvas API + OffscreenCanvas
HEIC to JPG/PNG100% browserWebAssembly (heic2any)
PDF to Word / Excel / PPTServer-sideCustom parser + layout engine
Video compression / conversionServer-sideFFmpeg
Audio format conversionServer-sideFFmpeg
Audio/Video to TextServer-sideWhisper (speech-to-text)
Text to SpeechServer-sideOpenAI TTS (MP3 output)

Rule of thumb: if the Web API can handle it natively, keep it in the browser. Everything else goes to the server with strict privacy controls (e.g., temporary storage + auto-deletion).

Client-Side Image Conversion with Canvas API

The simplest conversion — say JPG to PNG — needs surprisingly little code:

async function convertImage(file, targetFormat) {
  const img = new Image();
  const url = URL.createObjectURL(file);

  return new Promise((resolve) => {
    img.onload = () => {
      const canvas = document.createElement("canvas");
      canvas.width = img.naturalWidth;
      canvas.height = img.naturalHeight;

      const ctx = canvas.getContext("2d");
      ctx.drawImage(img, 0, 0);

      canvas.toBlob(
        (blob) => {
          URL.revokeObjectURL(url);
          resolve(blob);
        },
        `image/${targetFormat}`,
        0.92
      );
    };

    img.src = url;
  });
}

Key gotcha: the quality parameter only applies to JPEG and WebP. PNG is always lossless — so "quality" won't change file size much for PNG outputs.

You can see this exact approach in action with FastlyConvert's image converter — JPG, PNG, and WebP conversions all happen client-side with zero server upload.

Handling HEIC (iPhone Photos) in the Browser

HEIC has been the default photo format on iPhones for years, but most browsers can't decode HEIC natively. For this, a WebAssembly approach works well.

I used heic2any (WASM-based):

import heic2any from "heic2any";

async function convertHeic(file) {
  const blob = await heic2any({
    blob: file,
    toType: "image/jpeg",
    quality: 0.92,
  });

  return blob;
}

This runs entirely in the browser — no server upload needed — which is exactly the kind of task where browser-first shines. This powers FastlyConvert's HEIC to JPG converter — your iPhone photos never leave the browser.

Server-Side: Video/Audio Processing with FFmpeg

For large video compression and audio transcoding, the browser can do it in theory (WASM FFmpeg exists), but in practice it's often:

  • too slow on low-end devices,
  • too memory-heavy for big files,
  • and hard to make reliable across browsers.

So I run FFmpeg server-side for video/audio tasks, and focus on:

  • clear presets (e.g., quality vs size),
  • predictable outputs (e.g., MP4 H.264 as the default),
  • and privacy policies (auto-deletion).

Example UX pattern that worked well: give users 3-4 "compression modes" (High Quality / Balanced / Max Compress) instead of asking them to tune bitrate and CRF on day one. FastlyConvert's video compressor uses exactly this approach.

Server-Side: AI Transcription with Whisper (Audio/Video to Text)

For speech recognition, browser-only options still don't match Whisper's quality and language coverage at scale.

The key architectural decisions I made:

  1. Process only what the user requests (no extra analysis).
  2. Auto-delete uploaded files after a short retention window.
  3. Keep the API simple, and return clean text + language metadata.

Pseudo-code sketch:

@app.post("/api/transcribe")
async def transcribe(file: UploadFile):
    temp_path = save_temp(file)

    # schedule deletion
    schedule_delete(temp_path, hours=24)

    result = whisper_model.transcribe(
        temp_path,
        task="transcribe",
        language=detect_language(temp_path)
    )

    return {"text": result["text"], "language": result["language"]}

You can test this with FastlyConvert's speech-to-text tool, which supports 90+ languages and exports to TXT, SRT, or VTT.

Server-Side: PDF Conversion — Why It Can't Be Client-Side (Yet)

PDF is not a "document" in the way most people think. It's closer to a printed page frozen in digital form — every character is placed at exact X/Y coordinates. There are no "paragraphs" or "tables" in the Word sense, just positioned text fragments and drawn lines.

Converting PDF to Word means reverse-engineering the visual layout back into a flow-based document model. That requires:

  • Table detection (recognizing grid patterns from drawn lines)
  • Font extraction or substitution
  • Column boundary detection
  • OCR for scanned pages

This is compute-heavy and requires libraries that don't run in browsers well. The tradeoff: server processing with the same auto-deletion policy.

i18n: Supporting 7 Languages Without a Framework

FastlyConvert is plain HTML + vanilla JS (no React/Next). For i18n, I used a simple attribute-based approach:

document.querySelectorAll("[data-i18n]").forEach((el) => {
  const key = el.getAttribute("data-i18n");
  el.textContent = translations[currentLang][key] || el.textContent;
});

This supports: English (en), French (fr), Japanese (ja), Spanish (es), Portuguese (pt), Simplified Chinese (zh-CN), Traditional Chinese (zh-TW).

No build step. No runtime framework. Just static pages + lightweight scripts.

Performance: Why No Build Step

Yes — it's "old school," but it's extremely effective for SEO-driven tool pages:

  • Users land directly on specific tools (e.g., /video-compressor, /text-to-speech)
  • Static files are edge-cached (Vercel/CDN)
  • No framework hydration overhead
  • Fewer moving parts = fewer deploy surprises

The tradeoff is duplication across many HTML pages, but for a conversion suite where speed + SEO matter most, it's a tradeoff I'm happy with.

Key Takeaways

  1. Use Canvas API for common image conversions — no server needed.
  2. WebAssembly (like heic2any) unlocks formats browsers can't decode natively.
  3. For heavy tasks, a hybrid browser-first approach gives better UX than "all-server" or "all-WASM".
  4. Auto-deletion policies are non-negotiable for file processing products.
  5. Vanilla HTML/JS can still win on performance for tool-style websites.
  6. PDF conversion is genuinely hard — the format gap between PDF and Word is much wider than most people realize.

Try It Out

See these patterns in production:

All free to use, no account required.