Browser-First File Converter for Privacy

If you need to take action right away, try our Merge PDF, PDF to Word, Compress PDF, or JPG to PDF tools.

Most file conversion tools upload your files to a remote server, process them, and send them back. That means your data leaves your device — which can be a problem when you're working with sensitive documents.

I wanted to explore a more privacy-friendly approach: do as much as possible directly in the browser, and only fall back to server processing when the web platform can't do the job well. This "Hybrid (Browser-First)" model is what I used to build FastlyConvert — a multi-format conversion & compression suite covering PDF, image, video, and audio.

The Architecture: What Runs Where

Not every conversion can happen client-side. Browsers are great at image operations, but heavier workloads (e.g., large video transcoding, speech recognition) still require server compute for good UX and reliability.

Here's how I split the work today:

Conversion Type	Where it Runs	Technology
Image format (JPG, PNG, WebP)	100% browser	Canvas API + `toBlob()`
Image resize / compress	100% browser	Canvas API + OffscreenCanvas
HEIC to JPG/PNG	100% browser	WebAssembly (`heic2any`)
PDF to Word / Excel / PPT	Server-side	Custom parser + layout engine
Video compression / conversion	Server-side	FFmpeg
Audio format conversion	Server-side	FFmpeg
Audio/Video to Text	Server-side	Whisper (speech-to-text)
Text to Speech	Server-side	OpenAI TTS (MP3 output)

Rule of thumb: if the Web API can handle it natively, keep it in the browser. Everything else goes to the server with strict privacy controls (e.g., temporary storage + auto-deletion).

Client-Side Image Conversion with Canvas API

The simplest conversion — say JPG to PNG — needs surprisingly little code:

async function convertImage(file, targetFormat) {
  const img = new Image();
  const url = URL.createObjectURL(file);

  return new Promise((resolve) => {
    img.onload = () => {
      const canvas = document.createElement("canvas");
      canvas.width = img.naturalWidth;
      canvas.height = img.naturalHeight;

      const ctx = canvas.getContext("2d");
      ctx.drawImage(img, 0, 0);

      canvas.toBlob(
        (blob) => {
          URL.revokeObjectURL(url);
          resolve(blob);
        },
        `image/${targetFormat}`,
        0.92
      );
    };

    img.src = url;
  });
}

Key gotcha: the quality parameter only applies to JPEG and WebP. PNG is always lossless — so "quality" won't change file size much for PNG outputs.

You can see this exact approach in action with FastlyConvert's image converter — JPG, PNG, and WebP conversions all happen client-side with zero server upload.

Handling HEIC (iPhone Photos) in the Browser

HEIC has been the default photo format on iPhones for years, but most browsers can't decode HEIC natively. For this, a WebAssembly approach works well.

I used heic2any (WASM-based):

import heic2any from "heic2any";

async function convertHeic(file) {
  const blob = await heic2any({
    blob: file,
    toType: "image/jpeg",
    quality: 0.92,
  });

  return blob;
}

This runs entirely in the browser — no server upload needed — which is exactly the kind of task where browser-first shines. This powers FastlyConvert's HEIC to JPG converter — your iPhone photos never leave the browser.

Server-Side: Video/Audio Processing with FFmpeg

For large video compression and audio transcoding, the browser can do it in theory (WASM FFmpeg exists), but in practice it's often:

too slow on low-end devices,
too memory-heavy for big files,
and hard to make reliable across browsers.

So I run FFmpeg server-side for video/audio tasks, and focus on:

clear presets (e.g., quality vs size),
predictable outputs (e.g., MP4 H.264 as the default),
and privacy policies (auto-deletion).

Example UX pattern that worked well: give users 3-4 "compression modes" (High Quality / Balanced / Max Compress) instead of asking them to tune bitrate and CRF on day one. FastlyConvert's video compressor uses exactly this approach.

Server-Side: AI Transcription with Whisper (Audio/Video to Text)

For speech recognition, browser-only options still don't match Whisper's quality and language coverage at scale.

The key architectural decisions I made:

Process only what the user requests (no extra analysis).
Auto-delete uploaded files after a short retention window.
Keep the API simple, and return clean text + language metadata.

Pseudo-code sketch:

@app.post("/api/transcribe")
async def transcribe(file: UploadFile):
    temp_path = save_temp(file)

    # schedule deletion
    schedule_delete(temp_path, hours=24)

    result = whisper_model.transcribe(
        temp_path,
        task="transcribe",
        language=detect_language(temp_path)
    )

    return {"text": result["text"], "language": result["language"]}

You can test this with FastlyConvert's speech-to-text tool, which supports 90+ languages and exports to TXT, SRT, or VTT.

Server-Side: PDF Conversion — Why It Can't Be Client-Side (Yet)

PDF is not a "document" in the way most people think. It's closer to a printed page frozen in digital form — every character is placed at exact X/Y coordinates. There are no "paragraphs" or "tables" in the Word sense, just positioned text fragments and drawn lines.

Converting PDF to Word means reverse-engineering the visual layout back into a flow-based document model. That requires:

Table detection (recognizing grid patterns from drawn lines)
Font extraction or substitution
Column boundary detection
OCR for scanned pages

This is compute-heavy and requires libraries that don't run in browsers well. The tradeoff: server processing with the same auto-deletion policy.

i18n: Supporting 7 Languages Without a Framework

FastlyConvert is plain HTML + vanilla JS (no React/Next). For i18n, I used a simple attribute-based approach:

document.querySelectorAll("[data-i18n]").forEach((el) => {
  const key = el.getAttribute("data-i18n");
  el.textContent = translations[currentLang][key] || el.textContent;
});

This supports: English (en), French (fr), Japanese (ja), Spanish (es), Portuguese (pt), Simplified Chinese (zh-CN), Traditional Chinese (zh-TW).

No build step. No runtime framework. Just static pages + lightweight scripts.

Performance: Why No Build Step

Yes — it's "old school," but it's extremely effective for SEO-driven tool pages:

Users land directly on specific tools (e.g., /video-compressor, /text-to-speech)
Static files are edge-cached (Vercel/CDN)
No framework hydration overhead
Fewer moving parts = fewer deploy surprises

The tradeoff is duplication across many HTML pages, but for a conversion suite where speed + SEO matter most, it's a tradeoff I'm happy with.

Key Takeaways

Use Canvas API for common image conversions — no server needed.
WebAssembly (like heic2any) unlocks formats browsers can't decode natively.
For heavy tasks, a hybrid browser-first approach gives better UX than "all-server" or "all-WASM".
Auto-deletion policies are non-negotiable for file processing products.
Vanilla HTML/JS can still win on performance for tool-style websites.
PDF conversion is genuinely hard — the format gap between PDF and Word is much wider than most people realize.

Try It Out

See these patterns in production:

FastlyConvert — the full converter suite
PDF to Word — server-side conversion with formatting preservation
Image Converter — 100% browser-based, zero upload
AI Speech-to-Text — Whisper-powered transcription, 90+ languages
Video Compressor — FFmpeg presets with auto-deletion

All free to use, no account required.

I Built a File Converter That Never Uploads Your Images — Here's How