XConvert
Downloads
Pricing

Convert TTML to VTT Online

Convert a TTML subtitle file to WebVTT (VTT) quickly with a simple, browser-based converter.

Input (TTML)
Output (VTT)

How to Convert TTML to VTT Online

  1. Upload Your TTML File: Drag and drop a .ttml or .xml file (Netflix delivers TTML1 with either extension) or click "Add Files" to pick from disk. Paste the XML directly into the textarea if you only have the markup. Batch is supported — drop in a full season's worth of episode subtitles and each track converts in parallel.
  2. Pick Output Extension: The target is fixed to .vtt (MIME type text/vtt). Conversion runs entirely in your browser session — no upload to a server, no account, no watermark, and the source TTML never leaves your device.
  3. Review the Generated Cues (Optional): Inspect the preview pane to confirm timings landed at millisecond precision (HH:MM:SS.mmm), line breaks survived the XML-to-text flattening, and any <br/>, <span>, or styling tags collapsed cleanly. Edit cues inline before downloading if you need to fix line wraps or correct a typo.
  4. Convert and Download: Click Convert. The output is a UTF-8 .vtt file with the required WEBVTT header, ready to drop into an HTML5 <track> element or upload to YouTube, Vimeo, Mux, JW Player, or any HLS/DASH packager.

Why Convert TTML to VTT?

TTML (Timed Text Markup Language) is the W3C XML-based subtitle standard used by professional broadcast and streaming workflows. Netflix mandates TTML1 delivery, Disney+ uses the IMSC 1.1 profile of TTML, and Amazon Prime Video accepts TTML variants. Browsers, however, don't render TTML natively — the HTML5 <track> element only accepts WebVTT (.vtt, MIME text/vtt), the W3C web-captioning format that started as a WHATWG draft in 2010. WebVTT now sits at roughly 96% global browser support per caniuse, including Chrome 23+, Firefox 31+, Safari 6+, Edge 12+, and every modern mobile browser. Converting TTML to VTT bridges the broadcast pipeline to the open web.

  • HTML5 video and the <track> element — Native browser captioning works only with .vtt. Drop the converted file into <track kind="subtitles" src="movie.vtt" srclang="en" label="English"> and every modern browser renders it without JavaScript.
  • Streaming packagers (HLS, DASH, CMAF) — mediaconvert, Bitmovin Encoder, Mux, and Wowza all consume WebVTT for in-manifest subtitle sidecars. HLS specifically requires WebVTT segments (#EXT-X-MEDIA TYPE=SUBTITLES).
  • YouTube and Vimeo uploads — Both platforms accept WebVTT directly; YouTube's "Add subtitles" upload pipeline recognises .vtt immediately while TTML often needs a workaround (rename to .dfxp or strip namespaces).
  • VOD and OTT to web — If you have Netflix-style TTML1 from a localisation vendor and need a web preview, sample player, or marketing reel, VTT is the only format the HTML5 player will load.
  • Removing XML overhead — A 30-minute TTML file with full styling and region metadata can be 200–400 KB; the equivalent WebVTT is typically 30–60 KB. Smaller payload, faster TTFB for caption tracks.
  • Editing in lightweight tools — VTT is plain text; you can hand-edit in VS Code, Sublime, or any text editor. TTML's nested XML is friendlier to validators than to humans.

TTML vs WebVTT — Format Comparison

Property TTML WebVTT
Standardiser W3C (Timed Text Working Group), TTML1 = Recommendation since 2010, TTML2 since 2018 W3C Candidate Recommendation Draft; originated at WHATWG 2010
File extension .ttml, .xml, .dfxp (older alias) .vtt
MIME type application/ttml+xml text/vtt
Underlying syntax XML — verbose, namespaced, schema-validated Plain text with WEBVTT header and cue blocks
Timing precision Frames, ticks, clock-time, fractional seconds; needs ttp:frameRate for frame-based Milliseconds only (HH:MM:SS.mmm) — hours optional
Styling Full inline + referenced <style> blocks; tts:color, tts:fontSize, region-based layout in %, em, cell, px CSS ::cue pseudo-element + inline tags (<b>, <i>, <c.class>, <v Speaker>)
Positioning Regions (top/bottom/custom) via tts:extent, tts:origin, tts:displayAlign line, position, align, size, vertical settings on each cue
Hierarchy Yes — <div> containers, timeContainer="seq", inherited styles Flat — every cue is independent
Voice/speaker labels <span tts:fontStyle="italic">Speaker:</span> ... convention First-class <v Speaker Name> voice tag
Browser support in <track> None Chrome 23+, Firefox 31+, Safari 6+, Edge 12+, iOS Safari, Android Chrome (~96% global per caniuse)
Primary users Netflix (TTML1), Disney+ (IMSC 1.1), Amazon Prime, broadcast Captions delivery HTML5 video players, HLS/DASH packagers, YouTube/Vimeo upload, web embeds

What Survives the Conversion — And What Doesn't

The W3C maintains an official TTML to WebVTT Mapping spec describing the conversion algorithm. Some TTML constructs map cleanly; others have no WebVTT equivalent and are dropped or approximated.

TTML feature Maps to WebVTT? Notes
<p> paragraphs and timing Yes — direct cue mapping Each <p> becomes one WebVTT cue with start/end times
<br/> line breaks Yes Preserved as literal newlines inside the cue
Text content + <span> flattening Yes Inherited styles collapse to inline tags
Bold / italic / underline Yes tts:fontWeight, tts:fontStyle, tts:textDecoration → <b>, <i>, <u>
Color and font-family Partial Round-trip via WebVTT STYLE block + ::cue(.cssClass) selectors
Frame-based or tick timing Yes — converted Tool reads ttp:frameRate and emits millisecond timestamps; rounding to the nearest 1 ms is normal
Region positioning (%) Approximate Maps to line and position cue settings; pixel/em region values are rebased to %
timeContainer="seq" Yes — flattened Cumulative child durations computed before emit
Sequential <div> containers Yes — flattened Ancestor styles propagated to each leaf <p>
xml:lang per-element Yes Pushed into wrapper spans where needed; document-level xml:lang becomes <track srclang="..."> metadata
Overlapping regions Lossy WebVTT applies automatic collision avoidance — intentional overlaps may shift
Ruby annotations Limited WebVTT has no <ruby> equivalent in the cue grammar; phonetic text is dropped or inlined as parentheses
opacity, padding, zIndex No Not representable in WebVTT — silently dropped
Multiple language tracks in one file No TTML allows multiplexed languages; WebVTT requires one file per language track

Frequently Asked Questions

Will my Netflix or Disney+ TTML file convert correctly?

Yes. Netflix delivers TTML1 with percentage-based positioning, tts:textAlign, and tts:displayAlign — all of which map to WebVTT cue settings (line, position, align). Disney+'s IMSC 1.1 profile is a constrained TTML subset that converts even more cleanly because it already restricts itself to web-renderable styling. Amazon Prime's TTML variants behave the same way. The conversion preserves timing to the millisecond and keeps line breaks, bold/italic, and per-speaker voice labels. Region overlaps and ruby annotations are the two features most likely to need a manual touch-up afterward.

Why does the WebVTT file start with WEBVTT?

The W3C WebVTT 1 specification requires the first line of any conforming .vtt file to be the literal string WEBVTT (optionally followed by a space or tab and an arbitrary description). HTML5 video players reject files without this header — they treat the file as invalid and refuse to display captions. Our converter writes the header automatically; if you ever edit a .vtt file by hand, leave that first line alone.

How does timing precision differ between the two formats?

TTML supports multiple timing dialects: clock-time (00:01:23.456), offset (83.456s), frames (00:01:23:11 with ttp:frameRate="24"), and ticks. WebVTT supports only HH:MM:SS.mmm (hours optional) at millisecond resolution. Frame-based TTML times are converted using the document's declared ttp:frameRate and ttp:frameRateMultiplier — drop-frame timecode is honoured. Sub-millisecond TTML precision is rounded to the nearest millisecond, which is below the threshold of human caption perception and the limit of the WebVTT grammar.

Can I keep speaker labels and styling?

Speaker labels in TTML are usually expressed as <span> prefixes ("JOHN: Where are we going?"). The converter detects common patterns and emits WebVTT <v Speaker> voice tags where it can, otherwise the speaker text remains inline. Bold, italic, and underline map to <b>, <i>, <u> cue tags. Colour and font-family are preserved via an emitted STYLE block at the top of the VTT file using ::cue(.classname) selectors — supported in Chrome, Firefox, Safari 10+, and Edge.

Do positioning and regions survive?

Most do. TTML region origin and extent in percentages are converted to WebVTT line and position cue settings. Region origins specified in pixels, ems, or cell units are first rebased to a percentage relative to the video viewport — this can shift a caption a few percent if the source assumed a non-standard viewport. The W3C mapping spec notes that WebVTT cue boxes also lack explicit height, so vertical extent is approximated by multiplying line count by the assumed line height. Vertical writing modes (Japanese, traditional Chinese) map to WebVTT's vertical:rl / vertical:lr cue settings.

Why is my converted VTT file so much smaller than the TTML source?

XML is verbose. A TTML document carries the XML declaration, multiple namespace prefixes (xmlns:tt, xmlns:tts, xmlns:ttp), a <head> with style and region definitions, and per-paragraph <span> styling — easily 5–10x the byte count of the equivalent WebVTT. WebVTT, by design, is a plain-text format with one cue per block separated by blank lines. Same captions, ~80% less file size, faster CDN delivery, and cheaper bandwidth.

Can I batch-convert an entire season of episodes?

Yes. Drop in any number of TTML files; each is converted independently in your browser. The output is one .vtt per input, named after the source file. There's no per-file count cap and the work is parallelised across CPU cores. Multi-language TTML deliveries (one file per language) round-trip directly — each becomes a separate .vtt you can wire into multiple <track> elements on the same <video>.

Will the converted VTT work on YouTube, Vimeo, and HLS streams?

Yes. YouTube's caption upload accepts WebVTT directly; pick "Add new subtitles or CC", choose "Upload a file", select "With timing", and upload the .vtt. Vimeo accepts WebVTT via the same upload UI in the "CC/Subtitles" tab on each video. For HLS, the WebVTT segments slot directly into a #EXT-X-MEDIA TYPE=SUBTITLES,GROUP-ID="subs" playlist — most packagers (Bitmovin, AWS MediaConvert, Mux, Bento4) consume the same .vtt we emit without modification.

Is the converter free, and does it work offline?

Yes to both. The conversion runs entirely in JavaScript in your browser — no upload, no account, no file-size cap beyond available device memory, no watermark, and no Pro tier gating output. Once the page is loaded you can disconnect from the network and it still works; the JavaScript is cached and the conversion is local.

Related Convert tools
Convert Ttml To SrtConvert Ttml To AssConvert Ttml To SsaConvert Ttml To SbvConvert Srt To VttConvert Ass To VttConvert Ssa To VttConvert Sbv To Vtt

Image Tools

Image CompressorCompress JPEGCompress PNGCompress GIFCompress WebPImage ConverterJPG ConverterImage Resizer

Video Tools

Video CompressorCompress MP4MP4 to GIFVideo to GIFVideo ConverterMP4 ConverterVideo Cutter

Audio Tools

Audio CompressorCompress MP3Compress WAVAudio ConverterMP3 ConverterFLAC to MP3Audio Cutter

Document Tools

Compress PDFMerge Images to PDFSplit PDFPDF to JPGUnzip FilesRAR Extractor
© 2026 XConvert.com. All Rights Reserved.
About UsPrivacy PolicyTerms of ServiceContactHelp Us Grow