Convert a TTML subtitle file to WebVTT (VTT) quickly with a simple, browser-based converter.
.ttml or .xml file (Netflix delivers TTML1 with either extension) or click "Add Files" to pick from disk. Paste the XML directly into the textarea if you only have the markup. Batch is supported — drop in a full season's worth of episode subtitles and each track converts in parallel..vtt (MIME type text/vtt). Conversion runs entirely in your browser session — no upload to a server, no account, no watermark, and the source TTML never leaves your device.HH:MM:SS.mmm), line breaks survived the XML-to-text flattening, and any <br/>, <span>, or styling tags collapsed cleanly. Edit cues inline before downloading if you need to fix line wraps or correct a typo..vtt file with the required WEBVTT header, ready to drop into an HTML5 <track> element or upload to YouTube, Vimeo, Mux, JW Player, or any HLS/DASH packager.TTML (Timed Text Markup Language) is the W3C XML-based subtitle standard used by professional broadcast and streaming workflows. Netflix mandates TTML1 delivery, Disney+ uses the IMSC 1.1 profile of TTML, and Amazon Prime Video accepts TTML variants. Browsers, however, don't render TTML natively — the HTML5 <track> element only accepts WebVTT (.vtt, MIME text/vtt), the W3C web-captioning format that started as a WHATWG draft in 2010. WebVTT now sits at roughly 96% global browser support per caniuse, including Chrome 23+, Firefox 31+, Safari 6+, Edge 12+, and every modern mobile browser. Converting TTML to VTT bridges the broadcast pipeline to the open web.
<track> element — Native browser captioning works only with .vtt. Drop the converted file into <track kind="subtitles" src="movie.vtt" srclang="en" label="English"> and every modern browser renders it without JavaScript.mediaconvert, Bitmovin Encoder, Mux, and Wowza all consume WebVTT for in-manifest subtitle sidecars. HLS specifically requires WebVTT segments (#EXT-X-MEDIA TYPE=SUBTITLES)..vtt immediately while TTML often needs a workaround (rename to .dfxp or strip namespaces).| Property | TTML | WebVTT |
|---|---|---|
| Standardiser | W3C (Timed Text Working Group), TTML1 = Recommendation since 2010, TTML2 since 2018 | W3C Candidate Recommendation Draft; originated at WHATWG 2010 |
| File extension | .ttml, .xml, .dfxp (older alias) |
.vtt |
| MIME type | application/ttml+xml |
text/vtt |
| Underlying syntax | XML — verbose, namespaced, schema-validated | Plain text with WEBVTT header and cue blocks |
| Timing precision | Frames, ticks, clock-time, fractional seconds; needs ttp:frameRate for frame-based |
Milliseconds only (HH:MM:SS.mmm) — hours optional |
| Styling | Full inline + referenced <style> blocks; tts:color, tts:fontSize, region-based layout in %, em, cell, px |
CSS ::cue pseudo-element + inline tags (<b>, <i>, <c.class>, <v Speaker>) |
| Positioning | Regions (top/bottom/custom) via tts:extent, tts:origin, tts:displayAlign |
line, position, align, size, vertical settings on each cue |
| Hierarchy | Yes — <div> containers, timeContainer="seq", inherited styles |
Flat — every cue is independent |
| Voice/speaker labels | <span tts:fontStyle="italic">Speaker:</span> ... convention |
First-class <v Speaker Name> voice tag |
Browser support in <track> |
None | Chrome 23+, Firefox 31+, Safari 6+, Edge 12+, iOS Safari, Android Chrome (~96% global per caniuse) |
| Primary users | Netflix (TTML1), Disney+ (IMSC 1.1), Amazon Prime, broadcast Captions delivery | HTML5 video players, HLS/DASH packagers, YouTube/Vimeo upload, web embeds |
The W3C maintains an official TTML to WebVTT Mapping spec describing the conversion algorithm. Some TTML constructs map cleanly; others have no WebVTT equivalent and are dropped or approximated.
| TTML feature | Maps to WebVTT? | Notes |
|---|---|---|
<p> paragraphs and timing |
Yes — direct cue mapping | Each <p> becomes one WebVTT cue with start/end times |
<br/> line breaks |
Yes | Preserved as literal newlines inside the cue |
Text content + <span> flattening |
Yes | Inherited styles collapse to inline tags |
| Bold / italic / underline | Yes | tts:fontWeight, tts:fontStyle, tts:textDecoration → <b>, <i>, <u> |
| Color and font-family | Partial | Round-trip via WebVTT STYLE block + ::cue(.cssClass) selectors |
| Frame-based or tick timing | Yes — converted | Tool reads ttp:frameRate and emits millisecond timestamps; rounding to the nearest 1 ms is normal |
Region positioning (%) |
Approximate | Maps to line and position cue settings; pixel/em region values are rebased to % |
timeContainer="seq" |
Yes — flattened | Cumulative child durations computed before emit |
Sequential <div> containers |
Yes — flattened | Ancestor styles propagated to each leaf <p> |
xml:lang per-element |
Yes | Pushed into wrapper spans where needed; document-level xml:lang becomes <track srclang="..."> metadata |
| Overlapping regions | Lossy | WebVTT applies automatic collision avoidance — intentional overlaps may shift |
| Ruby annotations | Limited | WebVTT has no <ruby> equivalent in the cue grammar; phonetic text is dropped or inlined as parentheses |
opacity, padding, zIndex |
No | Not representable in WebVTT — silently dropped |
| Multiple language tracks in one file | No | TTML allows multiplexed languages; WebVTT requires one file per language track |
Yes. Netflix delivers TTML1 with percentage-based positioning, tts:textAlign, and tts:displayAlign — all of which map to WebVTT cue settings (line, position, align). Disney+'s IMSC 1.1 profile is a constrained TTML subset that converts even more cleanly because it already restricts itself to web-renderable styling. Amazon Prime's TTML variants behave the same way. The conversion preserves timing to the millisecond and keeps line breaks, bold/italic, and per-speaker voice labels. Region overlaps and ruby annotations are the two features most likely to need a manual touch-up afterward.
WEBVTT?The W3C WebVTT 1 specification requires the first line of any conforming .vtt file to be the literal string WEBVTT (optionally followed by a space or tab and an arbitrary description). HTML5 video players reject files without this header — they treat the file as invalid and refuse to display captions. Our converter writes the header automatically; if you ever edit a .vtt file by hand, leave that first line alone.
TTML supports multiple timing dialects: clock-time (00:01:23.456), offset (83.456s), frames (00:01:23:11 with ttp:frameRate="24"), and ticks. WebVTT supports only HH:MM:SS.mmm (hours optional) at millisecond resolution. Frame-based TTML times are converted using the document's declared ttp:frameRate and ttp:frameRateMultiplier — drop-frame timecode is honoured. Sub-millisecond TTML precision is rounded to the nearest millisecond, which is below the threshold of human caption perception and the limit of the WebVTT grammar.
Speaker labels in TTML are usually expressed as <span> prefixes ("JOHN: Where are we going?"). The converter detects common patterns and emits WebVTT <v Speaker> voice tags where it can, otherwise the speaker text remains inline. Bold, italic, and underline map to <b>, <i>, <u> cue tags. Colour and font-family are preserved via an emitted STYLE block at the top of the VTT file using ::cue(.classname) selectors — supported in Chrome, Firefox, Safari 10+, and Edge.
Most do. TTML region origin and extent in percentages are converted to WebVTT line and position cue settings. Region origins specified in pixels, ems, or cell units are first rebased to a percentage relative to the video viewport — this can shift a caption a few percent if the source assumed a non-standard viewport. The W3C mapping spec notes that WebVTT cue boxes also lack explicit height, so vertical extent is approximated by multiplying line count by the assumed line height. Vertical writing modes (Japanese, traditional Chinese) map to WebVTT's vertical:rl / vertical:lr cue settings.
XML is verbose. A TTML document carries the XML declaration, multiple namespace prefixes (xmlns:tt, xmlns:tts, xmlns:ttp), a <head> with style and region definitions, and per-paragraph <span> styling — easily 5–10x the byte count of the equivalent WebVTT. WebVTT, by design, is a plain-text format with one cue per block separated by blank lines. Same captions, ~80% less file size, faster CDN delivery, and cheaper bandwidth.
Yes. Drop in any number of TTML files; each is converted independently in your browser. The output is one .vtt per input, named after the source file. There's no per-file count cap and the work is parallelised across CPU cores. Multi-language TTML deliveries (one file per language) round-trip directly — each becomes a separate .vtt you can wire into multiple <track> elements on the same <video>.
Yes. YouTube's caption upload accepts WebVTT directly; pick "Add new subtitles or CC", choose "Upload a file", select "With timing", and upload the .vtt. Vimeo accepts WebVTT via the same upload UI in the "CC/Subtitles" tab on each video. For HLS, the WebVTT segments slot directly into a #EXT-X-MEDIA TYPE=SUBTITLES,GROUP-ID="subs" playlist — most packagers (Bitmovin, AWS MediaConvert, Mux, Bento4) consume the same .vtt we emit without modification.
Yes to both. The conversion runs entirely in JavaScript in your browser — no upload, no account, no file-size cap beyond available device memory, no watermark, and no Pro tier gating output. Once the page is loaded you can disconnect from the network and it still works; the JavaScript is cached and the conversion is local.