Turn WebVTT (.vtt) subtitle files into TTML subtitles quickly with a simple, browser-based converter.
.vtt file or click "Add Files" to select. You can also paste WebVTT text directly. Batch upload is supported, and conversion runs in your browser — files never leave your device.<p> element in the resulting TTML is mapped from a WebVTT cue, with timing attributes (begin / end) and <span> styling derived from cue settings and class tags.WebVTT is the de facto subtitle format for HTML5 video and HLS web playback, but most broadcast and OTT delivery specs ask for TTML — an XML-based, profile-driven standard published by the W3C. Converting VTT to TTML is what turns a working web caption file into a deliverable asset for streaming services, archive workflows, and packaging pipelines that won't accept plain-text VTT.
.xml or .ttml extension; Japanese-language assets must conform to the Netflix IMSC 1.1 Text Profile. A VTT track captured from a web player won't pass QC.stpp codec (ISO/IEC 14496-30), versus wvtt for WebVTT. Many CMAF workflows expect TTML on input even when the player output is WebVTT.| Property | WebVTT (.vtt) | TTML (.ttml / .xml) |
|---|---|---|
| Structure | Plain text with WEBVTT header and cue blocks |
XML document with <tt> root and namespaces |
| Standard | W3C WebVTT (Candidate Recommendation Draft) | W3C TTML2 (REC, 2018); profiles include IMSC 1.2, EBU-TT-D, SMPTE-TT |
| Timecode | 00:00:00.000 --> 00:00:02.000 delimiter line |
begin="00:00:00.000" end="00:00:02.000" attributes |
| Styling | Inline cue settings + optional CSS via STYLE blocks |
Attribute styles, referenced <style> elements, regions |
| Positioning | Cue settings: position, line, size, align |
Dedicated <region> elements with percentage extents |
| Profiles | Single specification, no profiles | DFXP, SMPTE-TT, EBU-TT-D, SDP-US, CFF-TT, IMSC 1/1.1/1.2 |
| Primary use | HTML5 <track>, HLS WebVTT segments |
OTT delivery (Netflix, Disney+), broadcast captioning, archive |
| fMP4 codec | wvtt |
stpp (ISO/IEC 14496-30) |
| Profile | When to pick it | Notes |
|---|---|---|
| Plain TTML1 | Generic interchange, Netflix non-Japanese delivery | Default output; valid for most authoring tools |
| IMSC 1.1 Text Profile | Netflix Japanese delivery, modern OTT | Subset of TTML2; requires ttp:profile declaration |
| EBU-TT-D | European broadcasters (BBC, ARD, France TV), HbbTV | Distribution profile; mandates UTF-8 and specific region rules |
| SMPTE-TT | US broadcast captioning, FCC-compliant workflows | Adds SMPTE-specific metadata and image-based caption support |
| DFXP | Legacy Flash / Adobe / older packagers | Original TTML1 distribution profile; still accepted by many tools |
xconvert exports a generic TTML1 document. If your spec requires a stricter profile, open the file in a text editor and add the appropriate ttp:profile attribute and region definitions — the timing and text payload are profile-portable.
Basic timing and cue text always carry over. Inline cue settings (position, line, align) and class spans map to TTML attributes and <span> elements per the W3C TTML-WebVTT mapping, but TTML supports a richer set of style and region constructs than WebVTT does. Expect to add regions, fonts, and colour palettes inside your finishing tool if your delivery spec requires them — the converter gives you a structurally valid TTML skeleton to build on.
xconvert emits TTML1-compatible XML with the standard http://www.w3.org/ns/ttml namespace, which validates against both TTML1 and TTML2 parsers. It does not declare an IMSC, EBU-TT-D, or SMPTE-TT profile by default — you can add ttp:profile="http://www.w3.org/ns/ttml/profile/imsc1.1/text" (or the relevant profile URI) at the top of the document if your downstream tool requires it.
.xml or .ttml and not .vtt?Netflix's localization pipeline does QC, language tagging, and forced-narrative handling on TTML because the format carries explicit metadata and structured regions that WebVTT cannot represent unambiguously. The post-production spec page is explicit: "TTML1 with .xml or .ttml extension" for all subtitle and SDH files, with IMSC 1.1 required for Japanese.
Yes. xconvert runs the conversion entirely in your browser — uploaded VTT bytes never reach a server, and the page works after first load with the network disabled. For automation, the open-source sandflow/ttconv CLI is a good fit for CI pipelines and supports WebVTT, TTML, SRT, SCC, and STL.
WebVTT supports three cue kinds — subtitles, chapters, and metadata. Standard VTT→TTML mapping treats only subtitle cues as timed text; chapter and metadata cues are typically dropped because TTML has no direct equivalent. If you need chapters preserved, export them separately (for example as an MP4 chapter atom or a sidecar JSON) before converting.
A WebVTT file represents one language track; multi-language web playback uses one .vtt per language. TTML can carry multiple languages in a single document using xml:lang attributes on <div> or <p> elements, which is one reason it is preferred for live broadcast distribution where switchable language tracks are encoded together.
Soft line breaks inside cue text become <br/> elements in TTML. Positioning encoded in VTT cue settings (line:70%, position:50%, size:80%) maps to region or tts:origin / tts:extent style attributes. Exact pixel-accurate placement may need a manual pass in your authoring tool, because TTML positioning is profile-dependent — IMSC, EBU-TT-D, and SMPTE-TT define their own conventions.
Yes — use TTML to VTT for the reverse direction. The W3C mapping is lossy in both directions: TTML→VTT drops region styling and most attribute styles, while VTT→TTML produces a minimal skeleton without profile-specific regions. If you need to round-trip without loss, keep the original VTT alongside the TTML master.
xconvert also supports VTT to SRT for plain SubRip output, VTT to DFXP for the legacy TTML1 distribution profile, VTT to ASS for Aegisub-style advanced substation, and SRT to TTML if you're starting from a SubRip source.