Changelog
[Unreleased]
- No changes yet
[0.6.1] - 2025-11-15
Fixed
- Corrected the IEEE 754 round-to-nearest-even examples in the Time Format Requirements so implementers no longer inherit incorrect rounding guidance.
- Made URI validation severity consistent across the General Validation section so RFC 3986 violations are always treated as ERROR-level issues.
Clarified
- Renamed repeated headings (e.g., Metadata Fields, Segment Time Fields, Speaker ID Format Specifications) and updated cross-references so every anchor target exists and navigation works across renderers.
- Added explicit subheadings for segment-level validation requirements to match existing summary links and improve intra-document references.
- Clarified the Optional vs. Mandatory Fields note to emphasize which optional fields may appear empty, aligned URI validation text with the ERROR-level requirement, detailed the default behavior for
word_timing_modeomissions, and spelled out how simultaneous segments must retain their provided order. - Noted repository tooling updates:
- Python CLI utilities now bootstrap their vendored dependencies automatically so local installs no longer depend on globally available
stjlib,srt, orwebvtt. - A vendored ISO 639 dataset backs the JavaScript validator, eliminating fragile runtime imports while preserving ISO 639-1/639-3 enforcement.
- Added a project-level
pytest.inito scope discovery to first-party tests and ignore the new vendor tree, ensuringpytestand CI runs remain stable.
[0.6.0] - 2024-10-27
Breaking Changes
- File Extensions:
- Changed primary recommended extension from
.stj.jsonto.stjson - Added
.stjand.stj.jsonas alternative supported extensions -
Applications should be updated to:
- Recognize all three extensions (
.stjson,.stj,.stj.json) - Use
.stjsonas default when creating new files - Continue supporting
.stj.jsonfor backward compatibility
- Recognize all three extensions (
-
Root Structure:
- Moved the
versionfield from themetadatasection to the root"stj"object -
Updated the root structure to:
json { "stj": { "version": "0.6.0", "metadata": { ... }, "transcript": { ... } } } -
Specified that no additional properties are allowed at the root level
-
Character Encoding Requirements:
- Changed UTF-8 Byte Order Mark (BOM) from optional to prohibited
- Files MUST be encoded in UTF-8 without a BOM
Changed
- Mandatory Fields:
- Clarified that
transcript.segments[].startandtranscript.segments[].endare optional fields that become mandatory when timing information is included - Made the
metadatasection optional -
Updated the "Mandatory vs. Optional Fields Summary" to reflect these changes
-
Handling of Anonymous Speakers:
- Specified that the
namefield MUST be omitted for anonymous speakers - Updated examples to remove the
"name": "Unknown"entries for anonymous speakers -
Ensured consistency in representing anonymous speakers throughout the document
-
Time Format Requirements:
- Specified the rounding rules for time values with more than 3 decimal places, using IEEE 754 round-to-nearest-even
- Updated examples to illustrate the rounding behavior and edge cases
- Emphasized that
is_zero_durationMUST be included whenstartequalsend, and MUST NOT be included otherwise -
Added detailed processing requirements for time values, including validation severity levels
-
Extensions Field Requirements:
- Corrected formatting errors and improved clarity regarding custom namespace guidelines
- Emphasized that applications MUST report an error if a reserved namespace is used
-
Provided clearer guidance on using prefixes like
"custom_"for provisional namespaces -
Validation Approach:
- Added explicit severity levels (ERROR, WARNING, INFO) for validation issues
- Defined specific validation requirements and their corresponding severity levels
- Added structured validation response format requirements
- Reordered validation steps for logical flow
- Added performance considerations for validation implementations
-
Required implementations to collect multiple validation issues when possible
-
Best Practices and Compliance:
- Removed any ordering requirements within JSON objects, as JSON objects are unordered collections
- Ensured that all examples and guidelines align with JSON standards and best practices
- Maintained consistency in terminology and formatting throughout the document
Fixed
- JSON Examples:
- Corrected all JSON examples to ensure validity
-
Removed comments within JSON code blocks, as they are not allowed in JSON syntax
-
Formatting Errors:
- Corrected typographical errors and improved overall formatting for better readability
-
Ensured consistent use of terminology and style throughout the document
-
Consistency Issues:
- Addressed inconsistencies regarding the usage of
is_zero_duration - Updated examples to reflect the correct usage of
is_zero_duration - Confirmed the consistent treatment of overlapping segments as ERROR level issues across all relevant sections
Added
- Empty Value Handling:
- Added explicit rules for handling empty arrays, objects, and strings
- Specified which fields may be empty and which must be omitted
-
Added validation requirements for empty value handling
-
Validation Response Format:
- Added structured validation response format requirements
- Specified required fields for validation responses (severity, path, code, message, etc.)
-
Added examples of proper validation response formatting
-
Clarification on
startandendFields: - Provided clear guidance on when
startandendfields are required -
Emphasized that they become mandatory when timing information is included
-
RFC 2119 Key Words:
- Added a section defining the usage of requirement level keywords (MUST, SHOULD, etc.) as per RFC 2119
- Ensured consistent use of these keywords throughout the document
Removed
- Ordering Constraints in JSON Objects:
-
Removed any statements imposing ordering on fields within JSON objects
-
Mandatory
metadataSection: - Removed the requirement for the
metadatasection to be mandatory - Updated the specification to reflect that
metadatais now optional
Clarified
- Usage of
extensionsField: - Provided clearer guidance on the usage of custom namespaces within the
extensionsfield -
Encouraged developers to use prefixes like
"custom_"to avoid conflicts with reserved namespaces -
Validation Requirements:
- Specified that implementations MUST perform validation in the sequence outlined to ensure consistency and completeness
- Clarified the severity levels for validation issues and the appropriate handling for each
- Added guidance on implementing error recovery strategies
-
Specified when recovery attempts are appropriate
-
Character Encoding Requirements:
- Clarified that the UTF-8 Byte Order Mark (BOM) MUST NOT be used
-
Provided guidance on proper handling of control characters and Unicode normalization
-
Time Value Processing:
- Clarified the processing and validation requirements for time values, including rounding rules and edge cases
- Added examples to illustrate proper handling of time values
[0.5.0] - 2024-10-24
Added
- Comprehensive character encoding requirements:
- Mandatory UTF-8 encoding with optional BOM
- String content validation rules
- Unicode normalization requirements
-
Character encoding validation requirements
-
Detailed Speaker ID Requirements:
- Format specifications for
speaker_id - Allowed characters, length constraints
- Uniqueness and case sensitivity rules
-
Guidelines for representing anonymous speakers
-
Comprehensive time format requirements and constraints:
- Specific implementation requirements for handling time values
- Time value precision and range requirements
- Consolidated time-related validation requirements
Changed
- Renamed
additional_infofields toextensionsthroughout the specification - Introduced namespaces within the
extensionsfield for structured custom data - Reserved specific namespaces (
stj,webvtt,ttml,ssa,srt,dfxp,smptett) for future official use - Updated style definitions to use namespaced
extensionsinstead ofx_prefixed properties - Changed zero-duration segment representation from
segment_durationinextensionsto a booleanis_zero_durationfield - Changed zero-duration word representation from
word_durationinextensionsto a booleanis_zero_durationfield
Clarified
- Default behavior of the
word_timing_modefield when omitted: - Treated as
"complete"when thewordsarray is present with complete coverage - Treated as
"none"when thewordsarray is absent - Invalid when the
wordsarray is present but incomplete (must explicitly specify"partial") - Added explicit validation rules for each
word_timing_modevalue - Standardized cross-references to the Time Format Requirements section throughout the document
[0.4.0] - 2024-10-23
Added
word_timing_modefield in segments to indicate the completeness of word-level timing data- Validation requirements for all parts of the JSON:
- Segments
- Words
- Speakers
- Styles
- Additional fields
Changed
- Clarified the relationship between segment-level text and word-level details, accounting for
word_timing_mode
Added
- Additional examples demonstrating the use of
word_timing_mode