In today's interconnected digital landscape, data exchanges between diverse systems necessitate effective transformation mechanisms. Organizations frequently need to convert data between different formats to ensure interoperability and seamless information flow. The Data Format Description Language (DFDL) has emerged as a powerful standard for modeling and describing text and binary data formats in a standardized way. This capability is crucial for legacy systems integration, data migration, and modern API interfaces.
JSON (JavaScript Object Notation) has become the de facto standard for data exchange in web applications, cloud services, and APIs due to its simplicity, human readability, and widespread support across programming languages. Converting various document formats to JSON is therefore a common requirement in many integration scenarios.
While DFDL provides a robust framework for describing and parsing diverse data formats, several alternative standards and technologies have been developed specifically for JSON transformations. This paper explores these DFDL-like standards, comparing their features, capabilities, and suitability for different use cases.
DFDL is a modeling language for describing general text and binary data in a standard way. Published as an Open Grid Forum Recommendation in February 2021 and upgraded to an ISO standard in April 2024, DFDL enables powerful data interchange and high-performance data handling.
DFDL allows for the description of text, dense binary, and legacy data formats in a vendor-neutral declarative manner. Apache Daffodil, an implementation of the DFDL specification, provides tools that can describe and parse a wide variety of data, including self-descriptive data formats.
DFDL uses XML Schema to describe structured data. This approach leverages XML Schema to define the logical model of data while using DFDL annotations to describe the native (non-XML) format of the data. The result is a powerful mechanism for converting various data formats to a standardized representation.
DFDL provides several key capabilities:
- Description of both textual and binary data formats
- Support for legacy and modern data formats
- Handling of scientific, numeric, commercial, and industry-specific formats
- Vendor-neutral approach to data format description
DFDL converts text and binary data to a corresponding XML document by using XML Schema with special DFDL annotations. These annotations describe the native format of the data, enabling the transformation process.
While DFDL is powerful for describing and parsing diverse data formats, its primary output is XML. Converting to JSON requires additional transformation steps, typically using technologies like XSLT. This indirect approach can add complexity and processing overhead when JSON is the target format.
Several standards have been developed specifically for JSON-to-JSON transformations, offering more direct paths than DFDL for JSON output.
JSONata is a lightweight query and transformation language for JSON data. Inspired by the location path semantics of XPath 3.1, it allows sophisticated queries to be expressed in a compact and intuitive notation.
JSONata provides:
- Lightweight query and transformation capabilities
- Path-based navigation through JSON structures
- Built-in operators and functions for manipulating and combining data
- User-defined function creation
- Formatting of query results into any JSON output structure
JSONata uses an intuitive syntax similar to JavaScript dot notation for navigating JSON structures. For example:
Address.City // Simple path navigation
FirstName & ' ' & Surname // String concatenation
Phone[type = 'mobile'].number // Filtered selection
$sum(Order.Product.(Price * Quantity)) // Aggregate function
JSONata is particularly well-suited for:
- Transforming JSON data for API responses
- Extracting specific information from complex JSON structures
- Generating formatted reports from JSON data
- Implementing lightweight data processing pipelines
JSLT is a complete query and transformation language for JSON. The language design is inspired by jq, XPath, and XQuery.
JSLT can be used as:
- A query language to extract values from JSON (e.g., .foo.bar[0])
- A filter/check language to test JSON objects (e.g., starts-with(.foo.bar[0], "http://"))
- A transformation language to convert between JSON formats
JSLT supports various operations including:
- Context node access (.)
- Key access in objects (.)
- Array indexing (.[])
- Array slicing (.[:])
- Conditional expressions (if () else )
- Variable definitions (let = )
- Array transformations ([for () ])
- Object transformations ({for () : })
- Function declarations (def (, ...) )
An example transform in JSLT:
{
"time": round(parse-time(.published, "yyyy-MM-dd'T'HH:mm:ssX") * 1000),
"device_manufacturer": .device.manufacturer,
"device_model": .device.model,
"language": .device.acceptLanguage,
"os_name": .device.osType,
"os_version": .device.osVersion,
"platform": .device.platformType,
"user_properties": {
"is_logged_in" : boolean(.actor."spt:userId")
}
}
JSLT has been used in production at Schibsted since January 2018, performing about 9 billion transforms per day. The language has found applications in various domains and is integrated with platforms like Apache Camel, Apache NiFi, and IBM Cloud Pak for Business Automation.
JOLT is a JSON-to-JSON transformation library written in Java where the "specification" for the transform is itself a JSON document.
JOLT provides:
- A set of transforms that can be "chained" together to form the overall JSON-to-JSON transform
- Focus on transforming the structure of JSON data, not manipulating specific values
- Consumption and production of "hydrated" JSON (in-memory tree of Maps, Lists, Strings, etc.)
JOLT includes several stock transforms:
- Shift: copy data from the input tree and put it in the output tree
- Default: apply default values to the tree
- Remove: remove data from the tree
- Sort: sort the Map key values alphabetically (for debugging and human readability)
- Cardinality: "fix" the cardinality of input data
JOLT is particularly useful for:
- Transforming JSON data from ElasticSearch, MongoDB, Cassandra, etc., before sending it to the world
- Extracting data from large JSON documents for specific consumption
- Implementing structural transformations with custom Java code for data manipulation
JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD.
JSON-LD is designed around the concept of a "context" to provide additional mappings from JSON to an RDF model. The context links object properties in a JSON document to concepts in an ontology.
In order to map the JSON-LD syntax to RDF, JSON-LD allows values to be coerced to a specified type or to be tagged with a language. A context can be embedded directly in a JSON-LD document or put into a separate file and referenced from different documents.
A JSON-LD example using the FOAF (friend of a friend) ontology:
{
"@context": {
"name": "http://xmlns.com/foaf/0.1/name",
"homepage": {
"@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",
"@type": "@id"
},
"Person": "http://xmlns.com/foaf/0.1/Person"
},
"@id": "https://me.example.com",
"@type": "Person",
"name": "John Smith",
"homepage": "https://www.example.com/"
}
JSON-LD is used by:
- Schema.org
- Google Knowledge Graph
- Search engine optimization activities
- Biomedical informatics
- Representing provenance information
- ActivityPub, the federated social networking protocol
- Internet of Things (IoT) for describing network-facing interfaces
Some standards focus primarily on querying JSON data rather than transformation, though they can be combined with other tools to achieve transformation goals.
JSONPath is a query language for querying values in JSON. It was inspired by XML's XPath and was intended as a lightweight companion to JSON implementations in programming languages such as PHP and JavaScript.
JSONPath provides:
- A string syntax for selecting and extracting JSON values from within a given JSON value
- A way to navigate through complex JSON values to retrieve required data
- Path expressions written as strings (e.g., $.foo)
- Standardization via RFC 9535 (February 2024)
JSONPath is suitable for:
- Navigating through complex JSON values to retrieve required data
- Filtering and selecting specific elements within JSON structures
- Testing and validation of JSON data
- Integration with other tools for more complex transformations
JSONiq is a query language specifically designed for the JSON data model. It's based on lessons learned from relational query systems and semi-structured data spanning over 50 years.
JSONiq offers:
- Every language construct is an expression, fully composable
- Query capabilities similar to SQL (Project, Select, Filter, Join, Group, Order)
- Standardized and open approach based on W3C standards
- Compatibility with many JSON-like formats (text, CSV, Parquet)
- Ability to work with denormalized data across the entire normalization spectrum
- Suitability for data lakes and NoSQL databases
JSONiq syntax example:
let $stats := collection("stats")
for $access in $stats
group by $url := $access.url
return
{
"url": $url,
"avg": avg($access.response_time),
"hits": count($access)
}
JSONiq specifications are stable and maintained, with active support offered on StackOverflow. The language has multiple implementations including RumbleDB, Zorba, Xidel Engine, SirixDB, and several others.
Some transformation technologies are specifically designed for integration platforms, combining document transformation with broader integration capabilities.
DataWeave is a query and transformation language developed by MuleSoft, particularly designed for integration scenarios within the Mule runtime engine.
DataWeave provides:
- A language for reading and parsing data from one format, transforming it, and writing it out in a different format
- Support for various data formats, with JSON being a primary format
- Integration with MuleSoft's integration platform
- A script structure consisting of a header with directives and a body with expressions
- Extensive built-in functions for data manipulation
DataWeave is particularly well-suited for:
- Enterprise integration scenarios using the MuleSoft platform
- API transformations for request/response handling
- Converting between various data formats (JSON, XML, CSV, etc.)
- Complex data mapping and transformation within integration flows
Standard | Primary Purpose | Query Capabilities | Transformation Capabilities | Input Formats | Output Formats | Implementation Language | Standardization Status |
---|---|---|---|---|---|---|---|
DFDL | Data format description | Limited | Structural transformation | Text, Binary | XML | Java (Apache Daffodil) | ISO Standard (2024) |
JSONata | JSON query & transformation | Path-based | Rich with functions | JSON | JSON | JavaScript | De facto standard |
JSLT | JSON query & transformation | Path-based | Template-based | JSON | JSON | Java | Open specification |
JOLT | JSON transformation | Limited | Chain-based transforms | JSON | JSON | Java | Open specification |
JSON-LD | Linked data representation | Limited | Context-based | JSON | JSON-LD | Multiple | W3C Recommendation |
JSONPath | JSON query | Path-based | None (query only) | JSON | JSON subsets | Multiple | RFC 9535 (2024) |
JSONiq | JSON query | SQL-like | Template-based | Multiple | JSON | Multiple | Open specification |
DataWeave | Integration data transformation | Path-based | Template & function-based | Multiple | Multiple | MuleSoft-specific | Proprietary |
For simple JSON transformations with minimal restructuring:
- JSONPath is ideal for simple data extraction
- JSONata provides an intuitive approach with minimal syntax
- JSLT offers straightforward template-based transformation
For complex transformations involving deeply nested structures:
- JSONata excels with its rich function library
- JSLT provides powerful templating with conditionals
- JSONiq offers SQL-like capabilities for complex queries
- JOLT allows chaining of transformations for complex restructuring
For enterprise integration use cases:
- DataWeave is purpose-built for integration platforms
- JOLT integrates well with Java-based systems
- DFDL with additional transformations handles complex legacy formats
For semantic data representation and linked data:
- JSON-LD is specifically designed for linked data
- JSONiq supports complex data modeling with its type system
For big data transformation scenarios:
- JSONiq with implementations like RumbleDB is optimized for data lakes
- JSLT has proven performance at scale (billions of transforms per day)
The landscape of standards for document transformation to JSON offers diverse approaches tailored to different use cases. While DFDL provides a comprehensive framework for describing and parsing various data formats, it primarily targets XML as an output format, requiring additional transformation steps to produce JSON.
For direct JSON-to-JSON transformations, standards like JSONata, JSLT, and JOLT offer more streamlined approaches. These technologies focus specifically on JSON manipulation with syntax and features optimized for JSON structures. JSONPath and JSONiq provide powerful query capabilities that can be combined with transformation tools for more complex scenarios.
Integration-focused solutions like DataWeave embed transformation capabilities within broader integration platforms, offering unified approaches for enterprise scenarios.
The choice of standard depends on several factors:
- Input data complexity: For binary or complex legacy formats, DFDL may be required
- Integration requirements: Integration platforms like MuleSoft benefit from DataWeave
- Performance needs: JSLT has demonstrated high-performance capabilities at scale
- Semantic requirements: JSON-LD addresses linked data representation
- Developer experience: JSONata and JSONPath offer intuitive syntax for JSON developers
As data integration needs continue to evolve, these standards are likely to develop further, with increased standardization (as seen with JSONPath's recent RFC status) and enhanced capabilities to address emerging use cases.
The trend toward standardization indicates a maturing ecosystem for document transformation to JSON, providing organizations with reliable, well-supported options for their integration needs. As JSON solidifies its position as a primary data exchange format, these standards will play an increasingly critical role in ensuring seamless data interoperability across systems and organizations.
- Apache Daffodil. (2024). Data Format Description Language (DFDL) v1.0 Specification. https://daffodil.apache.org/docs/dfdl/
- JSONata.org. (2025). JSONata: JSON query and transformation language. https://jsonata.org/
- Schibsted GitHub. (n.d.). JSLT: JSON query and transformation language. https://github.com/schibsted/jslt
- Bazaarvoice GitHub. (n.d.). JOLT: JSON to JSON transformation library written in Java. https://github.com/bazaarvoice/jolt
- W3C. (n.d.). JSON-LD - JSON for Linking Data. https://json-ld.org/
- IETF. (2024). RFC 9535: JSONPath: Query Expressions for JSON. https://www.rfc-editor.org/rfc/rfc9535.html
- JSONiq.org. (n.d.). JSONiq: The JSON Query Language. https://www.jsoniq.org/
- MuleSoft Documentation. (n.d.). DataWeave Overview. https://docs.mulesoft.com/dataweave/latest/