Skip to content

Instantly share code, notes, and snippets.

@donbr
Created April 21, 2025 15:36
Show Gist options
  • Save donbr/fe6fe315d0bda8c6f8836657219733f1 to your computer and use it in GitHub Desktop.
Save donbr/fe6fe315d0bda8c6f8836657219733f1 to your computer and use it in GitHub Desktop.
Standards Similar to DFDL for Converting Documents to JSON

Standards Similar to DFDL for Converting Documents to JSON

1. Introduction

In today's interconnected digital landscape, data exchanges between diverse systems necessitate effective transformation mechanisms. Organizations frequently need to convert data between different formats to ensure interoperability and seamless information flow. The Data Format Description Language (DFDL) has emerged as a powerful standard for modeling and describing text and binary data formats in a standardized way. This capability is crucial for legacy systems integration, data migration, and modern API interfaces.

JSON (JavaScript Object Notation) has become the de facto standard for data exchange in web applications, cloud services, and APIs due to its simplicity, human readability, and widespread support across programming languages. Converting various document formats to JSON is therefore a common requirement in many integration scenarios.

While DFDL provides a robust framework for describing and parsing diverse data formats, several alternative standards and technologies have been developed specifically for JSON transformations. This paper explores these DFDL-like standards, comparing their features, capabilities, and suitability for different use cases.

2. Data Format Description Language (DFDL)

DFDL is a modeling language for describing general text and binary data in a standard way. Published as an Open Grid Forum Recommendation in February 2021 and upgraded to an ISO standard in April 2024, DFDL enables powerful data interchange and high-performance data handling.

DFDL allows for the description of text, dense binary, and legacy data formats in a vendor-neutral declarative manner. Apache Daffodil, an implementation of the DFDL specification, provides tools that can describe and parse a wide variety of data, including self-descriptive data formats.

DFDL uses XML Schema to describe structured data. This approach leverages XML Schema to define the logical model of data while using DFDL annotations to describe the native (non-XML) format of the data. The result is a powerful mechanism for converting various data formats to a standardized representation.

2.1 DFDL Key Features

DFDL provides several key capabilities:

  • Description of both textual and binary data formats
  • Support for legacy and modern data formats
  • Handling of scientific, numeric, commercial, and industry-specific formats
  • Vendor-neutral approach to data format description

DFDL converts text and binary data to a corresponding XML document by using XML Schema with special DFDL annotations. These annotations describe the native format of the data, enabling the transformation process.

2.2 Limitations for JSON Transformation

While DFDL is powerful for describing and parsing diverse data formats, its primary output is XML. Converting to JSON requires additional transformation steps, typically using technologies like XSLT. This indirect approach can add complexity and processing overhead when JSON is the target format.

3. JSON-Oriented Transformation Standards

Several standards have been developed specifically for JSON-to-JSON transformations, offering more direct paths than DFDL for JSON output.

3.1 JSONata

JSONata is a lightweight query and transformation language for JSON data. Inspired by the location path semantics of XPath 3.1, it allows sophisticated queries to be expressed in a compact and intuitive notation.

3.1.1 Key Features

JSONata provides:

  • Lightweight query and transformation capabilities
  • Path-based navigation through JSON structures
  • Built-in operators and functions for manipulating and combining data
  • User-defined function creation
  • Formatting of query results into any JSON output structure

3.1.2 Syntax and Examples

JSONata uses an intuitive syntax similar to JavaScript dot notation for navigating JSON structures. For example:

Address.City                    // Simple path navigation
FirstName & ' ' & Surname       // String concatenation
Phone[type = 'mobile'].number   // Filtered selection
$sum(Order.Product.(Price * Quantity))  // Aggregate function

3.1.3 Use Cases

JSONata is particularly well-suited for:

  • Transforming JSON data for API responses
  • Extracting specific information from complex JSON structures
  • Generating formatted reports from JSON data
  • Implementing lightweight data processing pipelines

3.2 JSLT (JSON query and transformation language)

JSLT is a complete query and transformation language for JSON. The language design is inspired by jq, XPath, and XQuery.

3.2.1 Key Features

JSLT can be used as:

  • A query language to extract values from JSON (e.g., .foo.bar[0])
  • A filter/check language to test JSON objects (e.g., starts-with(.foo.bar[0], "http://"))
  • A transformation language to convert between JSON formats

3.2.2 Syntax and Examples

JSLT supports various operations including:

  • Context node access (.)
  • Key access in objects (.)
  • Array indexing (.[])
  • Array slicing (.[:])
  • Conditional expressions (if () else )
  • Variable definitions (let = )
  • Array transformations ([for () ])
  • Object transformations ({for () : })
  • Function declarations (def (, ...) )

An example transform in JSLT:

{
  "time": round(parse-time(.published, "yyyy-MM-dd'T'HH:mm:ssX") * 1000),
  "device_manufacturer": .device.manufacturer,
  "device_model": .device.model,
  "language": .device.acceptLanguage,
  "os_name": .device.osType,
  "os_version": .device.osVersion,
  "platform": .device.platformType,
  "user_properties": {
    "is_logged_in" : boolean(.actor."spt:userId")
  }
}

3.2.3 Use Cases and Deployment

JSLT has been used in production at Schibsted since January 2018, performing about 9 billion transforms per day. The language has found applications in various domains and is integrated with platforms like Apache Camel, Apache NiFi, and IBM Cloud Pak for Business Automation.

3.3 JOLT (JSON Language for Transform)

JOLT is a JSON-to-JSON transformation library written in Java where the "specification" for the transform is itself a JSON document.

3.3.1 Key Features

JOLT provides:

  • A set of transforms that can be "chained" together to form the overall JSON-to-JSON transform
  • Focus on transforming the structure of JSON data, not manipulating specific values
  • Consumption and production of "hydrated" JSON (in-memory tree of Maps, Lists, Strings, etc.)

3.3.2 Stock Transforms

JOLT includes several stock transforms:

  • Shift: copy data from the input tree and put it in the output tree
  • Default: apply default values to the tree
  • Remove: remove data from the tree
  • Sort: sort the Map key values alphabetically (for debugging and human readability)
  • Cardinality: "fix" the cardinality of input data

3.3.3 Use Cases

JOLT is particularly useful for:

  • Transforming JSON data from ElasticSearch, MongoDB, Cassandra, etc., before sending it to the world
  • Extracting data from large JSON documents for specific consumption
  • Implementing structural transformations with custom Java code for data manipulation

3.4 JSON-LD (JSON for Linked Data)

JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD.

3.4.1 Key Features

JSON-LD is designed around the concept of a "context" to provide additional mappings from JSON to an RDF model. The context links object properties in a JSON document to concepts in an ontology.

In order to map the JSON-LD syntax to RDF, JSON-LD allows values to be coerced to a specified type or to be tagged with a language. A context can be embedded directly in a JSON-LD document or put into a separate file and referenced from different documents.

3.4.2 Syntax and Examples

A JSON-LD example using the FOAF (friend of a friend) ontology:

{
  "@context": {
    "name": "http://xmlns.com/foaf/0.1/name",
    "homepage": {
      "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",
      "@type": "@id"
    },
    "Person": "http://xmlns.com/foaf/0.1/Person"
  },
  "@id": "https://me.example.com",
  "@type": "Person",
  "name": "John Smith",
  "homepage": "https://www.example.com/"
}

3.4.3 Use Cases

JSON-LD is used by:

  • Schema.org
  • Google Knowledge Graph
  • Search engine optimization activities
  • Biomedical informatics
  • Representing provenance information
  • ActivityPub, the federated social networking protocol
  • Internet of Things (IoT) for describing network-facing interfaces

4. Query-Focused Standards

Some standards focus primarily on querying JSON data rather than transformation, though they can be combined with other tools to achieve transformation goals.

4.1 JSONPath

JSONPath is a query language for querying values in JSON. It was inspired by XML's XPath and was intended as a lightweight companion to JSON implementations in programming languages such as PHP and JavaScript.

4.1.1 Key Features

JSONPath provides:

  • A string syntax for selecting and extracting JSON values from within a given JSON value
  • A way to navigate through complex JSON values to retrieve required data
  • Path expressions written as strings (e.g., $.foo)
  • Standardization via RFC 9535 (February 2024)

4.1.2 Use Cases

JSONPath is suitable for:

  • Navigating through complex JSON values to retrieve required data
  • Filtering and selecting specific elements within JSON structures
  • Testing and validation of JSON data
  • Integration with other tools for more complex transformations

4.2 JSONiq

JSONiq is a query language specifically designed for the JSON data model. It's based on lessons learned from relational query systems and semi-structured data spanning over 50 years.

4.2.1 Key Features

JSONiq offers:

  • Every language construct is an expression, fully composable
  • Query capabilities similar to SQL (Project, Select, Filter, Join, Group, Order)
  • Standardized and open approach based on W3C standards
  • Compatibility with many JSON-like formats (text, CSV, Parquet)
  • Ability to work with denormalized data across the entire normalization spectrum
  • Suitability for data lakes and NoSQL databases

4.2.2 Syntax and Examples

JSONiq syntax example:

let $stats := collection("stats")
for $access in $stats
group by $url := $access.url
return
{
  "url": $url,
  "avg": avg($access.response_time),
  "hits": count($access)
}

4.2.3 Status and Implementations

JSONiq specifications are stable and maintained, with active support offered on StackOverflow. The language has multiple implementations including RumbleDB, Zorba, Xidel Engine, SirixDB, and several others.

5. Integration-Focused Solutions

Some transformation technologies are specifically designed for integration platforms, combining document transformation with broader integration capabilities.

5.1 DataWeave (MuleSoft)

DataWeave is a query and transformation language developed by MuleSoft, particularly designed for integration scenarios within the Mule runtime engine.

5.1.1 Key Features

DataWeave provides:

  • A language for reading and parsing data from one format, transforming it, and writing it out in a different format
  • Support for various data formats, with JSON being a primary format
  • Integration with MuleSoft's integration platform
  • A script structure consisting of a header with directives and a body with expressions
  • Extensive built-in functions for data manipulation

5.1.2 Use Cases

DataWeave is particularly well-suited for:

  • Enterprise integration scenarios using the MuleSoft platform
  • API transformations for request/response handling
  • Converting between various data formats (JSON, XML, CSV, etc.)
  • Complex data mapping and transformation within integration flows

6. Comparative Analysis

6.1 Feature Comparison

Standard Primary Purpose Query Capabilities Transformation Capabilities Input Formats Output Formats Implementation Language Standardization Status
DFDL Data format description Limited Structural transformation Text, Binary XML Java (Apache Daffodil) ISO Standard (2024)
JSONata JSON query & transformation Path-based Rich with functions JSON JSON JavaScript De facto standard
JSLT JSON query & transformation Path-based Template-based JSON JSON Java Open specification
JOLT JSON transformation Limited Chain-based transforms JSON JSON Java Open specification
JSON-LD Linked data representation Limited Context-based JSON JSON-LD Multiple W3C Recommendation
JSONPath JSON query Path-based None (query only) JSON JSON subsets Multiple RFC 9535 (2024)
JSONiq JSON query SQL-like Template-based Multiple JSON Multiple Open specification
DataWeave Integration data transformation Path-based Template & function-based Multiple Multiple MuleSoft-specific Proprietary

6.2 Use Case Suitability

6.2.1 Simple Transformations

For simple JSON transformations with minimal restructuring:

  • JSONPath is ideal for simple data extraction
  • JSONata provides an intuitive approach with minimal syntax
  • JSLT offers straightforward template-based transformation

6.2.2 Complex, Nested Transformations

For complex transformations involving deeply nested structures:

  • JSONata excels with its rich function library
  • JSLT provides powerful templating with conditionals
  • JSONiq offers SQL-like capabilities for complex queries
  • JOLT allows chaining of transformations for complex restructuring

6.2.3 Integration Scenarios

For enterprise integration use cases:

  • DataWeave is purpose-built for integration platforms
  • JOLT integrates well with Java-based systems
  • DFDL with additional transformations handles complex legacy formats

6.2.4 Data Modeling and Semantics

For semantic data representation and linked data:

  • JSON-LD is specifically designed for linked data
  • JSONiq supports complex data modeling with its type system

6.2.5 Big Data Processing

For big data transformation scenarios:

  • JSONiq with implementations like RumbleDB is optimized for data lakes
  • JSLT has proven performance at scale (billions of transforms per day)

7. Conclusion

The landscape of standards for document transformation to JSON offers diverse approaches tailored to different use cases. While DFDL provides a comprehensive framework for describing and parsing various data formats, it primarily targets XML as an output format, requiring additional transformation steps to produce JSON.

For direct JSON-to-JSON transformations, standards like JSONata, JSLT, and JOLT offer more streamlined approaches. These technologies focus specifically on JSON manipulation with syntax and features optimized for JSON structures. JSONPath and JSONiq provide powerful query capabilities that can be combined with transformation tools for more complex scenarios.

Integration-focused solutions like DataWeave embed transformation capabilities within broader integration platforms, offering unified approaches for enterprise scenarios.

The choice of standard depends on several factors:

  • Input data complexity: For binary or complex legacy formats, DFDL may be required
  • Integration requirements: Integration platforms like MuleSoft benefit from DataWeave
  • Performance needs: JSLT has demonstrated high-performance capabilities at scale
  • Semantic requirements: JSON-LD addresses linked data representation
  • Developer experience: JSONata and JSONPath offer intuitive syntax for JSON developers

As data integration needs continue to evolve, these standards are likely to develop further, with increased standardization (as seen with JSONPath's recent RFC status) and enhanced capabilities to address emerging use cases.

The trend toward standardization indicates a maturing ecosystem for document transformation to JSON, providing organizations with reliable, well-supported options for their integration needs. As JSON solidifies its position as a primary data exchange format, these standards will play an increasingly critical role in ensuring seamless data interoperability across systems and organizations.

8. References

  1. Apache Daffodil. (2024). Data Format Description Language (DFDL) v1.0 Specification. https://daffodil.apache.org/docs/dfdl/
  2. JSONata.org. (2025). JSONata: JSON query and transformation language. https://jsonata.org/
  3. Schibsted GitHub. (n.d.). JSLT: JSON query and transformation language. https://github.com/schibsted/jslt
  4. Bazaarvoice GitHub. (n.d.). JOLT: JSON to JSON transformation library written in Java. https://github.com/bazaarvoice/jolt
  5. W3C. (n.d.). JSON-LD - JSON for Linking Data. https://json-ld.org/
  6. IETF. (2024). RFC 9535: JSONPath: Query Expressions for JSON. https://www.rfc-editor.org/rfc/rfc9535.html
  7. JSONiq.org. (n.d.). JSONiq: The JSON Query Language. https://www.jsoniq.org/
  8. MuleSoft Documentation. (n.d.). DataWeave Overview. https://docs.mulesoft.com/dataweave/latest/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment