Abstract Syntax Tree
What an abstract syntax tree is, why HL7v2 fits one cleanly, and the shape of the tree Glion's parser produces.
Abstract Syntax Tree (AST) is an important concept for understanding how Glion processes messages. Abstract Syntax Tree is a standard representation for structured data in programming languages, compilers, linters, and document processors.
Glion uses AST as the foundation for all message processing, from validation to transformation to output. Understanding the AST is key to writing effective plugins and working with Glion's processing model.
HL7v2 as a tree structure
HL7v2 is a hierarchical wire format with a fixed and finite depth. A message contains segments, segments contain fields, fields contain repetitions, repetitions contain components, and components contain sub-components. This hierarchy is fundamental to the meaning of the message: each level of the hierarchy has a specific semantic role, and the same string of characters can have different meanings depending on where it lives in the hierarchy. For instance, 123 in PID-3 (patient identifier list) means something very different from 123 in OBX-3 (observation identifier).
HL7v2 uses delimiters — \r, |, ~, ^, & — to encode this hierarchy, but the structure is implicit in the string. This is efficient for storing and transmitting messages, but it makes querying, understanding, and transforming them more complex.
In software engineering, the problem of working with structured data in a flat format is often solved by parsing the input into an Abstract Syntax Tree (AST). An AST is a hierarchical representation of the meaningful parts of the input, where each node represents a unit of meaning and its children represent the units one level inside it. The tree carries the same information as the source, but organized so that the structure is explicit and queryable.
Once a message is a tree, the operations a consumer cares about are tree walks. Querying for a field is a node lookup. Validating against a profile compares each node against the profile's rules. Translating to FHIR emits a resource per relevant node. The same tree supports all of them; a plugin written for one composes with plugins written for the others.
Abstract Syntax Tree
There are different ways to represent the same information as a tree, with different trade-offs. An AST is a particular kind of tree that is abstract in the sense that it omits certain details that are not relevant to the meaning of the message, and it focuses on representing the hierarchical structure and semantics of the message in a way that is easy to work with programmatically.
For instance, an AST for HL7v2 might omit the delimiters and the exact character offsets of each field, and instead focus on representing the segments, fields, repetitions, components, and sub-components as nodes in a tree with their values and positions in the hierarchy. This makes it easier to write plugins that operate on the meaning of the message, rather than on the details of the wire format.
Unified framework
unified is an open-source plugin ecosystem for working with syntax trees in JavaScript/Typescript. unified provides hundreds of packages to build parsers, transformers, compilers, linters, formatters.
unified is used by widely popular projects such as remark for Markdown and rehype for HTML. unified is widely adopted and battle-tested against real workloads.
A few things that makes unified a good fit for Glion:
- Every
unifiedtree obeys theunistnode specification: every node has atype, an optionalchildrenarray, and an optionalposition. - Several packages like
unist-util-visitwalk Glion's tree the same way it walks a Markdown tree. - Built-in virtual file format
vfilethat tracks metadata and diagnostics.
The choice to build Glion's AST on top of unified was a deliberate one, to leverage the ecosystem and the battle-tested design of unified and unist. This means that Glion's AST is not a custom design, but rather an application of a well-known and widely used standard for syntax trees in the JavaScript ecosystem.
Further reading
unistspecification — the AST contract Glion obeys.unist-util-visit— the walker most plugins use.- Message structure — the HL7v2 hierarchy this AST mirrors.