ASN.1 Utilities

GitLab Pipeline Coverage License
Utility for reading and writing ASN.1 based content, including handling BER and DER.

The Ups and Downs of ASN.1

Pros

  • Encoding independence. The ASN.1 IDL (schema) is entirely decoupled from the wire format. A single schema can be used with BER, DER, PER, XER, JER, or OER without any changes to the type definitions. This makes it uniquely adaptable: the same protocol definition can target a compact binary transport and a human-readable XML or JSON transport simultaneously.

  • Highly standardized. ASN.1 is defined by ITU-T X.680–X.696 and the corresponding ISO 8824/8825 series. Every encoding rule (BER, DER, PER, …) is itself a formal standard, not a convention or implementation detail. This rigour is why ASN.1 is the foundation of X.509 certificates, PKCS/CMS cryptographic messages, SNMP, LDAP, Kerberos, and most of the LTE/5G air interface protocols — domains where interoperability is non-negotiable and correctness is safety-critical.

  • Rich, formal type system with built-in constraints. The IDL supports value range constraints (INTEGER (1..255)), size constraints (UTF8String (SIZE (1..64))), alphabet constraints (FROM ("A".."Z")), component constraints (WITH COMPONENTS), and more. Constraints are part of the schema, not application logic, so a conformant codec can validate them automatically.

  • Deterministic canonical encoding. DER (Distinguished Encoding Rules) is a strict subset of BER that produces exactly one valid byte sequence for any value. This property is essential for cryptographic use cases (signature verification, certificate fingerprinting) and for protocol interoperability where two independent implementations must produce identical output.

  • Forward and backward compatibility. Extension markers (...) allow new fields to be added to SEQUENCE and SET types without breaking existing decoders. A decoder that does not know about an extension field simply ignores it — the protocol evolves gracefully over decades, as seen in cellular standards that have been extended continuously since the 1980s.

  • Self-describing in BER. The BER Type-Length-Value encoding embeds the tag of every value on the wire. A generic BER decoder can walk any encoded message and display its structure without the schema — useful for diagnostics and protocol analysis tools.

  • Information Object Classes. The class mechanism (X.681) allows strongly typed, parameterised table-driven protocols: a SEQUENCE field can be constrained to carry exactly the type identified by a companion OID field, enforcing relationships between fields at the schema level rather than at runtime.

Cons

  • The ASN.1 IDL syntax is very complicated. The grammar has evolved over 40 years of ITU/ISO standards and accumulated significant complexity: multiple ways to write the same construct, context-sensitive parsing rules, and corner cases that trip up even experienced implementors. Reading a real-world .asn1 file (e.g. from a 3GPP spec) requires considerable domain knowledge.

  • Very limited OSS tooling. Unlike protobuf or JSON, the ASN.1 ecosystem is dominated by expensive commercial compilers. Open-source alternatives for IDL parsing, BER/DER codecs, and code generation are sparse and often incomplete. This library exists in part to address that gap for the JVM.

  • Non-human-readable binary encodings. BER and DER bytes are opaque without a decoder. While XER and JER exist, they are rarely used in practice, so debugging live traffic requires specialised tooling rather than a plain text editor or curl.

  • Steep learning curve. The combination of the complex IDL, multiple encoding rules with subtle differences, and sparse documentation means onboarding a new developer is significantly harder than for protobuf or JSON-based systems.

  • Niche community. ASN.1 expertise is concentrated in telecom, cryptography, and defense. General-purpose web and backend development communities have almost no exposure to it, making recruitment and knowledge transfer harder.

Comparison to alternatives

vs. protobuf (and gRPC)

Protobuf is the most direct competitor: both are binary, IDL-first, and designed for compact, efficient encoding.

Aspect ASN.1 (DER/BER) Protobuf
Standardisation ITU-T/ISO formal standard Google-maintained, de-facto
Schema language Rich, complex; constraints, classes, multiple encoding rules Simple, approachable; field numbers, oneof, maps
Encoding options BER, DER, PER, XER, JER, OER (all standardised) Binary wire format + JSON mapping
Canonical encoding DER is fully deterministic — required for crypto Not canonical by default; field order is unspecified
Compactness PER is among the most compact binary formats that exist Compact, but not as tunable as Aligned/Unaligned PER
Tooling Sparse OSS ecosystem; commercial tools dominate Excellent: protoc, plugins for every major language
Constraints First-class schema-level (size, value range, alphabet) None in schema; must validate in application code
Extensibility Extension markers (...) with formal semantics New fields with new field numbers; unknown fields forwarded
Adoption Telecom, PKI, SNMP, LDAP, Kerberos Microservices, gRPC, storage formats (BigQuery, etc.)

Bottom line: if you are designing a new service inside a modern infrastructure, protobuf is almost always the better practical choice — the tooling, community, and simplicity win. ASN.1 is the right answer when the protocol must be formally standardised, interoperate with existing telecom or security infrastructure, or requires canonical binary encoding (e.g. for signing).

vs. JSON (e.g. Jackson)

JSON is the de-facto standard for web APIs and configuration; Jackson is the dominant Java library for working with it.

Aspect ASN.1 (DER/BER) JSON / Jackson
Human readability Not readable without a decoder Fully human-readable
Compactness Highly compact (binary TLV or PER) Verbose; field names repeated in every message
Schema Formal IDL with enforced constraints Optional (JSON Schema); rarely enforced at codec level
Type fidelity Full: distinct integer sizes, binary data, OIDs, dates Weak: no integers vs. floats, no native binary, no date type
Number precision Arbitrary-precision integers, exact decimals IEEE 754 doubles; large integers lose precision
Binary data Native OCTET STRING and BIT STRING types Requires Base64 encoding; 33 % size overhead
Tooling / ecosystem Sparse Ubiquitous — every language has multiple mature libraries
Developer experience High barrier; specialist knowledge required Minimal barrier; every web developer knows JSON
Interoperability Guaranteed by formal standards Loose; behaviour differences across parsers (number limits, key order, duplicate keys)
Streaming / partial decode BER supports indefinite-length encoding Requires streaming JSON parsers; less standardised

Bottom line: JSON/Jackson wins on developer experience, ecosystem breadth, and debuggability for everything that lives near a browser or HTTP API. ASN.1 wins on encoding efficiency, type safety, and formal correctness for bandwidth-constrained or security-critical protocols. For a JVM microservice exchanging data over HTTP, JSON is the pragmatic default; for an IoT device sending telemetry over a constrained radio link, ASN.1 PER is often worth the complexity.

The ASN.1 Syntax

There is a good source of material for figuring out the ASN.1 DLS syntax, where I have among others used access to various sources of ASN.1 files on the internet, and publicly available books.

First a number of terms and what is meant by them:

  • identifier: Identifiers are a string of [a-zA-Z][a-zA-Z0-9]*(-[a-zA-z0-9]+)*.
  • Type-Name: Types are identified by an identifier that starts with an upper-case letter, and indicates the name of a type, either specified throughout the ASN.1 IDL, or from the UNIVERSAL namespace. Note that in the UNIVERSAL type namespace, there are a number of types that contains a space in it, like OCTET STRING, BIT STRING or OBJECT IDENTIFIER. Only these universally known types may contain the space character as part of the name.
  • value-name: Values (non-type fields and identifiers) starts with a a lower-case letter, and identified a name pointing to a value with a type.
  • SubType: is a modification of a type, usually enclosed in (...), but sometimes not, exceptions will be mentioned later.
  • String literals may be:
    • "Double quoted for unicode string"
    • '0100'B for encoding bit string, with single-quotes.
    • '917421436587'H for hexadecimal encoded octet strings, with single-quites.
    • foo or {'foo' 9 'bar'} using single quotes, or bracket enclosed sequence of single-quote strings and decimal code points for ASCII encoded strings (aka IA5 in ASN.1 spec), where the code-points can represent non-printable characters.
  • Integer numbers are of the form -?[1-9][0-9]*, and can represent all
  • It is not allowed to represent decimal form numbers (with a dot), but are defined

Each ASN.1 file is referenced as a Module.

Module-Name { module object id }
DEFINITIONS
-- add file options here
-- ( AUTOMATIC | IMPLICIT | EXPLICIT )? TAGS
BEGIN

-- put stuff here

END

Note that the ID part is not mandatory, but if it is missing, this ASN.1 spec may not be imported into other ASN.1 spec files. If EXPORTS is specified, then ONLY those types and values may be referenced outside this module, though if no EXPORTS are present, then ALL types and values may be referenced outside the module.

EXPORTS Type-Name, Type-Name2;

For any type or value not defined in the module itself, or in the UNIVERSAL type namespace only imported IDs may be used in this module.

IMPORTS
   Type-Name, value-name FROM Other-File-Spec { id }
   Type-Name2 FROM Third-File-Spec { id };

Note that the ID part is mandatory when importing.

Type-Name ::= `Type-Definition`
Class-Name ::= CLASS `Class-Specification`
Type-Set `Type-Specification` ::= { v1 | v2 }

value-name ::= `Type-Specification` `value`
value-name `Type-Specification` ::= `value`
value-name `Type-Specification` ::= { `value specification` }

Where I use the term Type definition to be how to define a type, while the term Type specification is a reference to a type with optional sub-typing. The latter one have less flexibility when it comes to what it can specify, but both of them can take a type and modify them before using it in its relative position.

And yes, for some reason both ways of defining values are allowed (:facepalm:).

Classes

Class-Name ::= CLASS {
  &Class-Type,
  &class-value Value-Type
} WITH SYNTAX {
  [`optional string containing one or more fields from above`]
  `string containing fields from above`
  `in total all fields must be referenced exactly once`
}
class-Value Class-Name ::= {
  SYNTAX WORD value-name
  MORE WORDS Type-Name
  EVEN MORE WORDS 42
}
Class-Object-Set Class-Name ::= { class-Value | class-Value-2, ... }

-- AFAIK only SEQUENCE types may be templated in this way.
Templated-Type ::= SEQUENCE {
  identifier  Class-Name.&class-value({Class-Object-Set}),
  argument    Class-Name.&Class-Type({Class-Object-Set}{@identifier})
}
  • TODO: Figure out the ANY DEFINED BY type. It looks like the only reasonable way to represent is is to have a TLV of unknown type, and let the implementers connect the dots.

And yes, both ways

Constructed types.

There exist 3 types of constructed types:

  • CHOICE
  • SEQUENCE
  • SET
Constructed-Type ::= Base-Type {
   component Component-Type,
   ..., -- either this or ...
   ...! err1 err2, -- <- exception marker ?????
   extended Component-Type-2 -- for version2
}
err1 INTEGER ::= 501
err2 INTEGER ::= 503

Constructed types and extensibility.

Normally constructed types have a fixed component set, but by either adding the ... extensible-marker component, or by setting.

Choice

A choice is a meta-type that signifies the presence of at most one of a set of values.

Choice-Type ::= CHOICE {
  first-choice [0] Value-Type-1,
  second-choice [1] Value-Type-2
}
  • Type matching a CHOICE is the same as matching any of its fields, and it is not encoded into it's own constructed group unless used in an EXPLICIT numbered field in a sequence or set.

Sequence and Set

Sequences and set's are pretty similar, but where the SEQUENCE type can take advantage of the fact that fields must come in the order they were defined.

Sub-Types and constraints

String-Type ::= UTF8String (SIZE (from..to)) -- bounded
String-Type ::= UTF8String (SIZE (from..)) -- unbounded on higher numbers
String-Type ::= UTF8String (SIZE (fixed)) -- fixed size
String-Type ::= IA5String (FROM ("A" .. "Z")) -- limited alphabet
String-Type ::= IA5String (FROM ("0" .. "9", A" .. "Z")) (SIZE (1 ..)) -- limited alphabet
Integer-Type ::= INTEGER (min..max) -- integer with bounded value.
Integer-Type ::= INTEGER (min..) -- integer with lower-bound value.
Integer-Type ::= INTEGER (..max) -- integer with upper-bound value.
Sequence-Type ::= Origin-Type (WITH COMPONENTS {..., `additional components`})

For extending a sequence with additional components: They will be inserted where the extension point is, or at the end if