Information technology - SoftWare Hash IDentifier (SWHID) Specification V1.2

This specification defines a standard data format for referencing software artifacts that match the data model of modern distributed version control systems. This format includes the typical tree-like structure of a filesystem hierarchy, but also, special nodes to track revisions and releases, as well as the full status of a version control system, with all its development branches. A key property of SWHIDs is that they can be computed using cryptographically strong functions directly from the digital objects they refer to, by anyone that has access to a copy of those objects. This enables decentralised and independent verification of integrity, without relying on a registry or a central authority. The computation of the SWHID identifiers is based on Merkle Acyclic Directed Graphs, a natural generalization of Merkle trees. The resolution of SWHIDs, that is, the process of obtaining a copy of a digital artifact corresponding to a given SWHID, is outside the scope of this specification.

Titre manque

General Information

Status
Published
Publication Date
22-Apr-2025
Current Stage
6060 - International Standard published
Start Date
23-Apr-2025
Due Date
03-Jan-2026
Completion Date
23-Apr-2025
Ref Project

Overview

ISO/IEC 18670:2025 - SoftWare Hash IDentifier (SWHID) Specification V1.2 defines a standardized data format for intrinsic identifiers that reference software artifacts modeled after modern distributed version control systems. SWHIDs encode the tree-like filesystem hierarchy, special nodes for revisions, releases, and the full VCS state (branches, snapshots). A core design goal is decentralized integrity verification: SWHIDs are computable directly from the digital objects (using cryptographically strong functions and Merkle Acyclic Directed Graphs), so anyone with a copy of the objects can independently verify identity and integrity without a central registry. Note: the specification covers identifier syntax and computation; resolving identifiers (retrieving artifacts) is explicitly out of scope.

Key Topics and Requirements

  • Identifier structure and syntax: SWHIDs use a compact core identifier format (scheme version "1") with object-type tags such as cnt (content), dir (directory), rev (revision), rel (release), and snp (snapshot). Qualified identifiers may include contextual qualifiers after semicolons.
  • Intrinsic identifiers: Identifiers are derived from artifact content and metadata so they are reproducible and tamper-evident.
  • Merkle Acyclic Directed Graphs (Merkle ADGs): Computation of SWHIDs is based on Merkle ADGs, a generalization of Merkle trees suited to version control graphs.
  • Cryptographic functions and SHA1: The specification references SHA‑1 (RFC‑3174) and includes provisions to detect collision-prone files; SHA‑1 is treated carefully (partial function) and counter-cryptanalysis is noted to avoid ambiguous references.
  • Qualifiers and fragments: Support for context qualifiers (origin, visit, anchor, path) and fragment qualifiers (lines, bytes) lets users pinpoint subparts or contextual views of artifacts.
  • Compatibility: The standard addresses compatibility considerations with established VCS tools (e.g., Git).
  • Formal grammar: ABNF grammar (RFC‑5234) and IRI handling (RFC‑3987) define exact syntax and percent-encoding rules.

Applications and Who Uses It

  • Software archive and preservation projects (for long-term reproducible identifiers)
  • Software supply chain security and provenance tracking (integrity checks, auditability)
  • Package managers, dependency scanners and SBOM tools (precise artifact referencing)
  • Source code repositories and CI/CD systems (traceability across builds and branches)
  • Researchers and reproducible science practitioners (exact version referencing)
  • Security teams and compliance auditors (tamper detection and verification without centralized trust)

Related Standards

  • RFC‑3174 (SHA‑1), RFC‑3986 / RFC‑3987 (URI/IRI syntax), RFC‑5234 (ABNF)
  • The specification references Software Heritage as an origin of examples but does not require its use.

ISO/IEC 18670:2025 is a practical, interoperable approach for uniquely and verifiably identifying software artifacts at all granularities across distributed development ecosystems. Keywords: SWHID, software hash identifier, intrinsic identifier, Merkle ADG, SHA1, software provenance, software supply chain, integrity verification.

Standard
ISO/IEC 18670:2025 - Information technology — SoftWare Hash IDentifier (SWHID) Specification V1.2 Released:23. 04. 2025
English language
14 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 18670:2025 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - SoftWare Hash IDentifier (SWHID) Specification V1.2". This standard covers: This specification defines a standard data format for referencing software artifacts that match the data model of modern distributed version control systems. This format includes the typical tree-like structure of a filesystem hierarchy, but also, special nodes to track revisions and releases, as well as the full status of a version control system, with all its development branches. A key property of SWHIDs is that they can be computed using cryptographically strong functions directly from the digital objects they refer to, by anyone that has access to a copy of those objects. This enables decentralised and independent verification of integrity, without relying on a registry or a central authority. The computation of the SWHID identifiers is based on Merkle Acyclic Directed Graphs, a natural generalization of Merkle trees. The resolution of SWHIDs, that is, the process of obtaining a copy of a digital artifact corresponding to a given SWHID, is outside the scope of this specification.

This specification defines a standard data format for referencing software artifacts that match the data model of modern distributed version control systems. This format includes the typical tree-like structure of a filesystem hierarchy, but also, special nodes to track revisions and releases, as well as the full status of a version control system, with all its development branches. A key property of SWHIDs is that they can be computed using cryptographically strong functions directly from the digital objects they refer to, by anyone that has access to a copy of those objects. This enables decentralised and independent verification of integrity, without relying on a registry or a central authority. The computation of the SWHID identifiers is based on Merkle Acyclic Directed Graphs, a natural generalization of Merkle trees. The resolution of SWHIDs, that is, the process of obtaining a copy of a digital artifact corresponding to a given SWHID, is outside the scope of this specification.

ISO/IEC 18670:2025 is classified under the following ICS (International Classification for Standards) categories: 35.040.10 - Coding of character sets. The ICS classification helps identify the subject area and facilitates finding related standards.

You can purchase ISO/IEC 18670:2025 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


International
Standard
ISO/IEC 18670
First edition
Information technology —
2025-04
SoftWare Hash IDentifier (SWHID)
Specification V1.2
Reference number
© ISO/IEC 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2025 – All rights reserved
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Syntax . 3
5 Core identifiers . 3
5.1 General .3
5.2 Contents .3
5.3 Directories .4
5.4 Revisions .5
5.5 Releases .7
5.6 Snapshots .8
5.7 Compatibility with Git .9
6 Qualified identifiers . 10
6.1 Qualifiers .10
6.2 Fragment qualifiers .10
6.2.1 General .10
6.2.2 Lines qualifier .10
6.2.3 Bytes qualifier .10
6.3 Context qualifiers .11
6.3.1 General .11
6.3.2 Origin qualifier .11
6.3.3 Visit qualifier .11
6.3.4 Path qualifier .11
6.3.5 Anchor qualifier .11
6.4 Comparing qualified SWHIDs . 12
6.5 Recommendations . 12
Annex A (informative) Specification versioning .13
Bibliography . 14

© ISO/IEC 2025 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by JDF [as The SoftWare Hash Identifier (SWHID) Specification Version 1.0]
and drafted in accordance with its editorial rules. It was adopted, under the JTC 1 PAS procedure, by Joint
Technical Committee ISO/IEC JTC 1, Information technology.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.

© ISO/IEC 2025 – All rights reserved
iv
Introduction
Modern software relies heavily on open source components that are developed collaboratively in a
distributed setting, and that are assembled to create complex systems that evolve at a fast pace.
This has strengthened the need to precisely track, ensure availability, and guarantee integrity of the
components that go into a given system for a variety of stakeholders. Academia needs to ensure that
research results are reproducible, industry needs to improve the traceability of the software supply chain,
and developer communities need tools to cope with the increasing complexity.
A key building block for addressing this issue is a system of intrinsic identifiers that allows users to precisely
pinpoint the exact version of any software artifact, at all levels of granularity, without relying on any central
registry or naming authority.
With this specification, the SWHID working group makes such a system of intrinsic identifiers, originally
[1]
developed for the Software Heritage universal source code archive, available to all stakeholders.
For the sake of clarity, examples have been drawn directly from the Software Heritage archive; however, it
is important to note that systems for the persistent archival of software artifacts, as well as resolution of
SWHIDs, are outside the scope of this specification, which does not require the use of Software Heritage.

© ISO/IEC 2025 – All rights reserved
v
International Standard ISO/IEC 18670:2025(en)
Information technology — SoftWare Hash IDentifier (SWHID)
Specification V1.2
1 Scope
This specification defines a standard data format for referencing software artifacts that match the data
model of modern distributed version control systems.
This format includes the typical tree-like structure of a filesystem hierarchy, but also, special nodes to
track revisions and releases, as well as the full status of a version control system, with all its development
branches.
A key property of SWHIDs is that they can be computed using cryptographically strong functions directly
from the digital objects they refer to, by anyone that has access to a copy of those objects. This enables
decentralised and independent verification of integrity, without relying on a registry or a central authority.
The computation of the SWHID identifiers is based on Merkle Acyclic Directed Graphs, a natural
generalization of Merkle trees.
The resolution of SWHIDs, that is, the process of obtaining a copy of a digital artifact corresponding to a
given SWHID, is outside the scope of this specification.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
RFC-3174, US Secure Hash Algorithm 1 (SHA1), The Internet Society Network Working Grouphttps://tools .ietf
. or g/ ht ml/ r fc 3174
RFC-3986, Uniform Resource Identifier (URI): Generic Syntax, The Internet Society Network Working
Gr oupht t p s://t o ol s .ie t f . or g/ ht ml/ r fc 39 86
RFC-3987, Internationalized Resource Identifiers (IRIs), The Internet Society Network Working Grouphttps://
t ool s .iet f .or g/ ht ml/ r fc 39 87
RFC-5234, Augmented BNF for Syntax Specifications: ABNF, The Internet Society Network Working Grouphttps://
t ool s .iet f .or g/ ht ml/ r fc5234
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
branch
parallel line of development in a version control system (3.7), that stems from the main line

© ISO/IEC 2025 – All rights reserved
3.2
Git
distributed version control system (3.7) created by Linus Torvalds in 2005
3.3
hierarchical file system
method of organizing and managing files in a computer where data is stored hierarchically
3.4
intrinsic identifier
identifier that can be computed directly from the object that it identifies, without needing access to a registry
3.5
repository
storage location for software development artifacts (3.8) including but not limited to source code, build
scripts, and documentation
3.6
SHA1
SHA1
Secure Hash Algorithm 1
hash function that takes as input a sequence of bytes and produces a 160-bit (20-byte) hash value
Note 1 to entry: The returned value is called SHA1 checksum, or simply SHA1 when there is no risk of ambiguity
between the function and the returned value. A detailed description of how to compute SHA1 is available in RFC-3174.
[3]
In the wake of the Shattered attack of 2017 (see ), it is now possible to produce collision-prone files that are
different but return the same SHA1 checksums. It is however possible to detect, during SHA1 computation,
[2]
such SHA1-colliding files using counter-cryptanalysis (see ).
As collision-prone files are problematic from the point of view of unequivocal identification and integrity
verification, the SWHID standard takes measures to avoid that such files are referenced using only SHA1
checksums. For the purpose of this specification, the SHA1 function is therefore considered to be a partial
function, that only returns a value when a Shattered-style collision is not detectable using the techniques
[2]
described in and the reference implementation of it available at https:// github .com/ cr -marcstevens/ sh
a1collision detection (Git commit ID 38096fc021ac5b8f8207c7e926f11feb6b5eb17c, or version stable-v1.0.3).
When such a collision is detected during SHA1 computation, no SHA1 can be obtained for the object in question
and hence, depending on the context, a valid SWHID might not exist for it.
In most cases, SHA1s in this specification are computed on objects after adding specific headers to them,
making "trivial" collision-prone files still perfectly valid and hence referenceable using SWHIDs.
3.7
version control system
revision control system source control system software tool that helps manage different versions of software
development artifacts (3.8)
3.8
software artifact
object
representation of a distinct entity identifiable by a SWHID
3.9
metadata
supplementary information associated with a software artifact (3.8)
3.10
UNIX epoch
time reference point that denotes the precise moment at 00:00:00 Coordinated Universal Time (UTC) on 1
January 1970
© ISO/IEC 2025 – All rights reserved
4 Syntax
A SWHID consists of two separate parts: a mandatory core_identifier that can identify any software artifact,
and an optional list of qualifiers that allows specification of the context where the object is meant to be seen
and that points to a subpart of the object itself.
Syntactically, SWHIDs are generated by the entry point in the following grammar (which uses
notation defined by RFC-5234):
::= [ ] ; ::=  "swh" ":"
":" ":" ; ::= "1" ;
::=  "snp" (* snapshot *) | "rel" (* release *) | "rev" (* revision *) | "dir" (*
directory *) | "cnt" (* content *) ; ::= 40 * ; (* intrinsic id,
hex-encoded *) ::=  "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
; ::=   | "a" | "b" | "c" | "d" | "e" | "f" ; ::= ";"
[ ] ; ::=   |
; ::=   |  | |
; ::= "origin" "=" ; ::= "visit" "="
; ::= "anchor" "=" ; ::= "path" "=" escaped> ; ::= "lines" "=" | "bytes" "=" ; ::=
["-" ] ; ::= + ; ::= (* RFC 3987 IRI
*) ::= (* RFC 3987 absolute path *)
The last two symbols are defined as:
— is an IRI as defined in RFC-3987; and
— is an ipath-absolute from RFC-3987.
In both of these, all occurrences of ; (and %, as required by the RFC) have been percent-encoded (as %3B
and %25 respectively). Other characters may be percent-encoded, for example, to improve readability and/or
embeddability of SWHID in other contexts.
5 Core identifiers
5.1 General
A core identifier is composed of four fields, each separated by a colon (:), as follows:
The first field is the type of the identifier, and it is defined to be swh.
The second field is the version of the identifier scheme and for this version of the specification it is defined to be 1.
The third field is a tag corresponding to the type of object Identified, as follows:
— cnt for contents (see 5.2)
— dir for directories (see 5.3)
— rev for revisions (see 5.4)
— rel for releases (see 5.5)
— snp for snapshots (see 5.6)
The fourth field is the intrinsic identifier of the object. This is a hex-encoded (using lowercase ASCII
characters) hash value computed from the content and relevant metadata of the object.
5.2 Contents
A content is a byte sequence, typically, the content of a file. For this type of object, the intrinsic identifier is
the hash of it; that is, the SHA1 of the byte sequence obtained by concatenating:
— the ASCII string "blob" (4 bytes),

© ISO/IEC 2025 – All rights reserved
— an ASCII space,
— the length of the content as ASCII-encoded decimal digits,
— a NULL byte,
— and the actual content of the file.
No metadata is used for this type of object (in particular, notice that there is no file name mentioned here).
As an example, swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2 is the SWHID computed from
the full text of the GPL3 license.
5.3 Directories
Directories are data structures commonly used in hierarchical file systems to group together files and other
directories, and to hold relevant metadata about them, in the form of directory entries.
This specification adopts the same convention as the Git version control system, and only takes into account
as metadata the name of the directory entries (as a sequence of arbitrary bytes, excluding ASCII '/' and the
NULL byte) and a simplified representation of the access rights.
The names of entries in a directory shall be distinct from one another.
In order to compute the intrinsic identifier of a directory, it is necessary to compute first the SWHID of each
object listed in that directory.
Then a serialization of the directory is created, as follows:
1. sort the directory entries using the following algorithm:
a. for each entry pointing to a directory, append an ASCII '/' to its name
b. sort all entries using the byte order of their (modified) name
2. for each entry, with a given name (unmodified), add a sequence of bytes composed of
a. the normalized access rights, encoded as a sequence of ASCII-encoded octal digits ('100644' for
regular [i.e., non-special] files, '100755' for executable files, '120000' for symbolic links, and '40000'
for directories),
b. an ASCII space,
c. the name as a string of bytes,
d. a NULL byte,
e. the intrinsic identifier of the content or directory, encoded as a sequence of 20 bytes.
The intrinsic identifier of the directory is the SHA1 of the byte sequence obtained by concatenating
— the ASCII string "tree" (4 bytes),
— an ASCII space,
— the length of the previously obtained serialization as ASCII-encoded decimal digits,
— a NULL byte,
— and the previously obtained serialization.
As an example, swh:1:dir:d198b
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

ISO/IEC 18670:2025の「情報技術 - ソフトウェアハッシュ識別子(SWHID)規格 V1.2」は、現代の分散バージョン管理システムのデータモデルに基づいてソフトウェアアーティファクトを参照するための標準データ形式を定義しています。この規格の範囲は非常に重要であり、特にソフトウェア開発における効率と正確性を向上させる目的に寄与しています。 SWHIDの主な強みは、その暗号学的強度を持つ関数に基づいて、デジタルオブジェクトから直接計算できるという特徴です。これにより、ユーザーは対象オブジェクトのコピーへのアクセスがあれば、誰でも独立した整合性の確認を行うことが可能になります。この特性は、中央集権的なレジストリや権限に依存しない分散型の管理を実現するため、特に信頼性が求められる環境での適用において非常に有益です。 さらに、SWHIDの計算はメルクル非巡回有向グラフに基づいており、これはメルクルツリーの自然な一般化です。この構造により、ファイルシステムの階層的なツリー状構造と、リビジョンやリリースを追跡する特別なノードを組み合わせた形で、全体のバージョン管理システムの状態を包括的に把握できることが可能です。 この規格は、ソフトウェアアーティファクトの管理とトラッキングをする上で欠かせないものであり、特に開発の各段階での透明性と整合性を確保するために重要な役割を果たします。また、SWHIDの解決プロセス-すなわち特定のSWHIDに対応するデジタルアーティファクトのコピーを取得する過程-はこの規格の範囲外ですが、整合性の検証においては極めて重要な要素です。 全体として、ISO/IEC 18670:2025は、ソフトウェア開発におけるハッシュ識別子の標準化を通じて、業界の信頼性向上やプロセスの効率化に大きく貢献するものといえます。

The ISO/IEC 18670:2025 standard, titled "Information technology - SoftWare Hash IDentifier (SWHID) Specification V1.2," provides a robust framework for referencing software artifacts in modern distributed version control systems. Its scope is clearly defined, specifically addressing the need for a standard data format that encapsulates the tree-like structure of filesystem hierarchies while incorporating special nodes for tracking revisions, releases, and the overall status of a version control system, including development branches. One of the significant strengths of the SWHID specification is its emphasis on decentralized and independent verification of integrity. By enabling the computation of SWHIDs using cryptographically strong functions from the digital objects they represent, the standard removes the dependency on central registries or authorities. This feature is particularly relevant in today’s software development landscape, where maintaining the integrity of softWare is crucial, especially in collaborative environments that utilize distributed version control systems. Moreover, ISO/IEC 18670:2025 leverages the concept of Merkle Acyclic Directed Graphs (DAG), an advanced representation that extends the benefits of Merkle trees. This adaptation enhances the capability of software versioning and artifact management, ensuring that developers can efficiently manage complex project structures. While the standard provides a comprehensive framework for SWHID computation, it clearly delineates that the resolution of SWHIDs-obtaining a copy of the digital artifacts corresponding to a given SWHID-is outside its scope. This clarity allows developers to understand both the utility and limitations of the standard, positioning it effectively within the broader context of software development practices. Overall, ISO/IEC 18670:2025 stands out as a necessary advancement in information technology, especially for teams working with distributed version control systems. Its focus on standardization, integrity verification, and technical robustness makes it a valuable asset for modern software development methodologies.

ISO/IEC 18670:2025 표준은 소프트웨어 아티팩트를 참조하기 위한 표준 데이터 형식을 정의하며, 현대 분산 버전 관리 시스템의 데이터 모델에 부합합니다. 이 표준은 파일 시스템 계층의 전형적인 트리 구조를 포함할 뿐만 아니라, 수정 사항 및 릴리스를 추적하기 위한 특별 노드를 포함하고, 모든 개발 브랜치와 함께 버전 관리 시스템의 전체 상태를 제공합니다. SWHID의 주요 속성 중 하나는 참조하는 디지털 객체로부터 암호학적으로 강한 함수로 직접 계산될 수 있는 점입니다. 이를 통해 객체의 사본에 접근할 수 있는 누구나 독립적이고 분산된 무결성 검증이 가능하므로, 중앙 등록소나 권한에 의존하지 않습니다. 이 표준은 Merkle 비순환 방향 그래프(Merkle Acyclic Directed Graphs)를 기반으로 SWHID 식별자의 계산을 이루며, 이는 Merkle 트리의 자연스러운 일반화라고 할 수 있습니다. 또한, SWHID의 해상도 즉, 주어진 SWHID에 해당하는 디지털 아티팩트의 사본을 얻는 과정은 이 명세의 범위 밖에 있습니다. 이러한 정의는 현대 소프트웨어 개발 환경에서 매우 중요한 역할을 하며, 특히 분산이나 협업 개발에 있어 무결성을 보장하는 데 기여합니다. ISO/IEC 18670:2025 표준은 소프트웨어 아티팩트 관리 및 참조의 명확성을 제공하며, 다양한 애플리케이션에서의 활용 가능성을 높입니다.