ISO 24620-5
(Main)Language resource management — Controlled human communication (CHC) — Part 5: Lexico-morpho-syntactic principles and methodology for personal data recognition and protection in text
Language resource management — Controlled human communication (CHC) — Part 5: Lexico-morpho-syntactic principles and methodology for personal data recognition and protection in text
This document establishes basic principles and a methodology to recognize personal data written in free text, in different languages (whether agglutinating, inflectional or isolating) and countries. This document is applicable to protecting human data circulating in national and international industries, and private and public organizations. This document is applicable to processing by human beings and/or automated processing, and to various domains (e.g. law, finance, health). It does not apply to automated image processing. This document uses formal methods only, as statistical methods are very different in nature.
Gestion des ressources linguistiques — Communication humaine contrôlée (CHC) — Partie 5: Principes lexico-morpho-syntaxiques et méthodologie pour la reconnaissance et la protection des données à caractère personnel dans du texte
Le présent document définit les principes de base et la méthodologie pour reconnaître des données à caractère personnel dans du texte libre, dans différentes langues (qu’elles soient agglutinantes, flexionnelles ou isolantes) et pays. Le présent document est applicable essentiellement à la protection des données humaines circulant dans les industries nationales et internationales, et dans les organisations privées et publiques. Le présent document s’applique au traitement par des êtres humains et/ou au traitement automatisé, ainsi qu’à divers domaines (par exemple, le droit, la finance, la santé). Il ne s’applique pas au traitement automatisé des images. Le présent document n’utilise que des méthodes formelles, les méthodes statistiques étant de nature très différente.
General Information
Buy Standard
Standards Content (Sample)
FINAL DRAFT
International
Standard
ISO/FDIS 24620-5
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Controlled human communication
Voting begins on:
(CHC) —
2024-03-07
Part 5:
Voting terminates on:
2024-05-02
Lexico-morpho-syntactic principles
and methodology for personal data
recognition and protection in text
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
Reference number
ISO/FDIS 24620-5:2024(en) © ISO 2024
---------------------- Page: 1 ----------------------
FINAL DRAFT
ISO/FDIS 24620-5:2024(en)
International
Standard
ISO/FDIS 24620-5
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Controlled human communication
Voting begins on:
(CHC) —
Part 5:
Voting terminates on:
Lexico-morpho-syntactic principles
and methodology for personal data
recognition and protection in text
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
COPYRIGHT PROTECTED DOCUMENT
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
© ISO 2024
IN ADDITION TO THEIR EVALUATION AS
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
or ISO’s member body in the country of the requester.
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/FDIS 24620-5:2024(en) © ISO 2024
© ISO 2024 – All rights reserved
ii
---------------------- Page: 2 ----------------------
ISO/FDIS 24620-5:2024(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation for controlled human communication . 2
5 Basic principles and methodology . 2
5.1 General .2
5.2 Specific issues .3
5.3 Principles .3
5.3.1 Overview .
...
Date: 2023-12-28
ISO/FDIS 24620-5:2023(E)2024
Date: 2024-02-21
ISO/TC 37/SC 4/WG 5
Secretariat: KATS
Language resource management — Controlled human
communication (CHC) — Part 5: Lexico-morpho-syntactic
principles and methodology for personal data recognition and
protection in textstext (DataPro)
Gestion des ressources linguistiques — Communication humaine contrôlée (CHC) — Partie 5:
Principes lexico-morpho-syntaxiques et méthodologie pour la détection et protection des données
personnelles dans les textes (DataPro)
© ISO 2024 – All rights reserved
i
---------------------- Page: 1 ----------------------
ISO/FDIS 24620-5:2023(E)
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of
this publication may be reproduced or utilized otherwise in any form or by any means, electronic or
mechanical, including photocopying, or posting on the internet or an intranet, without prior written
permission. Permission can be requested from either ISO at the address below or ISO’s member body in the
country of the requester.
ISO Copyright Office
CP 401 • CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland.
ii © ISO 2023 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/FDIS 24620-5:2024
Contents
Foreword . iv
Introduction. v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation for controlled human communication . 2
5 Basic principles and methodology . 4
5.1 General . 4
5.2 Specific issues . 5
5.3 Principles . 6
5.3.1 Overview . 6
5.3.2 Lexical, morphological and syntactic indicants . 7
6 Applications . 10
6.1 General . 10
6.2 Different language families . 10
6.3 Languages and countries . 10
6.4 Semes in text . 11
6.5 Applications for personal data recognition . 11
Annex A (informative) Examples of text in different languages and different semes . 12
Annex B (informative) Examples of hidden text with seme indications . 20
Annex C (informative) Table of semes in context . 23
Bibliography . 26
© ISO 2024 – All rights reserved
iii
---------------------- Page: 3 ----------------------
ISO/FDIS 2
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.