ISO/TC 37/SC 4/WG 6 - Linguistic annotation
Annotation linguistique
General Information
ISO 24623-1:2018 describes the abstract metamodel designed to accommodate any corpus query language (QL) and providing a basis for coarse-grained classification. The metamodel consists of several components referred to as CQLF classes, levels, and modules, and is illustrated with examples from the Single-stream class (where a single data stream is used to organize the relevant data structures). Within this class, this document discusses three CQLF levels (Linear, Complex and Concurrent), as well as their subdivisions into modules, dictated by functional and modelling criteria. ISO 24623-1:2018 does not provide a way to specify further details beyond the above-mentioned divisions, and neither does it contain within its scope QLs designed to query more than one concurrent data stream, as in multimodal corpora or in parallel corpora (such QLs can still be classified according to the criteria suggested here for less expressive QLs).
- Standard17 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard12 pagesEnglish languagesale 15% off
- Standard17 pagesEnglish languagesale 10% offe-Library read for1 day
ISO 24615-2:2018 describes an XML-conformant serialization of the ISO 24615‑1 meta-model, with the objective of supporting interoperability across language resources or language processing components in the domain of syntactic annotations. As an extension of ISO 24615‑1, this document is also coordinated with ISO 24612.
- Standard17 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard12 pagesEnglish languagesale 15% off
- Standard17 pagesEnglish languagesale 10% offe-Library read for1 day
ISO 24624:2016 specifies rules for representing transcriptions of audio- and video-recorded spoken interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the document aims to relate transcribed data with standards for annotated corpora. It is applicable to transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics, corpus lexicography, language technology, qualitative social studies and other transcription data of recorded spoken language. It is not applicable to other forms of transcription, most importantly transcriptions of hand-written manuscripts. Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.
- Standard39 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard32 pagesEnglish languagesale 15% off
- Standard39 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard34 pagesFrench languagesale 15% off
ISO 24615-1:2014 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615-1:2014 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.
- Standard25 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard20 pagesEnglish languagesale 15% off
- Standard25 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard20 pagesFrench languagesale 15% off
ISO 24611:2012 provides a framework for the representation of annotations of word-forms in texts; such annotations concern tokens, their relationship with lexical units, and their morpho-syntactic properties.It describes a metamodel for morpho-syntactic annotation that relates to a reference to the data categories contained in the ISOCat data category registry (DCR, as defined in ISO 12620). It also describes an XML serialization for morpho-syntactic annotations, with equivalences to the guidelines of the TEI (text encoding initiative).
- Standard65 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard58 pagesEnglish languagesale 15% off
- Standard65 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard63 pagesFrench languagesale 15% off
The basic concepts and general principles of word segmentation as defined in ISO 24614-1 apply to Chinese, Japanese and Korean. Text needs to be segmented into tokens, words, phrases or some other types of smaller textual units in order to perform certain computational applications on language resources, such as natural language processing, information retrieval and machine translation. ISO 24614-2:2011 is restricted to the segmentation of a text into words or other word segmentation units (WSUs). This task is distinct from morphological or syntactic analysis per se, although it greatly depends on morphosyntactic analysis. It is also different from the task of laying out a framework for constructing a lexicon and identifying its lexical entries, namely lemmas and lexemes. The frameworks for the latter tasks are provided by ISO 24611, ISO 24613 and ISO 24615. ISO 24614-2:2011 specifies rules for delineating WSUs for Chinese, Japanese and Korean. Some rules are common to all three languages, though each language also has its own distinct rules for identifying WSUs. The common features are discussed, then the distinct rules are laid out for Chinese, for Japanese and for Korean.
- Standard49 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard43 pagesEnglish languagesale 15% off
- Standard49 pagesEnglish languagesale 10% offe-Library read for1 day
ISO 24614-1:2010 presents the basic concepts and general principles of word segmentation, and provides language-independent guidelines to enable written texts to be segmented, in a reliable and reproducible manner, into word segmentation units (WSU). The many applications and fields that need to segment texts into words — and thus to which ISO 24614-1:2010 can be applied — include translation, content management, speech technologies, computational linguistics and lexicography.
- Standard20 pagesEnglish languagesale 10% offe-Library read for1 day
- Standard15 pagesEnglish languagesale 15% off
- Standard20 pagesEnglish languagesale 10% offe-Library read for1 day
- Draft24 pagesEnglish languagesale 10% offe-Library read for1 day
ISO 24615:2010 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615:2010 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.
- Standard18 pagesEnglish languagesale 15% off
- Standard23 pagesEnglish languagesale 10% offe-Library read for1 day