Introducción Conceptos previos Definiciones Proyectos Herramientas Paseo histórico por la red
| DefinicionesUnas cuantas definiciones para entendernos.
Tom McArthur define
«Corpus»
(latinajo de uso
habitual en la jerga, plural
«corpora»)[1]
como
-
A collection of texts, especially if complete and
self-contained: the corpus of Anglo-Saxon verse.
-
In linguistics and lexicography, a body of texts,
utterances or other specimens considered more or less
representative of a language, and usually stored as an
electronic database. Currently, computer corpora may store
many millions of running words, whose features can be analysed
by means of «tagging» (the addition
of identifying and classifying tags[2] to words and other
formations) and the use of «concordancing
programs».
«Corpus linguistics» studies data in
any such corpus.
El marcado del corpus reponde a la necesidad de lo que se
llama «text annotation»: adding linguistic
information
-
Part-of-speech (POS) tagging
-
Syntactic annotation (parsed corpora)
-
Pragmatic annotation
-
Rhetorical information
-
Discourse structure
Concordancias: índice (normalmente
alfabético) de las palabras de un texto, en el cual la palabra
analizada figura en el centro de una línea rodeada a derecha e
izquierda de otras con las que aparece en un
contexto determinado.
Continúa el
“Tutorial:
Concordances and Corpora
”[3]
The most common form of concordance today is the
«Keyword-in-Context (KWIC) index», in which
each word is centered in a fixed-length field (e.g., 80
characters).
«Concordance programs (concordancers)»[4]:
Concordance programs are basic tools for the corpus
linguist. Since most corpora are incredibly large, it is a
fruitless enterprise to search a corpus without the help of
a computer. Concordance programs turn the electronic texts
into databases which can be searched. Usually (1) word
queries are always possible, but most programs also offer
(2) the possibility of searching for word combinations
within a specified range of words and (3) of looking up
parts of words (substrings, in particular affixes, for
example). If the program is a bit more sophisticated, it
might also provide its user with (4) lists of collocates
(colocaciones) or (5) frequency lists.
Interesante, el siguiente texto de Melamed
(
http://www.cs.nyu.edu/cs/projects/proteus/bma/):
A «bitext» consists of
two texts that are mutual translations. A bitext
map is a fine-grained description of the
correspondence relation between elements of the two halves
of a bitext. Finding such a map is the first step to
building translation models. It is also the first step in
applications like automatic detection of omissions in
translations.
Alignments (‘alineaciones’,
‘alineamientos’,
‘emparejamientos’ o
‘correspondencias’
se lee en la literatura técnica) are
“watered-down” bitext maps
that we can derive from general bitext maps.
El Informe Final del proyecto POINTER se esfuerza —y
creo que lo consigue— por aclarar los términos
‘lexicología’, ‘lexicografía’,
‘terminología’ y ‘terminografía’
(
http://www.computing.surrey.ac.uk/ai/pointer/report/section1.html#2).
La cita es larga pero creo que no tiene desperdicio.
While lexicology is the study of
words in general, terminology is the
study of special-language words or terms associated with
particular areas of specialist knowledge[5]. Neither lexicology
nor terminology is directly concerned with any particular
application. Lexicography, however, is
the process of making dictionaries, most commonly of
general-language words, but occasionally of special-language
words (i.e. terms). Most general-purpose dictionaries also
contain a number of specialist terms, often embedded within
entries together with general-language
words. Terminography (or often
misleadingly "terminology"), on the other hand, is
concerned exclusively with compiling collections of the
vocabulary of special languages. The outputs of this work
may be known by a number of different names —often
used inconsistently— including
"terminology", "specialised vocabulary",
"glossary", and so on.
Dictionaries are word-based: lexicographical work
starts by identifying the different senses of a particular
word form. The overall presentation to the user is generally
alphabetical, reflecting the word-based working
method. Synonyms —different form same meaning—
are therefore usually scattered throughout the dictionary,
whereas polysemes (related but different senses) and
homonyms (same form, different meaning) are grouped
together.
While a few notable attempts have been made to produce
conceptually-based general-language dictionaries — or
"thesauri", the results of such attempts are bound
to vary considerably according to the cultural and
chronological context of the author.
By contrast, high-quality terminologies are always in
some sense concept-based, reflecting the fact that the terms
which they contain map out an area of specialist knowledge
in which encyclopaedic information plays a central
role. Such areas of knowledge tend to be highly constrained
(e.g. "viticulture"; "viniculture";
"gastronomy"; and so on, rather than "food
and drink"), and therefore more amenable to a
conceptual organisation than is the case with the totality
of knowledge covered by general language. The relations
between the concepts which the terms represent are the main
organising principle of terminographical work, and are
usually reflected in the chosen manner of presentation to
the user of the terminology. Conceptually-based work is
usually presented in the paper medium in a thesaurus-type
structure, often mapped out by a system of classification
(e.g. UDC) accompanied by an alphabetical index to allow
access through the word form as well as the concept. In
terminologies, synonyms therefore appear together as
representations of the same meaning (i.e. concept), whereas
polysemes and homonyms are presented separately in different
entries.
Dictionaries of the general language are descriptive
in their orientation, arising from the lexicographer's
observation of usage. Terminologies may also be descriptive
in certain cases (depending on subject field and/or
application), but prescription (also:
"normalisation" or "standardisation")
plays an essential role, particularly in scientific,
technical and medical work where safety is a primary
consideration. Standardisation is normally understood as the
elimination of synonymy and the reduction of
polysemy/homonymy, or the coinage of neologisms to reflect
the meaning of the term and its relations to other terms.
«Terminology management», itself a
neologism, was coined to emphasise the need for a
methodology to collect, validate, organise, store, update,
exchange and retrieve individual terms or sets of terms for
a given discipline. This methodology is put into operation
through the use of computer-based information management
systems called «Terminology Management
Systems» (TMS).
Dice Martínez de Sousa, sub voce
terminología, en el
Diccionario de
lexicografía práctica
Hoy la terminología es una
ciencia bien estructurada
que se ocupa en crear los catálogos léxicos propios de las
ciencias, las técnicas, los oficios, etc., partiendo de
sistemas coherentes establecidos por organismos nacionales e
internacionales.
El proyecto SALT distingue entre
«lexbases» y
«termbases», pensadas para ser usadas
en traducción automática las primeras y como recursos de
ayuda a la traducción las segundas; EAGLES habla de
«termbanks».
EAGLES-I proporciona la siguiente definición de
«Memoria de Traducción»[6]:
a multilingual text archive containing (segmented,
aligned, parsed and classified) multilingual texts,
allowing storage and retrieval of aligned multilingual
text segments against various search conditions.
|