Tagset description file has two sections: <DecompositionRules>
and <DirectTranslations>
, which describe how tags are converted to their
short version and decomposed into morphological feature-value pairs
<DirectTranslations>
describes a direct mapping from a tag
to its short version and to its feature-value pair list. Each line in the section
corresponds to a tag, and has the format: tag short-tag feature-value-pairs
For instance the line: NCMS000 NC postype=common|gender=masc|number=sg
states that the tag NCMS000
is shortened as NC
and that its list
of feature-value pairs is the one specified.
This section has precedence over section
<DecompositionRules>
, and can be used as an exception list.
If a tag is found in section <DirectTranslations>
, the rule
is applied and any rule in section <DecompositionRules>
for
this tag is ignored.
<DecompositionRules>
encodes rules to compute the morphological features
from an EAGLES tag digits. The form of each line is:
tag short-tag-size digit-description-1 digit-description-2 ...where
tag
is the digit for the category in the EAGLES PoS
tag (i.e. the first digit: N
, V
, A
, etc.),
and short-tag-size
is an integer stating the length of the
short version of the tag (e.g. if the value is 2, the first two
digits of the EAGLES PoS tag will we used as short
version). Finally, fields digit-description-n
contain
information on how to interpret each digit in the EAGLES PoS tag
There should be as many digit-description
fields as digits
there are in the PoS tag for that category. Each
digit-description
field has the format:
feature/digit:value;digit:value;digit:value;...
That is: the name of the feature encoded by that digit, followed by a
slash, and then a semicolon-separated list of translation pairs
that, for each possible digit in that position give the feature
value.
For instance, the rule for Spanish noun PoS tags is (in a single line):
N 2 postype/C:common;P:proper gen/F:f;M:m;C:c num/S:s;P:p;N:c
neclass/S:person;G:location;O:organization;V:other
grade/A:augmentative;D:diminutive
and states that any tag starting with N (unless it is
found in section <DirectTranslations>
) will be shortened
using its two first digits (e.g. NC, or NP). Then, the
description of each digit in the tag follows, encoding the information:
If a feature is underspecified or not appliable, a zero (0) is expected in the appropriate position of the PoS tag.
With the example rule described above, the tag translations in table
4.1.1 would take place:
Lluís Padró 2013-09-09