The format of the file containing the user map from regular expression to pairs lemma-PoS is one regular expression per line, each line with the format: regex lemma1 tag1 lemma2 tag2 ....
The lemma may be any string literal, or $$
meaning that the
string matching the regular expression is to be used as a
lemma.
E.g.:
@[a-z][0-9] $$ NP00000 <.*> XMLTAG Fz hulabee hulaboo JJS hulaboo NNS
The first rule will recognize tokens such as @john
or
@peter4
, and assign them the tag NP00000
(proper noun)
and the matching string as lemma.
The second rule will recognize tokens starting with ``<
'' and
ending with ``>
'' (such as <HTML>
or <br/>
) and
assign them the literal XMLTAG
as lemma and the tag Fz
(punctuation:others) as PoS.
The third rule will assign the two pairs lemma-tag to each occurrence of the word ``hulabee''. This is just an example, and if you want to add a word to your dictionary, the dictionary module is the right place.