User Map File

The format of the file containing the user map from regular expression to pairs lemma-PoS is one regular expression per line, each line with the format: regex lemma1 tag1 lemma2 tag2 ....

The lemma may be any string literal, or $$ meaning that the string matching the regular expression is to be used as a lemma.
E.g.:

  
   @[a-z][0-9] $$ NP00000
   <.*> XMLTAG Fz
   hulabee hulaboo JJS hulaboo NNS

The first rule will recognize tokens such as @john or @peter4, and assign them the tag NP00000 (proper noun) and the matching string as lemma.

The second rule will recognize tokens starting with ``<'' and ending with ``>'' (such as <HTML> or <br/>) and assign them the literal XMLTAG as lemma and the tag Fz (punctuation:others) as PoS.

The third rule will assign the two pairs lemma-tag to each occurrence of the word ``hulabee''. This is just an example, and if you want to add a word to your dictionary, the dictionary module is the right place.



Lluís Padró 2013-09-09