The machine-learning based Named Entity Classification module reads a configuration file with the following sections
<RGF>
contains one line with the path to the RGF file of the model. This file is the definition of the features that will be taken into account for NEC.
<RGF> ner.rgf </RGF>
<Classifier>
contains one line with the kind of classifier to use. Valid values are
AdaBoost
and SVM
.
<Classifier> Adaboost </Classifier>
<ModelFile>
contains one line with the path to the model file to be used. The model file must match the classifier type given in section <Classifier>
.
<ModelFile> ner.abm </ModelFile>The .abm files contain AdaBoost models based on shallow Decision Trees (see [CMP03] for details). You don't need to understand this, unless you want to enter into the code of the AdaBoost classifier.
The .svm files contain Support Vector Machine models generated by libsvm [CL11]. You don't need to understand this, unless you want to enter into the code of libsvm.
<Lexicon>
contains one line with the path to the lexicon file of the learnt model. The lexicon is used to translate string-encoded features generated by libfries to integer-encoded features needed by libomlet. The lexicon file is generated by libfries at training time.
<Lexicon> ner.lex </Lexicon>The .lex file is a dictionary that assigns a number to each symbolic feature used in the AdaBoost or SVM model. You don't need to understand this either unless you are a Machine Learning student or the like.
<Classes>
contains only one line with the classes of the model and its translation to B, I, O tag.
<Classes> 0 NP00SP0 1 NP00G00 2 NP00O00 3 NP00V00 </Classes>
<NE_Tag>
contains only one line with the PoS tag
assigned by the NER module, which will be used to select named entities
to be classified.
<NE_Tag> NP00000 </NE_Tag>
Lluís Padró 2013-09-09