There are two different modules able to perform NE recognition. They can be instantiated directly, or via a wrapper that will create the right module depending on the configuration file.
The API for the wrapper is the following:
class WINDLL ner { public: /// Constructor ner(const std::wstring &); /// Destructor ~ner(); /// analyze given sentence void analyze(sentence &) const; /// analyze given sentences void analyze(std::list<sentence> &) const; /// analyze sentence, return analyzed copy sentence analyze(const sentence &) const; /// analyze sentences, return analyzed copy std::list<sentence> analyze(const std::list<sentence> &) const; };
The parameter to the constructor is the absolute name of a
configuration file, which must contain the desired module type
(basic
or bio
) in a line enclosed by the tags
<Type>
and </Type>
.
The rest of the file must contain the configuration options specific for the selected NER type, described below.
The basic module is simple and fast, and easy to adapt for use in new languages, provided capitalization is the basic clue for NE detection in the target language. The estimated performance of this module is about 85% correctly recognized named entities.
The bio module, is based on machine learning algorithms. It has a higher precision (over 90%), but it is remarkably slower than basic, and adaptation to new languages requires a training corpus plus some feature engineering.