The number detection module is language dependent: It recognizes nummerical expression (e.g.: 1,220.54 or two-hundred sixty-five), and assigns them a normalized value as lemma.
The module is basically a finite-state automata that recognizes valid nummerical expressions. Since the structure of the automata and the actions to compute the actual nummerical value are different for each lemma, the automata is coded in C++ and has to be rewritten for any new language.
For languages that do not have an implementation of a specific automata, a generic module is used to recognize number-like expressions that contain nummerical digits.
There is no configuration file to be provided to the class when it is instantiated. The API of the class is:
class numbers { public: /// Constructor: receives the language code, and the decimal /// and thousand point symbols numbers(const std::string &, const std::string &, const std::string &); /// analyze given sentence. void analyze(sentence &) const; /// analyze given sentences. void analyze(std::list<sentence> &) const; /// return analyzed copy of given sentence sentence analyze(const sentence &) const; /// return analyzed copy of given sentences std::list<sentence> analyze(const std::list<sentence> &) const; };
The parameters that the constructor expects are:
Lluís Padró 2013-09-09