Basic NER module (np)

The first NER module is the np class, which is a just a FSA that basically detects sequences of capitalized words, taking into account some functional words (e.g. Bank of England) and capitalization at sentence begginings.

It can be instantiated via the ner wrapper described above, or directly via its own API:

class np: public ner_module, public automat {
  public:
    /// Constructor, receives a configuration file.
    np(const std::string &); 

    /// Detect multiwords starting at given sentence position
    bool matching(sentence &, sentence::iterator &) const;

    /// analyze given sentence.
    void analyze(sentence &) const;
    /// analyze given sentences.
    void analyze(std::list<sentence> &) const;
    /// return analyzed copy of given sentence
    sentence analyze(const sentence &) const;
    /// return analyzed copy of given sentences
    std::list<sentence> analyze(const std::list<sentence> &) const;
};

The file that controls the behaviour of the simple NE recognizer consists of the following sections:

Lluís Padró 2013-09-09