BIO NER module (bioner)

The machine-learning based NER module uses a classification algorithm to decide whether each word is at a NE begin (B), inside (I) or outside (O). Then, a simple viterbi algorithm is applied to guarantee sequence coherence.

It can be instantiated via the ner wrapper described above, or directly via its own API:

class bioner: public ner_module {
  public:
    /// Constructor, receives the name of the configuration file.
    bioner ( const std::string & );

    /// analyze given sentence.
    void analyze(sentence &) const;
    /// analyze given sentences.
    void analyze(std::list<sentence> &) const;
    /// return analyzed copy of given sentence
    sentence analyze(const sentence &) const;
    /// return analyzed copy of given sentences
    std::list<sentence> analyze(const std::list<sentence> &) const;
};

The configuration file sets the required model and lexicon files, which may be generated from a training corpus using the scripts provided with FreeLing (in folder src/utilities/nerc). Check the README and comments in the scripts to find out what to do.

The most important file in the set is the .rgf file, which contains a definition of the context features that must be extracted for each named entity. The feature rule language is described in section 4.4.

The sections of the configuration file for bioner module are:

Lluís Padró 2013-09-09