Named Entity Classification Module

The mission of the Named Entity Classification module is to assing a class to named entities in the text. It is a Machine-Learning based module, so the classes can be anything the model has been trained to recognize.

When classified, the PoS tag of the word is changed to the label defined in the model.

This module depends on a NER module being applied previously. If no entities are recognized, none can be classified.

Models provided with FreeLing distinguish four classes: Person (tag NP00SP0), Geographical location (NP00G00), Organization (NP00O00), and Others (NP00V00).

If you have an anotated corpus, the models can be trained using the scripts in src/utilities/nerc. See the README there and the comments inside the script for details.

The most important file in the set is the .rgf file, which contains a definition of the context features that must be extracted for each named entity. The feature rule language is described in section 4.4.

The API of the class is the following:

class nec {
  public:
    /// Constructor
    nec(const std::string &); 

    /// analyze given sentence.
    void analyze(sentence &) const;
    /// analyze given sentences.
    void analyze(std::list<sentence> &) const;
    /// return analyzed copy of given sentence
    sentence analyze(const sentence &) const;
    /// return analyzed copy of given sentences
    std::list<sentence> analyze(const std::list<sentence> &) const;
};

The constructor receives one parameter with the name of the configuration file for the module. Its content is described below.



Subsections
Lluís Padró 2013-09-09