Part-of-Speech Tagger Module

There are two different modules able to perform PoS tagging. The application should decide which method is to be used, and instantiate the right class.

The first PoS tagger is the hmm_tagger class, which is a classical trigam Markovian tagger, following [Bra00].

The second module, named relax_tagger, is a hybrid system capable to integrate statistical and hand-coded knowledge, following [Pad98].

The hmm_tagger module is somewhat faster than relax_tagger, but the later allows you to add manual constraints to the model. Its API is the following:

class hmm_tagger: public POS_tagger {
  public:
    /// Constructor
    hmm_tagger(const std::string &, bool, unsigned int, unsigned int kb=1);

    /// analyze given sentence.
    void analyze(sentence &) const;
    /// analyze given sentences.
    void analyze(std::list<sentence> &) const;
    /// return analyzed copy of given sentence
    sentence analyze(const sentence &) const;
    /// return analyzed copy of given sentences
    std::list<sentence> analyze(const std::list<sentence> &) const;

    /// given an analyzed sentence find out probability 
    /// of the k-th best sequence
    double SequenceProb_log(const sentence &, int k=0) const;

};

The hmm_tagger constructor receives the following parameters:

The relax_tagger module can be tuned with hand written constraint, but is about 2 times slower than hmm_tagger. It is not able to produce k best sequences either.

class relax_tagger : public POS_tagger {
  public:
    /// Constructor, given the constraint file and config parameters
    relax_tagger(const std::string &, int, double, double, bool, unsigned int);

    /// analyze given sentence.
    void analyze(sentence &) const;
    /// analyze given sentences.
    void analyze(std::list<sentence> &) const;
    /// return analyzed copy of given sentence
    sentence analyze(const sentence &) const;
    /// return analyzed copy of given sentences
    std::list<sentence> analyze(const std::list<sentence> &) const;
};

The relax_tagger constructor receives the following parameters:

The iteration number, scale factor, and threshold parameters are very specific of the relaxation labelling algorithm. Refer to [Pad98] for details.



Subsections
Lluís Padró 2013-09-09