Morphological Analyzer Module

The morphological analyzer is a meta-module which does not perform any processing of its own.

It is just a convenience module to simplify the instantiation and call to the submodules described in the next sections (from 3.5 to 3.13).

At instantiation time, it receives a maco_options object, containing information about which submodules have to be created and which files must be used to create them.

A calling application may bypass this module and just call directly the submodules.

The Morphological Analyzer API is:

class maco {
  public:
    /// Constructor. Receives a set of options.
    maco(const maco_options &); 

    /// analyze given sentence.
    void analyze(sentence &) const;
    /// analyze given sentences.
    void analyze(std::list<sentence> &) const;
    /// return analyzed copy of given sentence
    sentence analyze(const sentence &) const;
    /// return analyzed copy of given sentences
    std::list<sentence> analyze(const std::list<sentence> &) const;
};

The maco_options class has the following API:

class maco_options {
  public:
    /// Language analyzed
    std::string Lang;

    /// Submodules to activate
    bool UserMap, AffixAnalysis, MultiwordsDetection, 
         NumbersDetection, PunctuationDetection, 
         DatesDetection,   QuantitiesDetection, 
         DictionarySearch, ProbabilityAssignment,
         NERecognition;

    /// Names of data files to provide to each submodule.
    std::string UserMapFile, LocutionsFile, QuantitiesFile,
            AffixFile, ProbabilityFile, DictionaryFile, 
            NPdataFile, PunctuationFile;

    /// module-specific parameters for number recognition
    std::wstring Decimal, Thousand;
    /// module-specific parameters for probabilities
    double ProbabilityThreshold;
    /// module-specific parameters for dictionary
    bool InverseDict,RetokContractions;

    /// constructor
    maco_options(const std::string &); 

    /// Option setting methods provided to ease perl interface generation. 
    /// Since option data members are public and can be accessed directly
    /// from C++, the following methods are not necessary, but may become
    /// convenient sometimes.
    /// The order of the parameters is the same than the variables defined above.
    void set_active_modules(bool,bool,bool,bool,bool,bool,bool,bool,bool,bool);
    void set_data_files(const std::wstring &,const std::wstring &,const std::wstring &,
                        const std::wstring &,const std::wstring &,const std::wstring &,
                        const std::wstring &,const std::wstring &);
    void set_nummerical_points(const std::string &,const std::string &);
    void set_threshold(double);
    void set_inverse_dict(bool);
    void set_retok_contractions(bool);

To instantiate a Morphological Analyzer object, the calling application needs to instantiate a maco_options object, initialize its fields with the desired values, and use it to call the constructor of the maco class.

The created object will create the required submodules, and when asked to analyze some sentences, it will just pass it down to each the submodule, and return the final result.

The maco_options class has convenience methods to set the values of the options, but note that all the members are public, so the user application can set those values directly if preferred.

Lluís Padró 2013-09-09