Alternatives Suggestion Module

This module is able to retrieve from its dictionary the entries most similar to the input form. The similarity is computed according to a configurable string edit distance (SED) measure.

The alternatives module can be created to perform a direct search of the form in a dictionary, or either to perform a search of the phonetic transcription of the form in a dictionary of phonetic transcriptions. In the later case, the orthographic forms corresponding to the phonetically similar words are returned. For instance, if a mispelled word such as spid is found, this module will find out that it sounds very close to a correct word in the dictionary (speed), and return the correctly spelled alternatives. This module is based on the fast search algorithms on FSMs included in the finite-state libray FOMA (http://code.google.com/p/foma).

The API for this module is the following:

class alternatives {
  public:
    /// Constructor
    alternatives(const std::wstring &);
    /// Destructor
    ~alternatives();

    /// direct access to results of underlying FSM
    void get_similar_words(const std::wstring &, 
                           std::list<std::pair<std::wstring,int> > &) const;
    
    /// analyze given sentence.
    void analyze(sentence &) const;
    /// analyze given sentences.
    void analyze(std::list<sentence> &) const;
    /// return analyzed copy of given sentence
    sentence analyze(const sentence &) const;
    /// return analyzed copy of given sentences
    std::list<sentence> analyze(const std::list<sentence> &) const;

This module will find alternatives for words in the sentences, and enrich them with a list of forms, each with the corresponding SED value. The forms are added to the alternatives member of class word, which is a std::list<pair<std::wstring,int». The list can be traversed using the iterators word::alternatives_begin() and word::alternatives_end().

The constructor of this module expects a configuration file containing the following sections:

Lluís Padró 2013-09-09