Splitter Module

The splitter module receives lists of word objects (either produced by the tokenizer or by any other means in the calling application) and buffers them until a sentence boundary is detected. Then, a list of sentence objects is returned.

The buffer of the splitter may retain part of the tokens if the given list didn't end with a clear sentence boundary. The caller application can submit further token lists to be added, or request the splitter to flush the buffer.

Note that the splitter is not thread-safe when the buffer is not flushed at each call.

The API for the splitter class is:

class splitter {
  public:
    /// Constructor. Receives a file with the desired options
    splitter(const std::string &);

    /// Add list of words to the buffer, and return complete sentences 
    /// that can be build.
    /// The boolean states if a buffer flush has to be forced (true) or
    /// some words may remain in the buffer (false) if the splitter 
    /// needs to wait to see what is coming next.
    std::list<sentence> split(const std::list<word> &, bool);
    /// split given list, add resulting sentence to output parameter
    void split(const std::list<word> &, bool, std::list<sentence> &);
};



Subsections

Lluís Padró 2013-09-09