Sample programs

The directory src/main/simple_examples in the tarball contains some example programs to illustrate how to call the library.

See the README file in that directory for details on what does each of the programs.

The most complete program in that directory is sample.cc, which is very similar to the program depicted below, which reads text from stdin, morphologically analyzes it, and processes the obtained results.

Note that depending on the application, the input text could be obtained from a speech recongnition system, or from a XML parser, or from any source suiting the application goals. Similarly, the obtained analysis, instead of being output, could be used in a translation system, or sent to a dialogue control module, etc.

int main() {
  wstring text;
  list<word> lw;
  list<sentence> ls;

  /// set locale to an UTF8 compatible locale
  util::init_locale(L"default");

  // if FreeLing was compiled with --enable-traces, you can activate
  // the required trace verbosity for the desired modules.
  //   traces::TraceLevel=4;
  //   traces::TraceModule=0xFFFFF;
  
  // ====== instantiate analyzers as needed =====

  wstring path=L"/usr/local/share/freeling/es/";
  tokenizer tk(path+L"tokenizer.dat"); 
  splitter sp(path+L"splitter.dat");
  
  // morphological analysis has a lot of options, and for simplicity they 
  // are packed up in a maco_options object. First, create the maco_options
  // object with default values.
  maco_options opt(L"es");  

  // then, set required options on/off  
  opt.UserMap=false;                 opt.AffixAnalysis = true;
  opt.MultiwordsDetection = true;    opt.NumbersDetection = true; 
  opt.PunctuationDetection = true;   opt.DatesDetection = true;
  opt.QuantitiesDetection = false;   opt.DictionarySearch = true; 
  opt.ProbabilityAssignment = true;  opt.NERecognition = true;   
  // alternatively, you can set active modules in a single call:
  // opt.set_active_modules(false,true,true,true,true,true,false,true,true,true);

  // and provide files for morphological submodules. Note that it is not necessary
  // to set opt.QuantitiesFile, since Quantities module was deactivated.
  opt.UserMapFile=L"";                 opt.LocutionsFile=path+L"locucions.dat";
  opt.AffixFile=path+L"afixos.dat";    opt.ProbabilityFile=path+L"probabilitats.dat"; 
  opt.DictionaryFile=path+L"dicc.src"; opt.NPdataFile=path+L"np.dat"; 
  opt.PunctuationFile=path+L"../common/punct.dat"; 
  // alternatively, you can set the files in a single call:
  // opt.set_data_files(L"", path+L"locucions.dat", L"", path+L"afixos.dat",
  //                   path+L"probabilitats.dat", opt.DictionaryFile=path+L"maco.db",
  //                   path+L"np.dat", path+L"../common/punct.dat");

  // create the analyzer with the just build set of maco_options
  maco morfo(opt); 
  // create a hmm tagger for spanish (with retokenization ability, and forced 
  // to choose only one tag per word)
  hmm_tagger tagger(L"es", path+L"tagger.dat", true, true); 
  // create chunker
  chart_parser parser(path+L"grammar-dep.dat");
  // create dependency parser 
  dep_txala dep(path+L"dep/dependences.dat", parser.get_start_symbol());
  
  // ====== Start text processing =====

  // get plain text input lines while not EOF.
  while (getline(wcin,text)) {
    
    // tokenize input line into a list of words
    lw=tk.tokenize(text);
    
    // accumulate list of words in splitter buffer, returning a list of sentences.
    // The resulting list of sentences may be empty if the splitter has still not 
    // enough evidence to decide that a complete sentence has been found. The list
    // may contain more than one sentence (since a single input line may consist 
    // of several complete sentences).
    ls=sp.split(lw, false);
    
    // perform  morphosyntactic analysis, disambiguation, and parsing
    morfo.analyze(ls);
    tagger.analyze(ls);
    parser.analyze(ls);
    dep.analyze(ls);

    // Do application-side processing with analysis results so far.
    ProcessResults(ls);
    
    // clear temporary lists;
    lw.clear(); ls.clear();    
  }
  
  // No more lines to read. Make sure the splitter doesn't retain anything  
  sp.split(lw, true, ls);   
 
  // analyze sentence(s) which might be lingering in the buffer, if any.
  morfo.analyze(ls);
  tagger.analyze(ls);
  parser.analyze(ls);
  dep.analyze(ls);

  // process remaining sentences, if any.
  ProcessResults(ls);
  
}

The processing performed on the obtained results would obviously depend on the goal of the application (translation, indexation, etc.). In order to illustrate the structure of the linguistic data objects, a simple procedure is presented below, in which the processing consists of merely printing the results to stdout in XML format.

void ProcessResults(const list<sentence> &ls) {
  
  list<sentence>::const_iterator is;
  word::const_iterator a;   //iterator over all analysis of a word
  sentence::const_iterator w;
  
  // for each sentence in list
  for (is=ls.begin(); is!=ls.end(); is++) {

    wcout<<L"<SENT>"<<endl;
    // for each word in sentence
    for (w=is->begin(); w!=is->end(); w++) {
      
      // print word form, with PoS and lemma chosen by the tagger
      wcout<<L"  <WORD form=\""<<w->get_form();
      wcout<<L"\" lemma=\""<<w->get_lemma();
      wcout<<L"\" pos=\""<<w->get_tag();
      wcout<<L"\">"<<endl;
      
      // for each possible analysis in word, output lemma, tag and probability
      for (a=w->analysis_begin(); a!=w->analysis_end(); ++a) {
	
	// print analysis info
	wcout<<L"    <ANALYSIS lemma=\""<<a->get_lemma();
	wcout<<L"\" pos=\""<<a->get_tag();
	wcout<<L"\" prob=\""<<a->get_prob();
	wcout<<L"\"/>"<<endl;
      }
      
      // close word XML tag after list of analysis
      wcout<<L"  </WORD>"<<endl;
    }
    
    // close sentence XML tag
    wcout<<L"</SENT>"<<endl;
  }
}

The above sample program may be found in /src/main/simple_examples/sample.cc in FreeLing tarball. The actual program also outputs tree structures resulting from parsing, which is ommitted here for simplicity.

Once you have compiled and installed FreeLing, you can build this sample program (or any other you may want to write) with the command:
g++ -o sample sample.cc -lfreeling

Option -lfreeling links with libfreeling library, which is the final result of the FreeLing compilation process. Check the README file in the directory to learn more about compiling and using the sample programs.

You may need to add some -I and/or -L options to the compilation command depending on where the headers and code of required libraries are located. For instance, if you installed some of the libraries in /usr/local/mylib instead of the default place /usr/local, you'll have to add the options -I/usr/local/mylib/include -L/usr/local/mylib/lib to the command above.

Issuing make in /src/main/simple_examples will compile all sample programs in that directory. Make sure that the paths to FreeLing installation directory in Makefile are the right ones.

Lluís Padró 2013-09-09