public class BobChrisTreeNormalizer extends TreeNormalizer implements TreeTransformer
The normalizations of the original (Prolog) BobChrisNormalize were: 1. Remap the root node to be called 'ROOT' 2. Truncate all nonterminal labels before characters introducing annotations according to TreebankLanguagePack (traditionally, -, =, | or # (last for BLLIP)) 3. Remap the representation of certain leaf symbols (brackets etc.) 4. Map to lowercase all leaf nodes 5. Delete empty/trace nodes (ones marked '-NONE-') 6. Recursively delete any nodes that do not dominate any words 7. Delete A over A nodes where the top A dominates nothing else 8. Remove backslashes from lexical items (the Treebank inserts them to escape slashes (/) and stars (*)). 4 is deliberately omitted, and a few things are purely aesthetic.
14 June 2002: It now deletes unary A over A if both nodes' labels are equal (7), and (6) was always part of the Tree.prune() functionality... 30 June 2005: Also splice out an EDITED node, just in case you're parsing the Brown corpus.
| Modifier and Type | Class and Description |
|---|---|
static class |
BobChrisTreeNormalizer.AOverAFilter |
static class |
BobChrisTreeNormalizer.EmptyFilter |
| Modifier and Type | Field and Description |
|---|---|
protected java.util.function.Predicate<Tree> |
aOverAFilter |
protected java.util.function.Predicate<Tree> |
emptyFilter |
protected TreebankLanguagePack |
tlp |
| Constructor and Description |
|---|
BobChrisTreeNormalizer() |
BobChrisTreeNormalizer(TreebankLanguagePack tlp) |
| Modifier and Type | Method and Description |
|---|---|
protected java.lang.String |
cleanUpLabel(java.lang.String label)
Remove things like hyphened functional tags and equals from the
end of a node label.
|
java.lang.String |
normalizeNonterminal(java.lang.String category)
Normalizes a nonterminal contents.
|
java.lang.String |
normalizeTerminal(java.lang.String leaf)
Normalizes a leaf contents.
|
Tree |
normalizeWholeTree(Tree tree,
TreeFactory tf)
Normalize a whole tree -- one can assume that this is the
root.
|
Tree |
transformTree(Tree tree)
Does whatever one needs to do to a particular tree.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitapplyprotected final TreebankLanguagePack tlp
protected java.util.function.Predicate<Tree> emptyFilter
protected java.util.function.Predicate<Tree> aOverAFilter
public BobChrisTreeNormalizer()
public BobChrisTreeNormalizer(TreebankLanguagePack tlp)
public java.lang.String normalizeTerminal(java.lang.String leaf)
normalizeTerminal in class TreeNormalizerleaf - The String that decorates the leafpublic java.lang.String normalizeNonterminal(java.lang.String category)
normalizeNonterminal in class TreeNormalizercategory - The String that decorates this nonterminal nodeprotected java.lang.String cleanUpLabel(java.lang.String label)
null.label - The label from the treebankpublic Tree normalizeWholeTree(Tree tree, TreeFactory tf)
normalizeWholeTree in class TreeNormalizertree - The tree to be normalizedtf - the TreeFactory to create new nodes (if needed)public Tree transformTree(Tree tree)
TreeTransformerTree, and could itself
work recursively, but the canonical usage is to invoke this method
via the Tree.transform() method, which will apply the
transformer in a bottom-up manner to each local Tree,
and hence the implementation of TreeTransformer should
merely examine and change a local (one-level) Tree.transformTree in interface TreeTransformertree - A tree. Classes implementing this interface can assume
that the tree passed in is not null.Tree