graphpath -- Analyze RDF DataThe graphpath package provides a python implementation
of the GraphPath little-language together with an inference rule evaluator and
support for common RDF API's.
graphpath.exprClass(r), Node(v) and
Property(r) with their operators.graphpath.entailRuleBase is a dictionary of rules and
Sandbox is a rule evaluator and entailment cache.graphpath.redadaptRDF.Model objects. graphpath.libadaptrdflib RDF API to GraphPath.
After this module is imported, GraphPath expressions can be bound
to rdflib.store.AbstractTripleStore objects. graphpath.utilgraphpath.exprThis module implements GraphPath expressions as python objects. The GraphPath operators are implemented on these objects as overloaded python operators. The expression objects are immutable and comparable. Their operators are documented in the reference.
The following classes provide the elementary GraphPath steps from which more complex expressions may be composed using the operators. (The module also contains the classes that implement composite expressions but they are not intended to be directly instantiated.)
Once constructed, an expression is evaluated by binding it (with >>)
to a graph object (e.g. a Model or TripleStore) and iterating the result.
In the following, r and v stand for node objects.
In the Redland environment r and v must be RDF.Node objects. For rdflib
r should be a URIRef and v
should be either a URIRef or Literal. In addition,
all implementations are expected to accept a string
for v.
Node(v)v.Nodes(s)s.
The argument, s, is an iterable yielding nodes.Class(r)rSubject()Any()Self()Any, constructs a self step that matches any node.Property(r)r.HasNo(p)p.Map(p)p as a mapping of initial nodes to terminal nodes.
If m=Map(p) then iter(m) iterates the initial nodes of all paths matching p
and m[key] is the set of terminal nodes of those paths where key is the initial node.trace(p)p. If q=trace(p) then q and p
are equivalent expressions except that q will print tracing information to sys.stdout during evaluation.graphpath.entailThis module implements the GraphPath rule and inference system.
RuleBase()Constructs an empty collection of rules.
A RuleBase object
implements the mapping protocol where keys may be of type Class or Property
and values may be arbitrary GraphPath expression objects.
If rules is a RuleBase object, p a GraphPath
expression, and r a resource, then:
rules[Property(r)]=pr containing arcs for all paths matching p.
More precisely, every path matching p entails an arc in r
with the same respective initial and terminal nodes. rules[Class(r)]=pr containing all nodes matching the predicate p. To be exact, every
path matching p entails an rdf type arc with the same initial node and a terminal node
r.
Rule definitions are cumulative so that rules[Property(r)]=p; rules[Property(r)]=q is
equivalent to rules[Property(r)]=p|q. Similarly, repeated class definitions are
equivalent to a union.
A RuleBase can be queried using the usual mapping operations.
The keys (Class and Property objects) are iterated with iter(rules)
and a rule may be recovered with a subscripting operation: rule=rules[Property(r)].
Sandbox(graph, rules)Constructs a rule evaluator for the given graph and set of rules. The graph is a
Model or TripleStore object or another Sandbox instance. The
rules argument is a RuleBase object.
A Sandbox object represents a graph and can be bound to a GraphPath expression (with the >>
operator). For example:
graph = RDF.Model() rules = Rulebase() ... augmented = Sandbox( data, rules ) for result in augmented>>Class(r)/Property(p)[Property(q)/Node(v)]: print result
The Sandbox conceptually contains all of the arcs from the initial graph
augmented with their entailments, inferred by applying the given rules.
Expressions bound to the Sandbox will be matched against the augmented graph.
The inference algorithm assumes that the initial graph remains unchanged for the life of the
Sandbox. However, construction of a new Sandbox is cheap and does not involve
rule evaluation, which occurs on demand. A rule is evaluated when required to evaluate an expression bound
to the Sandbox or another rule, recursively. Entailments are cached
in the Sandbox and released when it is destroyed.
Modules are provided to add support for the Redland and rdflib API's.
graphpath.redadaptgraphpath.redadapt is imported, GraphPath expressions can be bound
to Redland RDF.Model objects.graphpath.libadaptgraphpath.libadapt is imported, GraphPath expressions can be bound
to rdflib.store.AbstractTripleStore objects.Both modules may be imported into the same program, although mixing RDF API's in the same GraphPath expression is not generally possible.
Each adapter module defines a class, Population,
that adapts graphs for binding.
However, it is not normally necessary to construct objects of this class explicitly.
The expression
Population(g)>>p should yield the same bound GraphPath expression as
g>>p.
The Population protocol can be implemented
to adapt new graph data structures to GraphPath.
Population(g)graphpath.expr.StrategyError,
which will enable other bindings to be attempted. To enable binding, the
constructor (ie the class or a factory) should be appended to the list:
graphpath.expr.adapters.
values(subject, property)match(property, object)rdf_typeClass() step.)__iter__The type of the subject, predicate and object (or value) arguments should be whatever type is used for these in underlying RDF API.
It should also be possible to use elementary python types for values, including strings at least. The adapter should convert these to and from the underlying RDF API value node type.
Individual methods can be left unimplemented although this will affect the ability
to evaluate or the speed of certain GraphPath expressions. Unimplemented methods
should raise graphpath.expr.StrategyError.