This module implements the microdata->RDF algorithm, as documented by the U{W3C Semantic Web Interest Group Note<http://www.w3.org/TR/2012/NOTE-microdata-rdf-20120308/>}.
The module can be used via a stand-alone script (an example is part of the distribution) or bound to a CGI script as a Web Service. An example CGI script is also added to the distribution. Both the local script and the distribution may have to be adapted to local circumstances.
returns an RDFLib.Graph object instead of a serialization thereof. See the the description of the L{pyMicrodata class<pyMicrodata.pyMicrodata>} for further possible entry points details.
There is also, as part of this module, a L{separate entry for CGI calls<processURI>}.
By default, the output format for the graph is RDF/XML. At present, the following formats are also available (with the corresponding key to be used in the package entry points):
- “xml”: U{RDF/XML<http://www.w3.org/TR/rdf-syntax-grammar/>}
- “turtle”: U{Turtle<http://www.w3.org/TR/turtle/>} (default)
- “nt”: U{N-triple<http://www.w3.org/TR/rdf-testcases/#ntriples>}
- “json”: U{JSON-LD<http://json-ld.org/spec/latest/json-ld-syntax/>}
@summary: Microdata parser (distiller) @requires: Python version 2.5 or up @requires: U{RDFLib<http://rdflib.net>} @requires: U{html5lib<http://code.google.com/p/html5lib/>} for the HTML5 parsing; note possible dependecies on Python’s version on the project’s web site @organization: U{World Wide Web Consortium<http://www.w3.org>} @author: U{Ivan Herman<http://www.w3.org/People/Ivan/>} @license: This software is available for use under the U{W3C® SOFTWARE NOTICE AND LICENSE<href=”http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231“>} @copyright: W3C
Bases: rdflib.plugins.parsers.pyMicrodata.MicrodataError
Raised when HTTP problems are detected. It does not add any new functionality to the Exception class.
Bases: exceptions.Exception
Superclass exceptions representing error conditions defined by the RDFa 1.1 specification. It does not add any new functionality to the Exception class.
list of weak references to the object (if defined)
The standard processing of a microdata uri options in a form, ie, as an entry point from a CGI call.
The call accepts extra form options (eg, HTTP GET options) as follows:
@param uri: URI to access. Note that the “text:” and “uploaded:” values are treated separately; the former is for textual intput (in which case a StringIO is used to get the data) and the latter is for uploaded file, where the form gives access to the file directly. @param outputFormat: serialization formats, as understood by RDFLib. Note that though “turtle” is a possible parameter value, some versions of the RDFLib turtle generation does funny (though legal) things with namespaces, defining unusual and unwanted prefixes... @param form: extra call options (from the CGI call) to set up the local options (if any) @type form: cgi FieldStorage instance @return: serialized graph @rtype: string
Main processing class for the distiller @ivar base: the base value for processing @ivar http_status: HTTP Status, to be returned when the package is used via a CGI entry. Initially set to 200, may be modified by exception handlers
@keyword base: URI for the default “base” value (usually the URI of the file to be processed) @keyword vocab_expansion: whether vocab expansion should be performed or not @type vocab_expansion: Boolean @keyword vocab_cache: if vocabulary expansion is done, then perform caching of the vocabulary data @type vocab_cache: Boolean
Extract the RDF Graph from a DOM tree. @param dom: a DOM Node element, the top level entry node for the whole tree (to make it clear, a dom.documentElement is used to initiate processing) @keyword graph: an RDF Graph (if None, than a new one is created) @type graph: rdflib Graph instance. If None, a new one is created. @return: an RDF Graph @rtype: rdflib Graph instance
Extract an RDF graph from an microdata source. The source is parsed, the RDF extracted, and the RDF Graph is returned. This is a front-end to the L{pyMicrodata.graph_from_DOM} method.
@param name: a URI, a file name, or a file-like object @return: an RDF Graph @rtype: rdflib Graph instance
Extract and RDF graph from an RDFa source and serialize it in one graph. The source is parsed, the RDF extracted, and serialization is done in the specified format. @param name: a URI, a file name, or a file-like object @keyword outputFormat: serialization format. Can be one of “turtle”, “n3”, “xml”, “pretty-xml”, “nt”. “xml” and “pretty-xml”, as well as “turtle” and “n3” are synonyms. @return: a serialized RDF Graph @rtype: string
Extract and RDF graph from a list of RDFa sources and serialize them in one graph. The sources are parsed, the RDF extracted, and serialization is done in the specified format. @param names: list of sources, each can be a URI, a file name, or a file-like object @keyword outputFormat: serialization format. Can be one of “turtle”, “n3”, “xml”, “pretty-xml”, “nt”. “xml” and “pretty-xml”, as well as “turtle” and “n3” are synonyms. @return: a serialized RDF Graph @rtype: string
The core of the Microdata->RDF conversion, a more or less verbatim implementation of the U{W3C IG Note<http://www.w3.org/TR/microdata-rdf/>}. Because the implementation was also used to check the note itself, it tries to be fairly close to the text.
@organization: U{World Wide Web Consortium<http://www.w3.org>} @author: U{Ivan Herman<a href=”http://www.w3.org/People/Ivan/“>} @license: This software is available for use under the U{W3C® SOFTWARE NOTICE AND LICENSE<href=”http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231“>}
Evaluation context structure. See Section 4.1 of the U{W3C IG Note<http://www.w3.org/TR/microdata-rdf/>}for the details.
@ivar current_type : an absolute URL for the current type, used when an item does not contain an item type @ivar memory: mapping from items to RDF subjects @type memory: dictionary @ivar current_name: an absolute URL for the in-scope name, used for generating URIs for properties of items without an item type @ivar current_vocabulary: an absolute URL for the current vocabulary, from the registry
Get the memory content (ie, RDF subject) for ‘item’, or None if not stored yet @param item: an ‘item’, in microdata terminology @type item: DOM Element Node @return: None, or an RDF Subject (URIRef or BNode)
During the generation algorithm a new copy of the current context has to be done with a new current type.
At the moment, the content of memory is copied, ie, a fresh dictionary is created and the content copied over. Not clear whether that is necessary, though, maybe a simple reference is enough... @param itype : an absolute URL for the current type @return: a new evaluation context instance
This class encapsulates methods that are defined by the U{microdata spec<http://dev.w3.org/html5/md/Overview.html>}, as opposed to the RDF conversion note.
@ivar document: top of the DOM tree, as returned by the HTML5 parser @ivar base: the base URI of the Dom tree, either set from the outside or via a @base element
@param document: top of the DOM tree, as returned by the HTML5 parser @param base: the base URI of the Dom tree, either set from the outside or via a @base element
This is a method defined for DOM 2 HTML, but the HTML5 parser does not seem to define it. Oh well... @param id: value of an @id attribute to look for @return: array of nodes whose @id attribute matches C{id} (formally, there should be only one...)
Bases: rdflib.plugins.parsers.pyMicrodata.microdata.Microdata
Top level class encapsulating the conversion algorithms as described in the W3C note.
@ivar graph: an RDF graph; an RDFLib Graph @type graph: RDFLib Graph @ivar document: top of the DOM tree, as returned by the HTML5 parser @ivar ns_md: the Namespace for the microdata vocabulary @ivar base: the base of the Dom tree, either set from the outside or via a @base element
@param graph: an RDF graph; an RDFLib Graph @type graph: RDFLib Graph @param document: top of the DOM tree, as returned by the HTML5 parser @keyword base: the base of the Dom tree, either set from the outside or via a @base element @keyword vocab_expansion: whether vocab expansion should be performed or not @type vocab_expansion: Boolean @keyword vocab_cache: if vocabulary expansion is done, then perform caching of the vocabulary data @type vocab_cache: Boolean
Top level entry to convert and generate all the triples. It finds the top level items, and generates triples for each of them; additionally, it generates a top level entry point to the items from base in the form of an RDF list.
Generate a full URI for a predicate, using the type, the vocabulary, etc.
For details of this entry, see Section 4.4 @param name: name of the property, ie, what appears in @itemprop @param context: an instance of an evaluation context @type context: L{Evaluation_Context}
Generate the property values for a specific subject and predicate. The context should specify whether the objects should be added in an RDF list or each triples individually.
@param subject: RDF subject @type subject: RDFLib Node (URIRef or blank node) @param predicate: RDF predicate @type predicate: RDFLib URIRef @param objects: RDF objects @type objects: list of RDFLib nodes (URIRefs, Blank Nodes, or literals) @param context: evaluation context @type context: L{Evaluation_Context}
Generate the triples for a specific item. See the W3C Note for the details.
@param item: the DOM Node for the specific item @type item: DOM Node @param context: an instance of an evaluation context @type context: L{Evaluation_Context} @return: a URIRef or a BNode for the (RDF) subject
Generate an RDF object, ie, the value of a property. Note that if this element contains an @itemscope, then a recursive call to L{MicrodataConversion.generate_triples} is done and the return value of that method (ie, the subject for the corresponding item) is return as an object.
Otherwise, either URIRefs are created for <a>, <img>, etc, elements, or a Literal; the latter gets a time-related type for the <time> element.
@param node: the DOM Node for which the property values should be generated @type node: DOM Node @param context: an instance of an evaluation context @type context: L{Evaluation_Context} @return: an RDF resource (URIRef, BNode, or Literal)
Hardcoded version of the current microdata->RDF registry. There is also a local registry to include some test cases. Finally, there is a local dictionary for prefix mapping for the registry items; these are the preferred prefixes for those vocabularies, and are used to make the output nicer.
@organization: U{World Wide Web Consortium<http://www.w3.org>} @author: U{Ivan Herman<a href=”http://www.w3.org/People/Ivan/“>} @license: This software is available for use under the U{W3C® SOFTWARE NOTICE AND LICENSE<href=”http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231“>}
Various utilities for pyMicrodata
@organization: U{World Wide Web Consortium<http://www.w3.org>} @author: U{Ivan Herman<a href=”http://www.w3.org/People/Ivan/“>} @license: This software is available for use under the U{W3C® SOFTWARE NOTICE AND LICENSE<href=”http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231“>}
A wrapper around the urllib2 method to open a resource. Beyond accessing the data itself, the class sets the content location. The class also adds an accept header to the outgoing request, namely text/html and application/xhtml+xml (unless set explicitly by the caller).
@ivar data: the real data, ie, a file-like object @ivar headers: the return headers as sent back by the server @ivar location: the real location of the data (ie, after possible redirection and content negotiation)
@param name: URL to be opened @keyword additional_headers: additional HTTP request headers to be added to the call
Generate an RDF List from vals, returns the head of the list @param graph: RDF graph @type graph: RDFLib Graph @param vals: array of RDF Resources @return: head of the List (an RDF Resource)
Generate an (absolute) URI; if val is a fragment, then using it with base, otherwise just return the value @param base: Absolute URI for base @param v: relative or absolute URI
Get (recursively) the full text from a DOM Node.
@param Pnode: DOM Node @return: string