How to parse restructuredtext in python?
I'd like to extend upon the answer from Gareth Latty. "What you probably want is the parser at docutils.parsers.rst
" is a good starting point of the answer, but what's next? Namely:
How to parse restructuredtext in python?
Below is the exact answer for Python 3.6 and docutils 0.14:
import docutils.nodes
import docutils.parsers.rst
import docutils.utils
import docutils.frontend
def parse_rst(text: str) -> docutils.nodes.document:
parser = docutils.parsers.rst.Parser()
components = (docutils.parsers.rst.Parser,)
settings = docutils.frontend.OptionParser(components=components).get_default_values()
document = docutils.utils.new_document('<rst-doc>', settings=settings)
parser.parse(text, document)
return document
And the resulting document can be processed using, for example, below, which will print all references in the document:
class MyVisitor(docutils.nodes.NodeVisitor):
def visit_reference(self, node: docutils.nodes.reference) -> None:
"""Called for "reference" nodes."""
print(node)
def unknown_visit(self, node: docutils.nodes.Node) -> None:
"""Called for all other node types."""
pass
Here's how to run it:
doc = parse_rst('spam spam lovely spam')
visitor = MyVisitor(doc)
doc.walk(visitor)
Docutils does indeed contain the tools to do this.
What you probably want is the parser at docutils.parsers.rst
See this page for details on what is involved. There are also some examples at docutils/examples.py
- particularly check out the internals()
function, which is probably of interest.