Implementing "nested inline markup" in reStructuredText and Sphinx#
Background#
Sphinx is a famous documentation generator used by a lot of Open Source communities. It uses reStructuredText (hereafter referred to rST) as markup language by default.
Unlike Markdown, rST does not yet support Nested Inline Markups, so text like "bold code" or "italic link" doesn't render as expected:
|
bold |
✔️ |
|
|
✔️ |
|
|
❌ |
|
``bold code`` |
❌ |
In rST, all inline markups are implemented by
Interpreted Text Roles. For example, markup **foo**
is equivalent to
:strong:`foo`
, "foo" is the interpreted text, and "strong" is the name of
roles, which tells the renderer that "foo" should be highlighted.
The same goes for markup `foo`
and :literal:`foo`
.
|
bold |
✔️ |
|
|
✔️ |
Interpreted text can only "be interpreted" once, so markups and roles inside interpreted text will be treated as plain text, which means the syntax of role is not nestable either:
|
:strong: |
❌ |
|
|
❌ |
|
:literal:`bold code` |
❌ |
|
|
❌ |
Fortunately, rST is extensible, it allows users to create custom roles. Suppose we can create a role that combines the effects of two existing roles, then creating "bold code" is possible and it is true:
|
|
✔️ |
The sphinxnotes-comboroles
extension#
I wrote a Sphinx extension sphinxnotes-comboroles
,
which can dynamically create composite roles from existing roles.
First, download the extension from PyPI:
$ pip install sphinxnotes-comboroles
Then, add the extension name to extensions
configuration item in your conf.py
:
extensions = [
# …
'sphinxnotes.comboroles',
# …
]
To create a strong_literal
role that same as described above, add the following
configuration, which tells the extension to composite example roles
strong
and literal
into a new role
strong_literal
:
comboroles_roles = {
'strong_literal': ['strong', 'literal'],
}
Finally, you can use it:
|
|
Nested Parse#
We have said that markups in interpreted text will not be parsed, but the extension allows us to force parse the interpreted text, like this:
comboroles_roles = {
'parsed_literal': (['literal'], True), # enable nested_parse
}
The above configuration creates a composite role parsed_literal
with
Nested Parse enabled, so the text "**bold code**" can be parsed.
|
|
❌ |
|
|
✔️ |
Further, hyperlinks, substitutions, and even roles inside interpreted text can be parsed too:
|
|
|
|
|
|
备注
For nested roles, the backquote `
in interpreted text needs to be escaped.
Works with other Extensions#
Not limited to Standard Roles, The extensions can also work with roles provided by some other extensions.
sphinx.ext.extlink
#
sphinx.ext.extlink
is a Sphinx builtin extension to create
shortened external links.
We have the following configuration, extlink creates the issue
role,
then comboroles creates a literal_issue
role based on it:
extlinks = {
'enwiki': ('https://wikipedia.org/wiki/%s', '📖 %s'),
}
comboroles_roles = {
'literal_enwiki': ['literal', 'enwiki'],
}
|
|
|
参见
Inspired by sphinx-doc/sphinx#11745
sphinxnotes.strike
#
sphinxnotes.strike
is another extension I wrote, which adds
strikethrough text support to Sphinx:
comboroles_roles = {
'literal_strike': ['literal', 'strike'],
}
|
text |
|
|
Limitation#
警告
Due to internal implementation, the extension can only used to composite simple roles and may CRASH Sphinx when compositing complex roles. DO NOT report to Sphinx first if it crashes, please report to sphinx-notes/comboroles#new
How it works#
Someone may be curious how the extension is implemented. In fact, it is quite simple, about 30 lines of code.
The Docutils Document Tree#
Before going further, we need to have some basic understanding of
the Document Tree of docutils [1] (hereafter referred to as doctree).
The doctree describes the data structure of a rST document (a *.rst
file) [2].
Here is a simplified diagram of the hierarchy of elements in the doctree,
we only focus on the highlighted lines:
+--------------------------------------------------------------------+
| document [may begin with a title, subtitle, decoration, docinfo] |
| +--------------------------------------+
| | sections [each begins with a title] |
+-----------------------------+-------------------------+------------+
| [body elements:] | (sections) |
| | - literal | - lists | | - hyperlink +------------+
| | blocks | - tables | | targets |
| para- | - doctest | - block | foot- | - sub. defs |
| graphs | blocks | quotes | notes | - comments |
+---------+-----------+----------+-------+--------------+
| [text]+ | [text] | (body elements) | [text] |
| (inline +-----------+------------------+--------------+
| markup) |
+---------+
The highlight lines describe the content model of Inline Elements. All inline markups and roles we just discussed belong to inline elements.
Inline elements directly contain text data, and may also contain further inline elements. [4]
We already know that roles can not contain further roles, so we conclude that: The limitation of inline nested markup is caused by rST's syntax, rather than the rST's content model.
By using the rst2pseudoxml
command line, we can convert
rST source code to text representation of doctree:
rST |
doctree |
---|---|
**bold** ``code``
|
<document source="untitled.rst">
<paragraph>
<strong>
bold
<literal>
code
|
Words enclosed in angle brackets <
and >
represent nodes of the doctree,
You can see that role :strong:`bold`
is converted to a <strong>
node in
somehow (see next section) with interpreted text "bold" as its child.
The doctree of "bold code" is a combination of <strong>
and literal
node,
which looks like:
<strong>
<literal>
bold code
Dynamic compositing#
All roles of docutils are implemented in the same way [5]:
Define the Role Function, which receives the context of the parser, creates and returns inline elements (nodes), and does any additional processing required node.
Register the Role, with a name, such as "strong", then users can use it
We can simply create a role function, that returns a fixed combination like
<strong> <literal> text
, but it is not cool. There may are many combinations of
various markups, I don’t want to implement them one by one. The better idea is:
In the function, we look up role functions from a set of role names and get the corresponding node by calling them
Nesting these nodes together
Note that not all node combinations make sense, it depends on the complexity role function and the implementation of builders. Fortunately:
Most of markups's role function are very simple: They wrap
docutils.nodes.TextElement
around the text [6]The most commonly used builder is HTML builder, in its view, the combinations of nodes are combinations of HTML tags, which makes sense in most cases
The code implementation#
sphinx.util.docutils.SphinxRole
provides helper methods for creating roles
in Sphinx, we use it instead of defining role function directly:
class CompositeRole(SphinxRole):
#: Rolenames to be composited
rolenames: list[str]
def __init__(self, rolenames: list[str]):
self.rolenames = rolenames
The run
function is equivalent to the role function, but bounded with
the SphinxRole
subclass we created:
def run(self) -> tuple[list[Node], list[system_message]]:
...
Here we look up role functions. _roles
and _role_registr
are unexported
variables of docutils.parsers.rst.roles
that store the mapping
from role name to role function:
components = []
for r in self.rolenames:
if r in roles._roles:
components.append(roles._roles[r])
elif r in roles._role_registry:
components.append(roles._role_registry[r])
else:
# Error handling...
备注
We can not look up during __init__
, some roles created by
3rd-party extension do not exist yet at that time.
Run all role function, pass parameters as is, then collect the returning nodes:
nodes: list[TextElement] = []
for comp in components:
ns, _ = comp(self.name, self.rawtext, self.text, self.lineno,
self.inliner, self.options, self.content)
# Error handling...
nodes.append(ns[0][0])
The returned nodes should be exactly one docutils.nodes.TextElement
and
contains exactly one docutils.nodes.Text
as a child, like this:
<TextElement>
<Text>
Nesting nodes together by replace the Text
node with the inner(i+1
)
TextElement
:
for i in range(0, len(nodes) -1):
nodes[i].replace(nodes[i][0], nodes[i+1])
before |
replace |
after |
---|---|---|
i=0: <strong>
<text>
i=1: <literal>
<text>
|
i=0: <strong>
<text> ◄─┐
│ replace
i=1: <literal> ─┘
<text>
|
i=0: <strong>
<literal>
<text>
|
Now, nodes[0]
is the root of node combination, just return it:
return [nodes[0]], []
So the complete code looks like this:
class CompositeRole(SphinxRole):
#: Rolenames to be composited
rolenames: list[str]
def __init__(self, rolenames: list[str]):
self.rolenames = rolenames
def run(self) -> tuple[list[Node], list[system_message]]:
components = []
for r in self.rolenames:
if r in roles._roles:
components.append(roles._roles[r])
elif r in roles._role_registry:
components.append(roles._role_registry[r])
else:
# Error handling...
pass
nodes: list[TextElement] = []
for comp in components:
ns, _ = comp(self.name, self.rawtext, self.text, self.lineno,
self.inliner, self.options, self.content)
# Error handling...
nodes.append(ns[0][0])
for i in range(0, len(nodes) -1):
nodes[i].replace(nodes[i][0], nodes[i+1])
return [nodes[0]], []
The above code has been simplified for ease of explanation, for complete implementation, please refer to ⛺ sphinx-notes/comboroles.
Footnotes#
如果你有任何意见,请在此评论。 如果你留下了电子邮箱,我可能会通过 回复你。