this converter is currently supposed to be able to reproduce the
docbook-generated html DOMs exactly, though not necessarily the
html *files*. it mirrors many docbook behaviours that seem rather odd,
such as top-level sections in chapters using the same heading depth as
understood by html as their parent chapters do. over time we can
hopefully remove all special casing needed to reproduce docbook
rendering, but for now at least it doesn't hurt *too* much.
this will be necessary for html since there we have to do chunking into
multiple files ourselves. writing one file from the caller of the
converter and all others from within the converter is unnecessarily
spread out, and returning a dict of file names and their contents is not
quite as meaningful for docbook (which has only one file to begin with).
it's not hooked up to anything yet, but that will come soon. there's a
bit of docbook compat here that must be interoperable with the actual
docbook exporter, but luckily it's not all that much.
the basic html renderer. it doesn't have all the docbook compatibility
codes embedded into it, but there is a good amount. this renderer is
unaware of manual structure and does not traverse structural include
tokens (if it finds any it'll just fail), that task falls to derived
classes. once we have more uses for structural includes than just the
manual we may revisit this decision.
the docbook toolchain uses docbook-xsl to generate its TOC, our html
renderer will have to do this on its own. this generator uses a very
straight-forward algorithm of only inspecting headings, but anything
else could be inspected as well. (examples come to mind, but those do
not have titles and would thus make for bad toc entries)
we also use path information (that will be taken from include block args
in the html renderer) to produce navigation information. the algorithm
we use mirrors what docbook does, linking to the next/previous files in
depth-first toc order.
toc entries are linked to the tokens they refer to for easy use later.
while docbook relies on external chunk-toc info to do chunking of the
rendered manual we have nothing of the sort for html. there it seems
easiest to add annotations to blocks to create new chunks. such
annotations could be extended to docbook to create the chunk-toc instead
of passing it in externally, but with docbook on the way out that seems
like a waste of effort.
text content in the toplevel file of a book will not render properly.
the first proper element will be a preface, part, or chapter anyway, and
those require includes to produce.
parts do not currently allow headings in the part file itself, but
that's mainly a renderer limitation. we can add support for headings in
part intros when we need them
in all other cases includes must be followed by either another include,
a heading, or end of file. text content could not be properly linked to
from a TOC without a preceding heading.
without this we cannot build a TOC to arbitrary depth without generating
ids for headings, but generated ids are fragile and liable to either
break or point to different things if the manual changes shape. we
already have the convention that all headings should have an id, this
formalizes it.
while not technically necessary for correct rendering of *contents* we
do need to disallow heading levels being skipped to build a correct
TOC. treating headings that have skipped a number of levels to actually
be headings that many levels up only gets confusing, and inserting
artifical intermediate headings suffers from problems, such as which ids
to use and what to call them.
check that all required headings are present during parsing, not during
rendering. building a correct TOC will need this since every TOC entry
needs a heading to set its title, and every included substructure needs
a title.
also improve the error message on repeated title headings slightly,
giving the end line turns out to not be very useful.
for most of our data classes we can use dataclasses.dataclass with
frozen=True or even plain named tuples. the TOC structure we'll need to
generate proper navigation links is most easily represented and used as
a cyclic structure though, and for that we can use neither. if we want
to make the TOC structures immutable (which seems like a good idea)
we'll need a hack of *some* kind, and this hack seems like the least intrusive.
we should really be rendering options at *rendering* time, not at parse
time. currently this is just an academic exercise, but the html renderer
will have to inspect the options.json data after the entire document has
been parsed, but before anything gets rendered.
these weren't used for anything. options never was (and does not contain
any information for the renderer that we *want* to honor), and env is
not used because typed renderer state is much more useful for all our cases.
our renderers carry significantly more state than markdown-it wants to
easily cater for, and the html renderer will need even more state still.
relying on the markdown-it-provided rendering functions has already
proven to be a nuisance, and since parsing and rendering are split well
enough we can just replace the rendering part with our own stuff outright.
this also frees us from the tyranny of having to set instance variables
before calling super().__init__ just to make sure that the renderer
creation callback has access to everything it needs.
the old method of pasting parts of options.json into a markdown document
and hoping for the best no longer works now that options.json contains
more than just docbook. given the infrastructure we have now we can
actually render options.md properly, so we may as well do that.
inline anchors are not allowed in option docs per the manual, and the
sole class we current have (.keycap) is never used anyway. disallow them
for now to avoid future surprises.
the same goes for examples, which aren't even documented in the manual yet.
move the restrictions we care about into a mixin class. a few more
restrictions will appear soon and a few new converters as well, the
renderers of which need not have these restrictions already baked in by
accident (like the manpage renderer does right now).
add the ability to set the info string for a newly created fenced code
block, and a flag to always emit a fenced block. the commonmark
converter will need this to faithfully recreate fenced and indented code
blocks.
with mypy type checking and Mapping types this is a lot less useful than
anticipated. let's drop it for simplicity and having fewer dependencies.
frozendict 2.3.5 also broke the mypy checks.
options processing is pretty slow right now, mostly because the
markdown-it-py parser is pure python (and with performance
pessimizations at that). options parsing *is* embarassingly parallel
though, so we can just fork out all the work to worker processes and
collect the results.
multiprocessing probably has a greater benefit on linux than on darwin
since the worker spawning method darwin uses is less efficient than
fork() on linux. this hasn't been tested on darwin, only on linux, but
if anything darwin will be faster with its preferred method.
this adds support for structural includes to nixos-render-docs.
structural includes provide a way to denote the (sub)structure of the
nixos manual in the markdown source files, very similar to how we used
literal docbook blocks before, and are processed by nixos-render-docs
without involvement of xml tooling. this will ultimately allow us to
emit the nixos manual in other formats as well, e.g. html, without going
through docbook at all.
alternatives to this source layout were also considered:
a parallel structure using e.g. toml files that describe the document
tree and links to each part is possible, but much more complicated to
implement than the solution chosen here and makes it harder to follow
which files have what substructure. it also makes it much harder to
include a substructure in the middle of a file.
much the same goes for command-line arguments to the converter, only
that command-lined arguments are even harder to specify correctly and
cannot be reasonably pulled together from many places without involving
another layer of tooling. cli arguments would also mean that the manual
structure would be fixed in default.nix, which is also not ideal.
<part> is different from all other blocks we care about in that it
requires textual content to be wrapped in <partintro>. add support for
this to the generic docbook renderer, which will just assume that a part
is the whole document start to finish. we do make provision for the
manual renderer to close a partintro tag early though.
__context__ is always set to the prior exception, even when not using
the raise from form. __cause__ is only set during raise from. use
__cause__ so we can override a leaf exception (eg KeyError to something
more meaningful).
render all manual chapters to docbook from scratch every time the manual
is built. nixos-render-docs is quick enough at this to not worry about
the cost (needing only about a second), and it means we can remove
md-to-db.sh in the next commit.
no changes to the rendered html manual except for replacements and smartquotes.
we'll soon add another docbook converter that does not emit a section as
a collection of chapters, but sections or chapters on their own. this
should clarify naming a bit before there can be any confusion.
this is currently only supported by the docbook exporter, and even the
docbook exporter doesn't do much with them. we mirror the conversion
pandoc did for consistency with the previous manual chapter conversion,
which is to add just an anchor with the given id. future exporters that
go directly to html might want to do more.
this is a subset of pandoc's fenced divs. currently we only use this for
admonitions (which get a new name to differentiate them from other kinds
of blocks), but more users will appear soon.
this is used by release notes (and we don't want to break links to
those), and is also technically allowed anyway. we will *not* extend the
regex to allow more characters just yet due to a mozilla recommendation
against it (cf https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/id)
this should've been a core rule from the beginning. not being a core
rule made it always run after smartquotes and replacements, which
could've wrecked the id.
this lets us parse the `[F12]{.keycap}` syntax we recently introduced to
the nixos manual markdown sources. the docbook renderer emits the keycap
element for this class, the manpage renderer will reject it because it's
not entirely clear what to do with it: while html has <kbd> mandoc has
nothing of the sort, and with no current occurences in options doc we
don't have to settle on a (potentially bad) way to render these.
this is pretty much what pandoc calls bracketed spans. since we only
want to support ids and classes it doesn't seem fair to copy the name,
so we'll call them "attributed span" for now. renderers are expected to
know about *all* classes they could encounter and act appropriately, and
since there are currently no classes with any defined behavior the most
appropriate thing to do for now is to reject all classes.
now that the renderer produces the output we want to keep for the future
we can add a test that checks all of its features. this test notably
does not include markdown headings since we don't want to have those in
manpages (at least right now), but tests for other converters may add
headings for themselves.