I maintain a man-page-to-DocBook converter, doclifter. A side effect of this program is that
it serves as a validator for the correctness and portability of the
markup used on Unix manual pages. I test it by running it against all
the manual pages in a full Xubuntu 18.04 with some extras; there are 13699 of these on my development
machine, of which 566 already have DocBook masters. It converts 12796
(97.43%) of the remaining 13133 into valid XML-DocBook.
Most of the remaining 2.57% of errors happen because groff(1)
and its kin have weak-to-nonexistent validity checking. Often,
doclifter fails because of outright errors in macro usage that groff
does not catch. Sometime it fails on constructions that are legal but
perverse. Very occasionally it throws an error because a man page is
correct but has a structure that cannot be translated to DocBook. I
keep a database of patches for such problems, and periodically
try to push fix patches out to the manual-page maintainers.
(These are lower numbers and a higher error rate than in some
previous reports because I now use i3 rather than GNOME or KDE. Many
of the userland manuals that I used to check are no longer installed
where my test procedure can see them. Because bad markup tends to be
concentrated in the older manual pages of core tools, a larger random
sample pulls down the error rate.)
Even if you do not care about DocBook, this cleanup work benefits
all third-party manual page viewers, including the GNOME and KDE
documentation browsers; groff constructions that confuse doclifter
are very likely to produce visible problems on these.
The table below is a listing of the 324 (2.47%) pages on which
doclifter fails, but the failure can be prevented with a fix patch to
the manual page source. 13 pages (0.10%) remain intractable,
generally due to markup problems more severe than a point patch can
address. I am working with the individual projects responsible to get
those cleaned up.
It is likely that you are reading this because you have received
email telling you that patches are associated with your name or list
address. Please consider incorporating them, or equivalents, in your
next release. Also, please write back and tell me what you plan to do
so I can keep my database up-to-date.
If you are not already considering it, please think about moving
the documentation masters of your project to DocBook (or some format
from which you can generate DocBook). If everybody moved to using
DocBook as a common exchange format, it would become much easier to
support unified browsing of all system documentation with Web-like
hypertext capabilities, automatic indexing, and rich search
facilities.
Tools to generate man pages, HTML, and PostScript from DocBook
files are open-source and generally available. My program, doclifter,
should make moving your manual-page masters to DocBook a fairly
painless process.
Many major open source projects (including the Linux kernel, the
Linux Documentation Project, X.org, GNOME, KDE, and FreeBSD) have
already moved to DocBook or are in the process of doing so.
(Individual entries for accepted patches are no longer shown.)
Summary: 300 patches pending, 618 accepted, 0 rejected.
Status codes are as follows:
n |
No response yet. |
p |
Maintainer has informed me that this is fixed in the masters, but
I have not seen the fix yet. |
y |
Accepted |
r |
Rejected |
s |
Superseded (page lifts correctly without the patch) |
[0-9]+ |
number of mailings sent |
b |
Address is blocked |
Problem codes are explained after the table.
Error codes:
- A
- Dot or single-quote at start of line turns it into a garbage command.
This is a serious error; some lines of your page get silently lost
when it is formatted.
- B
- ( ) notation for mandatory parts of command syntax should be { }.
- C
- Broken command synopsis syntax. This may mean you're using a
construction in the command synopsis other than the standard
[ ] | { }, or it may mean you have running text in the command synopsis
section (the latter is not technically an error, but most cases of it
are impossible to translate into DocBook markup), or it may mean the
command syntax fails to match the description.
- D
- The .ce macro cannot be rendered in HTML, so I have replaced .ce
before captions with an .RS/.RE bloxk including both caption
and table.
- E
- My translator trips over a useless command in list markup.
- F
- Nonexistent or non-local .Sx target.
- G
- .Pp is structurally incorrect in an element list.
- H
- Renaming SYNOPSIS because either (a) third-party viewers and
translators will try to interpret it as a command synopsis and become
confused, or (b) it actually needs to be named "SYNOPSIS" with no
modifier for function protoypes to be properly recognized.
- I
- Use of low-level troff hackery to set special indents or breaks can't
be translated. The page will have rendering faults in HTML, and
probably also under third-party man page browsers such as Xman,
Rosetta, and the KDE help browser. This patch eliminates .br, .ta, .ti,
.ce, .in, and \h in favor of requests like .RS/.RE that have
structural translations.
- J
- Ambiguous or invalid backslash. This doesn't cause groff a problem.
but it confuses doclifter and may confuse older troff implementations.
- K
- Renaming stock man macros throws warnings in doclifter and is likely
to cause failures on third-party manual browsers. Please redo this
page so it uses distinct names for the custom macros.
- L
- List syntax error. This means .IP, .TP or .RS/.RE markup is garbled.
Common causes include .TP just before a section header, .TP entries
with tags but no bodies, and mandoc lists with no trailing .El.
These confuse doclifter, and may also mess up stricter man-page
browsers like Xman and Rosetta.
- M
- Missing Feature Test Macros header
- N
- Extraneous . at start of line.
- O
- Command-line options described are not actually implemented.
- P
- Removed unnecessary \c that confused the doclifter parser.
- Q
- Missing Description header.
- R
- .ce markup can't be structurally translated, and is likely
to cause rendering flaws in generated HTML.
- S
- DEPRECATED: in function syntax cannot be translated. Also, the
code and examples need to be marked up better.
- T
- Junk at the beginning of the manual page.
- U
- Unbalanced group in command synopsis. You probably forgot
to open or close a [ ] or { } group properly.
- V
- .SS is not .SH and they cannot be used interchangeably. You get away
with this by accident in roff, but it will badly confuse other tools
that look at man pages.
- W
- Missing or garbled name section. The most common form of garbling
is a missing - or extra -. Or your manual page may have been generated
by a tool that doesn't emit a NAME section as it should. Or your page
may add running text such as a version or authorship banner. These
problems make it impossible to lift the page to DocBook. They
can also confuse third-party manpage browsers and some implementations
of man -k.
- X
- Unknown or invalid macro. That is, one that does not fit in the
macro set that the man page seems to be using. This is a serious
error; it often means part of your text is being lost or rendered
incorrectly.
- Y
- I have been unable to identify an upstream maintainer for this
Ubuntu/Debian package, and am notifying the generic "Maintainer"
address in the package. Please forward appropriately. Also please
fix the package metadata so it identifies the upstream maintainers.
- Z
- Your Synopsis is exceptionally creative. Unfortunately, that means
it cannot be translated to structural markup even when things like
running-text inclusions have been moved elswhere.
- a
- Incorrect use of BSD list syntax confused doclifter's parser.
- b
- \c is an obscure feature; third-party viewers sometimes don't
intepret it. Plain \ is safer.
- c
- Function declarations had to be modified in order to fit into
the DocBook DTD. This is not an error in troff usage, but it
reduces the quality of the HTML that can be generated from this page
through the DocBook toolchain.
- d
- .eo/.ec and complex tab-stop hackery can't be translated to XML/HTML
and are almost certain to confuse third-party readers such as
Rosetta and Xman.
- e
- Macro definitions in the NAME section confuse doclifter and are
likely to screw up third-party man viewers with their own parsers.
- f
- Presentation-level use of .SS could not be structurally
translated. I changed lower-level instances to .TP or .B.
- g
- Use of a double quote for inch measurements often confuses people
who aren't from the Anglosphere.
- h
- Unbalanced .RS or .EE
- i
- Non-ASCII character in document synopsis can't be parsed.
- j
- Parenthesized comments in command synopsis. This is impossible
to translate to DocBook.
- k
- Misspelled macro name.
- l
- Invalid or unterminated font escape.
- m
- Contains a request or escape that is outside the portable subset that
can be rendered by non-groff viewers such as the KDE and GNOME help
browsers.
- n
- C function syntax has extra or missing paren.
- o
- TBL markup not used where it should be. Tables stitched together
with .ta or list requests can't be lifted to DocBook and will often
choke third-party viewers such as TKMan, XMan, Rosetta, etc.
- p
- Garbage trailiing \ in function synopsis.
- q
- The .ul request used here can't be translated into document structure.
I put these files in a hanging list, which can be.
- r
- I supplied a missing mail address. Without it, the .TP at the end of the
authors list was ill-formed.
- s
- Changed page to use the .URL macro now preferred on man(7).
- t
- Synopsis has to be immediately after NAME section for DocBook
translation to work.
- u
- Use local definitions of .EX/.EE or .DS/.DE to avoid low-level troff
requests in the page body. There are plans to add these to groff man;
in the interim, this patch adds a compatible definition to your page.
- v
- .in and .EX have crossed or inverted scopes.
- w
- .SS markup in name section seriously confuses parsing, and sections
don't follow standard naming conventions.
- x
- Syntax had to be rearranged because of an options callout.
This is still excessively complicated; third-party man-page
viewers are likely to choke on it.
- y
- This page was generated from some sort of non-man markup. Please
fix the upstream markup so that it generates a well-formed
manual page with the indicated corrections.
- z
- Garbled comment syntax.