I maintain a man-page-to-DocBook converter, doclifter. A side effect of this program is that
it serves as a validator for the correctness and portability of the
markup used on Unix manual pages. I test it by running it against all
the manual pages in a full Fedora Core 6 with some extras; there are 13467 of these on my development
machine. It converts 13017 (96.66%) into valid XML-DocBook.
Most of the remaining 3.34% of errors happen because groff(1)
and its kin have weak-to-nonexistent validity checking. Often,
doclifter fails because of outright errors in macro usage that groff
does not catch. Sometime it fails on constructions that are legal but
perverse. Very occasionally it throws an error because a man page is
correct but has a structure that cannot be translated to DocBook. I
keep a database of patches for such problems, and periodically
try to push fix patches out to the manual-page maintainers.
Even if you do not care about DocBook, this cleanup work benefits
all third-party manual page viewers, including the GNOME and KDE
documentation browsers; groff constructions that confuse doclifter
are very likely to produce visible problems on these.
The table below is a listing of the 394 (2.93%) pages on which
doclifter fails, but the failure can be prevented with a fix patch to
the manual page source. 56 pages (0.42%) remain intractable,
generally due to markup problems more severe than a point patch can
address. I am working with the individual projects responsible to get
those cleaned up.
It is likely that you are reading this because you have received
email telling you that patches are associated with your name or list
address. Please consider incorporating them, or equivalents, in your
next release. Also, please write back and tell me what you plan to do
so I can keep my database up-to-date.
If you are not already considering it, please think about moving
the documentation masters of your project to DocBook (or some format
from which you can generate DocBook). If everybody moved to using
DocBook as a common exchange format, it would become much easier to
support unified browsing of all system documentation with Web-like
hypertext capabilities, automatic indexing, and rich search
facilities.
Tools to generate man pages, HTML, and PostScript from DocBook
files are open-source and generally available. My program, doclifter,
should make moving your manual-page masters to DocBook a fairly
painless process.
Many major open source projects (including the Linux kernel, the
Linux Documentation Project, X.org, GNOME, KDE, and FreeBSD) have
already moved to DocBook or are in the process of doing so.
Summary: 349 patches pending, 223 accepted, 8 rejected.
Status codes are as follows:
n |
No response yet. |
p |
Maintainer has informed me that this is fixed in the masters, but
I have not seen the fix yet. |
y |
Accepted |
r |
Rejected |
[0-9]+ |
number of mailings sent |
b |
Address is blocked |
Problem codes are explained after the table.
Error codes:
- 0
- This problem reflects a serious bug in db2man.
- 1
- Improper line wrap in .ds text.
- 2
- Description as well as name is required in a name section.
- 3
- Garbled escape sequence
- 4
- SYNTAX section is actually a Unix SYNOPSIS and should be so marked.
- 5
- Unescaped \d looks like a troff down-motion. This probably messes
up the rendering of the page in some environments, and certainly
confuses automated translation to XML.
- 7
- Unbalanced highlights. This is not technically an error, but
it makes display programs and translators to other formats
much more likely to break.
- 8
- Use of low-level troff hackery to set special indents or breaks can't
be translated. The page will have rendering faults in HTML, and
probably also under third-party man page browsers such as Xman, TkMan,
Rosetta, and the KDE help browser. This patch eliminates .br, .ti
and .in in favor of requests like .nf/.fi, and .RS/.RE that have
structural translations.
- 9
- Macro call is run on to the end of a line.
- A
- Dot or single-quote at start of line turns it into a garbage command.
This is a serious error; some lines of your page get silently lost
when it is formatted.
- B
- -[0-9] cannot be rendered in Docbook command-syntax markup.
This is not technically an error, but it makes the page impossible
to translate to DocBook.
- C
- Broken command synopsis syntax. This may mean you're using a
construction in the command synopsis other than the standard
[ ] | { }, or it may mean you have running text in the command synopsis
section (the latter is not technically an error, but it's impossible
to translate into DocBook markup).
- D
- Section or macro out of place; this confuses translators.
- E
- My translator trips over a useless command in list markup.
- F
- List structure can be better expressed with .IP.
- G
- Since this page was generated from db2man, I understand that
some of its problems may be due to db2man bugs that need to be reported
upstream. I am sending this heads-up nevertheless because at least
some of the problems can be fixed or worked around in your sources.
- H
- Illegal metavariable in a Synopsis description.
- I
- .it or .ti macro use is impossible to translate structurally.
- J
- Ambiguous or invalid backslash. This doesn't cause groff a problem.
but it confuses doclifter and may confuse older troff implementations.
- K
- You seem to be distributing a formatted manual page rather than source.
This may be a packaging error.
- L
- List syntax error. This means .IP, .TP or .RS/.RE markup is garbled.
This confuses doclifter, and may also mess up stricter troff
interpreters like Xman, Rosetta, and TkMan.
- M
- Macro definition is in a location (such as the Synopsis section)
that confuses translation tools.
- N
- Unbalanced or superfluous quotes may screw up argument parsing.
- O
- Running text in what should be a Unix command synopsis.
The right fix for this is to change the section name.
- P
- Garbage in the man page probably reflects a bug in the netpbm
makeman utility.
- Q
- You used .UN where .UR is needed.
- R
- English usage errors, apparently the writer is not a native speaker.
- S
- DEPRECATED: in function syntax connot be translated. Also, the
code and examples need to be marked up better.
- T
- There are multiple description lines. This makes it impossible to
translate the page to DocBook. It may also confuse some
implementations of man -k.
- U
- Unbalanced group in command synopis. You probably forgot
to open or close a [ ] or { } group properly.
- V
- .DS and .DE input macros are swapped.
- W
- Missing or garbled name section. The most common form of garbling
is a missing - or extra -. Or your manual page may have been generated
by a tool that doesn't emit a NAME section as it should. Or your page
may add running text such as a version or authorship banner. These
problems make it impossible to lift the page to DocBook. They
can also confuse third-party manpage browsers and some implementations
of man -k.
- X
- Unknown or invalid macro. That is, one that does not fit in the
macro set that the man page seems to be using. This is a serious
error; it often means part of your text is being lost or rendered
incorrectly.
- Y
- Missing or garbled section header.
- Z
- Garbage character at beginning of file.
- a
- .EX/.EE macros are missing or misplaced.
- c
- SYNOPSIS must come before DESCRIPTION or other sections. Otherwise
correctness-checking library pages that may have multiple Synopsis
subheadings becomes too difficult.
- d
- This old-style C prototype is too hard to parse, best to fix it.
- e
- Garbage generated by docbook2man
- f
- Presentational use of .SH messes up section parsing.
- g
- Run-on .B or .I macro.
- i
- Macro invocation in conditional confuses the doclifter parser.
- j
- Parenthesized comments in command synopsis. This is impossible
to translate to DocBook.
- l
- Page consists solely of NAME and SYNOPSIS.
- m
- Contains a request that is outside the portable subset that can be
rendered by non-groff viewers such as the KDE help browser, Xman,
TKman, or manserver.
- n
- Unbalanced .RS or .RE macro
- o
- TBL markup not used where it should be. Tables stitched together
with .ta requests can't be lifted to DocBook and will often
choke third-party viewers such as TKMan, XMan, Rosetta, etc.
- p
- It is unnecessary to explain basic shell redirects on a man page.
It is also bad style, especially when doing so produces an
unparseable SYNOPSIS section.
- q
- Formatting file lists as tables in a Synopsis is impossible to
translate into DocBook.
- r
- .Ss within a Synopsis confuses my section parser.
- s
- Example URLs which should *not* be turned into hyperlinks need
an invisible stopper so tools which try to lift URLs from the page
source will pass over them.
- t
- Unclosed .RS, appears to be a broken attempt to express list structure.
- u
- Use local definitions of .EX/.EE or .DS/.DE to avoid low-level troff
requests in the page body. There are plans to add these to groff man;
in the interim, this patch adds a compatible definition to your page.
- v
- Doubled dot before macro command.
- w
- You wrote .br where you meant .sp -- as written, the markup will
fail to produce a blank line where one was clearly intended.
- x
- Unclosed .nf needs a .fi
- y
- Page is empty. This probably means there is some sort of glitch in
your build machinery.
- z
- pod2man generates an unbalanced .RS tag. This is a bug.