Name

sitemap — make a site map from meta tags in an HTML tree

Synopsis

sitemap [-x] [-h start-dir] [-c config-file]

Description

sitemap indexes all pages under the start directory and writes an HTML map page to standard output. The code looks for description information for each page in a META DESCRIPTION header; if it doesn't find one, the page is omitted from the index. That is, HTML pages to be indexed should have a meta tag with its name attribute set to description and its content attribute set to a brief description of the contents. For example,


 <head>
   <title>Sitemap documentation</title>
   <meta name="description" 
     content="Documentation for sitemap program to index HTML pages.">
 </head>

The output of sitemap is an HTML page that contains a list of descriptions and links to the indexed pages. This output can be configured via an rc file (see below).

With the -x, the script emits a list of pathnames of pages that would be eligible for indexing but have no meta tags. This can be helpful in finding pages from which thuis information was omitted by mistake.

Arguments

If no options are supplied, the start directory is the directory indicated by the DOCUMENT_ROOT or HOME environment variables, in that order. If neither variable is specified on a UNIX system, the effective user's home directory (as indicated in the passwd file) will be used. If a start-dir directory is supplied with the -h, then that directory will be used as the start directory. In both of these cases, the configuration will be read from a file named .sitemaprc in the start directory. If the configuration file does not exist, sitemap will run with a set of default parameters, which is usually not what you want.

If a config-file configuration file is specified with the -c, then the configuration for sitemap will be read from that file. In this case, the start directory will be the directory containing the configuration file.

Configuration File

sitemap is a Python script. To configure the strings used in the index page header and footer, you can create a configuration file in your home directory called .sitemaprc (or as indicated by the command-line parameter). A skeleton of a configuration file is provided with the program. The file should start with the text [sitemap] on a line by itself. Subsequent lines should be name=value pairs. Lines beginning with the # character are treated as comments and are ignored. The possible field names in the configuration file are listed below:

Hometitle=title

The title of your homepage. The generated site map will contain a link with this text.

Homepage=url

The URL of your homepage. The generated site map will contain a link back to this page.

Indextitle=title

The title for the generated site map page.

Headinfo=any Html Text

Any additional HTML you want to include in the <head> section of the site map. Use with care - only certain tags are legal in the <head> of a page.

Sitemap=string

Path of a filr to be used as the stylesheet attribute

Prefix=url

An optional URL prefix to put before each pathname. Normally, sitemap outputs each filename as a site-relative path beginning with a '/', in the assumption that the start-directory can be accessed with the URL '/'. (That is, the start directory would be the directory indciated by the web server's DOCUMENT_ROOT.) If this is incorrect (e.g. you are indexing a user's home page whose URL begins with '/~username') you can supply the alternative URL prefix here.

Dirtitle=title

The title string to use for directories. Directories are listed and linked in the generated site map page with this text.

Fullname=name

Your full name. This name will be included in one corner of the generated site map page. You may want to list a company name or a copyright statement instead, for example.

Mailaddr=address

E-mail address of a contact person. Since the e-mail address will be linked on the generated site map page, you may want to set this parameter to the e-mail address of a contact person or a webmaster.

Lang=languuage

ISO 639-1 language code for the boilerplate text included in the output (Czech = cs, English = en, French = fr, German = de, Italian = it, Norwegian = nb, Portuguese = pt, Spanish = es, or Swedish = sv).

Icondirs=icon Path

The path (relative to the start directory or a URL) of the icon for directories. The icon must be 33 pixels wide (or scaleable to that size). If omitted, no icon will be displayed next to site map entries for directories.

Icontext=icon Path

The path (relative to the start directory or a URL) of the icon for HTML files. The icon must be 33 pixels wide (or scaleable to that size). If omitted, no icon will be displayed next to site map entries for HTML pages.

Indexfiles=file1 File2 File3

A space-separated list of files to treat as index or main pages for a directory. Any file with a filename exactly equal to one of the indicated filenames will be treated as an index page. Index pages sort to the top of the list of files in a directory. For example, index.html or default.htm might be good candidates for this parameter.

Exclude=word1 Word2

A space-separated list of words to ignore when scanning files and directories. sitemap will skip any file or entire subdirectories the contain any of the words in their path. For example, Test or CVS may be good candidates for this parameter.

Debug=y

Set this parameter to view the computed configuration file name, start directory, document root, and prefix in the generated site map page. You'll need to view the source of the generated HTML file because these values will be listed within and HTML comment. Search for the word Debugging in the generated HTML page.

Use Under CGI

You can use sitemap to generate site maps on the fly. Any command-line argument can be passed as the query string (i.e. a string immediately following the URL of the CGI script and a '?' character).

sitemap will deduce that it is running under the CGI by virtue of the fact that the REMOTE_ADDR environment variable is defined. If so, it outputs a content-type header (text/html) ahead of the HTML page.

When running as a CGI script, sitemap does not assume that the document root is necessarily identical with the start directory. It inspects the DOCUMENT_ROOT environment variable and constructs a prefix in an attempt to get from the server document root to the start directory. This will fail if the start directory is not a subdirectory under the document root, in which case the prefix directive in the configuration file should be used.

Authors

Eric S. Raymond .

Immo Huneke .

Tom Bryan .