Reproducible computing with Lir

Boris Vassilev

2020-01-30

Lir is a tool for reproducible computing. The name Lir stands for literate, reproducible computing. With Lir you can organize, document, and automate your work flow.

Any combination of programming languages and software platforms can be used in the same work flow. All executable code is maintained in the Lir source file, along with rules for generating the results of running each program. A Lir source file additionally contains the documentation of a project.

In this document, we provide detailed Lir installation instructions, followed by a User’s Guide, and the full implementation of Lir at the end.

This document is self-hosting: the implementation of Lir presented here can be used to produce the final human-readable version of this source file. This makes informal validation somewhat easier (“does the final document look right?”), but it also means that the only guarantee is that it implements enough of Lir to be able to compile itself.

Installation

You need a reasonably modern Linux/BSD operating system. Any of the major GNU/Linux and BSD distributions, and OS X fulfil that requirement. It should be possible to use it on Microsoft Windows 10, but at the moment this is not officially supported: however, if you are interested in trying to install Lir on a Windows 10, share your experience.

In particular, you will need an up-to-date version of the GNU Toolchain (autoconf, make, gcc, binutils, glibc), as well as Bash along with the standard command line tools like grep, sed, and awk (to name a few).

Prerequisites

Furthermore, you need to install the following software:

Lir makes use of existing tools for literate programming, making, and markup. A working knowledge of these tools is not required, but it is beneficial, at least at the beginning.

Makefiles

It would help you if you had some working knowledge of writing makefiles, as understood by GNU Make. Reading the official documentation is a good start; at least take a look at the first few chapters.

Markdown

You need to have a basic idea of the concept of markdown. Armed with this knowledge, open the excellent manual provided by Pandoc and use it as a reference.

Literate Programming with noweb

Understanding the concept of literate programming is a must. You may take a look at the original publication by Donald E. Knuth (pdf link); you can also skip this at first. The Wikipedia page for noweb contains enough detail to get you acquainted with the syntax. A Lir source file is a valid noweb source file.

Once you have all prerequisites, clone the Lir repository and change to the lir directory:

$ git clone https://github.com/borisvassilev/lir.git
$ cd lir

Before you can install Lir, three paths that might be different for each installation need to be adjusted.

Setting the installation paths

Figure out where you have installed the noweb library files. If you installed from the official Ubuntu directories, this should be /usr/lib/noweb.

If you cannot find this directory, the directory where the files were installed is mentioned towards the bottom of the nowebfilters man page, under the FILES section. So try this:

$ man nowebfilters

If for some reason you this doesn’t work either, you can try looking for one of the files in the library, for example:

$ locate emptydefn

If you cannot find a file called emptydefn on your system, you have not installed noweb correctly: please do so now.

Now that you know where the library files are, you need to open the Lir source file lir.lir and edit the three paths on lines 96, 100, and 105 of this file.

First, the folder where all lir components are installed:

C1: ⟪Path to lir
Appears in C10, C15, C35, C36

"$HOME"/lib/lir

Then, the folder where the main lir script will be installed; make sure it is on your system PATH.

C2: ⟪Path to lir binaries
Appears in C10

"$HOME"/bin

Finally, the folder containing the noweb library files. If you installed noweb normally (and as root), this will read /urs/lib/noweb.

C3: ⟪Path to noweb
Appears in C10, C34

"$HOME"/lib/noweb

Now save and close the file lir.lir, and run:

$ bash bootstrap

This will extract and install all Lir files to your computer, and use the installed Lir to compile this source file to an HTML document. You can use the command line above to reinstall Lir if you update it, or if you change the source.

User’s Guide

Lir defines a syntax for defining a reproducible computation, and the tools for running the computation on the input files to obtain results. You must have already read the Tutorial; you can use this Guide as a reference.

Source file

The Lir source file is a valid noweb source file. Lir adds semantics by recognizing and interpreting keywords embedded in code chunk names. A Lir source file has the extension “.lir”. It is a plain text file. Within this file, there are code chunks, as defined by noweb, and documentation chunks (everything between and around code chunks). Documentation chunks may be formatted and structured with markdown as understood by Pandoc.

Code chunks

A code chunk starts with a code chunk header, followed by the contents of the code chunk, and a code chunk footer. The code chunk header is the string “<<Code chunk name>>=” on a line by itself. The code chunk footer is the string “@” on a line by itself. If the code chunk must contain a line starting with “<<” or “@”, use an additional “@” to escape these: “@<<” and “@@”.

A code chunk ends with a footer line or with the header line of another code chunk.

Here is a link to the Lir source of the few following paragraphs; please read those in the raw Lir source to understand the code chunk syntax and examples!

Below, we see a code chunk with the name “: Just an example”.

C4: ⟪: Just an example
root chunk

These are the contents
of this chunk.
Another code chunk is included by a name reference.

Code chunks may refer to other code chunks by name: just put the name in double angle brackets. Whenever a code chunk is extracted, all code chunk references are replaced with their contents. If we define:

C5: ⟪Another code chunk
Appears in C4

Yet another code chunkMoar content!1!!!1Yet another code chunk

C6: ⟪Yet another code chunk
Appears in C5

***

Then, the full contents of : Just an example, if extracted from the source file, will be:

These are the contents
of this chunk.
***Moar content!1!!!1*** is included by a name reference.
The “tangled” code chunk

This is how we generate the listing:

C7: ⟪:make
root chunk

definition continued in C69

just-an-example-tangled: lir.lir
	notangle -R": Just an example" $< > $@

C69
S1: lir.lir

(If you don’t understand what happens in the lines above, please read the Tutorial.)

Special code chunks

All code chunks that have names starting with a colon (:) are given special treatment. The code chunk : Just an example is a code chunk that will not be considered for extracting.

Sources and sinks

We represent the work flow as a directed, acyclic graph (DAG). In this DAG, each node is a data object: a file. Some of these files are input data: they are sources of the DAG. Some of these files are final results: they are the sinks of th DAG.

The sources of the DAG (the input files to the workflow) must be declared explicitly in a special code chunk with a name: “:source ”. For example, this is how we declared :source lir.lir above.

Display items

To declare a placeholder for a display item – a Figure, a Table, or a Listing – use a code chunk with a special name. The name must start with “:figure”, “:table”, or “:listing”, followed by a single space and the name of the file that should be displayed.

Here is a link to the following few paragraphs in the Lir source file; you need to look at the source to understand the examples!

In the contents of a display item, you must declare the title of the display item, and you may declare a caption for the display item. This is a figure with a caption:

Figure 1: example.svg
Example figure This is the caption for the display item.

Make rules

All dependencies and rules are defined within make code chunks. The names of these code chunks must begin with “:make”. Here is an example:

C8: ⟪:make The example above
root chunk

example.svg: make-example.R
	Rscript --vanilla $< $@

This defines a rule for generating the example SVG image displayed in :figure example.svg. The rule has one target, example.svg, and one prerequisite, the R script make-example.R. It defines a rule for generating the target: by executing Rscript with the command line argument --vanilla and the script as the first argument and the target file as the second argument (note that obviously, you need to have R installed on your computer if you want to generate this figure and the final document).

Root code chunks

A root code chunk is a code chunk that is not referenced by name in any other code chunk.

Executable code

All root code chunks that don’t have a special name are extracted to files during the first stage of Lir, lir-tangle. The name of the code chunk determines the name of the file. Here is the executable R script that generates the example above:

C9: ⟪make-example.R
root chunk

out <- commandArgs(trailingOnly = TRUE)[1]
svg(out, 8, 5)
curve(sin, -2*pi, 2*pi, type="l", col="darkred")
curve(cos, -2*pi, 2*pi, type="l", col="darkblue", add = TRUE)
dev.off() -> foo

Citations

You can add citations in the markup recognized by Pandoc. The easiest way to produce the citations and a bibliography is to add a bibliography file in the working directory. The file must have the same base name as the Lir source file, and the extension “.bib”. For example, since this source file is named lir.lir, the bibliography is in lir.bib. The bibliography file should be in the format recognized by Biblatex; a good free software for maintaining a bibliography is JabRef, available for all operating systems.

The bibliography will be at the end of the final document. If you want to have it listed in the table of contents, you can put a section heading at the end of the document. The last line of the Lir source file can be for example “# References” or “# Bibliography”.

Running Lir

Lir provides the command line program lir. To extract all code, execute it, and compile the final document, you need to run lir with the name of the Lir source file as the only argument. For example, if you are in the working directory of the Lir distribution, you can do:

$ lir lir.lir

This will run the three steps: tangling, making, and weaving. You can also do each step separately, by providing a one of tangle, make, or weave as the first argument to lir:

$ lir tangle lir.lir
$ lir make lir.lir
$ lir weave lir.lir

This can be useful if you need to run lir-make on another machine.

State

When a Lir source file is interpreted, a current state of the project is created in the hidden directory .lir inside the working directory. Every time lir is re-run on a source file, the state is updated. Only files that have changed are updated: thus, running Lir does not run programs that have not changed on input data that has not changed.

The state contains all executable files extracted from the Lir source file, makefiles to generate all results, symbolic links to all input data, and all results. The state in .lir can be copied as it is to another machine, and used to generate all results remotely.

It is safe to remove the state by deleting the .lir directory and all its contents. This will force all intermediate results to be re-generated.

Implementation

This is the complete source of Lir.

Making and installing

C10: ⟪build.sh
root chunk

Bash header

echo "*** Making and Installing..."
notangle -t8 -filter emptydefn -RMakefile lir.lir > Makefile
export LIRPATH=Path to lir
export LIRBINPATH=Path to lir binaries
export NOWEBPATH=Path to noweb

make uninstall
make clean
make
make install
echo "*** Done!"

rm -rf .lir
echo "*** Using lir to tangle and weave lir..."
lir lir.lir
echo "*** Weaved document is in lir.html!"
cp -v lir.html docs/lir.html
echo "*** Generating index..."
title=$(mktemp)
echo "---" > "$title"
echo "title: Lir" >> "$title"
echo "author: Boris Vassilev (University of Helsinki)" >> "$title"
echo "..." >> "$title"
sed "s/^@/@@/" README.md \
        | sed "s/^<</@<</" \
        | cat "$title" - \
        > docs/index.lir
(cd docs ; lir index.lir)
rm "$title"
echo "*** Generating index done!"

This script is used when the bash script file bootstrap is evaluated.

Makefile

This makefile is used by the build script above.

C11: ⟪Makefile
root chunk

nwpipemodules = driver.pl nwpipe.pl lirhtml.pl yaml.pl

nwpipesources = nwpipe-pandoc.pl $(nwpipemodules)

tangled = lir lir-weave $(nwpipesources) lir.css

all : lir.lir
	$(NOWEBPATH)/markup -t lir.lir \
                | $(NOWEBPATH)/emptydefn \
                | $(NOWEBPATH)/mnt -t $(tangled)
	chmod u+x lir lir-weave

install :
	mkdir --parents $(LIRPATH)
	cp --verbose --preserve \
                lir.css $(nwpipesources) lir-weave $(LIRPATH)
	mkdir --parents $(LIRBINPATH)
	cp --verbose --preserve \
                lir $(LIRBINPATH)
.PHONY : install

clean :
	-rm $(tangled)
.PHONY : clean

uninstall :
	-rm -r $(LIRPATH)
	-rm $(LIRBINPATH)/lir
.PHONY : uninstall

Bash scripts

The portable Bash header is, according to Stack Overflow:

C12: ⟪Bash header
Appears in C10, C15, C32

definition continued in C13

#! /usr/bin/env bash
C13

Furthermore, we set the scripts to immediately exit with error on failure, consider failure in a pipe a failure, and to consider “empty” (unset) variables an error:

C13: ⟪Bash header
C12

set -o errexit
set -o pipefail
set -o nounset

Overview

The tools provided by lir support two goals:

  1. Generating results by running the code in a lir source file.
  2. Presenting all code and possibly results in a human-readable form.

This is usually done in three independent steps: tangling, making, and weaving.

The wrapper script

The program is installed as a script that can be invoked with one of the three actions, tangle, make, or weave, as the first argument.

C14: ⟪lir
root chunk

lir.bash

C15: ⟪lir.bash
Appears in C14

Bash header
Define functions used by lir

if [ "$#" -eq 0 ]
then
        :
else

if [ $# -eq 1 ]
then
        lir tangle "$1" && lir make "$1" && lir weave "$1" && exit
fi

LIR_SOURCE="$2"
LIR_DIR=.lir
MORE_OPTS="${@:3}"
BASENAME=$(basename "$LIR_SOURCE" .lir)
STATE_SOURCE="__LIR_SOURCE_$LIR_SOURCE"

case "$1" in
tangle)
        mkdir --verbose --parents "$LIR_DIR"
        tmpdir=$(mktemp --directory --tmpdir="$LIR_DIR")

        awk 'Nameless code chunks as continuations .awk' \
                        "$LIR_SOURCE" \
                | awk 'Close all codechunks with @ .awk' \
                | awk -v \
                        evalcmd="bash -o errexit -o pipefail -o nounset" \
                        'Evaluate code chunk contents .awk' \
                > "$LIR_DIR"/"$STATE_SOURCE"

        noroots "$LIR_DIR"/"$STATE_SOURCE" \
                | Drop double angle brackets \
                | Remove empty lines \
                > "$tmpdir"/rootchunks
        Remove special chunknames "$tmpdir"/rootchunks \
                > "$tmpdir"/filechunks
        Get source chunknames "$tmpdir"/rootchunks \
                > "$tmpdir"/sourcechunks
        Get sink chunknames "$tmpdir"/rootchunks \
                > "$tmpdir"/sinkchunks
    
        < "$tmpdir"/filechunks lir-mtangle "$LIR_DIR"/"$STATE_SOURCE" "$LIR_DIR"
        < "$tmpdir"/sourcechunks lir-use "$LIR_DIR"

        if grep --quiet --line-regexp ':make.*' "$tmpdir"/rootchunks
        then
                noroots "$LIR_DIR"/"$STATE_SOURCE" \
                        | sed -n 's/^<<\(:make.*\)>>$/-R\1/p' \
                        | tr '\n' '\0' \
                        | xargs --null notangle -t8 \
                                "$LIR_DIR"/"$STATE_SOURCE" \
                        > "$LIR_DIR"/.makedag
        else
                echo '' > "$LIR_DIR"/.makedag
        fi

        echo -n 'all: ' > "$LIR_DIR"/.makeall
        < "$tmpdir"/sinkchunks tr '\n' ' ' >> "$LIR_DIR"/.makeall
        echo ";" >> "$LIR_DIR"/.makeall

        echo -n "$BASENAME.html: $STATE_SOURCE " > "$LIR_DIR"/.makehtml
        < "$tmpdir"/sinkchunks tr '\n' ' ' >> "$LIR_DIR"/.makehtml
        echo ";" >> "$LIR_DIR"/.makehtml

        # Check for a bib file and add bibliography
        LIRBIB=""
        if [ -r "$BASENAME".bib ]
        then
                BIBTARGET=$(readlink -f "$BASENAME".bib)
                LIRBIB="--bibliography=$BIBTARGET"
        fi
        echo -e "\tPath to lir"'/lir-weave '"$LIRBIB"' < $< > $@' \
                >> "$LIR_DIR"/.makehtml

        rm -r "$tmpdir"
        ;;
make)
        make --jobs --directory="$LIR_DIR" \
                --makefile=.makeall --makefile=.makedag \
                all
        ;;
weave)
        make --directory="$LIR_DIR" \
                --makefile=.makehtml \
                "$BASENAME".html
        cp --verbose --update ".lir/$BASENAME.html" "$BASENAME.html"
        ;;
*)
        echo "ERROR: unknown command $1"
        ;;
esac

fi
# if [ "$#" == "0" ]

Tangling

C16: ⟪Define functions used by lir
Appears in C15

lir-use
lir-mtangle

C17: ⟪lir-mtangle
Appears in C16

lir-mtangle () {
while IFS='' read -r chunk
do
        notangle -t8 -R"$chunk" "$1" | cpif "$2/$chunk"
done
}

Not all data in an analysis naturally fits into the lir source file. Examples already mentioned above are the input data, or an externally maintained file like a bibliography.

Taking in use such a file means simply making a symbolic link to the file in the working direcory. However, keep in mind that the externally maintained file still needs to be in the project directory, and its name should on a path relative to the project directory. The make utility will check the timestamp on the target of the link, so when a “used” file changes, this will be taken into consideration.

The argument to the function is the directory in which the symbolic links will be made. The file names are read from standard input, one per line.

C18: ⟪lir-use
Appears in C16

lir-use () {
while IFS='' read -r used
do
        useddir=$(dirname "$used")
        mkdir  --verbose --parents "$1/$useddir"
        original=$(readlink --canonicalize "$used")
        if [ -f "$original" ]
        then
                ln --symbolic --force "$original" "$1/$used"
        else
                >&2 echo "LIR ERROR: File not found! ($original)"
                exit 1
        fi
done
}

C19: ⟪Drop double angle brackets
Appears in C15

sed -n 's/^<<\(.*\)>>$/\1/p'

C20: ⟪Remove empty lines
Appears in C15

sed -n '/^\s*$/!p'

C21: ⟪Remove special chunknames
Appears in C15

sed -n '/^:/!p'

C22: ⟪Get source chunknames
Appears in C15

sed -n 's/^:source \(.\+\)$/\1/p'

C23: ⟪Get sink chunknames
Appears in C15

sed -n 's/^:\(figure\|table\|listing\|result\) \(.\+\)/\2/p'

Source transformations

C24: ⟪Nameless code chunks as continuations .awk
Appears in C15

BEGIN {
        Setup awk to treat lines as single-field records
        last_name = "<<>>="
}
/^<<>>=$/ { $0 = last_name }
/^<<.+>>=$/ { last_name = $0 }
{ print $0 }

C25: ⟪Close all codechunks with @ .awk
Appears in C15

BEGIN {
        Setup awk to treat lines as single-field records
        in_chunk = 0
}
/Begin code awk regex/ {
        if (in_chunk) print "@"
        in_chunk = 1
}
/End code awk regex/ { in_chunk = 0 }
{ print $0 }

The contents of each code chunk named :eval will be evaluated by an invocation of bash. Whatever is written to standard output will be pasted as it is in place of the code chunk before anything else is done with the source file.

C26: ⟪Evaluate code chunk contents .awk
Appears in C15

BEGIN {
        Setup awk to treat lines as single-field records
}
/^<<:eval>>=$/ {
        Execute contents as standard input of [[evalcmd]]
        Write output of [[evalcmd]] to awk’s output
        next
}
{ print $0 }    

To make awk treat lines of input as records with one field each, set both the record and the field separator to the newline character. With these settings, an empty line (nothing but a new line) will have 0 fields (NF == 0) and a length of 0 (length($0) == 0).

C27: ⟪Setup awk to treat lines as single-field records
Appears in C24, C25, C26

RS="\n"
FS="\n"

C28: ⟪Execute contents as standard input of [[evalcmd]]
Appears in C26

while (getline > 0 && $0 !~ /End code awk regex/)
        print $0 |& evalcmd
close(evalcmd, "to")

C29: ⟪Write output of [[evalcmd]] to awk’s output
Appears in C26

save_rs = RS
RS = "^$"
evalcmd |& getline x
close(evalcmd)
printf "%s", x
RS = save_rs

C30: ⟪Begin code awk regex
Appears in C25

^<<.*>>=$

C31: ⟪End code awk regex
Appears in C25, C28

^@

Weaving

Weaving means generating the final human readable documentation from the lir source file. The original source is transformed by several filters before passed to Pandoc for generating an HTML document that can be viewed in a browser. Everything is wrapped as a bash script that will be installed, along with lir, in LIR_BINPATH. It reads the lir source from standard input and writes the HTML to standard output.

C32: ⟪lir-weave
root chunk

Bash header
Weave lir source to HTML

C33: ⟪Weave lir source to HTML
Appears in C32

Source to noweb pipeline representation \
| Noweb pipeline representation to Pandoc source \
| Pandoc source to HTML

A custom noweb pipeline is used, as we definitely want to be able to use “anonymous” chunks as continuations of the previous chunk. Since tabs can be significant, for example in Makefiles, those are preserved (the -t option to markup). The -delay option to noidx makes sure that the index of code chunks is emitted before the last document chunk. This is necessary to be able to generate a bibliography with Pandoc.

C34: ⟪Source to noweb pipeline representation
Appears in C33

"Path to noweb"/markup -t \
        | "Path to noweb"/noidx -delay

To make it marginally easier to deal with the pipeline representation, we completely remove any empty text tokens from it before tranforming it to Pandoc source. We then use nwpipe-pandoc.pl to transform the noweb pipeline representation of the lir source file to a valid Pandoc source file.

C35: ⟪Noweb pipeline representation to Pandoc source
Appears in C33

sed -n '/^@text $/!p' \
| swipl -q -g main -t "halt(2)" Path to lir/nwpipe-pandoc.pl

C36: ⟪Pandoc source to HTML
Appears in C33

pandoc "$@" \
        --preserve-tabs \
        --from=markdown \
        --table-of-contents \
        --standalone \
        --self-contained \
        --to=html5+smart \
        --css="Path to lir"/lir.css \
        --output=-

Lir to Pandoc

The program nwpipe-pandoc.pl is written in Prolog, using the SWI-Prolog implementation (Wielemaker et al. 2012). It is written as a stand-alone command line program: once compiled, can be run from the command line as a part of a pipe. It uses the module driver.pl to do the actual work.

C37: ⟪nwpipe-pandoc.pl
root chunk

No banner or informational messages
Expand meta-predicates at compile time
Succeed once without choice points

:- use_module(driver, [nwpipeline_pandoc/1]).

main :-
        current_prolog_flag(argv, Argv), % command line arguments
        goal_is_det(nwpipeline_pandoc(Argv)),
        halt. % exit with success
main :-
        throw(error(mode_error(failure),_)),
        halt(1). % otherwise, exit with failure (1)

During development, it is good to know if the program has left behind any unintentional choice points – they are certainly errors, as the program is designed to succeed deterministically. They would be “hidden” by the halt/0 used at the end of the main program.

C38: ⟪Succeed once without choice points
Appears in C37

goal_is_det(Goal) :-
        setup_call_cleanup(true, Goal, Det = true),
        (       Det == true
        ->      true
        ;       !,
                throw(error(mode_error(notdet),_))
        ).

Prolog programs are usually used from the “top level” (the Prolog REPL), not as command-line tools. We need to explicitly turn off the REPL banner and informational messages written to standard output.

C39: ⟪No banner or informational messages
Appears in C37

:- set_prolog_flag(verbose, silent).

Meta-predicates (predicates that take other predicates as arguments) are fundamentally slow: they need to evaluate the passed predicate dynamically. This can be avoided for the more commonly used meta-predicates (maplist/1+N, forall/2 etc) by treating them as macros and expanding them at compile time.

C40: ⟪Expand meta-predicates at compile time
Appears in C37

:- use_module(library(apply_macros)).

The driver uses nwpipe.pl to parse its input to native Prolog terms and lirhtml.pl to emit a valid Pandoc source document with embedded HTML fragments.

C41: ⟪driver.pl
root chunk

:- module(driver, [nwpipeline_pandoc/1]).

:- use_module(nwpipe, [nwpipe_term/2]).
:- use_module(lirhtml, [emit_html/2]).

nwpipeline_pandoc(_) :-
        I = user_input,
        read_string(I, _, S),
        split_string(S, "\r\n", "\r\n", Str_Ls),
        maplist(string_codes, Str_Ls, Ls),
        nwpipe_term(Ls, T),
        emit_html(user_output, T).

Parse the noweb pipeline

This module is used by the driver.pl to convert the list of lines representing the line-oriented noweb pipeline to a Prolog term. This is done in two steps.

  1. Each line in the list is converted to a term;
  2. The flat list of terms is parsed to obtain a term that represents the structure of the document.

C42: ⟪nwpipe.pl
root chunk

:- module(nwpipe, [nwpipe_term/2]).

nwpipe_term(Pipe, Doc) :-
        maplist(nwtoken_term, Pipe, Terms),
        phrase(lir_doc(Doc), Terms).

Noweb token  Prolog term
Flat list of terms  structured term

A noweb token in the noweb pipeline representation (Ramsey 1992) is parsed to become a Prolog term. Here, we use an approach to parsing them described in “The Craft of Prolog” (O’Keefe 1990). We use the keyword at the beginning of each token to look up (deterministically) in a table the list of items that this token contains. Then, each item is converted to a Prolog term using a mini-interpreter for the “language” of noweb tokens.

C43: ⟪Noweb token Prolog term
Appears in C42

:- use_module(library(dcg/basics), [nonblanks//1, integer//1]).

nwtoken_term([0'@|Token], Term) :-
        phrase(token(Back, Term), Token, Back).

token(Back, Term) -->
        nonblanks(NB),
        {       atom_codes(Key, NB),
                keyword(Key, Items, Back, Term)
        },
        items(Items).
Keyword table
Mini-interpreter for items

C44: ⟪Keyword table
Appears in C43

Structural keywords
Tagging keywords

C45: ⟪Structural keywords
Appears in C44

keyword(begin, [space, chunk_kind(K)], _, K).
keyword(end, [], _, end).
keyword(text, [space, string_rest(S, Rest)], Rest, text(S)).
keyword(nl, [], [], nl).
keyword(defn, [space, string_rest(S, Rest)], Rest, defn(S)).
keyword(use, [space, string_rest(S, Rest)], Rest, use(S)).
keyword(quote, [], [], quote).
keyword(endquote, [], [], endquote).

C46: ⟪Tagging keywords
Appears in C44

keyword(file, [space, atom_rest(File, Rest)], Rest, file(File)).
keyword(xref, [space, xref(XRef, Rest)], Rest, XRef).
keyword(index, [space, index(Index, Rest)], Rest, Index).

Cross-referencing keyword table
Index table

C47: ⟪Cross-referencing keyword table
Appears in C46

Basic cross-referencing
Linking previous and next definitions of a code chunk
Continued definitions of the current chunk
The list of all code chunks
Chunks where the code is used

C48: ⟪Index table
Appears in C46

index(beginindex, [], [], index_beginindex).
index(endindex, [], [], index_endindex).

C49: ⟪Basic cross-referencing
Appears in C47

xref(label, [space, atom_rest(L, Rest)], Rest, xref_label(L)).
xref(ref, [space, atom_rest(L, Rest)], Rest, xref_ref(L)).

C50: ⟪Linking previous and next definitions of a code chunk
Appears in C47

xref(prevdef, [space, atom_rest(L, Rest)], Rest, xref_prevdef(L)).
xref(nextdef, [space, atom_rest(L, Rest)], Rest, xref_nextdef(L)).

C51: ⟪Continued definitions of the current chunk
Appears in C47

xref(begindefs, [], [], xref_begindefs).
xref(defitem, [space, atom_rest(L, Rest)], Rest, xref_defitem(L)).
xref(enddefs, [], [], xref_enddefs).

C52: ⟪Chunks where the code is used
Appears in C47

xref(beginuses, [], [], xref_beginuses).
xref(useitem, [space, atom_rest(L, Rest)], Rest, xref_useitem(L)).
xref(enduses, [], [], xref_enduses).
xref(notused, [space, string_rest(Name, Rest)], Rest, xref_notused(Name)).

C53: ⟪The list of all code chunks
Appears in C47

xref(beginchunks, [], [], xref_beginchunks).
xref(chunkbegin,
        [space, atom(L), space, string_rest(Name, Rest)],
        Rest,
        xref_chunkbegin(L, Name)).
xref(chunkuse, [space, atom_rest(L, Rest)], Rest, xref_chunkuse(L)).
xref(chunkdefn, [space, atom_rest(L, Rest)], Rest, xref_chunkdefn(L)).
xref(chunkend, [], [], xref_chunkend).
xref(endchunks, [], [], xref_endchunks).

Using the tables above, we have looked up the exact items that we expect in a noweb token. This is a small interpreter that takes the list of items and converts each item to the corresponding Prolog term.

C54: ⟪Mini-interpreter for items
Appears in C43

items([]) --> [].
items([I|Is]) -->
        item(I),
        items(Is).

Individual items

C55: ⟪Individual items
Appears in C54

Atoms and text
“Space”
Chunk number
Chunk kind
Cross-reference

Identifiers are converted to atoms, while “text” (text and names) are converted to strings:

C56: ⟪Atoms and text
Appears in C55

item(atom(A)) -->
        nonblanks(Codes),
        {   atom_codes(A, Codes)
        }.

% Using when/2 here is probably an ugly hack.
% Could not figure out a better way to deal with
% "rest of line" situations.
item(atom_rest(A, Rest)) -->
        {   when(ground(Rest), atom_codes(A, Rest))
        }.

item(string_rest(Str, Rest)) -->
        {   when(ground(Rest), string_codes(Str, Rest))
        }.

We assume that “Space” is represented by a single “space” character; the “Hacker’s Guide” is not explicit about this, but so far it has always worked.

C57: ⟪“Space”
Appears in C55

item(space) --> [0'\s]. % could it be another "white"?...

Integers are used to enumerate the code and documentation:

C58: ⟪Chunk number
Appears in C55

item(chunk_number(N)) --> integer(N).

Code and documentation chunks are delimited by the same keywords; the Chunk kind is encoded in a secondary keyword:

C59: ⟪Chunk kind
Appears in C55

item(chunk_kind(CK)) -->
        nonblanks(Codes),
        {   atom_codes(CK, Codes)
        }.

Finally, Cross-reference. Note that it employs quite a few secondary keywords collected in their own look up tables, Cross-referencing keyword table and Index table.

C60: ⟪Cross-reference
Appears in C55

item(xref(XRef, Rest)) -->
        nonblanks(Codes),
        {   atom_codes(X, Codes),
                xref(X, Items, Rest, XRef)
        },
        items(Items).
item(index(Index, Rest)) -->
        nonblanks(Codes),
        {   atom_codes(X, Codes),
                index(X, Items, Rest, Index)
        },
        items(Items).

Convert the pipeline to a term

The structure of the document is implicitly present in the noweb pipeline representation: documentation and code chunks are delimited by a start and an end token, new lines and other formatting are contained in text tokens, and so on. The pipeline representation also contains cross-referencing information, using labels and references to them. It is however a bit inconvenient to use this representation for generating markup. This is because the order of tokens in the pipeline representation exactly mirrors the layout of the final human-readable documentation, as generated by noweb, and lir uses a slightly different layout.

At that point, we have a list of Prolog terms. Those are easier to deal with in the context of Prolog, as it is much easier to make the rules deterministic. The general approach is to consume the next input and use it as the first argument to a rule, thus taking advantage of Prolog’s first-argument indexing. Then, the rules defined below describe a state machine where each rule is a state and each rule clause is a transition that depends on the last input.

On the highest level, the noweb pipeline is made of a sequence of files. Note, however, that since in lir the input is standard input (and not a list of files), there will be only one, unnamed file in the pipeline.

C61: ⟪Flat list of terms structured term
Appears in C42

lir_doc(L) --> [file('')],
        lir_rest(L).

lir_rest(L) --> [X], !,
        lir_file(X, L).
lir_rest([]) --> [].

Parse a file

A file is a sequence of documentation and code chunks. It will contain a list of chunks and an index. To parse the contents of display item meta-data, we use the yaml.pl module.

C62: ⟪Parse a file
Appears in C61

:- use_module(yaml, [yaml_to_dict/2]).

lir_file(docs, L) --> [X],
        docs(X, L).
lir_file(code, [Code|L]) -->
        code(C, Name, Label, M),
        {   (   Name = name(N)
                ->  Code = code_name_label_meta(C, N, Label, M)
                ;   Name = display(KW, FN_str)
                ->  code_codelist(C, Codes),
                        yaml_to_dict(Codes, Display_meta),
                        Code = display(KW, FN_str, Display_meta, Label)
                )
        },
        lir_rest(L).
lir_file(nl, [nl|L]) -->
        lir_rest(L).
lir_file(xref_beginchunks, [chunks_list(Cs)|L]) --> [X],
        xref_chunks(X, Cs),
        lir_rest(L).
lir_file(index_beginindex, [index(I)|L]) --> [X],
        index_list(X, I),
        lir_rest(L).

code_codelist(C, L) :-
        maplist(code_atomic, C, L0),
        atomics_to_string(L0, S0),
        string_codes(S0, L).

code_atomic(text(Text), Text).
code_atomic(nl, "\n").

Parse a documentation chunk
Parse a code chunk
Parse the list of chunks
Parse the index

A module that allows the user to parse a codelist containing YAML data to a Prolog dictionary containing a native representation of the YAML data. This module makes use of SWI-Prolog’s library dcg/basics.

C63: ⟪yaml.pl
root chunk

:- module(yaml, [yaml_to_dict/2]).

:- use_module(library(dcg/basics), [white//0,
                                                                        whites//0,
                                                                        string_without//2]).

yaml_to_dict(Codes, Dict) :-
        phrase(yaml_es(Es), Codes),
        dict_create(Dict, yaml, Es).

Make a list of YAML key-value pairs

C64: ⟪Make a list of YAML key-value pairs
Appears in C63

yaml_es([K-V|Es]) --> yaml_key_val(K, V), !, yaml_es(Es).
yaml_es([]) --> [].

Parse one YAML key-value pair

The keyword starts at the very beginning of the line, does not contain any blank characters (space, tab, newline…), and is ended by a colon followed by a single space. The character that follows that space, here called C, determines if this is a block scalar or not.

C65: ⟪Parse one YAML key-value pair
Appears in C64

yaml_key_val(K, V) --> yaml_key_codes(KCs), ": ", !,
        {   atom_codes(K, KCs)
        },
        [C],
        yaml_val(C, V).

Get YAML key
Get YAML value

Any graph character will do, but no blanks of any kind allowed.

C66: ⟪Get YAML key
Appears in C65

yaml_key_codes([C|Cs]) --> [C],
        {   code_type(C, graph)
        },
        yaml_key_codes(Cs).
yaml_key_codes([]) --> [].

The only recognized block notation is literal style, indicated by a “|”.

C67: ⟪Get YAML value
Appears in C65

yaml_val(0'|, V) --> "\n", !,
        yaml_indented_block(V_codes),
        {   string_codes(V, V_codes)
        }.
yaml_val(C, V_str) --> string_without("\n", Cs), "\n",
        {   string_codes(V_str, [C|Cs])
        }.

Read YAML indented block

All indenting is removed, but all newlines are preserved.

C68: ⟪Read YAML indented block
Appears in C67

yaml_indented_block(Block) --> white, whites, !,
        indented_lines(Block).
yaml_indented_block([]) --> [].

indented_lines([C|Cs]) --> [C],
        {   C \== 0'\n
        },
        !,
        indented_lines(Cs).
indented_lines([0'\n|Block]) --> [0'\n], !,
        yaml_indented_block(Block).
S2: lir.lir

C69: ⟪:make
C7

test-display: lir.lir
	ls -l -h lir.lir > test-display

Listing 2: test-display
lrwxrwxrwx 1 boris boris 32 Jan 30 15:26 lir.lir -> /home/boris/code/own/lir/lir.lir
A test for display environments This is just a small example, nothing more.

A documentation chunk can contain text, new lines, and quoted code.

C70: ⟪Parse a documentation chunk
Appears in C62

definition continued in C71

docs(end, L) -->
        lir_rest(L).
docs(text(T), [text(Text)|L]) -->
        docs_text(Ts),
        {   atomics_to_string([T|Ts], Text)
        },
        [X],
        docs(X, L).
docs(nl, [nl|L]) --> [X],
        docs(X, L).
docs(quote, [quote(Q)|L]) --> [X],
        quote(X, Q),
        [Y],
        docs(Y, L).

Parse quoted code
C71

Consequitive text tokens that are not interrupted by any other structural contents are collected and concatenated.

C71: ⟪Parse a documentation chunk
C70

docs_text([T|Ts]) --> [text(T)], !,
        docs_text(Ts).
docs_text(["\n"|Ts]) --> [nl], !,
        eol(Ts).
docs_text([]) --> [].

eol([T|Ts]) --> [text(T)], !,
        docs_text(Ts).
eol([]) --> [].

C72: ⟪Parse quoted code
Appears in C70

quote(text(T), [text(T)|Q]) --> [X],
        quote(X, Q).
quote(xref_ref(L), [quote_use(N, L)|Q]) --> [use(N), X],
        quote(X, Q).
quote(endquote, []) --> [].

C73: ⟪Parse a code chunk
Appears in C62

code(Cs, Defn, L, M) -->
        [X],
        defn(X, M_pairs0),
        {   selectchk(label(L), M_pairs0, M_pairs1),
                selectchk(defn(Defn), M_pairs1, M_pairs),
                dict_create(M, code, M_pairs)
        },
        code_content(Cs).

Parse the code chunk header
Parse the code chunk contents

C74: ⟪Parse the code chunk header
Appears in C73

defn(nl, []) --> [].
defn(xref_label(L), [label(L)|M]) --> [X],
        defn(X, M).
defn(xref_ref(L), [ref(L)|M]) --> [X],
        defn(X, M).
defn(defn(N), [defn(Name)|M]) -->
        {   (   display_item(N, KW, FN_str)
                ->  Name = display(KW, FN_str)
                ;   Name = name(N)
                )
        },
        [X],
        defn(X, M).
defn(xref_notused(_N), [uses(notused)|M]) --> [X],
        defn(X, M).
defn(xref_beginuses, [uses(Us)|M]) --> [X],
        uses(X, Us, Us),
        [Y],
        defn(Y, M).
defn(xref_prevdef(L), [prev(L)|M]) --> [X],
        defn(X, M).
defn(xref_nextdef(L), [next(L)|M]) --> [X],
        defn(X, M).
defn(language(L), [language(L)|M]) --> [X],
        defn(X, M).
defn(xref_begindefs, [defs(Ds)|M]) --> [X],
        defs(X, Ds, Ds),
        [Y],
        defn(Y, M).

Treat display items differently
Parse defs and uses

C75: ⟪Treat display items differently
Appears in C74

display_item(Name, KW, FN_str) :-
        sub_string(Name, 0, 1, _, ":"),
        once( sub_string(Name, Before_sep, 1, After_sep, " ") ),
        Length_kw is Before_sep - 1,
        sub_string(Name, 1, Length_kw, _After_kw, KW_str),
        display_item_keyword(KW_str, KW),
        Before_fn is Before_sep + 1,
        sub_string(Name, Before_fn, After_sep, 0, FN_str).

/* Seems this is deterministic when 1st argument is a string */
display_item_keyword("source", source).
display_item_keyword("result", result).
display_item_keyword("figure", figure).
display_item_keyword("listing", listing).
display_item_keyword("table", table_name).

C76: ⟪Parse the code chunk contents
Appears in C73

code_content([text(Text)|Cs]) -->
        text_token(T), !,
        code_text(Ts),
        {   atomics_to_string([T|Ts], Text)
        },
        code_content(Cs).
code_content([code_use(L, R, N)|Cs]) -->
        [xref_label(L), xref_ref(R), use(N)], !,
        code_content(Cs).
code_content([]) --> [end].

text_token("\n") --> [nl].
text_token(T) --> [text(T)].
    
code_text([T|Ts]) -->
        text_token(T), !,
        code_text(Ts).
code_text([]) --> [].

C77: ⟪Parse defs and uses
Appears in C74

uses(xref_enduses, _, []) --> [].
uses(xref_useitem(L), Us, [L|Us0]) --> [X],
        uses(X, Us, Us0).

defs(xref_enddefs, _, []) --> [].
defs(xref_defitem(L), Ds, [L|Ds0]) --> [X],
        defs(X, Ds, Ds0).

C78: ⟪Parse the list of chunks
Appears in C62

xref_chunks(xref_endchunks, []) --> [].
xref_chunks(xref_chunkbegin(L, N), [chunk(L, N, Us, Ds)|Cs]) --> [X],
        xref_chunk(X, Us, Ds),
        [Y],
        xref_chunks(Y, Cs).

xref_chunk(xref_chunkend, [], []) --> [].
xref_chunk(xref_chunkuse(U), [chunkuse(U)|Us], Ds) --> [X],
        xref_chunk(X, Us, Ds).
xref_chunk(xref_chunkdefn(D), Us, [chunkdefn(D)|Ds]) --> [X],
        xref_chunk(X, Us, Ds).

We don’t have an index for now.

C79: ⟪Parse the index
Appears in C62

index_list(index_endindex, []) --> [].

Emit Pandoc HTML

C80: ⟪lirhtml.pl
root chunk

definition continued in C92 + C93 + C94 + C95

:- module(lirhtml, [emit_html/2]).

:- use_module(library(http/html_write)).

emit_html(Out, L) :-
        counters_db(L, DB),
        phrase(lir_pandoc(P, DB), L),
        phrase(html(P), H),
        print_html(Out, H).

counters_db/2: Generate counters, add them to a database
lir_pandoc//2: Lir to Pandoc HTML
C92

Add to a database all numbered items: code chunks, display items (figure, listing, table), as well as sources and sinks of the DAG representing the dataflow. For the code chunks, we also format the names. This is necessary because code chunk names are inside HTML tags by the time they make it to Pandoc, and are not considered for converting from markdown: instead, we do this here.

C81: ⟪counters_db/2: Generate counters, add them to a database
Appears in C80

counters_db(L, DB) :-
        include(is_code_name_label_meta, L, CNLMs),
        maplist(code_name_label_meta_Name_Label, CNLMs, Ns, Ls),
        names_fmt_fmtd(Ns, html5, FNs),
    
        pairs_keys_values(LFs, Ls, FNs),
        dict_create(CC_names, name, LFs),
        findall(Label-Nr, nth1(Nr, Ls, Label), LNrs),
        dict_create(CC_nrs, nr, LNrs),

        maplist(display_dict(L),
                        [source, result, listing, figure, table_name],
                        [DSNrs, DRNrs, DLNrs, DFNrs, DTNrs]),

        maplist(dict_create,
                        [CodeD, SourceD, ResultD, ListingD, FigureD, TableD],
                        [code,  source,  result,  listing,  figure,  table_name],
                        [[name:CC_names, nr:CC_nrs],
                          [namestr:"S", nr:DSNrs],
                          [namestr:"Result ", nr:DRNrs],
                          [namestr:"Listing ", nr:DLNrs],
                          [namestr:"Figure ", nr:DFNrs],
                          [namestr:"Table ", nr:DTNrs]]),

        dict_create(DB, db,
                                [code:CodeD,
                                  source:SourceD,
                                  result:ResultD,
                                  listing:ListingD,
                                  figure:FigureD,
                                  table_name:TableD]).

is_code_name_label_meta(code_name_label_meta(_,_,_,_)).
code_name_label_meta_Name_Label(code_name_label_meta(_,N,L,_), N,L).

display_dict(L, Item, Dict) :-
        include(is_display_item(Item), L, Ls),
        findall(Label-Nr,
                        nth1(Nr, Ls, display(Item, _, _, Label)),
                        DNrs),
        dict_create(Dict, nr, DNrs).

is_display_item(Item, display(Item,_,_,_)).
    
Use Pandoc to pre-format chunk names

C82: ⟪Use Pandoc to pre-format chunk names
Appears in C81

:- use_module(library(process), [process_create/3]).
:- use_module(library(sgml), [load_structure/3]).

to_par(S, PS) :- atomics_to_string(["<span>", S, "</span>"], PS).

names_fmt_fmtd([], html5, []).
names_fmt_fmtd([N|Ns], html5, FNs) :-
        maplist(to_par, [N|Ns], PNs),
        atomics_to_string(PNs, Ns_str),
        process_create(path(pandoc),
                                      ["--to=html5"],
                                      [stdin(pipe(Pandoc_in)),
                                        stdout(pipe(Pandoc_out))]),
        format(Pandoc_in, "~s", [Ns_str]),
        close(Pandoc_in),
        load_structure(Pandoc_out, DOM, [dialect(xml)]),
        close(Pandoc_out),
        (   DOM = [element(p, [], Names)]
        ->  maplist(names_fmtd, Names, FNs)
        ;   FNs = []
        ).

:- use_module(library(sgml_write), [xml_write/3]).

names_fmtd(element(span, [], StrDOM), Fmtd) :-
        with_output_to(string(Fmtd),
                                      xml_write(current_output,
                                                          StrDOM,
                                                          [header(false), layout(false)])).

C83: ⟪: foo
root chunk

This is a test. There was a problem with chunks that start with a colon
followed by a space, as they were interpreted as definitions by Pandoc.

C84: ⟪lir_pandoc//2: Lir to Pandoc HTML
Appears in C80

lir_pandoc(P, DB) -->
        [X], !,
        lir_pandoc(X, P, DB).
lir_pandoc([], _DB) --> [].

Lir to Pandoc: individual items
HTML for common symbols

C85: ⟪Lir to Pandoc: individual items
Appears in C84

Lir documentation to Pandoc
Lir code chunks to Pandoc
Lir display items to Pandoc
Ignore the list of chunks and the index
Individual display items

C86: ⟪Lir documentation to Pandoc
Appears in C85

lir_pandoc(nl, ["\n"|P], DB) --> lir_pandoc(P, DB).
lir_pandoc(text(T), [\[T]|P], DB) --> lir_pandoc(P, DB).
lir_pandoc(quote(Q), [span(class(quote), QC)|P], DB) -->
        {   phrase(lir_pandoc(QC, DB), Q)
        },
        lir_pandoc(P, DB).
lir_pandoc(quote_use(Name, Label),
                        [span(class("quoteduse"),
                                    [\openparen, \thinnbsp,
                                      a(href("#~a"-Label), Name),
                                      \thinnbsp, \closeparen])|P], DB) -->
        lir_pandoc(P, DB).

C87: ⟪Lir code chunks to Pandoc
Appears in C85

lir_pandoc(code_name_label_meta(C, _, L, M),
                      [div(class("codechunk"),
                                [\chunk_defn(M, L, DB),
                                  \chunk_uses(M, DB),
                                  \chunk_defs(M, DB),
                                  \chunk_prev(M, DB),
                                  pre(\chunk_content(C, DB)),
                                  \chunk_next(M, DB)])|P], DB) -->
        lir_pandoc(P, DB).

C88: ⟪Lir display items to Pandoc
Appears in C85

lir_pandoc(display(KW, FN_str, Meta, Label),
                      [div([class("lirdisplay"),id(Label)],
                                [\display(KW, FN_str, Meta, Label, DB)])|P], DB) -->
        lir_pandoc(P, DB).

C89: ⟪Ignore the list of chunks and the index
Appears in C85

lir_pandoc(chunks_list(_Cs), P, DB) -->
        lir_pandoc(P, DB).
lir_pandoc(index(_), P, DB) --> [], lir_pandoc(P, DB).

C90: ⟪HTML for common symbols
Appears in C84

nbsp --> html(&(nbsp)).
thinnbsp --> html(span(style("white-space:nowrap"), &(0x2009))).
openparen --> html(&('Lang')). % Lang
closeparen --> html(&('Rang')). % Rang
prevsym --> html(&(0x21E7)). % UPWARDS WHITE ARROW
nextsym --> html(&(0x21E9)). % DOWNWARDS WHITE ARROW
contsym --> html(&(darr)).
contmoresym --> html(&(0x21E3)). % DOWNWARDS DASHED ARROW
defeq --> html(&(equiv)).
defend --> html(&(0x25FC)). % Black square

C91: ⟪Individual display items
Appears in C85

:- discontiguous display//5.

display(source, FN, _, Label, DB) -->
        display_file(source, FN, Label, DB).
display(result, FN, _, Label, DB) -->
        display_file(result, FN, Label, DB).

display_file(Item, FN, Label, DB) -->
        {   absolute_file_name(FN, Absolute_FN)
        },
        html(div(class("lir~a"-Item),
                          [span(class("~aname"-Item),
                                      ["~s~d:"-[DB.Item.namestr,
                                                          DB.Item.nr.Label]]),
                            \nbsp,
                            span(class("~afile"-Item),
                                      a(href(Absolute_FN), code(FN)))])).

display(listing, FN, Meta, Label, DB) -->
        {   file_contents(FN, FC)
        },
        html(div(class("lirlisting"),
                          [\display_header(listing, FN, Label, DB),
                            div(class("listingcontents"), pre([FC])),
                            \display_title_caption(listing, Meta)])).

file_contents(FN, FC) :-
        setup_call_cleanup(open(FN, read, In),
                                              read_string(In, _, FC),
                                              close(In)).

display_header(Item, FN, Label, DB) -->
        {   absolute_file_name(FN, Absolute_FN)
        },
        html(div(class("~aheader"-Item),
                          [span(class("~aname"-Item),
                                      ["~s~d:"-[DB.Item.namestr,
                                                          DB.Item.nr.Label]]),
                            \nbsp,
                            span(class("~afile"-Item),
                                      a(href(Absolute_FN), code(FN)))])).

display_title_caption(Item, M) -->
        html(div(class("~ameta"-Item),
                          [span(class("~atitle"-Item), [M.title]),
                            \display_optional_caption(Item, M)])).

display_optional_caption(Item, M) -->
        {   yaml{caption:Caption} :< M
        },
        !,
        html([" ", span(class("~acaption"-Item), [Caption])]).
display_optional_caption(_, _) --> [].

display(table_name, FN, Meta, Label, DB) -->
        {   file_contents(FN, FC)
        },
        html(div(class("lirtable"),
                          [\display_header(table_name, FN, Label, DB),
                            table_name([\[FC]]),
                            \display_title_caption(table_name, Meta)])).

display(figure, FN, Meta, Label, DB) -->
        html(div(class("lirfigure"),
                          [\display_header(figure, FN, Label, DB),
                            \figure_contents(FN),
                            \display_title_caption(figure, Meta)])).

figure_contents(FN) -->
        html(div(class("figurecontents"),
                          [img(src("~s"-FN))])).

Chunk contents:

C92: ⟪lirhtml.pl
C80

chunk_content([], _DB) --> [].
chunk_content([C|Cs], DB) -->
        chunk_content_(C, Cs, DB).

chunk_content_(nl, Cs, DB) -->
        html("\n"),
        chunk_content(Cs, DB).
chunk_content_(text(T), Cs, DB) -->
        html(T),
        chunk_content(Cs, DB).
chunk_content_(code_use(L, R, _), Cs, DB) -->
        html(span(class("embeddeduse"),
                            [\openparen, \thinnbsp,
                              a([id(L), href("#~a"-R)],
                                  [\[DB.code.name.R]]),
                              \thinnbsp, \closeparen])),
        chunk_content(Cs, DB).
C93

Code chunk headers:

C93: ⟪lirhtml.pl
C92

chunk_defn(_M, L, DB) -->
        html(span([class("defn"),id(L)],
                            [span(class("chunknr"), "C~d:"-DB.code.nr.L), \nbsp,
                              \openparen, \thinnbsp,
                              a(href("#~a"-L), \[DB.code.name.L]),
                              \thinnbsp, \closeparen, \thinnbsp, \defeq])).
chunk_uses(M, DB) -->
        {   code{uses:Us} :< M
        }, !,
        html(span(class("chunkuses"), \chunk_uses_(Us, DB))).
chunk_uses(_, _) --> [].
C94

Uses in the header:

C94: ⟪lirhtml.pl
C93

chunk_uses_([U|Us], DB) -->
        html([br([]),
                  "Appears in ",
                  a(href("#~a"-U),["C~d"-DB.code.nr.U])]),
        chunk_uses_rest(Us, DB).
chunk_uses_(notused, _DB) --> html([br([]), "root chunk"]).
chunk_uses_rest([], _DB) --> [].
chunk_uses_rest([U|Us], DB) -->
        html([", ", a(href("#~a"-U), ["C~d"-DB.code.nr.U])]),
        chunk_uses_rest(Us, DB).
C95

Previous and next chunks in the header:

C95: ⟪lirhtml.pl
C94

chunk_next(M, DB) -->
        {   code{next:L} :< M
        }, !,
        html(span(class("chunknext"),
                  a(href("#~a"-L),
                      [\nextsym, \thinnbsp, "C~d"-DB.code.nr.L]))).
chunk_next(_, _DB) -->
        html(\defend).

chunk_prev(M, DB) -->
        {   code{prev:L} :< M
        },
        html([br([]),
                    span(class("chunkprev"),
                              a(href("#~a"-L),
                                  [\prevsym, \thinnbsp, "C~d"-DB.code.nr.L]))]).
chunk_prev(_, _DB) --> [].

chunk_defs(M, DB) -->
        {   code{defs:Ds} :< M
        }, !,
        html(span(class("chunkdefs"), \chunk_defs_(Ds, DB))).
chunk_defs(_, _) --> [].

chunk_defs_([D|Ds], DB) -->
        html([br([]),
                    "definition continued in ",
                    a(href("#~a"-D),
                        [\contsym, \thinnbsp, "C~d"-DB.code.nr.D])]),
        chunk_defs_rest(Ds, DB).
chunk_defs_rest([], _) --> [].
chunk_defs_rest([D|Ds], DB) -->
        html([" + ",
                    a(href("#~a"-D),
                        [\contmoresym, \thinnbsp, "C~d"-DB.code.nr.D])]),
        chunk_defs_rest(Ds, DB).

Layout

The system tries to separate content and layout as much as possible. It also aims to provide a sensible default layout for the human-readable documentation.

Code chunk language

The program in langs.cpp can be used as a filter to noweb. It deduces the programming language of code chunks based on the names of the chunks. This is a table that maps known extensions to programming language names.

C96: ⟪Ext-Lang
Appears in C111

{"pl", "prolog"},
{"sh", "bash"},
{"bash", "bash"},
{"cpp", "cpp"},
{"R", "R"},
{"awk", "awk"},
{"sed", "sed"},
{"css", "css"}

The program implements a state machine with the states necessary to extract code chunk names and the uses within a code chunk. The start state is out; when a code chunk starts, it transitions to code, where it expects a code chunk name, and transitions to content; while in content, it collects uses, and goes back to out at the end of the code chunk. Throughout, each line from input is saved to a list of all lines.

C97: ⟪langs.cpp
root chunk

Includes (langs)
Function definitions (langs)

int main()
{
        Variable definitions (langs)

out:
{   
        Out transitions
        goto out;
}

code:
{
        Code transitions
        goto code;
}

content:
{
        Content transitions
        Content: collect uses
        goto content;
}

end:
{
        Propagate language to uses
        Output all lines, with language after each defn
        return 0;
}

error:
        return 1;
}

When in out, end of input signals transition to the end state. A @begin code token signals a transition to code.

C98: ⟪Out transitions
Appears in C97

if (!std::getline(std::cin, line)) goto end;
lines.push_back(line);

if (string_prefix(line, {"@begin code"})) goto code;

When in code, end of input is an error. A defn token contains the code chunk name. The name is processed and the state machine transitions to content.

C99: ⟪Code transitions
Appears in C97

if (!std::getline(std::cin, line)) goto error;
lines.push_back(line);

if (string_prefix_rest(line, {"@defn "}, name)) {
        Process code chunk name
        goto content;
}

A code chunk name is inserted to the DAG of code chunks. Initially the set of neighbours is empty. Note that std::map::insert will only insert if the key does not yet exist, so it is safe to do this.

C100: ⟪Process code chunk name
Appears in C99

uses.insert({name, {}}); 
Try to set code chunk language

If the language of the chunk can be guessed from the name, the chunk name and its language are recorded. The code chunk name is also added to the queue later used for the breadth-first traversal of the DAG of code chunks.

C101: ⟪Try to set code chunk language
Appears in C100

std::string cl;
if (name_dict_lang(name, langs, cl)) {
        chunk_lang.insert({name, cl});
        pending.push(name);
}

When in content, end of input is an error. An @end code token signals the transition back to out.

C102: ⟪Content transitions
Appears in C97

if (!std::getline(std::cin, line)) goto error;
lines.push_back(line);

if (string_prefix(line, {"@end code"})) goto out;

When in content, each @uses token is used to populate the set of neighbours of the current code chunk in the DAG.

C103: ⟪Content: collect uses
Appears in C97

std::string un;
if (string_prefix_rest(line, {"@use "}, un)) uses[name].insert(un);

The language of a code chunk can be deduced in two ways. First, from the code chunk name; this is done as soon as the code chunk is encountered for the first time (see Try to set code chunk language). Second, if a code chunk without a known language is used by a code chunk with a language, it inherits it. To achieve this, a breadth-first traversal of the DAG of code chunks is used. Initially, the queue holds the names of all chunks with known languages (also done while reading). Any child code chunk that does not yet have a language has its language set to that of the parent, and is pushed to the back of the queue.

C104: ⟪Propagate language to uses
Appears in C97

while (!pending.empty()) {
        std::string next = pending.front();
        pending.pop();

        for (auto c : uses[next]) {
                auto x = chunk_lang.find(c);
                if (x == chunk_lang.cend()) {
                        chunk_lang.insert({c, chunk_lang[next]});
                        pending.push(c);
                }
        }
}

At the end, all tokens are emitted. If the token is defn, a language token is added right after it. For code chunks that don’t have a language, the txt language is used.

C105: ⟪Output all lines, with language after each defn
Appears in C97

for (auto l : lines) {
        std::cout << l << '\n';

        if (string_prefix_rest(l, {"@defn "}, name)) {
                auto x = chunk_lang.find(name);
                if (x != chunk_lang.cend())
                        std::cout << "@language " << x->second << '\n';
                else
                        std::cout << "@language txt\n";
        }
}

These variables are “global” to the main function, or in other words, are available to all states of the state machine.

C106: ⟪Variable definitions (langs)
Appears in C97

definition continued in C112

A list of all lines as they are read
The DAG of the code chunks as adjacency-list
The language of each code chunk
A queue for the traversal of the chunks DAG
Mapping of extensions to languages
C112

C107: ⟪A list of all lines as they are read
Appears in C106

std::list<std::string> lines{};

C108: ⟪The DAG of the code chunks as adjacency-list
Appears in C106

std::map<std::string, std::set<std::string>> uses{};

C109: ⟪The language of each code chunk
Appears in C106

std::map<std::string, std::string> chunk_lang{};

C110: ⟪A queue for the traversal of the chunks DAG
Appears in C106

std::queue<std::string> pending{};

C111: ⟪Mapping of extensions to languages
Appears in C106

const std::map<std::string, std::string> langs{Ext-Lang};

Two strings, one for the last line that was read and one for the name of the last code chunk defined with defn:

C112: ⟪Variable definitions (langs)
C106

std::string line;
std::string name;

The necessary standard libraries:

C113: ⟪Includes (langs)
Appears in C97

#include <iostream>
#include <string>
#include <list>
#include <map>
#include <set>
#include <queue>

C114: ⟪Function definitions (langs)
Appears in C97

Does a string have a prefix?
Can you guess the language from the name?

C115: ⟪Does a string have a prefix?
Appears in C114

definition continued in C116

bool
string_prefix(const std::string& s, const std::string& t)
{
        if (0 == s.compare(0, t.length(), t)) return true;
        return false;
}
C116

A three-argument version to get the rest, too:

C116: ⟪Does a string have a prefix?
C115

bool
string_prefix_rest(const std::string& s, const std::string& t,
                                      std::string& rest)
{
        if (string_prefix(s, t)) {
                rest = s.substr(t.length());
                return true;
        }
        return false;
}

C117: ⟪Can you guess the language from the name?
Appears in C114

bool
name_dict_lang(const std::string& n,
                              const std::map<std::string, std::string>& d,
                              std::string& l)
{
        size_t x = n.find_last_of('.');
        if (x == std::string::npos) return false;

        ++x;
        auto ext_lang = d.find(n.substr(x));
        if (ext_lang == d.cend()) return false;

        l = ext_lang->second;
        return true;
}

HTML

For HTML documentation, lir.css is used.

C118: ⟪lir.css
root chunk

Text width and margin
Headers in sans-serif
Quoted code in monospace
Use names typesetting
Highlight chunk names on hover
List of chunks formatting
Display item formatting

The text width is limited and a left margin is inserted to improve readability.

C119: ⟪Text width and margin
Appears in C118

p {
        max-width: 14cm;
}
blockquote {
        max-width: 11cm;
        font-size: small;
}
ul, ol, dl,
span.chunkdefs {
        max-width: 12cm;
}
body {
        padding-left: 1cm;
}

C120: ⟪Headers in sans-serif
Appears in C118

h1, h2, h3 {
        font-family: sans-serif;
        max-width: 12cm;
}

C121: ⟪Use names typesetting
Appears in C118

span.sourcefile a,
span.resultfile a,
span.listingfile a,
span.figurefile a,
span.chunkdefs a,
span.chunkuses a,
span.chunkprev a,
span.chunknext a,
span.defn a,
span.quoteduse a,
span.embeddeduse a {
        text-decoration-line: none;
}
span.chunkdefs,
span.chunkuses,
span.chunkprev,
span.chunknext {
        font-size: small;
}
span.quoteduse {
        font-family: serif;
}
span.embeddeduse {
        font-family: serif;
}
span.chunknr {
        font-weight: bold;
}

C122: ⟪Highlight chunk names on hover
Appears in C118

span.quoteduse a {
        color: darkGreen;
}
span.quoteduse a:hover {
        color: limeGreen;
}
span.embeddeduse a {
        color: darkblue;
        font-style: italic;
}
span.embeddedusesym {
        font-style: normal;
}
span.embeddeduse a:hover {
        color: dodgerBlue;
}
span.sourcefile a,
span.resultfile a,
span.listingfile a,
span.figurefile a,
span.chunkdefs a,
span.chunkprev a,
span.chunknext a,
span.defn a,
span.chunkuses a {
        color: darkred;
}
span.sourcefile a:hover,
span.resultfile a:hover,
span.listingfile a:hover,
span.figurefile a:hover,
span.chunkdefs a:hover,
span.chunkprev a:hover,
span.chunknext a:hover,
span.defn a:hover,
span.chunkuses a:hover {
        color: darkGoldenRod;
}

C123: ⟪Quoted code in monospace
Appears in C118

span.quote {
        font-family: monospace;
}

C124: ⟪List of chunks formatting
Appears in C118

ul.chunkslist {
        list-style-type: square;
}

C125: ⟪Display item formatting
Appears in C118

div.lirtable th {
        border-style: none none solid none;
        border-width: 0 0 1px 0;
}
div.lirtable th,
div.lirtable td {
        padding-left: 3.3mm;
        padding-right: 3.3mm;
        padding-top: 0.9mm;
        padding-bottom: 0.9mm;
}
div.lirtable table {
        font-family: sans-serif;
        font-size: small;
        margin-top: 3mm;
        margin-bottom: 3mm;
        border-collapse: collapse;
        border-style: solid none solid none;
        border-width: 2px 0 1px 0;
}
div.lirlisting {
        max-width: 14cm;
}
span.sourcename,
span.resultname,
span.tablename,
span.figurename,
span.listingname {
        font-weight: bold;
}
div.figurecontents img {
        display: block;
        max-width: 14cm;
        width: auto;
        height: auto;
}
div.listingcontents {
        font-size: small;
        margin-top: 3mm;
        margin-bottom: 3mm;
        border-style: solid none solid none;
        border-width: 0.3mm;
}
span.sourcefile,
span.resultfile,
span.tablefile,
span.figurefile,
span.listingfile {
        font-style: italic;
}
div.tablemeta,
div.figuremeta,
div.listingmeta {
        max-width: 14cm;
        font-size: small;
}
span.tabletitle,
span.figuretitle,
span.listingtitle {
        font-weight: bold;
}
div.codechunk,
div.lirlisting,
div.lirtable,
div.lirfigure {
        margin-bottom: 3mm;
}

Bibliography

O’Keefe, Richard A. 1990. The Craft of Prolog. Edited by Ehud Shapiro. The MIT Press.

Ramsey, Norman. 1992. “The Noweb Hacker’s Guide.” Departement of Computer Science Princeton University September, September. http://www.cs.tufts.edu/~nr/noweb/guide.ps.

Wielemaker, Jan, Tom Schrijvers, Markus Triska, and Torbjörn Lager. 2012. “SWI-Prolog.” Theory and Practice of Logic Programming 12 (1-2): 67–96.