Reproducible computing with Lir

Lir is a tool for reproducible computing. The name Lir stands for literate, reproducible computing. With Lir you can organize, document, and automate your work flow.

Any combination of programming languages and software platforms can be used in the same work flow. All executable code is maintained in the Lir source file, along with rules for generating the results of running each program. A Lir source file additionally contains the documentation of a project.

In this document, we provide detailed Lir installation instructions, followed by a User’s Guide, and the full implementation of Lir at the end.

This document is self-hosting: the implementation of Lir presented here can be used to produce the final human-readable version of this source file. This makes informal validation somewhat easier (“does the final document look right?”), but it also means that the only guarantee is that it implements enough of Lir to be able to compile itself.

Installation

You need a reasonably modern Linux/BSD operating system. Any of the major GNU/Linux and BSD distributions, and OS X fulfil that requirement. It should be possible to use it on Microsoft Windows 10, but at the moment this is not officially supported: however, if you are interested in trying to install Lir on a Windows 10, share your experience.

In particular, you will need an up-to-date version of the GNU Toolchain (autoconf, make, gcc, binutils, glibc), as well as Bash along with the standard command line tools like grep, sed, and awk (to name a few).

Prerequisites

Recommended reading

Lir makes use of existing tools for literate programming, making, and markup. A working knowledge of these tools is not required, but it is beneficial, at least at the beginning.

Makefiles

It would help you if you had some working knowledge of writing makefiles, as understood by GNU Make. Reading the official documentation is a good start; at least take a look at the first few chapters.

Markdown

Literate Programming with noweb

Understanding the concept of literate programming is a must. You may take a look at the original publication by Donald E. Knuth (pdf link); you can also skip this at first. The Wikipedia page for noweb contains enough detail to get you acquainted with the syntax. A Lir source file is a valid noweb source file.

Once you have all prerequisites, clone the Lir repository and change to the lir directory:

Before you can install Lir, three paths that might be different for each installation need to be adjusted.

Setting the installation paths

Figure out where you have installed the noweb library files. If you installed from the official Ubuntu directories, this should be /usr/lib/noweb.

If you cannot find this directory, the directory where the files were installed is mentioned towards the bottom of the nowebfilters man page, under the FILES section. So try this:

If for some reason you this doesn’t work either, you can try looking for one of the files in the library, for example:

If you cannot find a file called emptydefn on your system, you have not installed noweb correctly: please do so now.

Now that you know where the library files are, you need to open the Lir source file lir.lir and edit the three paths on lines 96, 100, and 105 of this file.

C1: ⟪ Path to lir ⟫ ≡
Appears in C10, C15, C35, C36

"$HOME"/lib/lir

◼

Then, the folder where the main lir script will be installed; make sure it is on your system PATH.

C2: ⟪ Path to lir binaries ⟫ ≡
Appears in C10

"$HOME"/bin

◼

Finally, the folder containing the noweb library files. If you installed noweb normally (and as root), this will read /urs/lib/noweb.

C3: ⟪ Path to noweb ⟫ ≡
Appears in C10, C34

"$HOME"/lib/noweb

◼

This will extract and install all Lir files to your computer, and use the installed Lir to compile this source file to an HTML document. You can use the command line above to reinstall Lir if you update it, or if you change the source.

User’s Guide

Lir defines a syntax for defining a reproducible computation, and the tools for running the computation on the input files to obtain results. You must have already read the Tutorial; you can use this Guide as a reference.

Source file

The Lir source file is a valid noweb source file. Lir adds semantics by recognizing and interpreting keywords embedded in code chunk names. A Lir source file has the extension “.lir”. It is a plain text file. Within this file, there are code chunks, as defined by noweb, and documentation chunks (everything between and around code chunks). Documentation chunks may be formatted and structured with markdown as understood by Pandoc.

Code chunks

A code chunk starts with a code chunk header, followed by the contents of the code chunk, and a code chunk footer. The code chunk header is the string “<<Code chunk name>>=” on a line by itself. The code chunk footer is the string “@” on a line by itself. If the code chunk must contain a line starting with “<<” or “@”, use an additional “@” to escape these: “@<<” and “@@”.

A code chunk ends with a footer line or with the header line of another code chunk.

Here is a link to the Lir source of the few following paragraphs; please read those in the raw Lir source to understand the code chunk syntax and examples!

C4: ⟪ : Just an example ⟫ ≡
root chunk

These are the contents
of this chunk.
⟪ Another code chunk ⟫ is included by a name reference.

◼

Code chunks may refer to other code chunks by name: just put the name in double angle brackets. Whenever a code chunk is extracted, all code chunk references are replaced with their contents. If we define:

C5: ⟪ Another code chunk ⟫ ≡
Appears in C4

⟪ Yet another code chunk ⟫Moar content!1!!!1⟪ Yet another code chunk ⟫

◼

C6: ⟪ Yet another code chunk ⟫ ≡
Appears in C5

***

◼

Then, the full contents of ⟪ : Just an example ⟫, if extracted from the source file, will be:

C7: ⟪ :make ⟫ ≡
root chunk
definition continued in ↓ C69

just-an-example-tangled: lir.lir
	notangle -R": Just an example" $< > $@

⇩ C69

(If you don’t understand what happens in the lines above, please read the Tutorial.)

Special code chunks

All code chunks that have names starting with a colon (:) are given special treatment. The code chunk ⟪ : Just an example ⟫ is a code chunk that will not be considered for extracting.

Sources and sinks

We represent the work flow as a directed, acyclic graph (DAG). In this DAG, each node is a data object: a file. Some of these files are input data: they are sources of the DAG. Some of these files are final results: they are the sinks of th DAG.

The sources of the DAG (the input files to the workflow) must be declared explicitly in a special code chunk with a name: “:source ”. For example, this is how we declared ⟪ :source lir.lir ⟫ above.

Display items

To declare a placeholder for a display item – a Figure, a Table, or a Listing – use a code chunk with a special name. The name must start with “:figure”, “:table”, or “:listing”, followed by a single space and the name of the file that should be displayed.

Here is a link to the following few paragraphs in the Lir source file; you need to look at the source to understand the examples!

In the contents of a display item, you must declare the title of the display item, and you may declare a caption for the display item. This is a figure with a caption:

Make rules

All dependencies and rules are defined within make code chunks. The names of these code chunks must begin with “:make”. Here is an example:

C8: ⟪ :make The example above ⟫ ≡
root chunk

example.svg: make-example.R
	Rscript --vanilla $< $@

◼

This defines a rule for generating the example SVG image displayed in ⟪ :figure example.svg ⟫. The rule has one target, example.svg, and one prerequisite, the R script make-example.R. It defines a rule for generating the target: by executing Rscript with the command line argument --vanilla and the script as the first argument and the target file as the second argument (note that obviously, you need to have R installed on your computer if you want to generate this figure and the final document).

Root code chunks

A root code chunk is a code chunk that is not referenced by name in any other code chunk.

Executable code

All root code chunks that don’t have a special name are extracted to files during the first stage of Lir, lir-tangle. The name of the code chunk determines the name of the file. Here is the executable R script that generates the example above:

C9: ⟪ make-example.R ⟫ ≡
root chunk

out <- commandArgs(trailingOnly = TRUE)[1]
svg(out, 8, 5)
curve(sin, -2*pi, 2*pi, type="l", col="darkred")
curve(cos, -2*pi, 2*pi, type="l", col="darkblue", add = TRUE)
dev.off() -> foo

◼

Citations

You can add citations in the markup recognized by Pandoc. The easiest way to produce the citations and a bibliography is to add a bibliography file in the working directory. The file must have the same base name as the Lir source file, and the extension “.bib”. For example, since this source file is named lir.lir, the bibliography is in lir.bib. The bibliography file should be in the format recognized by Biblatex; a good free software for maintaining a bibliography is JabRef, available for all operating systems.

The bibliography will be at the end of the final document. If you want to have it listed in the table of contents, you can put a section heading at the end of the document. The last line of the Lir source file can be for example “# References” or “# Bibliography”.

Running Lir

Lir provides the command line program lir. To extract all code, execute it, and compile the final document, you need to run lir with the name of the Lir source file as the only argument. For example, if you are in the working directory of the Lir distribution, you can do:

This will run the three steps: tangling, making, and weaving. You can also do each step separately, by providing a one of tangle, make, or weave as the first argument to lir:

State

When a Lir source file is interpreted, a current state of the project is created in the hidden directory .lir inside the working directory. Every time lir is re-run on a source file, the state is updated. Only files that have changed are updated: thus, running Lir does not run programs that have not changed on input data that has not changed.

The state contains all executable files extracted from the Lir source file, makefiles to generate all results, symbolic links to all input data, and all results. The state in .lir can be copied as it is to another machine, and used to generate all results remotely.

It is safe to remove the state by deleting the .lir directory and all its contents. This will force all intermediate results to be re-generated.

Implementation

Making and installing

C10: ⟪ build.sh ⟫ ≡
root chunk

⟪ Bash header ⟫

echo "*** Making and Installing..."
notangle -t8 -filter emptydefn -RMakefile lir.lir > Makefile
export LIRPATH=⟪ Path to lir ⟫
export LIRBINPATH=⟪ Path to lir binaries ⟫
export NOWEBPATH=⟪ Path to noweb ⟫

make uninstall
make clean
make
make install
echo "*** Done!"

rm -rf .lir
echo "*** Using lir to tangle and weave lir..."
lir lir.lir
echo "*** Weaved document is in lir.html!"
cp -v lir.html docs/lir.html
echo "*** Generating index..."
title=$(mktemp)
echo "---" > "$title"
echo "title: Lir" >> "$title"
echo "author: Boris Vassilev (University of Helsinki)" >> "$title"
echo "..." >> "$title"
sed "s/^@/@@/" README.md \
        | sed "s/^<</@<</" \
        | cat "$title" - \
        > docs/index.lir
(cd docs ; lir index.lir)
rm "$title"
echo "*** Generating index done!"

◼

Makefile

C11: ⟪ Makefile ⟫ ≡
root chunk

nwpipemodules = driver.pl nwpipe.pl lirhtml.pl yaml.pl

nwpipesources = nwpipe-pandoc.pl $(nwpipemodules)

tangled = lir lir-weave $(nwpipesources) lir.css

all : lir.lir
	$(NOWEBPATH)/markup -t lir.lir \
                | $(NOWEBPATH)/emptydefn \
                | $(NOWEBPATH)/mnt -t $(tangled)
	chmod u+x lir lir-weave

install :
	mkdir --parents $(LIRPATH)
	cp --verbose --preserve \
                lir.css $(nwpipesources) lir-weave $(LIRPATH)
	mkdir --parents $(LIRBINPATH)
	cp --verbose --preserve \
                lir $(LIRBINPATH)
.PHONY : install

clean :
	-rm $(tangled)
.PHONY : clean

uninstall :
	-rm -r $(LIRPATH)
	-rm $(LIRBINPATH)/lir
.PHONY : uninstall

◼

Bash scripts

C12: ⟪ Bash header ⟫ ≡
Appears in C10, C15, C32
definition continued in ↓ C13

#! /usr/bin/env bash

⇩ C13

Furthermore, we set the scripts to immediately exit with error on failure, consider failure in a pipe a failure, and to consider “empty” (unset) variables an error:

C13: ⟪ Bash header ⟫ ≡
⇧ C12

set -o errexit
set -o pipefail
set -o nounset

◼

Overview

This is usually done in three independent steps: tangling, making, and weaving.

The wrapper script

The program is installed as a script that can be invoked with one of the three actions, tangle, make, or weave, as the first argument.

C14: ⟪ lir ⟫ ≡
root chunk

⟪ lir.bash ⟫

◼

C15: ⟪ lir.bash ⟫ ≡
Appears in C14

⟪ Bash header ⟫
⟪ Define functions used by lir ⟫

if [ "$#" -eq 0 ]
then
        :
else

if [ $# -eq 1 ]
then
        lir tangle "$1" && lir make "$1" && lir weave "$1" && exit
fi

LIR_SOURCE="$2"
LIR_DIR=.lir
MORE_OPTS="${@:3}"
BASENAME=$(basename "$LIR_SOURCE" .lir)
STATE_SOURCE="__LIR_SOURCE_$LIR_SOURCE"

case "$1" in
tangle)
        mkdir --verbose --parents "$LIR_DIR"
        tmpdir=$(mktemp --directory --tmpdir="$LIR_DIR")

        awk '⟪ Nameless code chunks as continuations .awk ⟫' \
                        "$LIR_SOURCE" \
                | awk '⟪ Close all codechunks with @ .awk ⟫' \
                | awk -v \
                        evalcmd="bash -o errexit -o pipefail -o nounset" \
                        '⟪ Evaluate code chunk contents .awk ⟫' \
                > "$LIR_DIR"/"$STATE_SOURCE"

        noroots "$LIR_DIR"/"$STATE_SOURCE" \
                | ⟪ Drop double angle brackets ⟫ \
                | ⟪ Remove empty lines ⟫ \
                > "$tmpdir"/rootchunks
        ⟪ Remove special chunknames ⟫ "$tmpdir"/rootchunks \
                > "$tmpdir"/filechunks
        ⟪ Get source chunknames ⟫ "$tmpdir"/rootchunks \
                > "$tmpdir"/sourcechunks
        ⟪ Get sink chunknames ⟫ "$tmpdir"/rootchunks \
                > "$tmpdir"/sinkchunks
    
        < "$tmpdir"/filechunks lir-mtangle "$LIR_DIR"/"$STATE_SOURCE" "$LIR_DIR"
        < "$tmpdir"/sourcechunks lir-use "$LIR_DIR"

        if grep --quiet --line-regexp ':make.*' "$tmpdir"/rootchunks
        then
                noroots "$LIR_DIR"/"$STATE_SOURCE" \
                        | sed -n 's/^<<\(:make.*\)>>$/-R\1/p' \
                        | tr '\n' '\0' \
                        | xargs --null notangle -t8 \
                                "$LIR_DIR"/"$STATE_SOURCE" \
                        > "$LIR_DIR"/.makedag
        else
                echo '' > "$LIR_DIR"/.makedag
        fi

        echo -n 'all: ' > "$LIR_DIR"/.makeall
        < "$tmpdir"/sinkchunks tr '\n' ' ' >> "$LIR_DIR"/.makeall
        echo ";" >> "$LIR_DIR"/.makeall

        echo -n "$BASENAME.html: $STATE_SOURCE " > "$LIR_DIR"/.makehtml
        < "$tmpdir"/sinkchunks tr '\n' ' ' >> "$LIR_DIR"/.makehtml
        echo ";" >> "$LIR_DIR"/.makehtml

        # Check for a bib file and add bibliography
        LIRBIB=""
        if [ -r "$BASENAME".bib ]
        then
                BIBTARGET=$(readlink -f "$BASENAME".bib)
                LIRBIB="--bibliography=$BIBTARGET"
        fi
        echo -e "\t⟪ Path to lir ⟫"'/lir-weave '"$LIRBIB"' < $< > $@' \
                >> "$LIR_DIR"/.makehtml

        rm -r "$tmpdir"
        ;;
make)
        make --jobs --directory="$LIR_DIR" \
                --makefile=.makeall --makefile=.makedag \
                all
        ;;
weave)
        make --directory="$LIR_DIR" \
                --makefile=.makehtml \
                "$BASENAME".html
        cp --verbose --update ".lir/$BASENAME.html" "$BASENAME.html"
        ;;
*)
        echo "ERROR: unknown command $1"
        ;;
esac

fi
# if [ "$#" == "0" ]

◼

Tangling

C16: ⟪ Define functions used by lir ⟫ ≡
Appears in C15

⟪ lir-use ⟫
⟪ lir-mtangle ⟫

◼

C17: ⟪ lir-mtangle ⟫ ≡
Appears in C16

lir-mtangle () {
while IFS='' read -r chunk
do
        notangle -t8 -R"$chunk" "$1" | cpif "$2/$chunk"
done
}

◼

Not all data in an analysis naturally fits into the lir source file. Examples already mentioned above are the input data, or an externally maintained file like a bibliography.

Taking in use such a file means simply making a symbolic link to the file in the working direcory. However, keep in mind that the externally maintained file still needs to be in the project directory, and its name should on a path relative to the project directory. The make utility will check the timestamp on the target of the link, so when a “used” file changes, this will be taken into consideration.

The argument to the function is the directory in which the symbolic links will be made. The file names are read from standard input, one per line.

C18: ⟪ lir-use ⟫ ≡
Appears in C16

lir-use () {
while IFS='' read -r used
do
        useddir=$(dirname "$used")
        mkdir  --verbose --parents "$1/$useddir"
        original=$(readlink --canonicalize "$used")
        if [ -f "$original" ]
        then
                ln --symbolic --force "$original" "$1/$used"
        else
                >&2 echo "LIR ERROR: File not found! ($original)"
                exit 1
        fi
done
}

◼

C19: ⟪ Drop double angle brackets ⟫ ≡
Appears in C15

sed -n 's/^<<\(.*\)>>$/\1/p'

◼

C20: ⟪ Remove empty lines ⟫ ≡
Appears in C15

sed -n '/^\s*$/!p'

◼

C21: ⟪ Remove special chunknames ⟫ ≡
Appears in C15

sed -n '/^:/!p'

◼

C22: ⟪ Get source chunknames ⟫ ≡
Appears in C15

sed -n 's/^:source \(.\+\)$/\1/p'

◼

C23: ⟪ Get sink chunknames ⟫ ≡
Appears in C15

sed -n 's/^:\(figure\|table\|listing\|result\) \(.\+\)/\2/p'

◼

Source transformations

C24: ⟪ Nameless code chunks as continuations .awk ⟫ ≡
Appears in C15

BEGIN {
        ⟪ Setup awk to treat lines as single-field records ⟫
        last_name = "<<>>="
}
/^<<>>=$/ { $0 = last_name }
/^<<.+>>=$/ { last_name = $0 }
{ print $0 }

◼

C25: ⟪ Close all codechunks with @ .awk ⟫ ≡
Appears in C15

BEGIN {
        ⟪ Setup awk to treat lines as single-field records ⟫
        in_chunk = 0
}
/⟪ Begin code awk regex ⟫/ {
        if (in_chunk) print "@"
        in_chunk = 1
}
/⟪ End code awk regex ⟫/ { in_chunk = 0 }
{ print $0 }

◼

The contents of each code chunk named :eval will be evaluated by an invocation of bash. Whatever is written to standard output will be pasted as it is in place of the code chunk before anything else is done with the source file.

C26: ⟪ Evaluate code chunk contents .awk ⟫ ≡
Appears in C15

BEGIN {
        ⟪ Setup awk to treat lines as single-field records ⟫
}
/^<<:eval>>=$/ {
        ⟪ Execute contents as standard input of [[evalcmd]] ⟫
        ⟪ Write output of [[evalcmd]] to awk’s output ⟫
        next
}
{ print $0 }

◼

To make awk treat lines of input as records with one field each, set both the record and the field separator to the newline character. With these settings, an empty line (nothing but a new line) will have 0 fields (NF == 0) and a length of 0 (length($0) == 0).

C27: ⟪ Setup awk to treat lines as single-field records ⟫ ≡
Appears in C24, C25, C26

RS="\n"
FS="\n"

◼

C28: ⟪ Execute contents as standard input of [[evalcmd]] ⟫ ≡
Appears in C26

while (getline > 0 && $0 !~ /⟪ End code awk regex ⟫/)
        print $0 |& evalcmd
close(evalcmd, "to")

◼

C29: ⟪ Write output of [[evalcmd]] to awk’s output ⟫ ≡
Appears in C26

save_rs = RS
RS = "^$"
evalcmd |& getline x
close(evalcmd)
printf "%s", x
RS = save_rs

◼

C30: ⟪ Begin code awk regex ⟫ ≡
Appears in C25

^<<.*>>=$

◼

C31: ⟪ End code awk regex ⟫ ≡
Appears in C25, C28

^@

◼

Weaving

Weaving means generating the final human readable documentation from the lir source file. The original source is transformed by several filters before passed to Pandoc for generating an HTML document that can be viewed in a browser. Everything is wrapped as a bash script that will be installed, along with ⟪ lir ⟫, in LIR_BINPATH. It reads the lir source from standard input and writes the HTML to standard output.

C32: ⟪ lir-weave ⟫ ≡
root chunk

⟪ Bash header ⟫
⟪ Weave lir source to HTML ⟫

◼

C33: ⟪ Weave lir source to HTML ⟫ ≡
Appears in C32

⟪ Source to noweb pipeline representation ⟫ \
| ⟪ Noweb pipeline representation to Pandoc source ⟫ \
| ⟪ Pandoc source to HTML ⟫

◼

A custom noweb pipeline is used, as we definitely want to be able to use “anonymous” chunks as continuations of the previous chunk. Since tabs can be significant, for example in Makefiles, those are preserved (the -t option to markup). The -delay option to noidx makes sure that the index of code chunks is emitted before the last document chunk. This is necessary to be able to generate a bibliography with Pandoc.

C34: ⟪ Source to noweb pipeline representation ⟫ ≡
Appears in C33

"⟪ Path to noweb ⟫"/markup -t \
        | "⟪ Path to noweb ⟫"/noidx -delay

◼

To make it marginally easier to deal with the pipeline representation, we completely remove any empty text tokens from it before tranforming it to Pandoc source. We then use ⟪ nwpipe-pandoc.pl ⟫ to transform the noweb pipeline representation of the lir source file to a valid Pandoc source file.

C35: ⟪ Noweb pipeline representation to Pandoc source ⟫ ≡
Appears in C33

sed -n '/^@text $/!p' \
| swipl -q -g main -t "halt(2)" ⟪ Path to lir ⟫/nwpipe-pandoc.pl

◼

C36: ⟪ Pandoc source to HTML ⟫ ≡
Appears in C33

pandoc "$@" \
        --preserve-tabs \
        --from=markdown \
        --table-of-contents \
        --standalone \
        --self-contained \
        --to=html5+smart \
        --css="⟪ Path to lir ⟫"/lir.css \
        --output=-

◼

Lir to Pandoc

The program ⟪ nwpipe-pandoc.pl ⟫ is written in Prolog, using the SWI-Prolog implementation (Wielemaker et al. 2012). It is written as a stand-alone command line program: once compiled, can be run from the command line as a part of a pipe. It uses the module ⟪ driver.pl ⟫ to do the actual work.

C37: ⟪ nwpipe-pandoc.pl ⟫ ≡
root chunk

⟪ No banner or informational messages ⟫
⟪ Expand meta-predicates at compile time ⟫
⟪ Succeed once without choice points ⟫

:- use_module(driver, [nwpipeline_pandoc/1]).

main :-
        current_prolog_flag(argv, Argv), % command line arguments
        goal_is_det(nwpipeline_pandoc(Argv)),
        halt. % exit with success
main :-
        throw(error(mode_error(failure),_)),
        halt(1). % otherwise, exit with failure (1)

◼

During development, it is good to know if the program has left behind any unintentional choice points – they are certainly errors, as the program is designed to succeed deterministically. They would be “hidden” by the halt/0 used at the end of the main program.

C38: ⟪ Succeed once without choice points ⟫ ≡
Appears in C37

goal_is_det(Goal) :-
        setup_call_cleanup(true, Goal, Det = true),
        (       Det == true
        ->      true
        ;       !,
                throw(error(mode_error(notdet),_))
        ).

◼

Prolog programs are usually used from the “top level” (the Prolog REPL), not as command-line tools. We need to explicitly turn off the REPL banner and informational messages written to standard output.

C39: ⟪ No banner or informational messages ⟫ ≡
Appears in C37

:- set_prolog_flag(verbose, silent).

◼

Meta-predicates (predicates that take other predicates as arguments) are fundamentally slow: they need to evaluate the passed predicate dynamically. This can be avoided for the more commonly used meta-predicates (maplist/1+N, forall/2 etc) by treating them as macros and expanding them at compile time.

C40: ⟪ Expand meta-predicates at compile time ⟫ ≡
Appears in C37

:- use_module(library(apply_macros)).

◼

The driver uses ⟪ nwpipe.pl ⟫ to parse its input to native Prolog terms and ⟪ lirhtml.pl ⟫ to emit a valid Pandoc source document with embedded HTML fragments.

C41: ⟪ driver.pl ⟫ ≡
root chunk

:- module(driver, [nwpipeline_pandoc/1]).

:- use_module(nwpipe, [nwpipe_term/2]).
:- use_module(lirhtml, [emit_html/2]).

nwpipeline_pandoc(_) :-
        I = user_input,
        read_string(I, _, S),
        split_string(S, "\r\n", "\r\n", Str_Ls),
        maplist(string_codes, Str_Ls, Ls),
        nwpipe_term(Ls, T),
        emit_html(user_output, T).

◼

Parse the noweb pipeline

This module is used by the ⟪ driver.pl ⟫ to convert the list of lines representing the line-oriented noweb pipeline to a Prolog term. This is done in two steps.

C42: ⟪ nwpipe.pl ⟫ ≡
root chunk

:- module(nwpipe, [nwpipe_term/2]).

nwpipe_term(Pipe, Doc) :-
        maplist(nwtoken_term, Pipe, Terms),
        phrase(lir_doc(Doc), Terms).

⟪ Noweb token → Prolog term ⟫
⟪ Flat list of terms → structured term ⟫

◼

A noweb token in the noweb pipeline representation (Ramsey 1992) is parsed to become a Prolog term. Here, we use an approach to parsing them described in “The Craft of Prolog” (O’Keefe 1990). We use the keyword at the beginning of each token to look up (deterministically) in a table the list of items that this token contains. Then, each item is converted to a Prolog term using a mini-interpreter for the “language” of noweb tokens.

C43: ⟪ Noweb token → Prolog term ⟫ ≡
Appears in C42

:- use_module(library(dcg/basics), [nonblanks//1, integer//1]).

nwtoken_term([0'@|Token], Term) :-
        phrase(token(Back, Term), Token, Back).

token(Back, Term) -->
        nonblanks(NB),
        {       atom_codes(Key, NB),
                keyword(Key, Items, Back, Term)
        },
        items(Items).
⟪ Keyword table ⟫
⟪ Mini-interpreter for items ⟫

◼

C44: ⟪ Keyword table ⟫ ≡
Appears in C43

⟪ Structural keywords ⟫
⟪ Tagging keywords ⟫

◼

C45: ⟪ Structural keywords ⟫ ≡
Appears in C44

keyword(begin, [space, chunk_kind(K)], _, K).
keyword(end, [], _, end).
keyword(text, [space, string_rest(S, Rest)], Rest, text(S)).
keyword(nl, [], [], nl).
keyword(defn, [space, string_rest(S, Rest)], Rest, defn(S)).
keyword(use, [space, string_rest(S, Rest)], Rest, use(S)).
keyword(quote, [], [], quote).
keyword(endquote, [], [], endquote).

◼

C46: ⟪ Tagging keywords ⟫ ≡
Appears in C44

keyword(file, [space, atom_rest(File, Rest)], Rest, file(File)).
keyword(xref, [space, xref(XRef, Rest)], Rest, XRef).
keyword(index, [space, index(Index, Rest)], Rest, Index).

⟪ Cross-referencing keyword table ⟫
⟪ Index table ⟫

◼

C47: ⟪ Cross-referencing keyword table ⟫ ≡
Appears in C46

⟪ Basic cross-referencing ⟫
⟪ Linking previous and next definitions of a code chunk ⟫
⟪ Continued definitions of the current chunk ⟫
⟪ The list of all code chunks ⟫
⟪ Chunks where the code is used ⟫

◼

C48: ⟪ Index table ⟫ ≡
Appears in C46

index(beginindex, [], [], index_beginindex).
index(endindex, [], [], index_endindex).

◼

C49: ⟪ Basic cross-referencing ⟫ ≡
Appears in C47

xref(label, [space, atom_rest(L, Rest)], Rest, xref_label(L)).
xref(ref, [space, atom_rest(L, Rest)], Rest, xref_ref(L)).

◼

C50: ⟪ Linking previous and next definitions of a code chunk ⟫ ≡
Appears in C47

xref(prevdef, [space, atom_rest(L, Rest)], Rest, xref_prevdef(L)).
xref(nextdef, [space, atom_rest(L, Rest)], Rest, xref_nextdef(L)).

◼

C51: ⟪ Continued definitions of the current chunk ⟫ ≡
Appears in C47

xref(begindefs, [], [], xref_begindefs).
xref(defitem, [space, atom_rest(L, Rest)], Rest, xref_defitem(L)).
xref(enddefs, [], [], xref_enddefs).

◼

C52: ⟪ Chunks where the code is used ⟫ ≡
Appears in C47

xref(beginuses, [], [], xref_beginuses).
xref(useitem, [space, atom_rest(L, Rest)], Rest, xref_useitem(L)).
xref(enduses, [], [], xref_enduses).
xref(notused, [space, string_rest(Name, Rest)], Rest, xref_notused(Name)).

◼

C53: ⟪ The list of all code chunks ⟫ ≡
Appears in C47

xref(beginchunks, [], [], xref_beginchunks).
xref(chunkbegin,
        [space, atom(L), space, string_rest(Name, Rest)],
        Rest,
        xref_chunkbegin(L, Name)).
xref(chunkuse, [space, atom_rest(L, Rest)], Rest, xref_chunkuse(L)).
xref(chunkdefn, [space, atom_rest(L, Rest)], Rest, xref_chunkdefn(L)).
xref(chunkend, [], [], xref_chunkend).
xref(endchunks, [], [], xref_endchunks).

◼

Using the tables above, we have looked up the exact items that we expect in a noweb token. This is a small interpreter that takes the list of items and converts each item to the corresponding Prolog term.

C54: ⟪ Mini-interpreter for items ⟫ ≡
Appears in C43

items([]) --> [].
items([I|Is]) -->
        item(I),
        items(Is).

⟪ Individual items ⟫

◼

C55: ⟪ Individual items ⟫ ≡
Appears in C54

⟪ Atoms and text ⟫
⟪ “Space” ⟫
⟪ Chunk number ⟫
⟪ Chunk kind ⟫
⟪ Cross-reference ⟫

◼

Identifiers are converted to atoms, while “text” (text and names) are converted to strings:

C56: ⟪ Atoms and text ⟫ ≡
Appears in C55

item(atom(A)) -->
        nonblanks(Codes),
        {   atom_codes(A, Codes)
        }.

% Using when/2 here is probably an ugly hack.
% Could not figure out a better way to deal with
% "rest of line" situations.
item(atom_rest(A, Rest)) -->
        {   when(ground(Rest), atom_codes(A, Rest))
        }.

item(string_rest(Str, Rest)) -->
        {   when(ground(Rest), string_codes(Str, Rest))
        }.

◼

We assume that ⟪ “Space” ⟫ is represented by a single “space” character; the “Hacker’s Guide” is not explicit about this, but so far it has always worked.

C57: ⟪ “Space” ⟫ ≡
Appears in C55

item(space) --> [0'\s]. % could it be another "white"?...

◼

C58: ⟪ Chunk number ⟫ ≡
Appears in C55

item(chunk_number(N)) --> integer(N).

◼

Code and documentation chunks are delimited by the same keywords; the ⟪ Chunk kind ⟫ is encoded in a secondary keyword:

C59: ⟪ Chunk kind ⟫ ≡
Appears in C55

item(chunk_kind(CK)) -->
        nonblanks(Codes),
        {   atom_codes(CK, Codes)
        }.

◼

C60: ⟪ Cross-reference ⟫ ≡
Appears in C55

item(xref(XRef, Rest)) -->
        nonblanks(Codes),
        {   atom_codes(X, Codes),
                xref(X, Items, Rest, XRef)
        },
        items(Items).
item(index(Index, Rest)) -->
        nonblanks(Codes),
        {   atom_codes(X, Codes),
                index(X, Items, Rest, Index)
        },
        items(Items).

◼

Convert the pipeline to a term

The structure of the document is implicitly present in the noweb pipeline representation: documentation and code chunks are delimited by a start and an end token, new lines and other formatting are contained in text tokens, and so on. The pipeline representation also contains cross-referencing information, using labels and references to them. It is however a bit inconvenient to use this representation for generating markup. This is because the order of tokens in the pipeline representation exactly mirrors the layout of the final human-readable documentation, as generated by noweb, and lir uses a slightly different layout.

At that point, we have a list of Prolog terms. Those are easier to deal with in the context of Prolog, as it is much easier to make the rules deterministic. The general approach is to consume the next input and use it as the first argument to a rule, thus taking advantage of Prolog’s first-argument indexing. Then, the rules defined below describe a state machine where each rule is a state and each rule clause is a transition that depends on the last input.

On the highest level, the noweb pipeline is made of a sequence of files. Note, however, that since in lir the input is standard input (and not a list of files), there will be only one, unnamed file in the pipeline.

C61: ⟪ Flat list of terms → structured term ⟫ ≡
Appears in C42

lir_doc(L) --> [file('')],
        lir_rest(L).

lir_rest(L) --> [X], !,
        lir_file(X, L).
lir_rest([]) --> [].

⟪ Parse a file ⟫

◼

A file is a sequence of documentation and code chunks. It will contain a list of chunks and an index. To parse the contents of display item meta-data, we use the ⟪ yaml.pl ⟫ module.

C62: ⟪ Parse a file ⟫ ≡
Appears in C61

:- use_module(yaml, [yaml_to_dict/2]).

lir_file(docs, L) --> [X],
        docs(X, L).
lir_file(code, [Code|L]) -->
        code(C, Name, Label, M),
        {   (   Name = name(N)
                ->  Code = code_name_label_meta(C, N, Label, M)
                ;   Name = display(KW, FN_str)
                ->  code_codelist(C, Codes),
                        yaml_to_dict(Codes, Display_meta),
                        Code = display(KW, FN_str, Display_meta, Label)
                )
        },
        lir_rest(L).
lir_file(nl, [nl|L]) -->
        lir_rest(L).
lir_file(xref_beginchunks, [chunks_list(Cs)|L]) --> [X],
        xref_chunks(X, Cs),
        lir_rest(L).
lir_file(index_beginindex, [index(I)|L]) --> [X],
        index_list(X, I),
        lir_rest(L).

code_codelist(C, L) :-
        maplist(code_atomic, C, L0),
        atomics_to_string(L0, S0),
        string_codes(S0, L).

code_atomic(text(Text), Text).
code_atomic(nl, "\n").

⟪ Parse a documentation chunk ⟫
⟪ Parse a code chunk ⟫
⟪ Parse the list of chunks ⟫
⟪ Parse the index ⟫

◼

A module that allows the user to parse a codelist containing YAML data to a Prolog dictionary containing a native representation of the YAML data. This module makes use of SWI-Prolog’s library dcg/basics.

C63: ⟪ yaml.pl ⟫ ≡
root chunk

:- module(yaml, [yaml_to_dict/2]).

:- use_module(library(dcg/basics), [white//0,
                                                                        whites//0,
                                                                        string_without//2]).

yaml_to_dict(Codes, Dict) :-
        phrase(yaml_es(Es), Codes),
        dict_create(Dict, yaml, Es).

⟪ Make a list of YAML key-value pairs ⟫

◼

C64: ⟪ Make a list of YAML key-value pairs ⟫ ≡
Appears in C63

yaml_es([K-V|Es]) --> yaml_key_val(K, V), !, yaml_es(Es).
yaml_es([]) --> [].

⟪ Parse one YAML key-value pair ⟫

◼

The keyword starts at the very beginning of the line, does not contain any blank characters (space, tab, newline…), and is ended by a colon followed by a single space. The character that follows that space, here called C, determines if this is a block scalar or not.

C65: ⟪ Parse one YAML key-value pair ⟫ ≡
Appears in C64

yaml_key_val(K, V) --> yaml_key_codes(KCs), ": ", !,
        {   atom_codes(K, KCs)
        },
        [C],
        yaml_val(C, V).

⟪ Get YAML key ⟫
⟪ Get YAML value ⟫

◼

C66: ⟪ Get YAML key ⟫ ≡
Appears in C65

yaml_key_codes([C|Cs]) --> [C],
        {   code_type(C, graph)
        },
        yaml_key_codes(Cs).
yaml_key_codes([]) --> [].

◼

C67: ⟪ Get YAML value ⟫ ≡
Appears in C65

yaml_val(0'|, V) --> "\n", !,
        yaml_indented_block(V_codes),
        {   string_codes(V, V_codes)
        }.
yaml_val(C, V_str) --> string_without("\n", Cs), "\n",
        {   string_codes(V_str, [C|Cs])
        }.

⟪ Read YAML indented block ⟫

◼

C68: ⟪ Read YAML indented block ⟫ ≡
Appears in C67

yaml_indented_block(Block) --> white, whites, !,
        indented_lines(Block).
yaml_indented_block([]) --> [].

indented_lines([C|Cs]) --> [C],
        {   C \== 0'\n
        },
        !,
        indented_lines(Cs).
indented_lines([0'\n|Block]) --> [0'\n], !,
        yaml_indented_block(Block).

◼

C69: ⟪ :make ⟫ ≡
⇧ C7

test-display: lir.lir
	ls -l -h lir.lir > test-display

◼

C70: ⟪ Parse a documentation chunk ⟫ ≡
Appears in C62
definition continued in ↓ C71

docs(end, L) -->
        lir_rest(L).
docs(text(T), [text(Text)|L]) -->
        docs_text(Ts),
        {   atomics_to_string([T|Ts], Text)
        },
        [X],
        docs(X, L).
docs(nl, [nl|L]) --> [X],
        docs(X, L).
docs(quote, [quote(Q)|L]) --> [X],
        quote(X, Q),
        [Y],
        docs(Y, L).

⟪ Parse quoted code ⟫

⇩ C71

Consequitive text tokens that are not interrupted by any other structural contents are collected and concatenated.

C71: ⟪ Parse a documentation chunk ⟫ ≡
⇧ C70

docs_text([T|Ts]) --> [text(T)], !,
        docs_text(Ts).
docs_text(["\n"|Ts]) --> [nl], !,
        eol(Ts).
docs_text([]) --> [].

eol([T|Ts]) --> [text(T)], !,
        docs_text(Ts).
eol([]) --> [].

◼

C72: ⟪ Parse quoted code ⟫ ≡
Appears in C70

quote(text(T), [text(T)|Q]) --> [X],
        quote(X, Q).
quote(xref_ref(L), [quote_use(N, L)|Q]) --> [use(N), X],
        quote(X, Q).
quote(endquote, []) --> [].

◼

C73: ⟪ Parse a code chunk ⟫ ≡
Appears in C62

code(Cs, Defn, L, M) -->
        [X],
        defn(X, M_pairs0),
        {   selectchk(label(L), M_pairs0, M_pairs1),
                selectchk(defn(Defn), M_pairs1, M_pairs),
                dict_create(M, code, M_pairs)
        },
        code_content(Cs).

⟪ Parse the code chunk header ⟫
⟪ Parse the code chunk contents ⟫

◼

C74: ⟪ Parse the code chunk header ⟫ ≡
Appears in C73

defn(nl, []) --> [].
defn(xref_label(L), [label(L)|M]) --> [X],
        defn(X, M).
defn(xref_ref(L), [ref(L)|M]) --> [X],
        defn(X, M).
defn(defn(N), [defn(Name)|M]) -->
        {   (   display_item(N, KW, FN_str)
                ->  Name = display(KW, FN_str)
                ;   Name = name(N)
                )
        },
        [X],
        defn(X, M).
defn(xref_notused(_N), [uses(notused)|M]) --> [X],
        defn(X, M).
defn(xref_beginuses, [uses(Us)|M]) --> [X],
        uses(X, Us, Us),
        [Y],
        defn(Y, M).
defn(xref_prevdef(L), [prev(L)|M]) --> [X],
        defn(X, M).
defn(xref_nextdef(L), [next(L)|M]) --> [X],
        defn(X, M).
defn(language(L), [language(L)|M]) --> [X],
        defn(X, M).
defn(xref_begindefs, [defs(Ds)|M]) --> [X],
        defs(X, Ds, Ds),
        [Y],
        defn(Y, M).

⟪ Treat display items differently ⟫
⟪ Parse defs and uses ⟫

◼

C75: ⟪ Treat display items differently ⟫ ≡
Appears in C74

display_item(Name, KW, FN_str) :-
        sub_string(Name, 0, 1, _, ":"),
        once( sub_string(Name, Before_sep, 1, After_sep, " ") ),
        Length_kw is Before_sep - 1,
        sub_string(Name, 1, Length_kw, _After_kw, KW_str),
        display_item_keyword(KW_str, KW),
        Before_fn is Before_sep + 1,
        sub_string(Name, Before_fn, After_sep, 0, FN_str).

/* Seems this is deterministic when 1st argument is a string */
display_item_keyword("source", source).
display_item_keyword("result", result).
display_item_keyword("figure", figure).
display_item_keyword("listing", listing).
display_item_keyword("table", table_name).

◼

C76: ⟪ Parse the code chunk contents ⟫ ≡
Appears in C73

code_content([text(Text)|Cs]) -->
        text_token(T), !,
        code_text(Ts),
        {   atomics_to_string([T|Ts], Text)
        },
        code_content(Cs).
code_content([code_use(L, R, N)|Cs]) -->
        [xref_label(L), xref_ref(R), use(N)], !,
        code_content(Cs).
code_content([]) --> [end].

text_token("\n") --> [nl].
text_token(T) --> [text(T)].
    
code_text([T|Ts]) -->
        text_token(T), !,
        code_text(Ts).
code_text([]) --> [].

◼

C77: ⟪ Parse defs and uses ⟫ ≡
Appears in C74

uses(xref_enduses, _, []) --> [].
uses(xref_useitem(L), Us, [L|Us0]) --> [X],
        uses(X, Us, Us0).

defs(xref_enddefs, _, []) --> [].
defs(xref_defitem(L), Ds, [L|Ds0]) --> [X],
        defs(X, Ds, Ds0).

◼

C78: ⟪ Parse the list of chunks ⟫ ≡
Appears in C62

xref_chunks(xref_endchunks, []) --> [].
xref_chunks(xref_chunkbegin(L, N), [chunk(L, N, Us, Ds)|Cs]) --> [X],
        xref_chunk(X, Us, Ds),
        [Y],
        xref_chunks(Y, Cs).

xref_chunk(xref_chunkend, [], []) --> [].
xref_chunk(xref_chunkuse(U), [chunkuse(U)|Us], Ds) --> [X],
        xref_chunk(X, Us, Ds).
xref_chunk(xref_chunkdefn(D), Us, [chunkdefn(D)|Ds]) --> [X],
        xref_chunk(X, Us, Ds).

◼

C79: ⟪ Parse the index ⟫ ≡
Appears in C62

index_list(index_endindex, []) --> [].

◼

Emit Pandoc HTML

C80: ⟪ lirhtml.pl ⟫ ≡
root chunk
definition continued in ↓ C92 + ⇣ C93 + ⇣ C94 + ⇣ C95

:- module(lirhtml, [emit_html/2]).

:- use_module(library(http/html_write)).

emit_html(Out, L) :-
        counters_db(L, DB),
        phrase(lir_pandoc(P, DB), L),
        phrase(html(P), H),
        print_html(Out, H).

⟪ counters_db/2: Generate counters, add them to a database ⟫
⟪ lir_pandoc//2: Lir to Pandoc HTML ⟫

⇩ C92

Add to a database all numbered items: code chunks, display items (figure, listing, table), as well as sources and sinks of the DAG representing the dataflow. For the code chunks, we also format the names. This is necessary because code chunk names are inside HTML tags by the time they make it to Pandoc, and are not considered for converting from markdown: instead, we do this here.

C81: ⟪ counters_db/2: Generate counters, add them to a database ⟫ ≡
Appears in C80

counters_db(L, DB) :-
        include(is_code_name_label_meta, L, CNLMs),
        maplist(code_name_label_meta_Name_Label, CNLMs, Ns, Ls),
        names_fmt_fmtd(Ns, html5, FNs),
    
        pairs_keys_values(LFs, Ls, FNs),
        dict_create(CC_names, name, LFs),
        findall(Label-Nr, nth1(Nr, Ls, Label), LNrs),
        dict_create(CC_nrs, nr, LNrs),

        maplist(display_dict(L),
                        [source, result, listing, figure, table_name],
                        [DSNrs, DRNrs, DLNrs, DFNrs, DTNrs]),

        maplist(dict_create,
                        [CodeD, SourceD, ResultD, ListingD, FigureD, TableD],
                        [code,  source,  result,  listing,  figure,  table_name],
                        [[name:CC_names, nr:CC_nrs],
                          [namestr:"S", nr:DSNrs],
                          [namestr:"Result ", nr:DRNrs],
                          [namestr:"Listing ", nr:DLNrs],
                          [namestr:"Figure ", nr:DFNrs],
                          [namestr:"Table ", nr:DTNrs]]),

        dict_create(DB, db,
                                [code:CodeD,
                                  source:SourceD,
                                  result:ResultD,
                                  listing:ListingD,
                                  figure:FigureD,
                                  table_name:TableD]).

is_code_name_label_meta(code_name_label_meta(_,_,_,_)).
code_name_label_meta_Name_Label(code_name_label_meta(_,N,L,_), N,L).

display_dict(L, Item, Dict) :-
        include(is_display_item(Item), L, Ls),
        findall(Label-Nr,
                        nth1(Nr, Ls, display(Item, _, _, Label)),
                        DNrs),
        dict_create(Dict, nr, DNrs).

is_display_item(Item, display(Item,_,_,_)).
    
⟪ Use Pandoc to pre-format chunk names ⟫

◼

C82: ⟪ Use Pandoc to pre-format chunk names ⟫ ≡
Appears in C81

:- use_module(library(process), [process_create/3]).
:- use_module(library(sgml), [load_structure/3]).

to_par(S, PS) :- atomics_to_string(["<span>", S, "</span>"], PS).

names_fmt_fmtd([], html5, []).
names_fmt_fmtd([N|Ns], html5, FNs) :-
        maplist(to_par, [N|Ns], PNs),
        atomics_to_string(PNs, Ns_str),
        process_create(path(pandoc),
                                      ["--to=html5"],
                                      [stdin(pipe(Pandoc_in)),
                                        stdout(pipe(Pandoc_out))]),
        format(Pandoc_in, "~s", [Ns_str]),
        close(Pandoc_in),
        load_structure(Pandoc_out, DOM, [dialect(xml)]),
        close(Pandoc_out),
        (   DOM = [element(p, [], Names)]
        ->  maplist(names_fmtd, Names, FNs)
        ;   FNs = []
        ).

:- use_module(library(sgml_write), [xml_write/3]).

names_fmtd(element(span, [], StrDOM), Fmtd) :-
        with_output_to(string(Fmtd),
                                      xml_write(current_output,
                                                          StrDOM,
                                                          [header(false), layout(false)])).

◼

C83: ⟪ : foo ⟫ ≡
root chunk

This is a test. There was a problem with chunks that start with a colon
followed by a space, as they were interpreted as definitions by Pandoc.

◼

C84: ⟪ lir_pandoc//2: Lir to Pandoc HTML ⟫ ≡
Appears in C80

lir_pandoc(P, DB) -->
        [X], !,
        lir_pandoc(X, P, DB).
lir_pandoc([], _DB) --> [].

⟪ Lir to Pandoc: individual items ⟫
⟪ HTML for common symbols ⟫

◼

C85: ⟪ Lir to Pandoc: individual items ⟫ ≡
Appears in C84

⟪ Lir documentation to Pandoc ⟫
⟪ Lir code chunks to Pandoc ⟫
⟪ Lir display items to Pandoc ⟫
⟪ Ignore the list of chunks and the index ⟫
⟪ Individual display items ⟫

◼

C86: ⟪ Lir documentation to Pandoc ⟫ ≡
Appears in C85

lir_pandoc(nl, ["\n"|P], DB) --> lir_pandoc(P, DB).
lir_pandoc(text(T), [\[T]|P], DB) --> lir_pandoc(P, DB).
lir_pandoc(quote(Q), [span(class(quote), QC)|P], DB) -->
        {   phrase(lir_pandoc(QC, DB), Q)
        },
        lir_pandoc(P, DB).
lir_pandoc(quote_use(Name, Label),
                        [span(class("quoteduse"),
                                    [\openparen, \thinnbsp,
                                      a(href("#~a"-Label), Name),
                                      \thinnbsp, \closeparen])|P], DB) -->
        lir_pandoc(P, DB).

◼

C87: ⟪ Lir code chunks to Pandoc ⟫ ≡
Appears in C85

lir_pandoc(code_name_label_meta(C, _, L, M),
                      [div(class("codechunk"),
                                [\chunk_defn(M, L, DB),
                                  \chunk_uses(M, DB),
                                  \chunk_defs(M, DB),
                                  \chunk_prev(M, DB),
                                  pre(\chunk_content(C, DB)),
                                  \chunk_next(M, DB)])|P], DB) -->
        lir_pandoc(P, DB).

◼

C88: ⟪ Lir display items to Pandoc ⟫ ≡
Appears in C85

lir_pandoc(display(KW, FN_str, Meta, Label),
                      [div([class("lirdisplay"),id(Label)],
                                [\display(KW, FN_str, Meta, Label, DB)])|P], DB) -->
        lir_pandoc(P, DB).

◼

C89: ⟪ Ignore the list of chunks and the index ⟫ ≡
Appears in C85

lir_pandoc(chunks_list(_Cs), P, DB) -->
        lir_pandoc(P, DB).
lir_pandoc(index(_), P, DB) --> [], lir_pandoc(P, DB).

◼

C90: ⟪ HTML for common symbols ⟫ ≡
Appears in C84

nbsp --> html(&(nbsp)).
thinnbsp --> html(span(style("white-space:nowrap"), &(0x2009))).
openparen --> html(&('Lang')). % Lang
closeparen --> html(&('Rang')). % Rang
prevsym --> html(&(0x21E7)). % UPWARDS WHITE ARROW
nextsym --> html(&(0x21E9)). % DOWNWARDS WHITE ARROW
contsym --> html(&(darr)).
contmoresym --> html(&(0x21E3)). % DOWNWARDS DASHED ARROW
defeq --> html(&(equiv)).
defend --> html(&(0x25FC)). % Black square

◼

C91: ⟪ Individual display items ⟫ ≡
Appears in C85

:- discontiguous display//5.

display(source, FN, _, Label, DB) -->
        display_file(source, FN, Label, DB).
display(result, FN, _, Label, DB) -->
        display_file(result, FN, Label, DB).

display_file(Item, FN, Label, DB) -->
        {   absolute_file_name(FN, Absolute_FN)
        },
        html(div(class("lir~a"-Item),
                          [span(class("~aname"-Item),
                                      ["~s~d:"-[DB.Item.namestr,
                                                          DB.Item.nr.Label]]),
                            \nbsp,
                            span(class("~afile"-Item),
                                      a(href(Absolute_FN), code(FN)))])).

display(listing, FN, Meta, Label, DB) -->
        {   file_contents(FN, FC)
        },
        html(div(class("lirlisting"),
                          [\display_header(listing, FN, Label, DB),
                            div(class("listingcontents"), pre([FC])),
                            \display_title_caption(listing, Meta)])).

file_contents(FN, FC) :-
        setup_call_cleanup(open(FN, read, In),
                                              read_string(In, _, FC),
                                              close(In)).

display_header(Item, FN, Label, DB) -->
        {   absolute_file_name(FN, Absolute_FN)
        },
        html(div(class("~aheader"-Item),
                          [span(class("~aname"-Item),
                                      ["~s~d:"-[DB.Item.namestr,
                                                          DB.Item.nr.Label]]),
                            \nbsp,
                            span(class("~afile"-Item),
                                      a(href(Absolute_FN), code(FN)))])).

display_title_caption(Item, M) -->
        html(div(class("~ameta"-Item),
                          [span(class("~atitle"-Item), [M.title]),
                            \display_optional_caption(Item, M)])).

display_optional_caption(Item, M) -->
        {   yaml{caption:Caption} :< M
        },
        !,
        html([" ", span(class("~acaption"-Item), [Caption])]).
display_optional_caption(_, _) --> [].

display(table_name, FN, Meta, Label, DB) -->
        {   file_contents(FN, FC)
        },
        html(div(class("lirtable"),
                          [\display_header(table_name, FN, Label, DB),
                            table_name([\[FC]]),
                            \display_title_caption(table_name, Meta)])).

display(figure, FN, Meta, Label, DB) -->
        html(div(class("lirfigure"),
                          [\display_header(figure, FN, Label, DB),
                            \figure_contents(FN),
                            \display_title_caption(figure, Meta)])).

figure_contents(FN) -->
        html(div(class("figurecontents"),
                          [img(src("~s"-FN))])).

◼

C92: ⟪ lirhtml.pl ⟫ ≡
⇧ C80

chunk_content([], _DB) --> [].
chunk_content([C|Cs], DB) -->
        chunk_content_(C, Cs, DB).

chunk_content_(nl, Cs, DB) -->
        html("\n"),
        chunk_content(Cs, DB).
chunk_content_(text(T), Cs, DB) -->
        html(T),
        chunk_content(Cs, DB).
chunk_content_(code_use(L, R, _), Cs, DB) -->
        html(span(class("embeddeduse"),
                            [\openparen, \thinnbsp,
                              a([id(L), href("#~a"-R)],
                                  [\[DB.code.name.R]]),
                              \thinnbsp, \closeparen])),
        chunk_content(Cs, DB).

⇩ C93

C93: ⟪ lirhtml.pl ⟫ ≡
⇧ C92

chunk_defn(_M, L, DB) -->
        html(span([class("defn"),id(L)],
                            [span(class("chunknr"), "C~d:"-DB.code.nr.L), \nbsp,
                              \openparen, \thinnbsp,
                              a(href("#~a"-L), \[DB.code.name.L]),
                              \thinnbsp, \closeparen, \thinnbsp, \defeq])).
chunk_uses(M, DB) -->
        {   code{uses:Us} :< M
        }, !,
        html(span(class("chunkuses"), \chunk_uses_(Us, DB))).
chunk_uses(_, _) --> [].

⇩ C94

C94: ⟪ lirhtml.pl ⟫ ≡
⇧ C93

chunk_uses_([U|Us], DB) -->
        html([br([]),
                  "Appears in ",
                  a(href("#~a"-U),["C~d"-DB.code.nr.U])]),
        chunk_uses_rest(Us, DB).
chunk_uses_(notused, _DB) --> html([br([]), "root chunk"]).
chunk_uses_rest([], _DB) --> [].
chunk_uses_rest([U|Us], DB) -->
        html([", ", a(href("#~a"-U), ["C~d"-DB.code.nr.U])]),
        chunk_uses_rest(Us, DB).

⇩ C95

C95: ⟪ lirhtml.pl ⟫ ≡
⇧ C94

chunk_next(M, DB) -->
        {   code{next:L} :< M
        }, !,
        html(span(class("chunknext"),
                  a(href("#~a"-L),
                      [\nextsym, \thinnbsp, "C~d"-DB.code.nr.L]))).
chunk_next(_, _DB) -->
        html(\defend).

chunk_prev(M, DB) -->
        {   code{prev:L} :< M
        },
        html([br([]),
                    span(class("chunkprev"),
                              a(href("#~a"-L),
                                  [\prevsym, \thinnbsp, "C~d"-DB.code.nr.L]))]).
chunk_prev(_, _DB) --> [].

chunk_defs(M, DB) -->
        {   code{defs:Ds} :< M
        }, !,
        html(span(class("chunkdefs"), \chunk_defs_(Ds, DB))).
chunk_defs(_, _) --> [].

chunk_defs_([D|Ds], DB) -->
        html([br([]),
                    "definition continued in ",
                    a(href("#~a"-D),
                        [\contsym, \thinnbsp, "C~d"-DB.code.nr.D])]),
        chunk_defs_rest(Ds, DB).
chunk_defs_rest([], _) --> [].
chunk_defs_rest([D|Ds], DB) -->
        html([" + ",
                    a(href("#~a"-D),
                        [\contmoresym, \thinnbsp, "C~d"-DB.code.nr.D])]),
        chunk_defs_rest(Ds, DB).

◼

Layout

The system tries to separate content and layout as much as possible. It also aims to provide a sensible default layout for the human-readable documentation.

Code chunk language

The program in ⟪ langs.cpp ⟫ can be used as a filter to noweb. It deduces the programming language of code chunks based on the names of the chunks. This is a table that maps known extensions to programming language names.

C96: ⟪ Ext-Lang ⟫ ≡
Appears in C111

{"pl", "prolog"},
{"sh", "bash"},
{"bash", "bash"},
{"cpp", "cpp"},
{"R", "R"},
{"awk", "awk"},
{"sed", "sed"},
{"css", "css"}

◼

The program implements a state machine with the states necessary to extract code chunk names and the uses within a code chunk. The start state is out; when a code chunk starts, it transitions to code, where it expects a code chunk name, and transitions to content; while in content, it collects uses, and goes back to out at the end of the code chunk. Throughout, each line from input is saved to a list of all lines.

C97: ⟪ langs.cpp ⟫ ≡
root chunk

⟪ Includes (langs) ⟫
⟪ Function definitions (langs) ⟫

int main()
{
        ⟪ Variable definitions (langs) ⟫

out:
{   
        ⟪ Out transitions ⟫
        goto out;
}

code:
{
        ⟪ Code transitions ⟫
        goto code;
}

content:
{
        ⟪ Content transitions ⟫
        ⟪ Content: collect uses ⟫
        goto content;
}

end:
{
        ⟪ Propagate language to uses ⟫
        ⟪ Output all lines, with language after each defn ⟫
        return 0;
}

error:
        return 1;
}

◼

When in out, end of input signals transition to the end state. A @begin code token signals a transition to code.

C98: ⟪ Out transitions ⟫ ≡
Appears in C97

if (!std::getline(std::cin, line)) goto end;
lines.push_back(line);

if (string_prefix(line, {"@begin code"})) goto code;

◼

When in code, end of input is an error. A defn token contains the code chunk name. The name is processed and the state machine transitions to content.

C99: ⟪ Code transitions ⟫ ≡
Appears in C97

if (!std::getline(std::cin, line)) goto error;
lines.push_back(line);

if (string_prefix_rest(line, {"@defn "}, name)) {
        ⟪ Process code chunk name ⟫
        goto content;
}

◼

A code chunk name is inserted to the DAG of code chunks. Initially the set of neighbours is empty. Note that std::map::insert will only insert if the key does not yet exist, so it is safe to do this.

C100: ⟪ Process code chunk name ⟫ ≡
Appears in C99

uses.insert({name, {}}); 
⟪ Try to set code chunk language ⟫

◼

If the language of the chunk can be guessed from the name, the chunk name and its language are recorded. The code chunk name is also added to the queue later used for the breadth-first traversal of the DAG of code chunks.

C101: ⟪ Try to set code chunk language ⟫ ≡
Appears in C100

std::string cl;
if (name_dict_lang(name, langs, cl)) {
        chunk_lang.insert({name, cl});
        pending.push(name);
}

◼

When in content, end of input is an error. An @end code token signals the transition back to out.

C102: ⟪ Content transitions ⟫ ≡
Appears in C97

if (!std::getline(std::cin, line)) goto error;
lines.push_back(line);

if (string_prefix(line, {"@end code"})) goto out;

◼

When in content, each @uses token is used to populate the set of neighbours of the current code chunk in the DAG.

C103: ⟪ Content: collect uses ⟫ ≡
Appears in C97

std::string un;
if (string_prefix_rest(line, {"@use "}, un)) uses[name].insert(un);

◼

The language of a code chunk can be deduced in two ways. First, from the code chunk name; this is done as soon as the code chunk is encountered for the first time (see ⟪ Try to set code chunk language ⟫). Second, if a code chunk without a known language is used by a code chunk with a language, it inherits it. To achieve this, a breadth-first traversal of the DAG of code chunks is used. Initially, the queue holds the names of all chunks with known languages (also done while reading). Any child code chunk that does not yet have a language has its language set to that of the parent, and is pushed to the back of the queue.

C104: ⟪ Propagate language to uses ⟫ ≡
Appears in C97

while (!pending.empty()) {
        std::string next = pending.front();
        pending.pop();

        for (auto c : uses[next]) {
                auto x = chunk_lang.find(c);
                if (x == chunk_lang.cend()) {
                        chunk_lang.insert({c, chunk_lang[next]});
                        pending.push(c);
                }
        }
}

◼

At the end, all tokens are emitted. If the token is defn, a language token is added right after it. For code chunks that don’t have a language, the txt language is used.

C105: ⟪ Output all lines, with language after each defn ⟫ ≡
Appears in C97

for (auto l : lines) {
        std::cout << l << '\n';

        if (string_prefix_rest(l, {"@defn "}, name)) {
                auto x = chunk_lang.find(name);
                if (x != chunk_lang.cend())
                        std::cout << "@language " << x->second << '\n';
                else
                        std::cout << "@language txt\n";
        }
}

◼

These variables are “global” to the main function, or in other words, are available to all states of the state machine.

C106: ⟪ Variable definitions (langs) ⟫ ≡
Appears in C97
definition continued in ↓ C112

⟪ A list of all lines as they are read ⟫
⟪ The DAG of the code chunks as adjacency-list ⟫
⟪ The language of each code chunk ⟫
⟪ A queue for the traversal of the chunks DAG ⟫
⟪ Mapping of extensions to languages ⟫

⇩ C112

C107: ⟪ A list of all lines as they are read ⟫ ≡
Appears in C106

std::list<std::string> lines{};

◼

C108: ⟪ The DAG of the code chunks as adjacency-list ⟫ ≡
Appears in C106

std::map<std::string, std::set<std::string>> uses{};

◼

C109: ⟪ The language of each code chunk ⟫ ≡
Appears in C106

std::map<std::string, std::string> chunk_lang{};

◼

C110: ⟪ A queue for the traversal of the chunks DAG ⟫ ≡
Appears in C106

std::queue<std::string> pending{};

◼

C111: ⟪ Mapping of extensions to languages ⟫ ≡
Appears in C106

const std::map<std::string, std::string> langs{⟪ Ext-Lang ⟫};

◼

Two strings, one for the last line that was read and one for the name of the last code chunk defined with defn:

C112: ⟪ Variable definitions (langs) ⟫ ≡
⇧ C106

std::string line;
std::string name;

◼

C113: ⟪ Includes (langs) ⟫ ≡
Appears in C97

#include <iostream>
#include <string>
#include <list>
#include <map>
#include <set>
#include <queue>

◼

C114: ⟪ Function definitions (langs) ⟫ ≡
Appears in C97

⟪ Does a string have a prefix? ⟫
⟪ Can you guess the language from the name? ⟫

◼

C115: ⟪ Does a string have a prefix? ⟫ ≡
Appears in C114
definition continued in ↓ C116

bool
string_prefix(const std::string& s, const std::string& t)
{
        if (0 == s.compare(0, t.length(), t)) return true;
        return false;
}

⇩ C116

C116: ⟪ Does a string have a prefix? ⟫ ≡
⇧ C115

bool
string_prefix_rest(const std::string& s, const std::string& t,
                                      std::string& rest)
{
        if (string_prefix(s, t)) {
                rest = s.substr(t.length());
                return true;
        }
        return false;
}

◼

C117: ⟪ Can you guess the language from the name? ⟫ ≡
Appears in C114

bool
name_dict_lang(const std::string& n,
                              const std::map<std::string, std::string>& d,
                              std::string& l)
{
        size_t x = n.find_last_of('.');
        if (x == std::string::npos) return false;

        ++x;
        auto ext_lang = d.find(n.substr(x));
        if (ext_lang == d.cend()) return false;

        l = ext_lang->second;
        return true;
}

◼

HTML

C118: ⟪ lir.css ⟫ ≡
root chunk

⟪ Text width and margin ⟫
⟪ Headers in sans-serif ⟫
⟪ Quoted code in monospace ⟫
⟪ Use names typesetting ⟫
⟪ Highlight chunk names on hover ⟫
⟪ List of chunks formatting ⟫
⟪ Display item formatting ⟫

◼

C119: ⟪ Text width and margin ⟫ ≡
Appears in C118

p {
        max-width: 14cm;
}
blockquote {
        max-width: 11cm;
        font-size: small;
}
ul, ol, dl,
span.chunkdefs {
        max-width: 12cm;
}
body {
        padding-left: 1cm;
}

◼

C120: ⟪ Headers in sans-serif ⟫ ≡
Appears in C118

h1, h2, h3 {
        font-family: sans-serif;
        max-width: 12cm;
}

◼

C121: ⟪ Use names typesetting ⟫ ≡
Appears in C118

span.sourcefile a,
span.resultfile a,
span.listingfile a,
span.figurefile a,
span.chunkdefs a,
span.chunkuses a,
span.chunkprev a,
span.chunknext a,
span.defn a,
span.quoteduse a,
span.embeddeduse a {
        text-decoration-line: none;
}
span.chunkdefs,
span.chunkuses,
span.chunkprev,
span.chunknext {
        font-size: small;
}
span.quoteduse {
        font-family: serif;
}
span.embeddeduse {
        font-family: serif;
}
span.chunknr {
        font-weight: bold;
}

◼

C122: ⟪ Highlight chunk names on hover ⟫ ≡
Appears in C118

span.quoteduse a {
        color: darkGreen;
}
span.quoteduse a:hover {
        color: limeGreen;
}
span.embeddeduse a {
        color: darkblue;
        font-style: italic;
}
span.embeddedusesym {
        font-style: normal;
}
span.embeddeduse a:hover {
        color: dodgerBlue;
}
span.sourcefile a,
span.resultfile a,
span.listingfile a,
span.figurefile a,
span.chunkdefs a,
span.chunkprev a,
span.chunknext a,
span.defn a,
span.chunkuses a {
        color: darkred;
}
span.sourcefile a:hover,
span.resultfile a:hover,
span.listingfile a:hover,
span.figurefile a:hover,
span.chunkdefs a:hover,
span.chunkprev a:hover,
span.chunknext a:hover,
span.defn a:hover,
span.chunkuses a:hover {
        color: darkGoldenRod;
}

◼

C123: ⟪ Quoted code in monospace ⟫ ≡
Appears in C118

span.quote {
        font-family: monospace;
}

◼

C124: ⟪ List of chunks formatting ⟫ ≡
Appears in C118

ul.chunkslist {
        list-style-type: square;
}

◼

C125: ⟪ Display item formatting ⟫ ≡
Appears in C118

div.lirtable th {
        border-style: none none solid none;
        border-width: 0 0 1px 0;
}
div.lirtable th,
div.lirtable td {
        padding-left: 3.3mm;
        padding-right: 3.3mm;
        padding-top: 0.9mm;
        padding-bottom: 0.9mm;
}
div.lirtable table {
        font-family: sans-serif;
        font-size: small;
        margin-top: 3mm;
        margin-bottom: 3mm;
        border-collapse: collapse;
        border-style: solid none solid none;
        border-width: 2px 0 1px 0;
}
div.lirlisting {
        max-width: 14cm;
}
span.sourcename,
span.resultname,
span.tablename,
span.figurename,
span.listingname {
        font-weight: bold;
}
div.figurecontents img {
        display: block;
        max-width: 14cm;
        width: auto;
        height: auto;
}
div.listingcontents {
        font-size: small;
        margin-top: 3mm;
        margin-bottom: 3mm;
        border-style: solid none solid none;
        border-width: 0.3mm;
}
span.sourcefile,
span.resultfile,
span.tablefile,
span.figurefile,
span.listingfile {
        font-style: italic;
}
div.tablemeta,
div.figuremeta,
div.listingmeta {
        max-width: 14cm;
        font-size: small;
}
span.tabletitle,
span.figuretitle,
span.listingtitle {
        font-weight: bold;
}
div.codechunk,
div.lirlisting,
div.lirtable,
div.lirfigure {
        margin-bottom: 3mm;
}

◼

Bibliography

O’Keefe, Richard A. 1990. The Craft of Prolog. Edited by Ehud Shapiro. The MIT Press.

Ramsey, Norman. 1992. “The Noweb Hacker’s Guide.” Departement of Computer Science Princeton University September, September. http://www.cs.tufts.edu/~nr/noweb/guide.ps.

Wielemaker, Jan, Tom Schrijvers, Markus Triska, and Torbjörn Lager. 2012. “SWI-Prolog.” Theory and Practice of Logic Programming 12 (1-2): 67–96.

Reproducible computing with Lir

Installation

Prerequisites

Recommended reading

Makefiles

Markdown

Literate Programming with `noweb`

Setting the installation paths

User’s Guide

Source file

Code chunks

Special code chunks

Sources and sinks

Display items

Make rules

Root code chunks

Executable code

Citations

Running Lir

State

Implementation

Making and installing

Makefile

Bash scripts

Overview

The wrapper script

Tangling

Source transformations

Weaving

`Lir` to Pandoc

Parse the `noweb` pipeline

Convert the pipeline to a term

Emit Pandoc HTML

Layout

Code chunk language

HTML

Bibliography

Installation

Prerequisites

Recommended reading

Makefiles

Markdown

Literate Programming with noweb

Setting the installation paths

User’s Guide

Source file

Code chunks

Special code chunks

Sources and sinks

Display items

Make rules

Root code chunks

Executable code

Citations

Running Lir

State

Implementation

Making and installing

Makefile

Bash scripts

Overview

The wrapper script

Tangling

Source transformations

Weaving

Lir to Pandoc

Parse the noweb pipeline

Convert the pipeline to a term

Emit Pandoc HTML

Layout

Code chunk language

HTML

Bibliography

Literate Programming with `noweb`

`Lir` to Pandoc

Parse the `noweb` pipeline