2020-01-30
Lir is a tool for reproducible computing. The name Lir stands for literate, reproducible computing. With Lir you can organize, document, and automate your work flow.
Any combination of programming languages and software platforms can be used in the same work flow. All executable code is maintained in the Lir source file, along with rules for generating the results of running each program. A Lir source file additionally contains the documentation of a project.
In this document, we provide detailed Lir installation instructions, followed by a User’s Guide, and the full implementation of Lir at the end.
This document is self-hosting: the implementation of Lir presented here can be used to produce the final human-readable version of this source file. This makes informal validation somewhat easier (“does the final document look right?”), but it also means that the only guarantee is that it implements enough of Lir to be able to compile itself.
You need a reasonably modern Linux/BSD operating system. Any of the major GNU/Linux and BSD distributions, and OS X fulfil that requirement. It should be possible to use it on Microsoft Windows 10, but at the moment this is not officially supported: however, if you are interested in trying to install Lir on a Windows 10, share your experience.
In particular, you will need an up-to-date version of the GNU Toolchain (autoconf
, make
, gcc
, binutils
, glibc
), as well as Bash along with the standard command line tools like grep
, sed
, and awk
(to name a few).
Furthermore, you need to install the following software:
Noweb, “A Simple, Extensible Tool for Literate Programming.” Available in the Ubuntu official repositories. You can also look at the Debian package or even attempt to compile it from source (if you do that, make sure you get version 2.11b).
SWI-Prolog, a comprehensive free Prolog environment. It is easy to install from source, and for Ubuntu, the maintainer provides a PPA. Just make sure you have the latest development or stable release, and not one of the very outdated releases available in the official repositories of most distributions.
Pandoc, a universal document converter. Installing from source is a bit of a hustle, but you don’t have to: just get the installer for the latest release. It is important that you have the latest (patched) release, as there have been a non-negative number of bugs in major releases that get in the way.
Lir makes use of existing tools for literate programming, making, and markup. A working knowledge of these tools is not required, but it is beneficial, at least at the beginning.
It would help you if you had some working knowledge of writing makefiles, as understood by GNU Make. Reading the official documentation is a good start; at least take a look at the first few chapters.
You need to have a basic idea of the concept of markdown. Armed with this knowledge, open the excellent manual provided by Pandoc and use it as a reference.
noweb
Understanding the concept of literate programming is a must. You may take a look at the original publication by Donald E. Knuth (pdf link); you can also skip this at first. The Wikipedia page for noweb
contains enough detail to get you acquainted with the syntax. A Lir source file is a valid noweb
source file.
Once you have all prerequisites, clone the Lir repository and change to the lir
directory:
$ git clone https://github.com/borisvassilev/lir.git
$ cd lir
Before you can install Lir, three paths that might be different for each installation need to be adjusted.
Figure out where you have installed the noweb
library files. If you installed from the official Ubuntu directories, this should be /usr/lib/noweb
.
If you cannot find this directory, the directory where the files were installed is mentioned towards the bottom of the nowebfilters
man page, under the FILES
section. So try this:
$ man nowebfilters
If for some reason you this doesn’t work either, you can try looking for one of the files in the library, for example:
$ locate emptydefn
If you cannot find a file called emptydefn
on your system, you have not installed noweb
correctly: please do so now.
Now that you know where the library files are, you need to open the Lir source file lir.lir
and edit the three paths on lines 96, 100, and 105 of this file.
First, the folder where all lir
components are installed:
C1: ⟪ Path to lir
⟫ ≡
Appears in C10, C15, C35, C36
"$HOME"/lib/lir◼
Then, the folder where the main lir
script will be installed; make sure it is on your system PATH
.
C2: ⟪ Path to lir
binaries ⟫ ≡
Appears in C10
"$HOME"/bin◼
Finally, the folder containing the noweb
library files. If you installed noweb
normally (and as root), this will read /urs/lib/noweb
.
C3: ⟪ Path to noweb
⟫ ≡
Appears in C10, C34
"$HOME"/lib/noweb◼
Now save and close the file lir.lir
, and run:
$ bash bootstrap
This will extract and install all Lir files to your computer, and use the installed Lir to compile this source file to an HTML document. You can use the command line above to reinstall Lir if you update it, or if you change the source.
Lir defines a syntax for defining a reproducible computation, and the tools for running the computation on the input files to obtain results. You must have already read the Tutorial; you can use this Guide as a reference.
The Lir source file is a valid noweb
source file. Lir adds semantics by recognizing and interpreting keywords embedded in code chunk names. A Lir source file has the extension “.lir
”. It is a plain text file. Within this file, there are code chunks, as defined by noweb
, and documentation chunks (everything between and around code chunks). Documentation chunks may be formatted and structured with markdown as understood by Pandoc.
A code chunk starts with a code chunk header, followed by the contents of the code chunk, and a code chunk footer. The code chunk header is the string “<<Code chunk name>>=
” on a line by itself. The code chunk footer is the string “@
” on a line by itself. If the code chunk must contain a line starting with “<<
” or “@
”, use an additional “@
” to escape these: “@<<
” and “@@
”.
A code chunk ends with a footer line or with the header line of another code chunk.
Here is a link to the Lir source of the few following paragraphs; please read those in the raw Lir source to understand the code chunk syntax and examples!
Below, we see a code chunk with the name “: Just an example”.
C4: ⟪ : Just an example ⟫ ≡
root chunk
These are the contents
of this chunk.
◼
is included by a name reference.
Code chunks may refer to other code chunks by name: just put the name in double angle brackets. Whenever a code chunk is extracted, all code chunk references are replaced with their contents. If we define:
C5: ⟪ Another code chunk ⟫ ≡
Appears in C4
◼ Moar content!1!!!1
C6: ⟪ Yet another code chunk ⟫ ≡
Appears in C5
***◼
Then, the full contents of ⟪ : Just an example ⟫, if extracted from the source file, will be:
just-an-example-tangled
These are the contents of this chunk. ***Moar content!1!!!1*** is included by a name reference.
This is how we generate the listing:
C7: ⟪ :make ⟫ ≡
root chunk
definition continued in ↓ C69
just-an-example-tangled: lir.lir notangle -R": Just an example" $< > $@⇩ C69
lir.lir
(If you don’t understand what happens in the lines above, please read the Tutorial.)
All code chunks that have names starting with a colon (:
) are given special treatment. The code chunk ⟪ : Just an example ⟫ is a code chunk that will not be considered for extracting.
We represent the work flow as a directed, acyclic graph (DAG). In this DAG, each node is a data object: a file. Some of these files are input data: they are sources of the DAG. Some of these files are final results: they are the sinks of th DAG.
The sources of the DAG (the input files to the workflow) must be declared explicitly in a special code chunk with a name: “:source
To declare a placeholder for a display item – a Figure, a Table, or a Listing – use a code chunk with a special name. The name must start with “:figure”, “:table”, or “:listing”, followed by a single space and the name of the file that should be displayed.
Here is a link to the following few paragraphs in the Lir source file; you need to look at the source to understand the examples!
In the contents of a display item, you must declare the title of the display item, and you may declare a caption for the display item. This is a figure with a caption:
example.svg
All dependencies and rules are defined within make code chunks. The names of these code chunks must begin with “:make”. Here is an example:
C8: ⟪ :make The example above ⟫ ≡
root chunk
example.svg: make-example.R Rscript --vanilla $< $@◼
This defines a rule for generating the example SVG image displayed in ⟪ :figure example.svg ⟫. The rule has one target, example.svg
, and one prerequisite, the R script make-example.R
. It defines a rule for generating the target: by executing Rscript
with the command line argument --vanilla
and the script as the first argument and the target file as the second argument (note that obviously, you need to have R installed on your computer if you want to generate this figure and the final document).
A root code chunk is a code chunk that is not referenced by name in any other code chunk.
All root code chunks that don’t have a special name are extracted to files during the first stage of Lir, lir-tangle
. The name of the code chunk determines the name of the file. Here is the executable R script that generates the example above:
C9: ⟪ make-example.R ⟫ ≡
root chunk
out <- commandArgs(trailingOnly = TRUE)[1] svg(out, 8, 5) curve(sin, -2*pi, 2*pi, type="l", col="darkred") curve(cos, -2*pi, 2*pi, type="l", col="darkblue", add = TRUE) dev.off() -> foo◼
You can add citations in the markup recognized by Pandoc. The easiest way to produce the citations and a bibliography is to add a bibliography file in the working directory. The file must have the same base name as the Lir source file, and the extension “.bib
”. For example, since this source file is named lir.lir
, the bibliography is in lir.bib
. The bibliography file should be in the format recognized by Biblatex; a good free software for maintaining a bibliography is JabRef, available for all operating systems.
The bibliography will be at the end of the final document. If you want to have it listed in the table of contents, you can put a section heading at the end of the document. The last line of the Lir source file can be for example “# References” or “# Bibliography”.
Lir provides the command line program lir
. To extract all code, execute it, and compile the final document, you need to run lir
with the name of the Lir source file as the only argument. For example, if you are in the working directory of the Lir distribution, you can do:
$ lir lir.lir
This will run the three steps: tangling, making, and weaving. You can also do each step separately, by providing a one of tangle
, make
, or weave
as the first argument to lir
:
$ lir tangle lir.lir
$ lir make lir.lir
$ lir weave lir.lir
This can be useful if you need to run lir-make
on another machine.
When a Lir source file is interpreted, a current state of the project is created in the hidden directory .lir
inside the working directory. Every time lir
is re-run on a source file, the state is updated. Only files that have changed are updated: thus, running Lir does not run programs that have not changed on input data that has not changed.
The state contains all executable files extracted from the Lir source file, makefiles to generate all results, symbolic links to all input data, and all results. The state in .lir
can be copied as it is to another machine, and used to generate all results remotely.
It is safe to remove the state by deleting the .lir
directory and all its contents. This will force all intermediate results to be re-generated.
This is the complete source of Lir.
C10: ⟪ build.sh ⟫ ≡
root chunk
◼ echo "*** Making and Installing..." notangle -t8 -filter emptydefn -RMakefile lir.lir > Makefile export LIRPATH= export LIRBINPATH= export NOWEBPATH= make uninstall make clean make make install echo "*** Done!" rm -rf .lir echo "*** Using lir to tangle and weave lir..." lir lir.lir echo "*** Weaved document is in lir.html!" cp -v lir.html docs/lir.html echo "*** Generating index..." title=$(mktemp) echo "---" > "$title" echo "title: Lir" >> "$title" echo "author: Boris Vassilev (University of Helsinki)" >> "$title" echo "..." >> "$title" sed "s/^@/@@/" README.md \ | sed "s/^<</@<</" \ | cat "$title" - \ > docs/index.lir (cd docs ; lir index.lir) rm "$title" echo "*** Generating index done!"
This script is used when the bash script file bootstrap
is evaluated.
This makefile is used by the build script above.
C11: ⟪ Makefile ⟫ ≡
root chunk
nwpipemodules = driver.pl nwpipe.pl lirhtml.pl yaml.pl nwpipesources = nwpipe-pandoc.pl $(nwpipemodules) tangled = lir lir-weave $(nwpipesources) lir.css all : lir.lir $(NOWEBPATH)/markup -t lir.lir \ | $(NOWEBPATH)/emptydefn \ | $(NOWEBPATH)/mnt -t $(tangled) chmod u+x lir lir-weave install : mkdir --parents $(LIRPATH) cp --verbose --preserve \ lir.css $(nwpipesources) lir-weave $(LIRPATH) mkdir --parents $(LIRBINPATH) cp --verbose --preserve \ lir $(LIRBINPATH) .PHONY : install clean : -rm $(tangled) .PHONY : clean uninstall : -rm -r $(LIRPATH) -rm $(LIRBINPATH)/lir .PHONY : uninstall◼
The portable ⟪ Bash header ⟫ is, according to Stack Overflow:
C12: ⟪ Bash header ⟫ ≡
Appears in C10, C15, C32
definition continued in ↓ C13
#! /usr/bin/env bash⇩ C13
Furthermore, we set the scripts to immediately exit with error on failure, consider failure in a pipe a failure, and to consider “empty” (unset) variables an error:
C13: ⟪ Bash header ⟫ ≡
⇧ C12
set -o errexit set -o pipefail set -o nounset◼
The tools provided by lir
support two goals:
lir
source file.This is usually done in three independent steps: tangling, making, and weaving.
Tangling: Extract all source code from the lir
source file and bring in external dependencies.
This might be scripts that are immediately executable, or code that needs to be compiled.
For externally maintained files to be managed by lir
, they need to be explicitly declared in the lir
source file. Example use cases would be input data files, or a bibliography file maintained by a reference manager.
Making: Generate all results.
This is a step that usually will include resolving dependencies. In the case of intermediate analysis steps, a final result will depend on an intermediate result. For example, a figure might be generated from the results of an analysis of the input data.
In the case of programs that need to be compiled, a compilation step will have to occur before actually running the program.
Weaving: Compile the final documentation.
Once all results and figures have been generated, the original lir
source file is processed to obtain a valid Pandoc markdown source file. At that point, Pandoc is used to compile this to a final human readable file with any results and figures generated in the previous step.
The program is installed as a script that can be invoked with one of the three actions, tangle
, make
, or weave
, as the first argument.
C14: ⟪ lir ⟫ ≡
root chunk
◼
C15: ⟪ lir.bash ⟫ ≡
Appears in C14
◼ if [ "$#" -eq 0 ] then : else if [ $# -eq 1 ] then lir tangle "$1" && lir make "$1" && lir weave "$1" && exit fi LIR_SOURCE="$2" LIR_DIR=.lir MORE_OPTS="${@:3}" BASENAME=$(basename "$LIR_SOURCE" .lir) STATE_SOURCE="__LIR_SOURCE_$LIR_SOURCE" case "$1" in tangle) mkdir --verbose --parents "$LIR_DIR" tmpdir=$(mktemp --directory --tmpdir="$LIR_DIR") awk ' ' \ "$LIR_SOURCE" \ | awk ' ' \ | awk -v \ evalcmd="bash -o errexit -o pipefail -o nounset" \ ' ' \ > "$LIR_DIR"/"$STATE_SOURCE" noroots "$LIR_DIR"/"$STATE_SOURCE" \ | \ | \ > "$tmpdir"/rootchunks "$tmpdir"/rootchunks \ > "$tmpdir"/filechunks "$tmpdir"/rootchunks \ > "$tmpdir"/sourcechunks "$tmpdir"/rootchunks \ > "$tmpdir"/sinkchunks < "$tmpdir"/filechunks lir-mtangle "$LIR_DIR"/"$STATE_SOURCE" "$LIR_DIR" < "$tmpdir"/sourcechunks lir-use "$LIR_DIR" if grep --quiet --line-regexp ':make.*' "$tmpdir"/rootchunks then noroots "$LIR_DIR"/"$STATE_SOURCE" \ | sed -n 's/^<<\(:make.*\)>>$/-R\1/p' \ | tr '\n' '\0' \ | xargs --null notangle -t8 \ "$LIR_DIR"/"$STATE_SOURCE" \ > "$LIR_DIR"/.makedag else echo '' > "$LIR_DIR"/.makedag fi echo -n 'all: ' > "$LIR_DIR"/.makeall < "$tmpdir"/sinkchunks tr '\n' ' ' >> "$LIR_DIR"/.makeall echo ";" >> "$LIR_DIR"/.makeall echo -n "$BASENAME.html: $STATE_SOURCE " > "$LIR_DIR"/.makehtml < "$tmpdir"/sinkchunks tr '\n' ' ' >> "$LIR_DIR"/.makehtml echo ";" >> "$LIR_DIR"/.makehtml # Check for a bib file and add bibliography LIRBIB="" if [ -r "$BASENAME".bib ] then BIBTARGET=$(readlink -f "$BASENAME".bib) LIRBIB="--bibliography=$BIBTARGET" fi echo -e "\t "'/lir-weave '"$LIRBIB"' < $< > $@' \ >> "$LIR_DIR"/.makehtml rm -r "$tmpdir" ;; make) make --jobs --directory="$LIR_DIR" \ --makefile=.makeall --makefile=.makedag \ all ;; weave) make --directory="$LIR_DIR" \ --makefile=.makehtml \ "$BASENAME".html cp --verbose --update ".lir/$BASENAME.html" "$BASENAME.html" ;; *) echo "ERROR: unknown command $1" ;; esac fi # if [ "$#" == "0" ]
C16: ⟪ Define functions used by lir
⟫ ≡
Appears in C15
◼
C17: ⟪ lir-mtangle ⟫ ≡
Appears in C16
lir-mtangle () { while IFS='' read -r chunk do notangle -t8 -R"$chunk" "$1" | cpif "$2/$chunk" done }◼
Not all data in an analysis naturally fits into the lir
source file. Examples already mentioned above are the input data, or an externally maintained file like a bibliography.
Taking in use such a file means simply making a symbolic link to the file in the working direcory. However, keep in mind that the externally maintained file still needs to be in the project directory, and its name should on a path relative to the project directory. The make
utility will check the timestamp on the target of the link, so when a “used” file changes, this will be taken into consideration.
The argument to the function is the directory in which the symbolic links will be made. The file names are read from standard input, one per line.
C18: ⟪ lir-use ⟫ ≡
Appears in C16
lir-use () { while IFS='' read -r used do useddir=$(dirname "$used") mkdir --verbose --parents "$1/$useddir" original=$(readlink --canonicalize "$used") if [ -f "$original" ] then ln --symbolic --force "$original" "$1/$used" else >&2 echo "LIR ERROR: File not found! ($original)" exit 1 fi done }◼
C19: ⟪ Drop double angle brackets ⟫ ≡
Appears in C15
sed -n 's/^<<\(.*\)>>$/\1/p'◼
C20: ⟪ Remove empty lines ⟫ ≡
Appears in C15
sed -n '/^\s*$/!p'◼
C21: ⟪ Remove special chunknames ⟫ ≡
Appears in C15
sed -n '/^:/!p'◼
C22: ⟪ Get source chunknames ⟫ ≡
Appears in C15
sed -n 's/^:source \(.\+\)$/\1/p'◼
C23: ⟪ Get sink chunknames ⟫ ≡
Appears in C15
sed -n 's/^:\(figure\|table\|listing\|result\) \(.\+\)/\2/p'◼
C24: ⟪ Nameless code chunks as continuations .awk ⟫ ≡
Appears in C15
BEGIN {
◼
last_name = "<<>>="
}
/^<<>>=$/ { $0 = last_name }
/^<<.+>>=$/ { last_name = $0 }
{ print $0 }
C25: ⟪ Close all codechunks with @
.awk ⟫ ≡
Appears in C15
BEGIN {◼ in_chunk = 0 } / / { if (in_chunk) print "@" in_chunk = 1 } / / { in_chunk = 0 } { print $0 }
The contents of each code chunk named :eval
will be evaluated by an invocation of bash
. Whatever is written to standard output will be pasted as it is in place of the code chunk before anything else is done with the source file.
C26: ⟪ Evaluate code chunk contents .awk ⟫ ≡
Appears in C15
BEGIN {◼ } /^<<:eval>>=$/ { next } { print $0 }
To make awk
treat lines of input as records with one field each, set both the record and the field separator to the newline character. With these settings, an empty line (nothing but a new line) will have 0 fields (NF == 0
) and a length of 0 (length($0) == 0
).
C27: ⟪ Setup awk
to treat lines as single-field records ⟫ ≡
Appears in C24, C25, C26
RS="\n" FS="\n"◼
C28: ⟪ Execute contents as standard input of [[evalcmd]] ⟫ ≡
Appears in C26
while (getline > 0 && $0 !~ /
◼
/)
print $0 |& evalcmd
close(evalcmd, "to")
C29: ⟪ Write output of [[evalcmd]] to awk’s output ⟫ ≡
Appears in C26
save_rs = RS RS = "^$" evalcmd |& getline x close(evalcmd) printf "%s", x RS = save_rs◼
C30: ⟪ Begin code awk
regex ⟫ ≡
Appears in C25
^<<.*>>=$◼
C31: ⟪ End code awk
regex ⟫ ≡
Appears in C25, C28
^@◼
Weaving means generating the final human readable documentation from the lir
source file. The original source is transformed by several filters before passed to Pandoc for generating an HTML document that can be viewed in a browser. Everything is wrapped as a bash script that will be installed, along with ⟪ lir ⟫, in LIR_BINPATH
. It reads the lir
source from standard input and writes the HTML to standard output.
C32: ⟪ lir-weave ⟫ ≡
root chunk
◼
C33: ⟪ Weave lir
source to HTML ⟫ ≡
Appears in C32
◼ \ | \ |
A custom noweb
pipeline is used, as we definitely want to be able to use “anonymous” chunks as continuations of the previous chunk. Since tabs can be significant, for example in Makefiles, those are preserved (the -t
option to markup
). The -delay
option to noidx
makes sure that the index of code chunks is emitted before the last document chunk. This is necessary to be able to generate a bibliography with Pandoc.
C34: ⟪ Source to noweb
pipeline representation ⟫ ≡
Appears in C33
"◼ "/markup -t \ | " "/noidx -delay
To make it marginally easier to deal with the pipeline representation, we completely remove any empty text tokens from it before tranforming it to Pandoc source. We then use ⟪ nwpipe-pandoc.pl ⟫ to transform the noweb
pipeline representation of the lir
source file to a valid Pandoc source file.
C35: ⟪ Noweb
pipeline representation to Pandoc source ⟫ ≡
Appears in C33
sed -n '/^@text $/!p' \
| swipl -q -g main -t "halt(2)"
◼
/nwpipe-pandoc.pl
C36: ⟪ Pandoc source to HTML ⟫ ≡
Appears in C33
pandoc "$@" \
--preserve-tabs \
--from=markdown \
--table-of-contents \
--standalone \
--self-contained \
--to=html5+smart \
--css="
◼
"/lir.css \
--output=-
Lir
to PandocThe program ⟪ nwpipe-pandoc.pl ⟫ is written in Prolog, using the SWI-Prolog implementation (Wielemaker et al. 2012). It is written as a stand-alone command line program: once compiled, can be run from the command line as a part of a pipe. It uses the module ⟪ driver.pl ⟫ to do the actual work.
C37: ⟪ nwpipe-pandoc.pl ⟫ ≡
root chunk
◼ :- use_module(driver, [nwpipeline_pandoc/1]). main :- current_prolog_flag(argv, Argv), % command line arguments goal_is_det(nwpipeline_pandoc(Argv)), halt. % exit with success main :- throw(error(mode_error(failure),_)), halt(1). % otherwise, exit with failure (1)
During development, it is good to know if the program has left behind any unintentional choice points – they are certainly errors, as the program is designed to succeed deterministically. They would be “hidden” by the halt/0
used at the end of the main program.
C38: ⟪ Succeed once without choice points ⟫ ≡
Appears in C37
goal_is_det(Goal) :- setup_call_cleanup(true, Goal, Det = true), ( Det == true -> true ; !, throw(error(mode_error(notdet),_)) ).◼
Prolog programs are usually used from the “top level” (the Prolog REPL), not as command-line tools. We need to explicitly turn off the REPL banner and informational messages written to standard output.
C39: ⟪ No banner or informational messages ⟫ ≡
Appears in C37
:- set_prolog_flag(verbose, silent).◼
Meta-predicates (predicates that take other predicates as arguments) are fundamentally slow: they need to evaluate the passed predicate dynamically. This can be avoided for the more commonly used meta-predicates (maplist/1+N
, forall/2
etc) by treating them as macros and expanding them at compile time.
C40: ⟪ Expand meta-predicates at compile time ⟫ ≡
Appears in C37
:- use_module(library(apply_macros)).◼
The driver uses ⟪ nwpipe.pl ⟫ to parse its input to native Prolog terms and ⟪ lirhtml.pl ⟫ to emit a valid Pandoc source document with embedded HTML fragments.
C41: ⟪ driver.pl ⟫ ≡
root chunk
:- module(driver, [nwpipeline_pandoc/1]). :- use_module(nwpipe, [nwpipe_term/2]). :- use_module(lirhtml, [emit_html/2]). nwpipeline_pandoc(_) :- I = user_input, read_string(I, _, S), split_string(S, "\r\n", "\r\n", Str_Ls), maplist(string_codes, Str_Ls, Ls), nwpipe_term(Ls, T), emit_html(user_output, T).◼
noweb
pipelineThis module is used by the ⟪ driver.pl ⟫ to convert the list of lines representing the line-oriented noweb
pipeline to a Prolog term. This is done in two steps.
C42: ⟪ nwpipe.pl ⟫ ≡
root chunk
:- module(nwpipe, [nwpipe_term/2]). nwpipe_term(Pipe, Doc) :- maplist(nwtoken_term, Pipe, Terms), phrase(lir_doc(Doc), Terms).◼
A noweb
token in the noweb
pipeline representation (Ramsey 1992) is parsed to become a Prolog term. Here, we use an approach to parsing them described in “The Craft of Prolog” (O’Keefe 1990). We use the keyword at the beginning of each token to look up (deterministically) in a table the list of items that this token contains. Then, each item is converted to a Prolog term using a mini-interpreter for the “language” of noweb
tokens.
C43: ⟪ Noweb
token → Prolog term ⟫ ≡
Appears in C42
:- use_module(library(dcg/basics), [nonblanks//1, integer//1]). nwtoken_term([0'@|Token], Term) :- phrase(token(Back, Term), Token, Back). token(Back, Term) --> nonblanks(NB), { atom_codes(Key, NB), keyword(Key, Items, Back, Term) }, items(Items).◼
C44: ⟪ Keyword table ⟫ ≡
Appears in C43
◼
C45: ⟪ Structural keywords ⟫ ≡
Appears in C44
keyword(begin, [space, chunk_kind(K)], _, K). keyword(end, [], _, end). keyword(text, [space, string_rest(S, Rest)], Rest, text(S)). keyword(nl, [], [], nl). keyword(defn, [space, string_rest(S, Rest)], Rest, defn(S)). keyword(use, [space, string_rest(S, Rest)], Rest, use(S)). keyword(quote, [], [], quote). keyword(endquote, [], [], endquote).◼
C46: ⟪ Tagging keywords ⟫ ≡
Appears in C44
keyword(file, [space, atom_rest(File, Rest)], Rest, file(File)). keyword(xref, [space, xref(XRef, Rest)], Rest, XRef). keyword(index, [space, index(Index, Rest)], Rest, Index).◼
C47: ⟪ Cross-referencing keyword table ⟫ ≡
Appears in C46
◼
C48: ⟪ Index table ⟫ ≡
Appears in C46
index(beginindex, [], [], index_beginindex). index(endindex, [], [], index_endindex).◼
C49: ⟪ Basic cross-referencing ⟫ ≡
Appears in C47
xref(label, [space, atom_rest(L, Rest)], Rest, xref_label(L)). xref(ref, [space, atom_rest(L, Rest)], Rest, xref_ref(L)).◼
C50: ⟪ Linking previous and next definitions of a code chunk ⟫ ≡
Appears in C47
xref(prevdef, [space, atom_rest(L, Rest)], Rest, xref_prevdef(L)). xref(nextdef, [space, atom_rest(L, Rest)], Rest, xref_nextdef(L)).◼
C51: ⟪ Continued definitions of the current chunk ⟫ ≡
Appears in C47
xref(begindefs, [], [], xref_begindefs). xref(defitem, [space, atom_rest(L, Rest)], Rest, xref_defitem(L)). xref(enddefs, [], [], xref_enddefs).◼
C52: ⟪ Chunks where the code is used ⟫ ≡
Appears in C47
xref(beginuses, [], [], xref_beginuses). xref(useitem, [space, atom_rest(L, Rest)], Rest, xref_useitem(L)). xref(enduses, [], [], xref_enduses). xref(notused, [space, string_rest(Name, Rest)], Rest, xref_notused(Name)).◼
C53: ⟪ The list of all code chunks ⟫ ≡
Appears in C47
xref(beginchunks, [], [], xref_beginchunks). xref(chunkbegin, [space, atom(L), space, string_rest(Name, Rest)], Rest, xref_chunkbegin(L, Name)). xref(chunkuse, [space, atom_rest(L, Rest)], Rest, xref_chunkuse(L)). xref(chunkdefn, [space, atom_rest(L, Rest)], Rest, xref_chunkdefn(L)). xref(chunkend, [], [], xref_chunkend). xref(endchunks, [], [], xref_endchunks).◼
Using the tables above, we have looked up the exact items that we expect in a noweb
token. This is a small interpreter that takes the list of items and converts each item to the corresponding Prolog term.
C54: ⟪ Mini-interpreter for items ⟫ ≡
Appears in C43
items([]) --> [].
items([I|Is]) -->
item(I),
items(Is).
◼
C55: ⟪ Individual items ⟫ ≡
Appears in C54
◼
Identifiers are converted to atoms, while “text” (text and names) are converted to strings:
C56: ⟪ Atoms and text ⟫ ≡
Appears in C55
item(atom(A)) --> nonblanks(Codes), { atom_codes(A, Codes) }. % Using when/2 here is probably an ugly hack. % Could not figure out a better way to deal with % "rest of line" situations. item(atom_rest(A, Rest)) --> { when(ground(Rest), atom_codes(A, Rest)) }. item(string_rest(Str, Rest)) --> { when(ground(Rest), string_codes(Str, Rest)) }.◼
We assume that ⟪ “Space” ⟫ is represented by a single “space” character; the “Hacker’s Guide” is not explicit about this, but so far it has always worked.
Integers are used to enumerate the code and documentation:
C58: ⟪ Chunk number ⟫ ≡
Appears in C55
item(chunk_number(N)) --> integer(N).◼
Code and documentation chunks are delimited by the same keywords; the ⟪ Chunk kind ⟫ is encoded in a secondary keyword:
C59: ⟪ Chunk kind ⟫ ≡
Appears in C55
item(chunk_kind(CK)) --> nonblanks(Codes), { atom_codes(CK, Codes) }.◼
Finally, ⟪ Cross-reference ⟫. Note that it employs quite a few secondary keywords collected in their own look up tables, ⟪ Cross-referencing keyword table ⟫ and ⟪ Index table ⟫.
C60: ⟪ Cross-reference ⟫ ≡
Appears in C55
item(xref(XRef, Rest)) --> nonblanks(Codes), { atom_codes(X, Codes), xref(X, Items, Rest, XRef) }, items(Items). item(index(Index, Rest)) --> nonblanks(Codes), { atom_codes(X, Codes), index(X, Items, Rest, Index) }, items(Items).◼
The structure of the document is implicitly present in the noweb
pipeline representation: documentation and code chunks are delimited by a start and an end token, new lines and other formatting are contained in text tokens, and so on. The pipeline representation also contains cross-referencing information, using labels and references to them. It is however a bit inconvenient to use this representation for generating markup. This is because the order of tokens in the pipeline representation exactly mirrors the layout of the final human-readable documentation, as generated by noweb
, and lir
uses a slightly different layout.
At that point, we have a list of Prolog terms. Those are easier to deal with in the context of Prolog, as it is much easier to make the rules deterministic. The general approach is to consume the next input and use it as the first argument to a rule, thus taking advantage of Prolog’s first-argument indexing. Then, the rules defined below describe a state machine where each rule is a state and each rule clause is a transition that depends on the last input.
On the highest level, the noweb
pipeline is made of a sequence of files. Note, however, that since in lir
the input is standard input (and not a list of files), there will be only one, unnamed file in the pipeline.
C61: ⟪ Flat list of terms → structured term ⟫ ≡
Appears in C42
lir_doc(L) --> [file('')],
lir_rest(L).
lir_rest(L) --> [X], !,
lir_file(X, L).
lir_rest([]) --> [].
◼
A file is a sequence of documentation and code chunks. It will contain a list of chunks and an index. To parse the contents of display item meta-data, we use the ⟪ yaml.pl ⟫ module.
C62: ⟪ Parse a file ⟫ ≡
Appears in C61
:- use_module(yaml, [yaml_to_dict/2]). lir_file(docs, L) --> [X], docs(X, L). lir_file(code, [Code|L]) --> code(C, Name, Label, M), { ( Name = name(N) -> Code = code_name_label_meta(C, N, Label, M) ; Name = display(KW, FN_str) -> code_codelist(C, Codes), yaml_to_dict(Codes, Display_meta), Code = display(KW, FN_str, Display_meta, Label) ) }, lir_rest(L). lir_file(nl, [nl|L]) --> lir_rest(L). lir_file(xref_beginchunks, [chunks_list(Cs)|L]) --> [X], xref_chunks(X, Cs), lir_rest(L). lir_file(index_beginindex, [index(I)|L]) --> [X], index_list(X, I), lir_rest(L). code_codelist(C, L) :- maplist(code_atomic, C, L0), atomics_to_string(L0, S0), string_codes(S0, L). code_atomic(text(Text), Text). code_atomic(nl, "\n").◼
A module that allows the user to parse a codelist containing YAML data to a Prolog dictionary containing a native representation of the YAML data. This module makes use of SWI-Prolog’s library dcg/basics
.
C63: ⟪ yaml.pl ⟫ ≡
root chunk
:- module(yaml, [yaml_to_dict/2]).
:- use_module(library(dcg/basics), [white//0,
whites//0,
string_without//2]).
yaml_to_dict(Codes, Dict) :-
phrase(yaml_es(Es), Codes),
dict_create(Dict, yaml, Es).
◼
C64: ⟪ Make a list of YAML key-value pairs ⟫ ≡
Appears in C63
yaml_es([K-V|Es]) --> yaml_key_val(K, V), !, yaml_es(Es).
yaml_es([]) --> [].
◼
The keyword starts at the very beginning of the line, does not contain any blank characters (space, tab, newline…), and is ended by a colon followed by a single space. The character that follows that space, here called C
, determines if this is a block scalar or not.
C65: ⟪ Parse one YAML key-value pair ⟫ ≡
Appears in C64
yaml_key_val(K, V) --> yaml_key_codes(KCs), ": ", !, { atom_codes(K, KCs) }, [C], yaml_val(C, V).◼
Any graph character will do, but no blanks of any kind allowed.
C66: ⟪ Get YAML key ⟫ ≡
Appears in C65
yaml_key_codes([C|Cs]) --> [C], { code_type(C, graph) }, yaml_key_codes(Cs). yaml_key_codes([]) --> [].◼
The only recognized block notation is literal style, indicated by a “|
”.
C67: ⟪ Get YAML value ⟫ ≡
Appears in C65
yaml_val(0'|, V) --> "\n", !,
yaml_indented_block(V_codes),
{ string_codes(V, V_codes)
}.
yaml_val(C, V_str) --> string_without("\n", Cs), "\n",
{ string_codes(V_str, [C|Cs])
}.
◼
All indenting is removed, but all newlines are preserved.
C68: ⟪ Read YAML indented block ⟫ ≡
Appears in C67
yaml_indented_block(Block) --> white, whites, !, indented_lines(Block). yaml_indented_block([]) --> []. indented_lines([C|Cs]) --> [C], { C \== 0'\n }, !, indented_lines(Cs). indented_lines([0'\n|Block]) --> [0'\n], !, yaml_indented_block(Block).◼
lir.lir
test-display
lrwxrwxrwx 1 boris boris 32 Jan 30 15:26 lir.lir -> /home/boris/code/own/lir/lir.lir
A documentation chunk can contain text, new lines, and quoted code.
C70: ⟪ Parse a documentation chunk ⟫ ≡
Appears in C62
definition continued in ↓ C71
docs(end, L) -->
lir_rest(L).
docs(text(T), [text(Text)|L]) -->
docs_text(Ts),
{ atomics_to_string([T|Ts], Text)
},
[X],
docs(X, L).
docs(nl, [nl|L]) --> [X],
docs(X, L).
docs(quote, [quote(Q)|L]) --> [X],
quote(X, Q),
[Y],
docs(Y, L).
⇩ C71
Consequitive text tokens that are not interrupted by any other structural contents are collected and concatenated.
C71: ⟪ Parse a documentation chunk ⟫ ≡
⇧ C70
docs_text([T|Ts]) --> [text(T)], !, docs_text(Ts). docs_text(["\n"|Ts]) --> [nl], !, eol(Ts). docs_text([]) --> []. eol([T|Ts]) --> [text(T)], !, docs_text(Ts). eol([]) --> [].◼
C72: ⟪ Parse quoted code ⟫ ≡
Appears in C70
quote(text(T), [text(T)|Q]) --> [X], quote(X, Q). quote(xref_ref(L), [quote_use(N, L)|Q]) --> [use(N), X], quote(X, Q). quote(endquote, []) --> [].◼
C73: ⟪ Parse a code chunk ⟫ ≡
Appears in C62
code(Cs, Defn, L, M) --> [X], defn(X, M_pairs0), { selectchk(label(L), M_pairs0, M_pairs1), selectchk(defn(Defn), M_pairs1, M_pairs), dict_create(M, code, M_pairs) }, code_content(Cs).◼
C74: ⟪ Parse the code chunk header ⟫ ≡
Appears in C73
defn(nl, []) --> []. defn(xref_label(L), [label(L)|M]) --> [X], defn(X, M). defn(xref_ref(L), [ref(L)|M]) --> [X], defn(X, M). defn(defn(N), [defn(Name)|M]) --> { ( display_item(N, KW, FN_str) -> Name = display(KW, FN_str) ; Name = name(N) ) }, [X], defn(X, M). defn(xref_notused(_N), [uses(notused)|M]) --> [X], defn(X, M). defn(xref_beginuses, [uses(Us)|M]) --> [X], uses(X, Us, Us), [Y], defn(Y, M). defn(xref_prevdef(L), [prev(L)|M]) --> [X], defn(X, M). defn(xref_nextdef(L), [next(L)|M]) --> [X], defn(X, M). defn(language(L), [language(L)|M]) --> [X], defn(X, M). defn(xref_begindefs, [defs(Ds)|M]) --> [X], defs(X, Ds, Ds), [Y], defn(Y, M).◼
C75: ⟪ Treat display items differently ⟫ ≡
Appears in C74
display_item(Name, KW, FN_str) :- sub_string(Name, 0, 1, _, ":"), once( sub_string(Name, Before_sep, 1, After_sep, " ") ), Length_kw is Before_sep - 1, sub_string(Name, 1, Length_kw, _After_kw, KW_str), display_item_keyword(KW_str, KW), Before_fn is Before_sep + 1, sub_string(Name, Before_fn, After_sep, 0, FN_str). /* Seems this is deterministic when 1st argument is a string */ display_item_keyword("source", source). display_item_keyword("result", result). display_item_keyword("figure", figure). display_item_keyword("listing", listing). display_item_keyword("table", table_name).◼
C76: ⟪ Parse the code chunk contents ⟫ ≡
Appears in C73
code_content([text(Text)|Cs]) --> text_token(T), !, code_text(Ts), { atomics_to_string([T|Ts], Text) }, code_content(Cs). code_content([code_use(L, R, N)|Cs]) --> [xref_label(L), xref_ref(R), use(N)], !, code_content(Cs). code_content([]) --> [end]. text_token("\n") --> [nl]. text_token(T) --> [text(T)]. code_text([T|Ts]) --> text_token(T), !, code_text(Ts). code_text([]) --> [].◼
C77: ⟪ Parse defs and uses ⟫ ≡
Appears in C74
uses(xref_enduses, _, []) --> []. uses(xref_useitem(L), Us, [L|Us0]) --> [X], uses(X, Us, Us0). defs(xref_enddefs, _, []) --> []. defs(xref_defitem(L), Ds, [L|Ds0]) --> [X], defs(X, Ds, Ds0).◼
C78: ⟪ Parse the list of chunks ⟫ ≡
Appears in C62
xref_chunks(xref_endchunks, []) --> []. xref_chunks(xref_chunkbegin(L, N), [chunk(L, N, Us, Ds)|Cs]) --> [X], xref_chunk(X, Us, Ds), [Y], xref_chunks(Y, Cs). xref_chunk(xref_chunkend, [], []) --> []. xref_chunk(xref_chunkuse(U), [chunkuse(U)|Us], Ds) --> [X], xref_chunk(X, Us, Ds). xref_chunk(xref_chunkdefn(D), Us, [chunkdefn(D)|Ds]) --> [X], xref_chunk(X, Us, Ds).◼
We don’t have an index for now.
C79: ⟪ Parse the index ⟫ ≡
Appears in C62
index_list(index_endindex, []) --> [].◼
C80: ⟪ lirhtml.pl ⟫ ≡
root chunk
definition continued in ↓ C92 + ⇣ C93 + ⇣ C94 + ⇣ C95
:- module(lirhtml, [emit_html/2]). :- use_module(library(http/html_write)). emit_html(Out, L) :- counters_db(L, DB), phrase(lir_pandoc(P, DB), L), phrase(html(P), H), print_html(Out, H).⇩ C92
Add to a database all numbered items: code chunks, display items (figure
, listing
, table
), as well as sources and sinks of the DAG representing the dataflow. For the code chunks, we also format the names. This is necessary because code chunk names are inside HTML tags by the time they make it to Pandoc, and are not considered for converting from markdown: instead, we do this here.
C81: ⟪ counters_db/2
: Generate counters, add them to a database ⟫ ≡
Appears in C80
counters_db(L, DB) :-
include(is_code_name_label_meta, L, CNLMs),
maplist(code_name_label_meta_Name_Label, CNLMs, Ns, Ls),
names_fmt_fmtd(Ns, html5, FNs),
pairs_keys_values(LFs, Ls, FNs),
dict_create(CC_names, name, LFs),
findall(Label-Nr, nth1(Nr, Ls, Label), LNrs),
dict_create(CC_nrs, nr, LNrs),
maplist(display_dict(L),
[source, result, listing, figure, table_name],
[DSNrs, DRNrs, DLNrs, DFNrs, DTNrs]),
maplist(dict_create,
[CodeD, SourceD, ResultD, ListingD, FigureD, TableD],
[code, source, result, listing, figure, table_name],
[[name:CC_names, nr:CC_nrs],
[namestr:"S", nr:DSNrs],
[namestr:"Result ", nr:DRNrs],
[namestr:"Listing ", nr:DLNrs],
[namestr:"Figure ", nr:DFNrs],
[namestr:"Table ", nr:DTNrs]]),
dict_create(DB, db,
[code:CodeD,
source:SourceD,
result:ResultD,
listing:ListingD,
figure:FigureD,
table_name:TableD]).
is_code_name_label_meta(code_name_label_meta(_,_,_,_)).
code_name_label_meta_Name_Label(code_name_label_meta(_,N,L,_), N,L).
display_dict(L, Item, Dict) :-
include(is_display_item(Item), L, Ls),
findall(Label-Nr,
nth1(Nr, Ls, display(Item, _, _, Label)),
DNrs),
dict_create(Dict, nr, DNrs).
is_display_item(Item, display(Item,_,_,_)).
◼
C82: ⟪ Use Pandoc to pre-format chunk names ⟫ ≡
Appears in C81
:- use_module(library(process), [process_create/3]). :- use_module(library(sgml), [load_structure/3]). to_par(S, PS) :- atomics_to_string(["<span>", S, "</span>"], PS). names_fmt_fmtd([], html5, []). names_fmt_fmtd([N|Ns], html5, FNs) :- maplist(to_par, [N|Ns], PNs), atomics_to_string(PNs, Ns_str), process_create(path(pandoc), ["--to=html5"], [stdin(pipe(Pandoc_in)), stdout(pipe(Pandoc_out))]), format(Pandoc_in, "~s", [Ns_str]), close(Pandoc_in), load_structure(Pandoc_out, DOM, [dialect(xml)]), close(Pandoc_out), ( DOM = [element(p, [], Names)] -> maplist(names_fmtd, Names, FNs) ; FNs = [] ). :- use_module(library(sgml_write), [xml_write/3]). names_fmtd(element(span, [], StrDOM), Fmtd) :- with_output_to(string(Fmtd), xml_write(current_output, StrDOM, [header(false), layout(false)])).◼
C83: ⟪ : foo ⟫ ≡
root chunk
This is a test. There was a problem with chunks that start with a colon followed by a space, as they were interpreted as definitions by Pandoc.◼
C84: ⟪ lir_pandoc//2
: Lir to Pandoc HTML ⟫ ≡
Appears in C80
lir_pandoc(P, DB) --> [X], !, lir_pandoc(X, P, DB). lir_pandoc([], _DB) --> [].◼
C85: ⟪ Lir to Pandoc: individual items ⟫ ≡
Appears in C84
◼
C86: ⟪ Lir documentation to Pandoc ⟫ ≡
Appears in C85
lir_pandoc(nl, ["\n"|P], DB) --> lir_pandoc(P, DB). lir_pandoc(text(T), [\[T]|P], DB) --> lir_pandoc(P, DB). lir_pandoc(quote(Q), [span(class(quote), QC)|P], DB) --> { phrase(lir_pandoc(QC, DB), Q) }, lir_pandoc(P, DB). lir_pandoc(quote_use(Name, Label), [span(class("quoteduse"), [\openparen, \thinnbsp, a(href("#~a"-Label), Name), \thinnbsp, \closeparen])|P], DB) --> lir_pandoc(P, DB).◼
C87: ⟪ Lir code chunks to Pandoc ⟫ ≡
Appears in C85
lir_pandoc(code_name_label_meta(C, _, L, M), [div(class("codechunk"), [\chunk_defn(M, L, DB), \chunk_uses(M, DB), \chunk_defs(M, DB), \chunk_prev(M, DB), pre(\chunk_content(C, DB)), \chunk_next(M, DB)])|P], DB) --> lir_pandoc(P, DB).◼
C88: ⟪ Lir display items to Pandoc ⟫ ≡
Appears in C85
lir_pandoc(display(KW, FN_str, Meta, Label), [div([class("lirdisplay"),id(Label)], [\display(KW, FN_str, Meta, Label, DB)])|P], DB) --> lir_pandoc(P, DB).◼
C89: ⟪ Ignore the list of chunks and the index ⟫ ≡
Appears in C85
lir_pandoc(chunks_list(_Cs), P, DB) --> lir_pandoc(P, DB). lir_pandoc(index(_), P, DB) --> [], lir_pandoc(P, DB).◼
C90: ⟪ HTML for common symbols ⟫ ≡
Appears in C84
nbsp --> html(&(nbsp)). thinnbsp --> html(span(style("white-space:nowrap"), &(0x2009))). openparen --> html(&('Lang')). % Lang closeparen --> html(&('Rang')). % Rang prevsym --> html(&(0x21E7)). % UPWARDS WHITE ARROW nextsym --> html(&(0x21E9)). % DOWNWARDS WHITE ARROW contsym --> html(&(darr)). contmoresym --> html(&(0x21E3)). % DOWNWARDS DASHED ARROW defeq --> html(&(equiv)). defend --> html(&(0x25FC)). % Black square◼
C91: ⟪ Individual display items ⟫ ≡
Appears in C85
:- discontiguous display//5. display(source, FN, _, Label, DB) --> display_file(source, FN, Label, DB). display(result, FN, _, Label, DB) --> display_file(result, FN, Label, DB). display_file(Item, FN, Label, DB) --> { absolute_file_name(FN, Absolute_FN) }, html(div(class("lir~a"-Item), [span(class("~aname"-Item), ["~s~d:"-[DB.Item.namestr, DB.Item.nr.Label]]), \nbsp, span(class("~afile"-Item), a(href(Absolute_FN), code(FN)))])). display(listing, FN, Meta, Label, DB) --> { file_contents(FN, FC) }, html(div(class("lirlisting"), [\display_header(listing, FN, Label, DB), div(class("listingcontents"), pre([FC])), \display_title_caption(listing, Meta)])). file_contents(FN, FC) :- setup_call_cleanup(open(FN, read, In), read_string(In, _, FC), close(In)). display_header(Item, FN, Label, DB) --> { absolute_file_name(FN, Absolute_FN) }, html(div(class("~aheader"-Item), [span(class("~aname"-Item), ["~s~d:"-[DB.Item.namestr, DB.Item.nr.Label]]), \nbsp, span(class("~afile"-Item), a(href(Absolute_FN), code(FN)))])). display_title_caption(Item, M) --> html(div(class("~ameta"-Item), [span(class("~atitle"-Item), [M.title]), \display_optional_caption(Item, M)])). display_optional_caption(Item, M) --> { yaml{caption:Caption} :< M }, !, html([" ", span(class("~acaption"-Item), [Caption])]). display_optional_caption(_, _) --> []. display(table_name, FN, Meta, Label, DB) --> { file_contents(FN, FC) }, html(div(class("lirtable"), [\display_header(table_name, FN, Label, DB), table_name([\[FC]]), \display_title_caption(table_name, Meta)])). display(figure, FN, Meta, Label, DB) --> html(div(class("lirfigure"), [\display_header(figure, FN, Label, DB), \figure_contents(FN), \display_title_caption(figure, Meta)])). figure_contents(FN) --> html(div(class("figurecontents"), [img(src("~s"-FN))])).◼
Chunk contents:
C92: ⟪ lirhtml.pl ⟫ ≡
⇧ C80
chunk_content([], _DB) --> []. chunk_content([C|Cs], DB) --> chunk_content_(C, Cs, DB). chunk_content_(nl, Cs, DB) --> html("\n"), chunk_content(Cs, DB). chunk_content_(text(T), Cs, DB) --> html(T), chunk_content(Cs, DB). chunk_content_(code_use(L, R, _), Cs, DB) --> html(span(class("embeddeduse"), [\openparen, \thinnbsp, a([id(L), href("#~a"-R)], [\[DB.code.name.R]]), \thinnbsp, \closeparen])), chunk_content(Cs, DB).⇩ C93
Code chunk headers:
C93: ⟪ lirhtml.pl ⟫ ≡
⇧ C92
chunk_defn(_M, L, DB) --> html(span([class("defn"),id(L)], [span(class("chunknr"), "C~d:"-DB.code.nr.L), \nbsp, \openparen, \thinnbsp, a(href("#~a"-L), \[DB.code.name.L]), \thinnbsp, \closeparen, \thinnbsp, \defeq])). chunk_uses(M, DB) --> { code{uses:Us} :< M }, !, html(span(class("chunkuses"), \chunk_uses_(Us, DB))). chunk_uses(_, _) --> [].⇩ C94
Uses in the header:
C94: ⟪ lirhtml.pl ⟫ ≡
⇧ C93
chunk_uses_([U|Us], DB) --> html([br([]), "Appears in ", a(href("#~a"-U),["C~d"-DB.code.nr.U])]), chunk_uses_rest(Us, DB). chunk_uses_(notused, _DB) --> html([br([]), "root chunk"]). chunk_uses_rest([], _DB) --> []. chunk_uses_rest([U|Us], DB) --> html([", ", a(href("#~a"-U), ["C~d"-DB.code.nr.U])]), chunk_uses_rest(Us, DB).⇩ C95
Previous and next chunks in the header:
C95: ⟪ lirhtml.pl ⟫ ≡
⇧ C94
chunk_next(M, DB) --> { code{next:L} :< M }, !, html(span(class("chunknext"), a(href("#~a"-L), [\nextsym, \thinnbsp, "C~d"-DB.code.nr.L]))). chunk_next(_, _DB) --> html(\defend). chunk_prev(M, DB) --> { code{prev:L} :< M }, html([br([]), span(class("chunkprev"), a(href("#~a"-L), [\prevsym, \thinnbsp, "C~d"-DB.code.nr.L]))]). chunk_prev(_, _DB) --> []. chunk_defs(M, DB) --> { code{defs:Ds} :< M }, !, html(span(class("chunkdefs"), \chunk_defs_(Ds, DB))). chunk_defs(_, _) --> []. chunk_defs_([D|Ds], DB) --> html([br([]), "definition continued in ", a(href("#~a"-D), [\contsym, \thinnbsp, "C~d"-DB.code.nr.D])]), chunk_defs_rest(Ds, DB). chunk_defs_rest([], _) --> []. chunk_defs_rest([D|Ds], DB) --> html([" + ", a(href("#~a"-D), [\contmoresym, \thinnbsp, "C~d"-DB.code.nr.D])]), chunk_defs_rest(Ds, DB).◼
The system tries to separate content and layout as much as possible. It also aims to provide a sensible default layout for the human-readable documentation.
The program in ⟪ langs.cpp ⟫ can be used as a filter to noweb
. It deduces the programming language of code chunks based on the names of the chunks. This is a table that maps known extensions to programming language names.
C96: ⟪ Ext-Lang ⟫ ≡
Appears in C111
{"pl", "prolog"}, {"sh", "bash"}, {"bash", "bash"}, {"cpp", "cpp"}, {"R", "R"}, {"awk", "awk"}, {"sed", "sed"}, {"css", "css"}◼
The program implements a state machine with the states necessary to extract code chunk names and the uses within a code chunk. The start state is out
; when a code chunk starts, it transitions to code
, where it expects a code chunk name, and transitions to content
; while in content
, it collects uses, and goes back to out
at the end of the code chunk. Throughout, each line from input is saved to a list of all lines.
C97: ⟪ langs.cpp ⟫ ≡
root chunk
◼ int main() { out: { goto out; } code: { goto code; } content: { goto content; } end: { return 0; } error: return 1; }
When in out
, end of input signals transition to the end
state. A @begin code
token signals a transition to code
.
C98: ⟪ Out
transitions ⟫ ≡
Appears in C97
if (!std::getline(std::cin, line)) goto end; lines.push_back(line); if (string_prefix(line, {"@begin code"})) goto code;◼
When in code
, end of input is an error. A defn
token contains the code chunk name. The name is processed and the state machine transitions to content
.
C99: ⟪ Code
transitions ⟫ ≡
Appears in C97
if (!std::getline(std::cin, line)) goto error;
lines.push_back(line);
if (string_prefix_rest(line, {"@defn "}, name)) {
◼
goto content;
}
A code chunk name is inserted to the DAG of code chunks. Initially the set of neighbours is empty. Note that std::map::insert
will only insert if the key does not yet exist, so it is safe to do this.
C100: ⟪ Process code chunk name ⟫ ≡
Appears in C99
uses.insert({name, {}});
◼
If the language of the chunk can be guessed from the name, the chunk name and its language are recorded. The code chunk name is also added to the queue later used for the breadth-first traversal of the DAG of code chunks.
C101: ⟪ Try to set code chunk language ⟫ ≡
Appears in C100
std::string cl; if (name_dict_lang(name, langs, cl)) { chunk_lang.insert({name, cl}); pending.push(name); }◼
When in content
, end of input is an error. An @end code
token signals the transition back to out
.
C102: ⟪ Content
transitions ⟫ ≡
Appears in C97
if (!std::getline(std::cin, line)) goto error; lines.push_back(line); if (string_prefix(line, {"@end code"})) goto out;◼
When in content
, each @uses
token is used to populate the set of neighbours of the current code chunk in the DAG.
C103: ⟪ Content
: collect uses ⟫ ≡
Appears in C97
std::string un; if (string_prefix_rest(line, {"@use "}, un)) uses[name].insert(un);◼
The language of a code chunk can be deduced in two ways. First, from the code chunk name; this is done as soon as the code chunk is encountered for the first time (see ⟪ Try to set code chunk language ⟫). Second, if a code chunk without a known language is used by a code chunk with a language, it inherits it. To achieve this, a breadth-first traversal of the DAG of code chunks is used. Initially, the queue holds the names of all chunks with known languages (also done while reading). Any child code chunk that does not yet have a language has its language set to that of the parent, and is pushed to the back of the queue.
C104: ⟪ Propagate language to uses ⟫ ≡
Appears in C97
while (!pending.empty()) { std::string next = pending.front(); pending.pop(); for (auto c : uses[next]) { auto x = chunk_lang.find(c); if (x == chunk_lang.cend()) { chunk_lang.insert({c, chunk_lang[next]}); pending.push(c); } } }◼
At the end, all tokens are emitted. If the token is defn
, a language
token is added right after it. For code chunks that don’t have a language, the txt
language is used.
C105: ⟪ Output all lines, with language
after each defn
⟫ ≡
Appears in C97
for (auto l : lines) { std::cout << l << '\n'; if (string_prefix_rest(l, {"@defn "}, name)) { auto x = chunk_lang.find(name); if (x != chunk_lang.cend()) std::cout << "@language " << x->second << '\n'; else std::cout << "@language txt\n"; } }◼
These variables are “global” to the main
function, or in other words, are available to all states of the state machine.
C107: ⟪ A list of all lines as they are read ⟫ ≡
Appears in C106
std::list<std::string> lines{};◼
C108: ⟪ The DAG of the code chunks as adjacency-list ⟫ ≡
Appears in C106
std::map<std::string, std::set<std::string>> uses{};◼
C109: ⟪ The language of each code chunk ⟫ ≡
Appears in C106
std::map<std::string, std::string> chunk_lang{};◼
C110: ⟪ A queue for the traversal of the chunks DAG ⟫ ≡
Appears in C106
std::queue<std::string> pending{};◼
C111: ⟪ Mapping of extensions to languages ⟫ ≡
Appears in C106
const std::map<std::string, std::string> langs{
◼
};
Two strings, one for the last line that was read and one for the name of the last code chunk defined with defn
:
C112: ⟪ Variable definitions (langs
) ⟫ ≡
⇧ C106
std::string line; std::string name;◼
The necessary standard libraries:
C113: ⟪ Includes (langs
) ⟫ ≡
Appears in C97
#include <iostream> #include <string> #include <list> #include <map> #include <set> #include <queue>◼
C114: ⟪ Function definitions (langs
) ⟫ ≡
Appears in C97
◼
C115: ⟪ Does a string have a prefix? ⟫ ≡
Appears in C114
definition continued in ↓ C116
bool string_prefix(const std::string& s, const std::string& t) { if (0 == s.compare(0, t.length(), t)) return true; return false; }⇩ C116
A three-argument version to get the rest, too:
C116: ⟪ Does a string have a prefix? ⟫ ≡
⇧ C115
bool string_prefix_rest(const std::string& s, const std::string& t, std::string& rest) { if (string_prefix(s, t)) { rest = s.substr(t.length()); return true; } return false; }◼
C117: ⟪ Can you guess the language from the name? ⟫ ≡
Appears in C114
bool name_dict_lang(const std::string& n, const std::map<std::string, std::string>& d, std::string& l) { size_t x = n.find_last_of('.'); if (x == std::string::npos) return false; ++x; auto ext_lang = d.find(n.substr(x)); if (ext_lang == d.cend()) return false; l = ext_lang->second; return true; }◼
For HTML documentation, ⟪ lir.css ⟫ is used.
C118: ⟪ lir.css ⟫ ≡
root chunk
◼
The text width is limited and a left margin is inserted to improve readability.
C119: ⟪ Text width and margin ⟫ ≡
Appears in C118
p { max-width: 14cm; } blockquote { max-width: 11cm; font-size: small; } ul, ol, dl, span.chunkdefs { max-width: 12cm; } body { padding-left: 1cm; }◼
C120: ⟪ Headers in sans-serif ⟫ ≡
Appears in C118
h1, h2, h3 { font-family: sans-serif; max-width: 12cm; }◼
C121: ⟪ Use names typesetting ⟫ ≡
Appears in C118
span.sourcefile a, span.resultfile a, span.listingfile a, span.figurefile a, span.chunkdefs a, span.chunkuses a, span.chunkprev a, span.chunknext a, span.defn a, span.quoteduse a, span.embeddeduse a { text-decoration-line: none; } span.chunkdefs, span.chunkuses, span.chunkprev, span.chunknext { font-size: small; } span.quoteduse { font-family: serif; } span.embeddeduse { font-family: serif; } span.chunknr { font-weight: bold; }◼
C122: ⟪ Highlight chunk names on hover ⟫ ≡
Appears in C118
span.quoteduse a { color: darkGreen; } span.quoteduse a:hover { color: limeGreen; } span.embeddeduse a { color: darkblue; font-style: italic; } span.embeddedusesym { font-style: normal; } span.embeddeduse a:hover { color: dodgerBlue; } span.sourcefile a, span.resultfile a, span.listingfile a, span.figurefile a, span.chunkdefs a, span.chunkprev a, span.chunknext a, span.defn a, span.chunkuses a { color: darkred; } span.sourcefile a:hover, span.resultfile a:hover, span.listingfile a:hover, span.figurefile a:hover, span.chunkdefs a:hover, span.chunkprev a:hover, span.chunknext a:hover, span.defn a:hover, span.chunkuses a:hover { color: darkGoldenRod; }◼
C123: ⟪ Quoted code in monospace ⟫ ≡
Appears in C118
span.quote { font-family: monospace; }◼
C124: ⟪ List of chunks formatting ⟫ ≡
Appears in C118
ul.chunkslist { list-style-type: square; }◼
C125: ⟪ Display item formatting ⟫ ≡
Appears in C118
div.lirtable th { border-style: none none solid none; border-width: 0 0 1px 0; } div.lirtable th, div.lirtable td { padding-left: 3.3mm; padding-right: 3.3mm; padding-top: 0.9mm; padding-bottom: 0.9mm; } div.lirtable table { font-family: sans-serif; font-size: small; margin-top: 3mm; margin-bottom: 3mm; border-collapse: collapse; border-style: solid none solid none; border-width: 2px 0 1px 0; } div.lirlisting { max-width: 14cm; } span.sourcename, span.resultname, span.tablename, span.figurename, span.listingname { font-weight: bold; } div.figurecontents img { display: block; max-width: 14cm; width: auto; height: auto; } div.listingcontents { font-size: small; margin-top: 3mm; margin-bottom: 3mm; border-style: solid none solid none; border-width: 0.3mm; } span.sourcefile, span.resultfile, span.tablefile, span.figurefile, span.listingfile { font-style: italic; } div.tablemeta, div.figuremeta, div.listingmeta { max-width: 14cm; font-size: small; } span.tabletitle, span.figuretitle, span.listingtitle { font-weight: bold; } div.codechunk, div.lirlisting, div.lirtable, div.lirfigure { margin-bottom: 3mm; }◼
O’Keefe, Richard A. 1990. The Craft of Prolog. Edited by Ehud Shapiro. The MIT Press.
Ramsey, Norman. 1992. “The Noweb Hacker’s Guide.” Departement of Computer Science Princeton University September, September. http://www.cs.tufts.edu/~nr/noweb/guide.ps.
Wielemaker, Jan, Tom Schrijvers, Markus Triska, and Torbjörn Lager. 2012. “SWI-Prolog.” Theory and Practice of Logic Programming 12 (1-2): 67–96.