yruba - why rules for bash?

... because it's cool! (download)

Introduction

The idea of yruba is similar to make, ant and other rule evaluators used to build software. Both, make and ant, are a pain to use and debug. Some reasons are:

Whether yruba manages to get out of syntax hell depends on your taste for shell script syntax. The rule concept is, however, more explicit and more clearly defined than in either ant and make (Concepts).

Yruba is intended to be simple to use for everyone who has done some shell scripting. Most build systems could in principle be written as one long shell script. Then, however, everything would be done everytime, which is a waste of time and resources. Consequently, it is necessary to have a control structure that is different from the common loops, conditional statements and function calls — namely rules. Yruba adds rules to shell scripting.

Concepts

The sole purpose of yruba is to get targets up-to-date. A target is often a single file, but can also represent a list of files or some overall goal to reach. A target gets updated by mapping it to a task that implements the necessary actions. A task has a name, and it is defined by a list of dependencies, a test and a command or function to execute:

dependencies
The optional list of dependencies for a task contains names of other targets. These must be up-to-date before the task can even be considered.
test
The optional test makes sure the task is not run unnecessarily. Only if the test returns true (exit code 0), signifying that the target is out-of-date, the task's command is run.
command
The required command for a task performs all actions necessary to bring the target(s) up-to-date.

Get Going

Yruba comes as a the shell script yruba . The script looks for a task description file with the default name yrules. This file uses bash syntax to define tasks as described in section Writing Tasks. A default target can be specified in yrules, but targets can also be specified on the command line. To get a feel for how things go, have a look at this simple example.

defaultTarget="hello"     # optional

# map requests for *.msg to makemsg
mapTarget '*.msg' makemsg

# task "makemsg"
test_makemsg() { ! test -f "$1"; }
cmd_makemsg()  { echo "Yruba is cool!" >"$1"; }

# task "hello"
dep_hello() { echo hello.msg; }
cmd_hello() { cat hello.msg; }

When yruba evaluates this rules file, it will first consider target hello. Since hello is not mapped to a task name with mapTarget, hello itself is taken as the task name. Task hello has the dependency hello.msg, as produced by the function dep_hello. Now hello.msg is considered as a target. Due to the target mapping for *.msg, the task to run is makemsg. This task has no dependencies (no dep_makemsg), but it has a test (test_makemsg). If the test finds that the file hello.msg does not exist, the task's command, cmd_makemsg, is run. Once this is done, cmd_hello is immediately run since hello does have no test.

Target Mapping

The difference between target and task is only a subtle one. By default, a target and its task have the same name. If you have a task called tgz to pack your source files into a tar archive, you may use tgz as a target name in a dependency list and on the command line:

% yruba tgz

But it is also possible to map targets via a pattern to a task. If there is one task, called compileC, that compiles C source files into object files, we want to use this task for all targets *.o. This is done by calling mapTarget:

mapTarget '*.o' compileC

Whenever a target matching *.o is now considered, it is mapped to the task compileC.

Writing Tasks

Implemention of a task requires up to three shell functions or scripts,

In what follows, the term function is used throughout although a script can usually also be used.

The Dependency List

A task may depend on targets that must be up-to-date before its command can be run. The dependencies are specified by implementing a function that prints the targets on stdout. It will be called with a single argument — the target under consideration. The name of the function must be the name of the task prefixed by dep_. For a task called jar, intended to produce a jar file from Java class files, this would be:

dep_jar() {
  # $1, the target's name, not used here
  echo compileJava
}

This assumes that compileJava is a target that can be mapped to a task or is itself the name of a task to create the class files we want to pack. Users of make may be tempted to list all the class files as dependencies. But this is only because make does not separate the dependencies from the test. With yruba the list of class files only comes into play in the test function as described below.

The string returned by dep_* will undergo one round of eval processing in a statment like

eval set -- "$deps"

in order to have proper list handling. In most cases this is of no harm, but if there are potentially dangerous characters in some dependencies, then the use of lappend is recommended to built up the dependency list:

deps=$(lappend "$deps" "$someDep")
...
echo "$deps"

A note on source files as dependencies: make users may also like to put source files, e.g. xyz.c, in the dependency list. This, however, is only necessary if xyz.c is itself created by a code generator or must be checked out of a repository, for example. Otherwise yruba will merely check if the file is there and continue. But this is a waste of time, because if the file was inadvertently deleted, an error messages will result in the test function soon enough.

The dep_* function for a task is optional. If it does not exist, an empty dependency list is assumed.

Use of Variables

An additional detail to know is that the dep_* function is called in a subshell. As a result it has access to all shell variables but changing them will not have any effect after the return from the function.

The Test

After yruba has recursively considered and eventually updated all dependencies of a task, it calls the task's test function. The name of this function must be the task's name prefixed by test_. Now that all the dependencies are up-to-date, the test can finally determine if the target under consideration is out-of-date. For the task jar the test function would be

test_jar() {
  local jarfile=$1
  # $2, $3, etc., the dependencies, not used here
  JARCONTENT= ... # find list of files to pack
  old "$jarfile" -d $JARCONTENT
}

The first argument of the test function is the target under consideration. The following arguments are the dependency list as produced by the dep_* function. The function old comes with yruba.

Only if the test returns true (exit code 0), the task's command, as described below, will be called. If it returns false (exit code ≠ 0), the target is supposed to be up-to-date already. The test_* function is optional. If it does not exist, the target is assumed to be out-of-date and the command is called unconditionally.

Use of Variables

In contrast to the dep_* function, the test_* function is not called in a subshell. Consequently it is able to set or change global variables. In the example this is used to set JARCONTENT to the list of files to pack. The variable will be used again in the cmd_* to finally pack the jar file (see below). Be careful, however, to not inadvertantly change variables somewhere up the call stack. If in doubt, use local variables.

After calling the test function, yruba resets the current directory to where it was before the call to not disturb other tasks.

The Command

The minimum necessary to implement a task is the function that updates the target. Its name must be the task name prefixed with cmd_. For the task jar this is cmd_jar. When the function is eventually called by yruba, the first argument will be the name of the target and the other arguments will be the elements of the dependency list.

cmd_jar() {
  local jarfile=$1
  # refer to JARCONTENT as set by test_jar
  jar cf "$jarfile" $JARCONTENT
}

Use of Variables

Like test_* also cmd_* is not called in a subshell. This allows, for example, to set a variable as the result of a task, as opposed to the classical generation of a file. The comments about the dangers of changing global variables apply as above.

After calling the test function, yruba resets the current directory to where it was before the call to not disturb other tasks.

Special Variables

defaultTarget — the default target

The variable should normally be set somewhere at the beginning of the yrules file. It provides the target to consider if none is given to yruba on the command line.

yruba_* — internal variables of yruba

Variables prefixed with either yruba_ or YRUBA_ should not be changed anywhere, because they contain yruba internal information.

Library Functions

Yruba comes with a few predefined functions that may help in writing tasks. Some of the functions produce a result to be picked up by the caller. This may be a string printed on stdout. If the function implements a boolean test, then the result is delivered via the exit code. In the following documentation, we say the test returns true, if the exit code is 0, as is customs in shell programming. An exit code other than 0 denotes false.


old — compare file modification times

old t1 [...] -d [d1 [...]]

result: exit code

returns true if any of the file targets t1 [...] is older than (test -ot) any of the dependency files listed after -d.

Option -d must be present, but may have zero arguments. If no dependency is given, false is returned, meaning the target(s) are not old. If any of the dependencies does not exist as a file (test -f), yruba exits with an error message. If any of the targets does not exist as a file, true is returned (this is a feature of test -ot).


haveClass — does the current CLASSPATH provide a given class?

haveClass class

result: exit code

Tries to compile a small java source file that references the given class. The class must be in fully qualified dot-notation. The compiler used is ${JAVAC} with a default of javac. The function returns true, if the file can be compiled without error. No CLASSPATH is explicitely set, meaning that the value from the environment is relevant.


die — print message and exit

die [text [...]]

result: none

The given text is printed to stderr, prefixed by yruba:. Then the script is exited with code 1.


dlog — print indented informational message

dlog [-n] [text [...]]

result: none

Prints the given text to stdout, properly indented according to the current rule evaluation nesting level. If -n is given as the first argument, it is passed to echo, preventing a terminating end-of-line.


lappend — append an element to a list

lappend list elem

result: stdout

Maintaining a list in the shell is difficult as soon as the elements contain space, newline or quote characters. The l* functions provided with yruba try to help with this. The list as a whole is stored as a string in any shell variable. The content of the variable is kept quoted in a way that

eval set -- "$list"

sets the positional parameters exactly to the list's elements, given that list contains the list representation.

A typical call to lappend should look like

mylist=$(lappend "$mylist" "$newelem")

Don't forget the quotes around the list and the element variable.


lcreate — create a list from the arguments given

lcreate [arg [...]]

result: on stdout

Creates a list for use with yruba's l* functions. Each parameter of lcreate is made into a list element. The string representing the list is printed out stdout.

mylist=$(lcreate "bla'\"bla"  '$b'  "\\")

In particular the intention is make sure that

list=$(lcreate "$@")
eval set -- "$list"

does not change the values of the positional parameters.

This function was formerly called lquote.


lget — get ith element of a list

lget list index

result: stdout

Retrieves the element at index position index from the given list. Indexing starts at 0. If the index is out of range, an error is printed to stderr and exit code 1 is returned. Example:

mylist=$(lcreate -0- -1- -2- -3- -4-)
elem=$(lget "$mylist" 3)

will set variable elem to the string -3-.


lhead — get first element of a list

lhead list

result: stdout

Retrieves the first element of the list. If the list is empty, an error is printed to stderr and exit code 1 is returned. This function is a shortcut for lget "$list" 0. Example:

mylist=$(lcreate -0- -1- -2- -3- -4-)
elem=$(lhead "$mylist")

will set variable elem to the string -0-.

See also ltail.


lpush — prepend an element to a list

lpush element list

result: stdout

The first parameter is prepended to the list represented by the second paramter. See lappend for an introduction to yruba's l* functions. For example

mylist=$(lpush "$elem" "$mylist")

will establish the content of variable elem as the first element in the list represented by the content of variable mylist.


ltail — remove first element from a list

ltail list

result: stdout

Removes the first element of the given list and returns the remainig list. If the given list is empty, an error is printed to stderr and exit code 1 is returned. Example:

mylist=$(lcreate -0- -1- -2- -3- -4-)
mylist=$(ltail "$mylist")

will set the variable mylist to a string representing the list with the four elements -1-, -2-, -3- and -4-.


mapFilenames — map file names to different directory and change extension

mapFilenames olddir newdir ext fname [...]

result: stdout

Removes a prefix of the length of the string olddir from each filename given, typically a directory name, replaces it with newdir, and also sets the extension to ext. The dot is implicitly assumed as part of the extension and need not be specified. Special cases of ext are:

The example

l=$(mapFilenames jsrc classes class jsrc/*/*.java)

assumes that some Java source files are sitting in package directories below jsrc, say in pack1 and mypack. All these source files are picked up by the glob pattern jsrc/*/*.java. An example file name would be jsrc/pack1/Blorb.java. The function mapFilenames replaces jsrc with classes and changes the extension to class to get classes/pack1/Blorb.class.

The result is assembled by means of lappend to make sure that even dangerous file names containing, say, quotes or backslashes are properly handled. Consequently the result is a list that can be handled savely with the other l* functions, and in particular

eval set -- "$l"

will set the positional parameters such that

for x in "$@"; do ... done

will correctly iterate over even the strangest file names.

Note: Although the first parameter is a string — and typically you would just specify the directory to remove — only the length of this string is relevant, because exactly this number of characters is removed from each filename given.


mapTarget — map a target pattern to a task name

mapTarget pattern task

result: none

By default, every target is handled by the task with the same name. With mapTarget it is, however, possible to declare that a target matching the given pattern shall be handled by the named task. As an example consider:

mapTarget '*.o' compileC

This makes sure that every target matching the pattern *.o will be handled by the task compileC. The function mapTarget should usually be called before rule evaluation starts. The pattern will go once through eval, so any special character must be properly quoted. If several patterns match a target, the pattern which was added later has priority.

Yruba maintains the pattern map in a shell case construct stored in the variable yruba_tagmap. In desperate cases it may help debugging to just print its contents.


ydoc — add a one line descripton to a target

ydoc target text ...

result: none

Registers text as a short description of target. To list the descripton of targets, use command line option -i.