A Guide to Elly
===============

This guide describes the Elly programming language.  Elly is designed
as a "golfing" language: it aims to be able to express programs using
a minimal number of bytes.  It also aims to be fairly easy to read and
particularly fast to write: programs are written and read in a
readable "source code" form.  However, the source code must be
converted into a heavily compressed "encoded" form before it can be
used by the interpreter (this conversion can be done automatically,
and the encoded form can also be expanded back out into valid source
code automatically, although it might not use exactly the same syntax
as the original; it will at least have the same functionality).

Elly was inspired by the programming language
[Jelly](https://github.com/DennisMitchell/jellylanguage) (which is
also a golfing language), but (in addition to being easier to read and
write) aims to save space over Jelly by sorting out parsing
ambiguities more efficiently (especially in loops) and using better
representations for the constants.

This guide is intended to be usable both as a tutorial, and as a
reference guide.  Every feature of the language is discussed, but the
order in which features are discussed is chosen so that reading the
guide in order will avoid introducing too many features at once and
form a sensible progression for learning the language.


Full control mode
-----------------

Like most programming lmnguages, Elly allows you to write arbitrary
expressions by combining constants, functions and builtins in any
combination you like.  Unlike most languages, this is *not* the usual
way to write an expression in Elly; most expressions can be
abbreviated, allowing them to be encoded in fewer bytes.  However,
it's the most general-purpose method, and a good place to start when
learning the language.

If you want to write an arbitrary expression, you need to start it
with a "full control marker", which is an at sign `@`; this lets an
Elly implementation know that it should be looking at the code as-is,
rather than trying to implictly insert variables.  You can then follow
this with the expression you want.  Although it is unidiomatic to use
`@` where unnecessary in Elly, this guide will start by placing an `@`
on every expression; later on, it will explain the rules for where you
can omit one, and then omit them from then on.

An Elly *expression* uses *prefix notation*: constants are just
written normally, but to call a function or use a built-in function (a
"builtin"), you write the name of the function, followed by each of
the arguments in turn.  (Each function has a known number of
arguments, so it's always possible to unambiguously match a function
with its arguments, even without parentheses.)  The various
constants/functions/builtins that make up an expression are, in the
source code form, separated by spaces.

For example, if we wanted to write the expression "square root of (5
squared minus 4 squared)", we could write it like this:

    @ sqrt minus square 5 square 4

Elly is forgiving with names of built-in functions: most builtins have
multiple possible names that can be used in the source code.  So you
could also write it like this:

    @ √ - ² 5 ² 4

These programs are equivalent; after being converted into the encoded
form that's used as input to the interpreter, they will become the
same sequence of bytes (the exact name you use for the builtin doesn't
matter, the encoded representation only cares about which builtin
you're using).  This mechanism is designed to make it easier to
remember the builtin names; you can guess at the name, and normally
it'll be a valid name for the builtin.  (It's also designed to make it
easier to *type* the builtin names; names like `√` are easy to read,
but not so easy to type.)

Evaluation order in Elly is chosen to produce the same results as if
the arguments of a builtin were evaluated before the builtin itself
(from left to right), with the exception of builtins like `and` that
are specifically designed to skip evaluating the right argument if the
left argument has a specific value.  (This detail normally doesn't
matter unless calculating the arguments has side effects, but it means
that you can, e.g., assign to a variable while calculating the first
argument of a builtin, then read the assigned value while calculating
the second.)

When using a golfing language, it's usually important to know how many
bytes your program will encode to (in order to know what sort of
rearrangements will save bytes).  For full control mode, this is very
simple: the `@` costs a byte, and then you simply pay the costs for
each builtin/function/constant you use individually (which is normally
a single byte for each builtin and constant, except for the case of
larger literals like `20` or `12345678` or `"Hello, World!"` which
typically cost multiple bytes, and builtins whose names start with `@`
which cost two bytes).


Simple functions and programs
-----------------------------

In order to write an actual Elly program, you have to start by writing
the definition of a function; this will be the *main function* that
defines what the program does.  A function in Elly can take 0, 1 or 2
arguments.  (A program can take more, but the extra arguments become
global variables rather than being used as arguments to the function.)
It is possible to specify the number of arguments explicity with an
I/O specification (see below), but normally this would be
automatically determined by looking at how many arguments the program
was actually given.

A function has two local variables; these are initially used to hold
its arguments, but can be assigned to while the function is running
(with no effect on the actual arguments).  These variables are called
`X` and `Y`.

  * If the function has two arguments, initially `X` holds the first
    argument and `Y` the second argument.

  * If the function has only one argument, this is initially stored in
    `Y`; `X` will start out undefined, but when the function's initial
    expression has been evaluated, `X` will automatically be given the
    value that the expression evaluated to if it did not gain a value
    as a side effect of evaluating the expression.

  * If the function has no arguments, `X` and `Y` both start out
    undefined.  When the function's initial expression has been
    evaluated, its value will automatically be stored in both `X` and
    `Y` (although if either variable gained a value during the
    evaluation of that expression, that value will not be
    overwritten).

The simplest form of a function (thus, of an Elly program) is a single
expression; the value that the expression evaluates to will be the
value returned from the function.  (Except when an I/O specification
is used to override this default rule, a full program will
automatically print the return value from the main function to
standard output, letting you see what value it computed.)

As a running example through this guide, we'll try to write a program
which takes a number *n* as input, and calculates (*n* + 2) × (*n* +
3).  (In mathematical notation, this program would be
"λ*n*.(*n*+2)(*n*+3)".) Using the language constructs implemented so
far, we can implement this easily (but not very tersely) using a main
function consisting of a single expression:

    @ × + Y 2 + Y 3

(If we weren't sure how to type `×`, we could have written it as
`multiply` instead.)

This program will be 8 bytes long in encoded form: 1 for the full
control marker, 3 for the 3 builtin usages, 2 for the 2 small integer
constants, and 2 for the two uses of `Y`.

If we wanted to write this as a separate function within a larger
program, we'd need to explicitly mark it as a function (the program as
a whole is automatically interpreted as a function body, but if you
have other functions, their bodies need to be marked as functions
explicitly). That's done using parentheses around the function, and a
number of arguments before them:

    1( @ × + Y 2 + Y 3 )

A function is automatically called at the location where it's defined,
so you generally define a function at the first location where you'd
want to use it.  Subsequent calls to the same function are done via
using builtins whose purpose is to call user-defined functions.

Functions are not allowed to be completely empty; you must have
something between the `1(` (or whatever) and `)`, and the content you
place there is not allowed to abbreviate down to 0 bytes.  The
exception is when you have no closing `)` (i.e. you're in the main
program); in this case, you are allowed to simply just end the file to
produce an empty function.

If a function's closing `)` would appear at the end of the program,
you may omit it.


Pipelines
---------

Although simple functions are easy to understand, a function in Elly
doesn't need to consist of just a single expression.  Instead,
functions more commonly consist of a *pipeline*: an expression
followed by a list of *statements*.  Elly statements are sort-of like
statements in most other programming languages, in that they could
just be bits of code that you're calling for their side effects.
However, they also have an input and a return value; the input of each
statement is the return value of the previous statement or expression
(and the return value of a function is the return value of the last
statement in it, or of its initial expression if there are no
statements), and statements are more commonly used to create an
appropriate return value rather than for side effects.

Any function or builtin that takes 1 or 2 arguments can be used as a
statement: the argument to the statement will be used as the first
argument to the function or builtin, and if it requires a second
argument, an expression to produce that argument needs to be specified
just after it in the source code.  This gives us a couple of
alternative ways to write our (*n* + 2) × (*n* + 3) program:

    @ + Y 2  × @ + Y 3
    @ Y  + @ 2  × @ + Y 3

(There is no requirement to add additional whitespace between the
"pipeline stages" of an Elly program, but it won't cost bytes after
encoding and makes it more readable, so it's generally worth adding
some.  The normal whitespace convention in Elly is to place one space
between tokens, and one additional space before each statement.)

Statements do not *have* to be formed from builtins that are usable in
expressions; a few statements are usable only as statements.  This
sort of statement could potentially take any number of extra arguments
(which are all written just after the statement name, prefix-notation
style), in addition to their input.

Sometimes, when writing a pipeline, it is useful to swap the arguments
of a function or builtin; for example, the mathematical function
λ*n*.(2 − (3 × (4 + *n*))) would be most clearly written as a pipeline
that starts with *n*, adds 4, multiplies by 3, and subtracts *from* 2.
In Elly, you can express this form of argument swapping by adding a
`'` to the end of a function or builtin usage:

    @ Y  + @ 4  × @ 3  −' @ 2

Although reversed builtins like `−'` don't have an encoding of their
own, the `'` nonetheless never costs a byte; if it's followed by a
full control expression, it's encoded via using a different encoding
of the full control marker `@`, and otherwise it's encoded as part of
the tacit pipeline stage shape (see below).  You cannot use a reversed
builtin within a full control expression (but you can typically just
swap the arguments manually).

I personally find the pipeline form of a program to be closest to the
way I think about programs in golfing languages.  Although it
typically comes out longer when using nothing but full control mode,
it also tends to be the tersest way to write a program when you take
full advantage of the various abbreviation mechanisms (which are
explained below), so it should normally be your first approach when
trying to write an Elly program (with complex expressions used only
for expressions that are too "branchy" to fit into a linear pipeline,
or in cases where you hardly mention variables and thus could gain no
advantage from abbreviating them).


Implicit variables, and how to count bytes in Elly
--------------------------------------------------

Elly is a token-based language.  Although there are more than 256
spellings of tokens in total, many of them are equivalent (e.g. `*`
versus `×` versus `multiply`) and thus encoded the same way, and many
won't be legal in any given context.  As such, there are fewer than
256 possible one-byte tokens at any given point in the code, so each
can be unambiguously encoded as a single byte.  The whitespace between
tokens is just used to make the program easier to read (there is
always some whitespace between tokens and the exact amount or nature
of the whitespace doesn't matter, so the whitespace doesn't need to be
encoded at all).

Some tokens are rarely used, and intentionally encoded with multiple
bytes.  For example, there are a lot of possible literals, and (apart
from small integers) each individual literal is rarely used; so a
string, integer or list literal will often have to be longer than one
byte.  For literals, there isn't a clear visual indication of the
extra bytes, so it helps to use an Elly encoding tool if you care
about the exact number of bytes there.

There are also some rarely-used builtins (such as `@first_last`, which
returns the first and last elements of a list); despite being a single
token each, these tokens are encoded as two bytes in order to save
encoding space for more commonly used tokens.  Two-byte builtins all
have names starting with `@`, as a warning that an extra byte is being
used (this is intentionally reminiscent of the full control marker
`@`, which acts as a warning/reminder that if an expression isn't
being abbreviated, it needs an extra byte to represent).

Sometimes, though, a token will cost no bytes at all, allowing an Elly
program to be shorter than it looks; a few tokens can be represented
with no bytes at all (these tokens can be recognised in source code
because they contain lowercase variable names).  This section explains
how this is possible, and how you can save bytes with implicit
variables in your own programs.


### Implicit variables in full control expressions

Implicit (tacit) variables are the main advantage/twist that Elly has
over other golfing languages, and are often able to make programs much
shorter without making them substantially less clear.

Functions are generally built around their two main variables, `X` and
`Y`.  When written as capital letters like this, use of the variables
costs a byte each.  However, it's also often possible to specify that
these variables should be used, but without spending any bytes to do
so.

Requesting an implicit variable is very simple: you just write the
variable name in lowercase.  So we can save bytes in the first two
versions of our example program we've seen so far simply by making the
first `y` in each expression implicit, leading to the following
programs (which are each 7 bytes long when encoded):

    @ × + y 2 + Y 3
    @ + y 2  × @ + y 3

Of course, it isn't always possible to "magically" save bytes like
this; the fact that a `y` is used, and its location, needs to be
specified somehow.  Elly implements this by allocating more than one
possible encoding to various tokens (most notably `@`, `:`, `each`);
one possible encoding of `@` will simply just represent an `@` on its
own, but another encoding will represent an `@` followed by a `y` as
soon as possible, another will represent an `@` followed by a `y` a
little later, and so on.  ("As soon as possible" is generally after
one builtin has been seen: you are not allowed to write a full-control
expression that consists of nothing but a single tacit variable, thus
`@ y` is illegal.)

The Elly encoder is thus able to represent the exact program you
requested, in the exact number of bytes you requested, via modifying
the encoding of the `@` so that it encodes both itself, and the
implicit `y`.  In practice, this typically means that the cost of an
`@` will "refund" itself as long as you use an implicitly
representable variable fairly early in that expression.  The variable
that can be represented implicitly is usually either `y` or a
generator scope variable (see the discussion of generators below);
however, in the initial expression of a 2-argument function, it is `x`
that can be used implicitly.

(This section does not include details on the exact way that, e.g.,
the encoding of the `@` is adjusted.  If you care about them, see the
separate guide on Encoding for information on how Elly's encoding
process works.  However, those details shouldn't matter for learning
the language; you just need to know that this saves a byte, and why
it can save a byte, rather than the exact details on what numbers go
where in the resulting encoded file.)


### Incomplete full control expressions

There's another way that full control expressions can be abbreviated:
if the function (or program) ends in the middle of one.

In this situation, sufficiently many implicit variables are implied
into the end of the encoding to complete the incomplete expression.
We can take advantage of this by placing implicit variables at the end
of the program in our source code.  The last variable placed will be a
`y`; if more than one variable is missing, all but the last will be an
`x`.

So our example program:

    @ × + Y 2 + Y 3

could be written like this, by rearranging it to end with a `y`:

    @ × + y 2 + 3 y

in order to allow us to make both `y`s implicit, giving us a 6-byte
program.  (The first `y` is being encoded in the full control marker,
the second `y` at EOF.)  We're making progress!


### Tacit pipeline stages

You may be starting to get sick of the `@` signs everywhere
(especially because they often cost bytes).  As explained above, the
full control marker is necessary whenever you want to be able to write
an *arbitrary* expression, which can have any shape you want and which
puts the variables wherever you want.  However, most expressions
aren't arbitrary, and have one of a few common shapes and use
variables in one of a few common ways.

If you don't use a full control marker, therefore, you can still write
expressions, and statements containing expressions.  To keep things
simple, the meaning is exactly the same as it would be without the
full control marker.  However, rather than being able to write
arbitrary expressions, you are limited to using a few common shapes
for the expression; the code is a syntax error if you try to go
outside the set of common shapes and don't spend a byte on a
full control marker to permit this (because most of the available
non-full-control encoding space is used to encode *which* tacit shape
is in use, in addition to the builtins/constants/functions that make
up the statement or expression).  But because these shapes are so
common in practice, you can often write much of the program without
resorting to full control.  The big advantage is that the common
shapes are often packed full of implicit variables, saving you bytes
not only on omitting the `@` signs, but also by lowercasing your `x`s
and `y`s.

For the purpose of determining tacit pipeline stage shapes, the
2-argument functions are split into two groups, C (which includes
builtins which there is never or rarely a reason to swap the arguments
of); and D (all other 2-argument builtins, and user-defined 2-argument
functions).

Here are the various tacit pipeline stage shapes available (here, `K`
represents any constant / 0-arg function, `F` any 1-arg function
except generators, `G` any 1-arg function that is a generator (see the
section on generators below), and `C` and `D` represent 2-arg
functions in groups C and D respectively):

  * As an initial expression:
    * `x` (only in the initial expression of a 2-arg function)
    * `y` (only in the initial expression of a 1-arg function)
    * for a 0-arg function, *any* initial expression may omit the `@`
      (as a tradeoff for this, no implicit `x`/`y` will be available)
  * As a statement:
    * `F` / `G`
    * `C y` / `D y`
    * `C K` / `D K`
    * `C F y` / `D F y` / `C G y` / `D G y`
    * `D' K`
    * `D' G y`
    * `C G K`
    * `C G F y`
    * `C C x y` / `C D x y`
    * `C C K y` / `C D K y` (only in cases in which for the `C`/`D` in
      the argument, `C x y` / `D x y` is trivially equivalent to
      something of the form `F y` or `K`, typically because neither
      `X` nor `Y` have changed since `Y` was implicitly assigned to
      `X`)
    * `C G C x y` / `C G D x y` 
    * `C G C K y` / `C G D K y` (under the same circumstances in which
      `C C K y` / `C D K y` are allowed)
  * For the last statement in the function, `D' y` and `C x` are also
    permitted tacit shapes (these are not allowed for earlier
    statements), except when `y` is in use as a generator scope
    variable.

Above, we looked at the following pipeline for calculating our running
example:

    @ Y  + @ 2  × @ + Y 3

Under full control, this looks pretty horrible (and is 10 bytes
long!).  However, we can easily convert it entirely to tacit pipeline
stages:

    y  + 2  × + 3 y

This is not only more readable than what we've seen before, it's also
shorter: because the two uses of `y` are implicit, this comes to only
five bytes.  Note that we had to swap the arguments of the `+` at the
end so that the last pipeline stage has a valid tacit shape; addition
produces the same result regardless of its argument order (which is
why it's in group C), so this swap doesn't change the meaning of the
program.  The `× + K y` shape is legal because if we had
hypothetically written `× + x y` instead, we could alternatively have
written it as `× twice y`, i.e. in the form `C F y`, making the
encoding that would otherwise have been used for `× + x y` available
for a different shape.  The Elly decoder takes into account whether or
not we necessarily have `X` equal to `Y` when working out whether
expressions like `+ x y` could be useful or not.

Normally, you will want to write an Elly program almost entirely with
tacit pipeline stages, both to save on full control markers and to
make as many variables implicit as possible.


Variables
---------

As mentioned above, the most important varilables in Elly are `X` and
`Y` (especially `Y`, which is the most commonly available in tacit
shapes).  `X` and `Y` are local variables, local to the function that
contains them.  There are a few other variables, too; most of these
are global variables that hold their value throughout the whole
program, with the most important being `R` and `M`.

In order to assign to a variable, we normally use the `→` (or `->`)
syntax.  Each of the four main variables has its own assignment
operator as a builtin, e.g. `->Y` is a 1-argument builtin that assigns
its argument to `Y` and returns the value assigned.  When these
builtins are used inside an expression (rather than as a statement on
their own), they automatically turn on full control mode (thus, for
example, you can't write `+ →X y` except at the end of a function, but
can write `+ →X × Y 3`); this is based on the reasoning that there
isn't normally much point in assigning one variable directly to
another as a side effect.  `→X` and friends have only a single
encoding, so you can't place tacit variables inside this sort of full
control expression (except at the end of the function).

As a side note, functions with fewer than 2 arguments have an
automatic assignment to `X` after the initial expression is evaluated
(unless it was explicitly assigned within the initial expression).  To
make programs more readable, you should make this implict assignment
explicit by writing `→x` (or `->x`) at this point in the program,
unless the value of `X` is not used beyond that point.  (Likewise,
functions with 0 arguments allow an explicit `→y`.)  These cost no
bytes because they exist implicitly anyway (thus the lowercase
variable names), and the variable will actually be assigned to even if
you forget them.


### Saving bytes via using variables

Frequently, functions will discover that they don't need to remember
the value of `X` and/or `Y` all the way through.  So it's often worth
thinking about whether using a variable to remember a partial result
will lead to shorter code to recalculate it.

Let's think about our running example, the λ*n*.(*n*+2)(*n*+3)
program, which we have so far "golfed down" to 5 bytes.  Suppose we
wanted to implement a slightly different program instead:
λ*n*.(*n*+2)÷(*n*+3).  We can't simply just write `y  + 2  ÷ + 3 y`,
because `÷ + 3 y` is not a valid tacit pipeline stage (unlike `×`, the
argument order to `÷` matters, so the encoding space is needed to
remember which order the arguments are supposed to come in).  However,
by using a variable assignment, it's nonetheless possible to write
this program in 5 bytes.

The trick is to notice that "increment" (i.e. "plus 1") is a byte
cheaper to express than "plus 2" or "plus 3".  That means that instead
of adding to our argument `Y` twice, we can remember the value of `y +
2` and increment it:

    y  + 2  →Y  ÷ inc y

Incidentally, the same program can be encoded in 4 bytes by using a
mathematical rearrangement: (*n*+2)÷(*n*+3) = (*n*+3−1)÷(*n*+3) =
(*n*+3)÷(*n*+3)−1÷(*n*+3) = 1−1÷(*n*+3), which we can write as

    y  + 3  reciprocal  complement

When golfing, it's important to look for rearrangements that could
help you express the same algorithm in fewer bytes.  But it's also
important to look for a simpler algorithm, because maybe that
algorithm will come out terser than your original algorithm did.


### Which variables are available?

`X` and `Y` are general-purpose local variables (local to the
containing function), which initially hold the arguments but which you
can subsequently use how you like.  `X` has an additional purpose,
being automatically overwritten by a few builtins that calculate a
related value to the value you're calculating, e.g. when doing
floor-division of two numbers, the quotient will be output, and the
remainder will be stored in `X`.  Additionally, if a function contains
an explicit assignment to `X` (`→X`), but does not use `X` anywhere
after the assignment, the last value assigned to `X` will be used as
the function's return value (rather than the return value of the last
statement in it).  This return value override is based on whether `X`
actually *was* read, not on whether it *could have been* read.

`Z` is a special-purpose local variable; it refers to the return value
of the previous statement (or of the initial expression if this is the
first statement).  Normally this isn't very useful (as that value is
typically being used implicitly anyway), but you might sometimes want
to specify it as `Z` in a complex full control expression.  You cannot
specify it tacitly.

`I`, `J`, `K`, `L` (and `I0`, `I1`, … `I9`) are temporary, read-only,
local variables used to store the "scope variable" of an active
generator.  They can be named tacitly (e.g. as `i`) in the same
contexts as `y`, as long as they are used sufficiently close to the
start of the generator's scope, or of a full control expression; at
longer distances, they may need to be named explicitly as 1 byte
(`I`).  See "Generators" below for more information.

`NI`, `NJ`, `NK`, etc. are *numerical* loop counters, associated with
the generators whose scope variables are `I`, `J`, `K`, etc.; for
example, when looping over a list using scope variable `I`, `I` will
hold the list element and `NI` will hold the index of the location
within the list from which that element was taken.  These variables
can never be tacit, and numerical loop counters associated with
generators other than the next to be collected can be named only near
the start of a generator's scope or the start of a full control
expression (otherwise, there is nowhere to encode the choice of which
loop counter you mean); additionally, if they appear inside a full
control expression, that expression cannot contain tacit variables to
the left of the non-next-to-be-collected numerical loop counter.

`ZI`, `ZJ`, `ZK` are effectively the "working space" for parallel
generators; they are lists, holding the values of `Z` on every
parallel stream of the generators at once.  In `each`-like generators,
collecting the generator with scope variable `I` (`I.`) will return
the value of `ZI`.  `ZI` and friends have the same encoding
restrictions as `NI` and friends (i.e. when referring to a generator
other than the next to be collected, you're very restricted in where
you can put them).

`R` is a general-purpose *global* variable (whose initial value is a
falsey empty list).  You can assign to it in one function, then read
the value in a different function (whereas assigning to `X` or `Y`
won't change the value of the corresponding variable in the calling
function).

Similarly to `X`, if a program assigns to `R` and yet the value of `R`
is not read at any time during the execution of the program (except by
builtins that modify `R` in-place without exposing its value to the
calling code), the last value assigned to `R` will be the output from
the program (as opposed to the output of the main function).  This is
slightly different from the rule for `X` because it affects the
program as a whole, not just a single function; and because reads of
`R` before the assignment turn off the special handling.

`M` is a bit of a special case: it can be thought of as a global
map/dictionary, or as a set of global variables that are named after
Elly values.  For example, `M 3` is a specific variable, that can be
read from and assigned to, and so is `M [10]`, and `M "abc"`, and even
`M null` (note that although nulls are not normally considered equal
to each other, as discussed below, they are considered equivalent for
the purpose of looking up values in `M`).  This means that `M` itself
is not a 0-argument builtin, but rather a 1-argument builtin (the
argument is the key of the `M` entry to read); and `→M` is a
2-argument builtin (the value to assign is the first argument, the map
key is the second).  Every value within `M` is initially a falsey
zero.

`$` is a list that holds the command-line arguments to the program as
a whole.  Separately, there are variables `$1`, `$2`, `$3`, `$4`, `$5`
that hold the first 5 command-line arguments to the program, if they
exist.  If there are fewer than 5 arguments, then the unused arguments
in `$2`…`$5` are initialized to certain constants that would otherwise
require two bytes to represent:

  * `$2`: 16
  * `$3`: 32 (the codepoint of a space character, in ASCII/Unicode)
  * `$4`: 256
  * `$5`: 100

`$`, and `$1`…`$5`, are read-only; they don't have assignment
builtins or statements of their own.

`$1` is also a bit of a special case.  In cases where some unusual
encoding is in use (see "Unusual encodings" below), then the variable
`$1` will contain information specific to the encoding, allowing the
program to introspect into the way it was encoded.  (This does not
change the actual first input to the program, which will still be used
as the main function's initial `X` or `Y`, and which will still be
usually accessible as `@ first_nonnull $`, or `@ index_into 0 $` if
your program accepts nulls as arguments.)

There are no other variables available.  For most small programs,
`X`/`Y`/`R` will be sufficient (in fact, even `R` is often
unnecessary), but if your program is large and you need a lot of
variables, you can create an unlimited supply by using entries within
`M`.


Generators
----------

Unlike imperative languages, where control flow is typically
accomplished using control flow statements, the usual (and idiomatic)
way to perform control flow in Elly is to use *generators* (the only
other forms of control flow are recursion and `when`).  Generators can
be thought of as control flow builtins whose scope is determined by
the way that values flow through the program; they return a value
multiple times (in series or in parallel), and then are *collected* at
some subsequent point (which either starts the next iteration by returning
control flow to the generator and allowing it to returning a new value; or
else stops looping and outputs some value based on the result of all
the computations so far).

For example, let's look at the generator `each_prefix`, which runs
some code on each prefix of a list (with the outputs being listed in
order from shortest input to longest input).  Here are a 0-argument
program, and an expression, that calculate the squared sum of the
prefixes of a specific list, in order from longest to shortest:

    [1,3,5]  each_prefix :I  sum  square  I.  reverse
    @ reverse I. square sum :I each_prefix [1,3,5]

The program returns, and the expression evaluates to, [81,16,1,0].

In addition to the generator builtin itself, the important syntax here
is the `:I … I.`.  This specifies the scope of the generator.  The
`:I` at the start of the scope specifies what the scope is named (it
can be named `I`, `J`, `K`, `L` or `I0`, `I1`, `I2`, etc.; the name is
not encoded in the program so your choice of name doesn't matter).
The `I.` at the end shows the end of the generator's scope, where the
generator is collected.  So in this example, the scope is around two
builtins (`square` and `sum`); while in the scope of the generator,
they will run once for every prefix of the list that was the input to
`each_prefix`, looking at a different prefix each time, so three
different numbers have their squared sum calculated.  When an
`each_prefix` generator (in general, any prefix whose name starts
`each`) is collected, the various results at the point of collection
get combined back into a single list.  The `reverse` is outside the
scope of the generator, so it affects the entire list of results as a
whole, running only once.

Within the scope of a generator, the variable with the same name as
the scope holds the value that was returned from the generator.  You
can name this value tacitly in the same contexts as you would normally
be able to tacitly name `y` (as long as you are close enough to the
`:` start-of-scope marker, because the choice of `y` versus `i`, etc.,
is encoded using a choice of multiple encodings for `:`).  You can
also name it tacitly in a full control expression, if close enough to
the leading `@` (in this case, it's the full control marker that
encodes `y` versus `i`).  If we wanted to take the sum of the squares
of each prefix of a list, rather than the square of the sum, one
approach would be to take the dot product of each prefix with itself
(dot product is "multiply corresponding elements, then sum"):

    y  each_prefix :I  dot_product i  I.

`dot_product y` is a legal tacit shape, so we can use `dot_product i`
here instead.

You are not allowed to nest generators more than 7 deep in a single
function (placing a limit on this reduces the amount of encoding space
that needs to be dedicated to highly nested generators, freeing up
encoding space for encoding other constructs).


### Tacit scope markers

In our examples so far, we've been using explicit scope markers; the
start-of-scope marker `:` costs a byte, and the end-of-scope marker
`.` costs an additional byte.  As usual for Elly, that's the general
but inefficient way of doing things, and there are some cases in which
we can make one or both scope markers tacit, saving a byte or even two
bytes.  The notation for a tacit scope marker is, unsurprisingly, to
write the name of the scope variable in lowercase.

The `.` marker can be made tacit in most situations; colons have
multiple possible encodings, and so if the `.` is close enough to the
matching `:`, the exact position of a tacit `.` marker can be
specified by varying the encoding of the `:`.  The exact distance you
can reach depends on how many other things we're trying to encode in
the colon (a lot of implicit `i`s and `y`s in between will really
shorten the distance, for example, as we need to encode which is in
use; and heavily nesting generators will make the reachable distance a
little shorter); as a special case, an end-of-scope marker can always
be tacit if it appears right at the end of the function (with nothing
but other tacit scope markers after it).  The `.` marker can also be
made tacit (in fact must be made tacit) if the `:` marker is tacit and
the generator is not `each`.  However, if the scope of a generator is
empty (i.e. there are no builtins between the start and end of the
scope), the end marker may never be tacit (often this implies that the
start marker cannot be tacit either).

As for the `:` marker, there are four situations in which this can be
made tacit.

The first is if we're using the `each` builtin, which runs the
statements in its scope once for each element of the original list
(and combines the results back into a list when collected at the end
of its scope, as usual for `each` generators).  `each` is sufficiently
commonly used that it always gets an implicit colon (and may not use
an explicit colon); it has just as many different encodings as the
colon does, so the exact sort of colon we're using is implicitly
included within the `each`.

Our running example, the λ*n*.(*n* + 2)(*n* + 3) program, can be
written in terms of an `each` generator:

    y  + :i each [2,3]  i.  product

The program is 5 bytes long, tying our other attempts so far.
(Normally we'd need a full control marker to write a statement of
shape `C F K`, such as `+ each [2,3]`, but there is a special case for
1-argument generators and `C G K` is allowed.)  However, it
generalises in a different way from the other attempts (you could
easily modify it to calculate λ*n*.(*n* + 2)(*n* + 3)(*n* + 5), for
example).

The second situation is a "1-statement scope"; if a scope consists of
exactly one statement, then the `:` at its start can usually be tacit.
(An exception: you cannot make it tacit if the generator is the first
statement in the function, the generator takes a second argument, and
the function's initial expression is a tacit `x` or `y`.)  In this
situation, if the statement doesn't have a full control marker,
there's nowhere which can encode tacit `y` versus tacit scope
variables; the rule is that within a minimum size 1-statement scope, a
tacit `y` is accepted only if the statement is of the form
`2_argument_function y` exactly (you cannot use a tacit scope variable
with that sort of context), and any other tacit use of a variable
within the statement must be of the scope variable.  Related to this
is a "1-builtin scope" within a full control expression, which also
allows an implicit `:` (in this case, the scope variable cannot be
used tacitly); note that this is only allowed in full control
expressions, not when the generator is part of the expression of a
tacit statement, and may occasionally be disallowed within deeply
nested scopes (because some of the encodings normally used for this
are instead needed for the scope variables `I`, `J`, `K`, etc.).

The third situation is the opposite extreme: when the scope of a
generator that is a statement covers all the other statements of a
function (not just all the *remaining* statements; the builtin
creating the generator has to be the first statement after the initial
expression).  In this case, the scope variable entirely replaces `Y`;
in fact, you can think of `Y` being used as the scope variable.  The
syntax for this uses `:y` as the start-of-scope marker; the
end-of-scope marker would be `y.`, but you can omit it if you like.
(As an exception to the "cover the entire function" rule, you may nest
two or more whole-function scopes, along the lines of `y each :y each
:y reverse y.  y.`.  Only the innermost `y` will be visible, with the
others being shadowed.)

The fourth situation is similar: a generator that forms the root of
the initial expression of a 0- or 1-argument function.  In this
situation, the generator's output is implicitly being assigned to `X`.
You may therefore use `X` as the scope variable if you wish.  In this
situation, the colon may be implicit if the matching `.x` comes at the
end of the function (you may still use `X` as the scope variable if
the `.x` comes elsewhere, but will need to spend a byte on `:X` in
that case).  For example, it's legal to write this as a 0-argument
program:

    :x each_prefix [1,2,3]  →y  pair reverse y  x.

This would produce the output
[[[],[]],[[1],[1]],[[1,2],[2,1]],[[1,2,3],[3,2,1]]].  (Note that there's no
cheap way to specify the scope variable `x` directly, but the implicit
`→y` in a 0-argument function means that we can just use `y` instead.)
It is worth noting that this situation uses the same encoding that
would otherwise be used for a 1-statement scope; therefore, if you
want to write a generator that starts in a function's initial
expression and is scoped to only its first statement, you will need to
use an explicit `:I` or the like for its scope.

Our "sum of squares of each prefix" example covers both a single
statement, and the entire function, so we could write it in any of the
following forms (explicit colon, 1-statement tacit colon,
from-initial-expression tacit colon, all-other-statements tacit colon,
with the end of scope being tacit in each case):

    y  each_prefix :I  dot_product i  i.
    y  each_prefix :i  dot_product I  i.
    @ :x each_prefix y  dot_product X  x.
    y  each_prefix :y  dot_product y  y.

In this case, the last form is preferable; the first form costs a byte
for the colon, the second and third cost a byte for the scope variable
(which can't be tacit because the default for a tacit variable in this
context is `y`), and the third costs an additional byte for the full
control marker.  The last, however, allows both the variable and the
colon to be tacit, giving us a 2-byte program.

The various tacit forms exist because they make different trade-offs;
some (such as the all-other-statements form) are very terse when they
work but often won't fit your program; others (such as explicit start
of scope with tacit end of scope) will be usable for most programs but
are slightly longer.  It's often worth trying them all out to see
which will give the shortest code for any particular problem you might
be having.


### Generator control flow patterns

There are two main sorts of generator in Elly, in terms of how they
affect the program's control flow.

One possibility is to create an explicit loop, with separate
iterations (a *serial* generator).  For example, the `first_n_ints`
builtin creates a generator that loops through successive integers as
input, stopping when it finds a given number of truthy results (and
outputting the inputs that lead to those results).  In this situation,
it's collecting the iterator that causes the next iteration to occur.
For example, look at this 0-argument program to find the first 5
square numbers that are 1 less than a prime number (because there are
no arguments, we can write the leading `5` tacitly with no `@`):

    5  first_n_ints :y  is_square  and inc y  is_prime  y.

Here, the loop starts at `first_n_ints :y` and ends at `y.`, so all
three statements in between will be run once on each iteration.  If we
added a statement with a side effect there, it would run every
iteration; if we added two statements with side effects there, they
would alternate with each other, as both would run on every iteration.
`first_n_ints` behaves like this because the number of iterations to
run is not known when the generator starts to generate possibilities.
This iterative sort of generator must be entirely well-nested; any
generator whose scope starts outside its must have a scope that
finishes outside its, and any generator whose scope starts inside its
must have a scope that finishes inside its.

Most generators, however, are like `each`: there's a fixed number of
times the statements inside can run, and each of the executions are
independent of each other.  This sort of generator (a *parallel*
generator) does not place a single loop around all the statements
inside it, but rather runs each of them in the usual sequence; only
when a value derived from the generator is used is a loop needed, and
the loop will run just around a single function or builtin.  If we
look for squares that are 1 less than a prime in an explicit list (we
can do this using the `filter` generator, which is like `each` but
creates a list of the inputs that mapped to truthy values when the
generator was collected, rather than a list of the values that were
collected themselves), the program will look very similar:

    [6,9,16,36]  filter :y  is_square  and inc y  is_prime  y.

but now the evaluation order is different, and all four of our
hardcoded numbers will be checked for squareness first, then
incremented, then checked for primality.  If we had placed two
builtins with side effects inside this `filter` scope, then we'd see
all the side effects from the first builtin, followed by all the side
effects from the second bultin – even if they were part of the same
expression.

The main reason for this modified evaluation order is that it allows
generators to be non-well-nested.  It's entirely legal to do something
like this:

    [[1,2,3],[4,5]]  each :i  each :j  double  i.  j.

`each :i  each :j  double  j.  i.` would simply just double each
variable of the nested structure – `each :j` runs twice, so there are
two `J` generators (that get collected simultaneously), but they just
reconstruct the lists they were given.  This mis-nested version,
however, means that the `I` generator has to be collected before the
`J` generator.  This collection happens within the scope of the two
`J` generators, so we end up with one list entry for each value from
each `J` generator, producing the six lists [2,8], [2,10], [4,8],
[4,10], [6,8], [6,10] (which end up being collected into a single list
by the `J` collector).  Note that the two `J` generators have merged
with each other; there were two `J` generators while the `I` generator
still existed, but they merged when the generator that was keeping
them separate (here the `I` generator) was collected.  (In general,
you can always look at where you are relative to the scopes to see how
many copies of something there are, e.g. something that is copied by
an `I` scope will have multiple parallel copies only inside the `I`
scope and will always be merged when leaving it.)

Mis-nesting generators is allowed only when both generators are either
`each`, or have an explicit start-of-scope marker; tacit
start-of-scope markers cannot be mis-nested, except on `each`.  (And
of course, both generators must be parallel; serial generators cannot
meaningfully be mis-nested.)

In a parallel generator, it's also possible to peek at the output from
the "other" streams that are running in parallel.  The variables `ZI`,
`ZJ`, etc., hold the lists that would be produced if the generator
with the corresponding scope variable (`I`, `J`, etc.) were collected
right now (accessing the variables does not actually collect the
generator, though, merely lets you know what would happen if you did);
in other words, they contain all possible variations of `Z`, in all
the streams visible from the generator.  Using these variables
sometimes saves the need to collect a generator, look at `Z`, and
restart the generator again (perhaps forgetting the value of its scope
variable in the process).


### Two-variable generators

Although the vast majority of generators take one *argument*, in the
sense that one input is provided to the builtin, a couple of
generators effectively have two scope variables.  For example, `fold`
evaluates some code on the first two elements of a given list, then on
the result and the third, then on that result and the fourth, etc.;
this means that two pieces of information need to be provided from the
generator to its scope.

In Elly, generators like `fold` nonetheless only have one scope
variable named – the second of the two variables.  This is the
variable that can be referred to by name, and used implicitly in the
same contexts that `y` can; and it is written at the ends of the scope
in the same way that normal scope variables are.  The other is the
"output from the generator builtin", which will be the input to the
first expression or statement inside the generator.  (You can think of
this mechanism as being similar to the two arguments of a two-argument
function; `x` is the default for the initial expression, but most
tacit mentions inside the function will be of `y`.)  As a simple
example, if we wanted to write a function that sums a list without
using the `sum` builtin, we could fold addition over the list:

    y  fold :j  + j  j.

If you need to refer to the "first" scope variable more than once, you
will need to assign it to some other variable for safekeeping, or (if
you don't want to overwrite any of your variables) use a temporary
function call so that its parameters act as temporary variables.
Here are three ways you could fold λ*i*λ*j*.*i*+*ij* over a list:

    y   fold :j  2( x  + × x y ) j  j.
    y   fold :J  →X  + × x j  j.
    y   fold :y  →X  + × x y  y.

However, often you can avoid the need for this by either using `Z`:

    y   fold :j  + × Z j  j.

or rearranging the equation:

    y   fold :j  × inc j  j.


### `while` and `until`

[TODO: need magic for the RHS of these]


### `when`

Apart from generators and recursion, Elly has one other way to do
control flow: `when`, typically used to produce an early return from
the loops created by generators.  `when` is not itself a generator,
but rather a 2-argument builtin.  If its second argument is truthy, it
simply just returns its first argument; but if the second argument is
falsey, it effectively "cancels" the current code execution.  That
means that the executions of a parallel generator that contain a
failing `when` are removed entirely (so they won't be collected), and
an iterative generator or `while` loop will stop at the point it had
reached at the end of the previous iteration.  If a failing `when` is
used outside a loop, it causes the currently running function to stop
executing and immediately return `null`.

For example, this program will output [1,5,7]:

    [1,2,5,6,7]  each :i  when is_odd i  i.

`when` belongs to group C, so you can use fairly complex tacit shapes
with it (at the, generally irrelevant, cost of not being able to write
its reversed form `when'`; in many but not all of these situations you
can use `and` instead).


Data types and autovectorisation
--------------------------------

At this point, it's worth looking at what data types can exist within
an Elly program.  There are basically only three types of data:

  * **Numbers**.  Most Elly programs use only integers for their
    numbers.  However, Elly does transparently support other types of
    numbers (such as decimal, rational and complex numbers); these can
    be freely mixed with integers and will produce the mathematically
    expected results.  Additionally, they will be used to represent
    the mathematically expected results when necessary, e.g. the
    square root of −1 is an imaginary number not an error, and `÷ 2 3`
    is the exact number ⅔ rather than being rounded to 1 or truncated
    to 0, even though the inputs were integers.
    
    There is no maximum value for a number; Elly uses an unbounded
    (bignum) representation for integers, and for the numerator and
    denominator of a rational number, and these representations are
    used for the real and complex parts of a complex number.
    Irrational numbers do, however, have to be approximated.

    Numeric literals are limited to integers.  In source code, the
    normal form of a numeric literal is written in decimal.  Negative
    literals start with a hyphen-minus or Unicode minus sign, and
    (except in the case of −1) cost a byte more than the corresponding
    positive number.

    Infinities, NaN, etc., are not considered to be possible values
    for numbers.  Operations that should construct an infinite or
    impossible result, such as division by 0, return null in Elly.

    Elly does not have a separate character type.  Rather, each number
    has a "display flag" that determines whether it should be
    displayed as a number, byte or character when doing I/O.  This
    flag has no impact on any calculation – values with the flag set
    act the same way in calculations as values without the flag set –
    but will affect whether, e.g., a number with value of 65 is output
    as `65` or as `'A'`.  The flag is preserved by builtins that just
    copy a number around without inspecting it, and automatically set
    back to "number" in most other cases (with the exception that
    addition, and some addition-based operations like increments, will
    keep the display flag from the additon's left argument unless the
    resulting value would be inconsistent with the display flag, such
    as a byte with value 259, or a character code with value 4½).
    
    To support display-flagged numbers, there is a second form of
    numeric literal available in the source code: the byte literal,
    consisting of a backslash followed by any byte (this is a token on
    its own, so it must be followed by a space as usual), and encoding
    a number display-flagged as "byte".  For example, `\d` is an
    encoding of the number 100, but flagged to display as a byte.
    This can also be spelled as `byte.` followed by two hexadecimal
    digits, e.g. `byte.64`.

    When doing I/O on numbers that requires an unambiguous
    representation (e.g. printing them as a list element, or parsing
    them from a string), character-flagged numbers are written as a
    character in single quotes (with special cases of `'\''` for a
    character-flagged 39, and `'\u{…}'` for unprintable characters),
    byte-flagged numbers are written in backslash-byte or `byte.xx`
    format (depending on whether they are printable or not), a
    number-flagged 0 is written as `0`, and other number-flagged
    numbers are written in the form
    <code>(<i>a</i>+<i>b</i>i)/<i>c</i></code> or
    <code>(<i>a</i>-<i>b</i>i)/<i>c</i></code> (with
    <code><i>a</i></code> and a possible subsequent <i>+</i> omitted
    if it is 0, <code><i>b</i></code> omitted if it is 1 or −1,
    <code>+<i>b</i>i</code> omitted if *b* is 0,
    <code>/<i>c</i></code> omitted if *c* is 1, and the parentheses
    omitted if any of *a*, *b*, and *c* are).  If a number is printed
    on its own, and is not number-flagged, the character or byte it
    represents will be printed directly (e.g. a character-flagged 100
    prints as `d`).

  * **Lists**.  A list has a specific length (which is a nonnegative
    integer), and a number of elements equal to that length, e.g. a
    length 3 list has a first, a second, and a third element.  Lists
    are 0-indexed, i.e. an index of 0 corresponds to the first
    element, of 1 to the second element, and so on.  (Many builtins
    that operate on lists will "wrap" indexes beyond the list,
    e.g. element −1 is the last element, and element 4 of a 3-element
    list is the second element.  But this is part of the behaviour of
    the builtin, not part of the definition of a list.)

    List elements can be of any type supported by Elly, i.e. they can
    be numbers, other lists, or even null.  There is no requirement
    for all the elements of a list to have the same type, e.g. a list
    like `[1, [2, []], null]` is perfectly legal.

    Elly does not have a separate string type; a string is just a list
    of its characters.  For example, the string `"Elly"` is actually
    just the list `[69, 108, 108, 121]` (the display flag would
    typically be set to "character" on each of the elements, but it is
    almost invisible to Elly and matters only when formatting output,
    and calculations will produce the same value regardless).  A list
    may be output using string or bytestring notation if every element
    of the list has a "character" or "byte" display flag respectively,
    or if the list contains a mix of "character" and "byte"-flagged
    elements but they all correspond to ASCII characters.  As with
    numbers, if a list is printed on its own (rather than being an
    element of another list, or used in a context where an unambiguous
    representation is reuqired), a character string or bytestring is
    just printed directly by printing the characters or bytes it
    contains.

    List literals are limited to lists of nonnegative integers, but
    other sorts of lists can be created at runtime via running code or
    parsing strings.  In the encoded form, there are four types of
    list literal: one which is specialised to encode the sort of lists
    that contain opaque integers, one which can only encode ranges
    from 0 or 1 to a small-to-medium integer, one which is specialised
    to encode lists of character codes and which sets the display flag
    to "character" on the elements, and one which can only encode
    lists that are valid encodings of some Elly program (which have no
    trailing bytestring).  In source code, list literals are either
    written as a sequence of decimal integers, separated by commas and
    enclosed in square brackets; or as a character string in double
    quotes `""` (this allows escape codes like `\\`, `\"`, `\n`, and
    `\u{…}`); or as an Elly program surrounded by (depending on the
    number of arguments it takes) `@0{ … }`, `1{ … }`, or `2{ … }`, to
    represent the list that contains the bytes that encode that
    program (you may omit the `}` at the end of the file if you wish,
    but this doesn't save any bytes).  The `@0{` delimiter costs an
    extra byte compared to the other two, thus the `@` at the start of
    its name.  In I/O, the square-bracketed and double-quoted forms
    are allowed, as is a bytestring form (which is the same as the
    string form but with an extra "b" at the start and with `\x`
    escapes rather than `\u{…}` escapes, e.g. `b"a\x01b"`).

  * **Null**.  A null is a special value, distinct from all valid
    values; to obtain a null, you can use `null`, a 0-argument builtin
    that returns one.  A comparison involving a null is treated as
    meaningless; all such comparisions will return false, even when
    comparing two nulls for equality (e.g. it is not true that `[1,
    null]` is the same list as `[1, null]`), although some builtins
    (such as `→M`) treat two different nulls as being interchangeable.
    Bear in mind that the rule is not "null is not equal to null"; two
    nulls are neither equal to each other, nor unequal to each other,
    they're impossible to compare at all.

    A null is used for two purposes.  One is as the main
    representation of Boolean false (most values in Elly can be either
    truthy or falsey depending on how they were produced, but nulls
    are always false).  The other is as a sentinel value to specify
    that part of a data structure is inapplicable / has errored / has
    been determined not to be the part that we're looking for / should
    be abandoned.
    
    Generally speaking, any calculation that takes a null as input
    will produce a null as output, unless the calculation is
    specifically designed to operate on nulls or on Booleans.  If a
    calculation takes a list including a null as input, or takes a
    null as input and produces a list, this will only null out any
    parts of the output which would depend on the null value
    specifically.  Additionally, if a calculation takes a list as
    input and inspects its elements, but interprets it as an unordered
    collection of elements rather than as an ordered list, generally
    any nulls within the list are ignored (and only the non-null
    elements are used); for example, `lex_maximum` will give you the
    maximum non-null element of a list.
    
    In I/O, a null is represented as `null`.

Builtins will generally try their best to operate on data that has the
"wrong type".  In the case of nulls, this is easy (if the builtin
isn't specialised for operating on nulls, you get a null result).  The
rules for unexpected mixes of lists and integers are more interesting,
and are known as *autovectorisation*:

  * If a builtin/statement expects a list, and is given an integer,
    the integer will be treated as a list of the nonnegative integers
    strictly below that integer.  So, e.g., `@ i. square :i each 4` is
    an expression that evaluates to the list `[0,1,4,9]`.  This
    happens regardless of the other inputs.

    At present, the behaviour when a non-integer number attempts to
    autovectorise into a list in this way is not defined in general.
    However, some builtins will special-case this to generalise what
    their behaviour on the integers is to other sorts of numbers; for
    example, `index_into` (which indexes into a list, wrapping if
    necessary) effectively computes a modulus operation when indexing
    into an integer, and thus using `index_into` to "index into" a
    non-integer number will also calculate the value of the "index"
    modulus the "list".

  * If a builtin/statement has exactly one argument that is expected
    to be a number, but is actually a list, an implicit `each` will be
    added onto that argument (with its scope covering the builtin but
    nothing else).  So, e.g., `@ + [1,2] 3` is an expression that
    evaluates to the list `[4,5]`; we've added `3` to each element of
    the list `[1,2]`.

  * If a builtin/statement has two or more arguments that are expected
    to be numbers, but are actually lists, then it will run in a loop
    over the first elements of the lists in question, the second
    elements of the lists in question, and so on.  (This is not the
    same as adding an `each` to every argument; `@ + [1,2] [3,5]` is
    `[4,7]`, whereas `@ j. i. + :j each [1,2] :i each [3,5]` is
    `[[4,6],[5,7]]`.  The generator with the same behaviour as the
    autovectorisation is `corresponding`.)  If the lists have
    different lengths, then the shorter list will be padded to the
    length of the longer using falsey integer zeroes: `@ + [1,2]
    [3,4,5]` is `[4,6,5]`.

  * The above rules apply recursively.  For example, a list of lists
    can end up creating two nested loops: `@ + [[1],[2,3]] 4` is
    `[[5],[6,7]]`.  Likewise, `tighten` (which expects a list of lists
    and concatenates all those lists together) may end up expanding
    integers to lists twice: `@ tighten 4` is `[0,0,1,0,1,2]` (the
    tightening of `[[],[0],[0,1],[0,1,2]]`).

In the previous section, we saw how we could write our running example
as `y + @ :i each [2,3] i.  product`.  Using autovectorisation, we can
see that the `each` is unnecessary; the `+` (which is designed to add
numbers) will autovectorise when given a list as an argument.  So we
can write the program as `y + [2,3] product`.  This encodes to only 4
bytes, just half the size of the naive full control version, and is
probably the shortest way to express the program in Elly.


### Comparisons

One thing to be aware of is that Elly has two basic families of
comparison builtins.

One of these families are the *numerical* comparison builtins.  The
basic numerical comparison builtins are `≤` and `≥`.  As the names
suggest, they are only used to compare numbers; if you try to compare
lists using them, they will autovectorise.  So for example, the
expression

    @ ≤ [1, 2, 4] [1, 2, 3]

will output a list of two truthy numbers and a falsey number.

The other family is the *lexicographic* comparison builtins.  These
compare arbitrary values, and never autovectorise.  For the purpose of
lexicographic comparisons, any comparison involving `null` returns
false (regardless of what sort of comparison it is: equality,
inequality, less-than, greater-than…), a comparison of two numbers
compares their numerical values, and a comparison of two lists
compares the elements in corresponding positions of the lists, with
the first differing element determining the result of the comparison.

Lexicographic comparisons can try to compare things of different types
from each other.  When comparing two lists of different lengths, the
common elements are compared first: if they are the same, then the
shorter list is considered lexicographically lower.  When comparing a
list to a number, numbers whose real part is in the range *n*
exclusive to *n*+1 inclusive are considered to be marginally greater
than the list [0, 1, …, *n*] (i.e. if compared to this list, they will
be considered greater than it; and if compared to any other list, the
result will be the same as when comparing this list to that list).

To directly lexicographically compare two values, you can use
`lex_greater`; in general, builtins for lexicographic comparisons have
names starting `lex_`.  The selection of 1-byte builtins for
comparisons is different between lexicographic and numerical
comparisons, e.g. you can sort lexicographically in 1 byte but not
numerically, and lexicographic comparisons have a greater-than whereas
numerical comparisons have a greater-than-or-equal.

Complex numbers also present some difficulty for comparison functions,
because they aren't normally considered to be ordered.  These are also
handled differently between lexicographic and numerical comparisons.
Numerical comparisons treat complex numbers (i.e. numbers whose
imaginary part is not zero) very similarly to `null`s: complex numbers
will compare equal to themselves, and unequal to any other number, but
anything that cares about the order (`≤`, `≥` and friends) will return
false.  Meanwhile, lexicographic comparisons mostly only look at the
real part of a number; the imaginary part will be used as a tiebreak
when comparing two numbers which different but identical in their real
parts (the number whose imaginary part is larger upon dividing by `i`
will be considered greater).


### Truthiness and falsiness

Some builtins care about whether a value is logically "true" or
"false".  In Elly, this doesn't have much relationship to the actual
value; rather, each value is associated with an extra boolean that
determines whether it's currently truthy or falsey.  (The main
exception to this is that nulls are always falsey.)

Generally speaking, builtins that are designed to handle data in an
opaque way (without examining it) will leave its truthiness alone.
Builtins that are designed to return booleans will return null for
false, and some useful value related to the input value for true
(often but not always the input itself); in this case, the output will
be converted to its truthy version (even if it was previously falsey).
Builtins that have functionality unrelated to booleans
(e.g. arithmetic) will generally mark 0s in the output as falsey,
whilst marking other elements as truthy.  When an autovectorisation
assembles its results into a list, the resulting list wrapper (as
opposed to its elements) will be falsey if it is empty or contains any
falsey elements, and truthy otherwise; builtins that generate lists
might also use this scheme (but will mark any nonempty list as truthy
if they could plausibly generate an empty output from nonempty input).

In some cases, two builtins with only vaguely related functionality
will be merged together, because one of them conceptually returns only
a boolean and the other conceptually returns only a value.  For
example, `≤` (which determines whether its left argument is
numerically less than or equal to its right argument) and `smaller`
(which takes two arguments and returns the numerically smaller) are
actually the same builtin, implemented as taking two arguments,
returning the smaller, and marking it as truthy or falsey depending on
which argument was smaller.  This sort of merge can be a little
confusing, but helps to make programs terser by increasing the number
of builtins that can be given single-byte names (by allowing some of
them to share).

Note that falsiness or truthiness is generally only important when
using a builtin that specifically checks for it, so most of the time,
you won't have to worry about the truth values of your data (and
besides, it tends to get overwritten all the time anyway).  The
truthiness or falsiness of the output of the whole program will affect
the program's exit code, but typically not anything else.


I/O specifications
------------------

[TODO: Where do these things go? If we put them at the end, they can't
adapt argument count, but if we put them at the start, we can't parse
them correctly.

Current plan: encode these by starting a function with a C1 byte,
which shouldn't be needed for other meanings, followed by a
self-delimiting specification.]

I/O specifications are a domain-specific language, embedded within
Elly, for parsing input and formatting output, and specifying where a
function/program takes its input from and where it writes it to.

Placing an I/O specification at the start of the main program turns
off any attempt to adapt its argument count to the number of
command-line arguments given; it will always take exactly the number
of inputs that the I/O specification implies.  (The place to take
those inputs from can be included in the specification itself; if it
isn't, they're taken from arguments first, then from a trailing
bytestring (see below), then from standard input.)  As an exception to
this, if the I/O specification lists no inputs at all, command-line
inputs will be taken via the normal mechanism.

Functions other than the main program can also start with an I/O
specification.  Again, this can affect the number of arguments the
function is given.  The `0(`/`1(`/`2(` specifies the number of
arguments the *caller* must specify when calling the function; the
number of arguments the function *receives* is determined by the I/O
specification, and might be more (or occasionally even fewer).  The
I/O specification can read the function arguments; any which it does
not read at all will be provided to the function after the arguments
produced by the I/O specification.

It's possible that an I/O specification will specify that a function
takes 3 or more arguments.  In this case, all but the last will be
grouped together into a list.

To write an I/O specification, you start by writing I/O specification
commands, followed by a `;` (which counts against the bytecount).
[TODO: is this obsolete?] This is always distinguishable from the `;`
used by control flow statements, because no control flow statements
have appeared yet in the function.  The commands can be explicitly
marked to say whether they appear to inputs or outputs; but if they
aren't, they apply first to the program/function's return value, and
then to its inputs in order.

There are three sorts of I/O specification commands, explained in the
next three sections.


### Formatting commands

Formatting commands specify how a single value (normally a number)
should be converted to (or from) a string for I/O.  For example, this
includes `hexadecimal` which outputs the number in hexadecimal, and
`fractional` which specifies that non-integer numbers should be output
as a fraction (rather than a decimal).

A run of consecutive formatting commands will all be interpreted as
applying to the same value.  For example, `hexadecimal uppercase` will
output the value in uppercase hexadecimal.  The order can matter;
formatting commands will only affect formatting commands to their left
in the run.  So for example, `hexadecimal uppercase parseable` would
format the number 121 as `0x7B`, whereas `parseable uppercase
hexadecimal` would format it as `0X7b`; `uppercase` applies to
`hexadecimal` in the former case, and to `parseable` (which in this
case prefixes a `0x` so that the number is identifiable as
hexadecimal) in the latter case.  Note, however, that `parseable`
still prepends a `0x` in the latter case, despite appearing to the
left of `hexadecimal`, because otherwise it would do nothing at all.

Sometimes you want to break up a run of consecutive formatting
commands, to apply them to two different values.  A run will
automatically break upon seeing a redundant or contradictory command;
`hexadecimal decimal` can't apply to a single value, so it would end
up applying to two values.  For example, this program:

    hexadecimal  decimal  ;  y  double

will read a decimal integer (from command-line argument or standard
input), double it, and output the doubled value in hexadecimal.

However, there may be some cases in which the formatting commands are
consistent with each other, but nonetheless you want to apply them to
different values.  For example, if we wanted to input a number in
hexadecimal, and output a number with an explicit decimal point, we
can't use `show_point hexadecimal` because this is internally
consistent (producing numbers like `0x4.`).  Instead, we need to
insert an additional `;` in the middle of the run to break it up;
here's how we read a number in hexadecimal, and output it in (decimal,
by default) with an explicit decimal point:

    show_point  ;  hexadecimal  ;  y

Most formatting commands take no arguments.  A few, however, do need
an argument; as usual in Elly, this is written directly after the
command itself.  You don't have many tacit shapes available for
formatting command arguments; your only options are to use a single
0-argument builtin/function (which would typically be a literal or
variable), or else to enter full control mode.

Two examples of formatting commands with arguments are `precision` and
`fieldwidth`.  `precision` determines the number of decimal places (or
significant figures under `scientific`, hexadecimal places under
`hexadecimal`, etc.) to display; the number will be rounded if there
are more and padded out with trailing zeroes if there are fewer.
`fieldwidth` determines the preferred number of characters to use to
write the value; if the value naturally comes out shorter, it will be
padded out to the given width, and if it naturally comes out longer,
it will be rounded or converted to scientific notation if possible.
(However, sometimes this isn't possible, and the value will take up
more space than the given `fieldwidth`.)  Incidentally, the nature of
the padding can be changed by reordering the commands; something like
`fieldwidth 10 decimal` will attempt to pad the number using decimal
digits (so we end up with leading zeroes in this case), whereas if we
reverse this to `decimal fieldwidth 10`, we get the default padding of
spaces.

`precision` and `fieldwidth` also have special cases that allow them
to be joined together with their argument into a single token; instead
of `fieldwidth 10` (two bytes), you can use `fieldwidth_10` (one
byte).  Joining into a single-byte token is allowed up to
`precision_9` and `fieldwidth_19`.  You can also join into a
double-byte token (with leading `@` because it's a double-byte token),
going as high as `@precision_99` and `@fieldwidth_399`.  The
double-byte versions are encoded by using two consecutive `precision`
or `fieldwidth` commands; thus, you'll need to separate them with a
`;` if they were meant to apply to different values.

Formatting commands are mostly only used for output, because often
it's possible to automatically determine the format of input anyway.
Sometimes, though, you want to restrict what inputs are legal, or
enforce an unusual interpretation (e.g. parsing octal numbers, which
typically look like decimal if they don't have a leading `0o`).  So
you can use formatting command on inputs too.  An input that doesn't
fit the required format will parse as `null` and produce a warning to
stderr.


### Composition commands

Composition commands allow the various pieces of a value to be
formatted separately and combined back together again.  One example is
the `join ","` command, which will treat the value being output as a
list, and output the elements separated by commas (with no whitespace;
if you wanted whitespace that would be the `join ", "` command).
Composition commands effectively consume the value being formatted and
give you a new value to format (in this case, "an element"; there is
an implicit loop over the elements of the list).  In this way, they
act a little like generators do (although the syntax is very
different).

When composition commands have arguments (and they usually do),
they're compressed together with the argument.  For example, the
literal `"and"` encodes to three bytes, and `" and "` to five.
However, `join "and"` is still three bytes, and `join " and "` is also
three bytes, because there is a special case for `join` followed by a
dictionary word possibly surrounded by spaces.  `join` in particular
has a huge number of special cases because it's so common; very common
combinations like `join ", "` can encode to just a single byte.

`join` also has two variants, `ojoin` and `rojoin`, that allow you to
special-case the behaviour of the start or end of the list.  `ojoin`
will specify how to join the first element of the list to the rest of
the list; the "new value" produced to format will be the remainder of
the list (the first element itself will be formatted the same way as
an *element* of that list).  Likewise, `rojoin` will specify how to
join the last element of the list to the rest of the list.  If the
list happens to be only one element long, both `ojoin` and `rojoin`
are no-ops.  Here's how we'd format a list as `1, 2, 3 and 4`:

    rojoin " and " join ", "  ;  [1,2,3,4]

`join` also has a variant `map`.  This applies to all inputs and
outputs simultaneously, and effectively runs the entire function (or
entire main program) once for every item in the input, creating the
same sort of item in the output.  As an example, we can reverse every
line of the input, without needing an explicit loop:

    map "\n"  ;  y  reverse

Aside from the very common `join` and its variants, composition
commands can also be used to split apart values in various ways and
format the parts separately.  For example, `num,denom` replaces a
number given as input with a 2-element list consisting of its
numerator and denominator.  Most of these come in comma and semicolon
versions: `num;denom` replaces a number given as input with *two*
numbers, its numerator and a denominator, and must be followed with a
format for each individually.  Generally speaking, irrelevant parts of
the format are just ignored (`num,denom` will convert an integer to a
singleton list, and `num;denom` will format an integer using the
numerator format and disregard the denominator format), although there
are sometimes variants like `num;denom1` which force the denominator
to exist (in this case, with value 1).

The remaining family of composition commands is typified by, e.g.,
`surround "()"`.  These don't change the value to be formatted at all,
but do change the output that it formats to (in this case, placing
parentheses around the value that would be formatted otherwise).

Composition commands are written as though they are being used to
output a value, but they can also be used to input a value; for
example, placing `join ","` on an input would attempt to input a
comma-separated list.


### Location commands

Location commands specify where a value is read from or written to.
You write these before the other formatting commands related to the
value.  For example, `read_line byte` as an I/O specification will
read a line from standard input, and interpret it as a bytestring; as
another example, `write_stderr binary` will format the function's
output in binary and write it to the standard error stream.

Location commands don't exist within the "standard sequence" of
locations to read from or write to; they just give you extra places to
read from or write to.  If you use a location command to redirect the
output of the main program, it doesn't get implicitly written to
standard output (like it normally would be); however, using a location
command to redirect the output of a function other than the main
program will not prevent it returning its normal return value.  You
can use multiple location commands referencing outputs to output the
same output multiple times.

There are three special location commands, `return_value`,
`return_bytes`, `return_chars`, which overwrite the output of the
currently running function (causing it to return a different value).
`return_bytes` and `return_chars` format the output as a string (as
though it were about to be output), but then return a string (of bytes
or codepoints respectively) from the function rather than outputting
it.  `return_value` just outputs the value from the function directly,
as an Elly data type, without formatting it at all.

Note that in addition to being placed before the description of the
format of a value as a whole, you can place a location command at the
start of the formatting of a part of a value (which was created by
splitting a value with a composition command).  This will send just
that part of the value to a different location.  You could, for
example, write `real;complex write_stdout write_stderr` to write the
real part of a number to standard output, and the complex part to
standard error.


Miscellaneous features
----------------------

### Features that only exist in source code

Elly is primarily focused on optimising the encoded representation
(which is the representation on which the interpreter works).
However, the encoded representation is normally produced by compiling
the source code representation.  In most cases, the source code
notation is designed to make it obvious how large the encoded
representation would be, so there's a very close correspondence
between them.  However, there are some features that exist only at the
source code level, and don't appear in the encoded representation.

The most obvious of these is source-code-level comments.  These start
with a `#` and last to the end of the line.  The content of a
source-code-level comment will be entirely ignored by the compiler;
and these comments will be entirely omitted from the encoded
representation (and thus will have no effect when the program actually
runs).  Comments are allowed anywhere except inside a literal.

Another option you have is to specify builtins via specifying the raw
bytes that represent them when encoded.  This might be useful if you
want to force a specific encoding of a builtin, e.g. to obey some
source layout rule, but it mostly exists to give a method of referring
to nonexistent builtins in error messages.  The syntax involves a
period `.` followed by the bytes in big-endian hexadecimal.  If the
builtin is a literal, you need to place an additional `.` between the
part that specifies what sort of literal it is, and the part that
specifies what value the literal has (e.g. `−128` would not be
`.3F3B80`, but rather `.3F3B.80`).

The same "specify via raw bytes" mechanism can be used to specify just
the data part of a literal.  For example, you could write the builtin
`280` as `common_integer.0A`.  This syntax is used by interactive Elly
programming environments when they want to visualise how a literal is
encoded and/or how many bytes it takes up.

Another feature that's often useful is the `function_by_cases`
builtin.  At the source code level, it looks something like this:

    y  function_by_cases {
         "Monday": 0,
         "Tuesday": 1,
         "Wednesday": 2,
         "Thursday": 3,
         "Friday": 4,
         "Saturday": 5,
         "Sunday": 6,
       } 7

At the source code level, the builtin takes two arguments and a
literal dictionary.  If the argument is one of the keys in the
dictionary (which can be arbitrary Elly data; they don't need to be
strings, but could be nested lists or numbers or even a null), then
the return value will be calculated by finding the value corresponding
to the key, then indexing into the right-hand argument.  If the
argument is not one of the keys in the dictionary, the return value
will be an element of the right-hand argument that's arbitrary, but
consistent for any given input.  In order for the mechanism to work,
the number of elements in (or for numbers, the value of) the right
hand argument must be the smallest prime power that's strictly greater
than any of the values in the dictionary (in this case, 7 is the
smallest prime power that's strictly greater than 6).

The reason that `function_by_cases` is useful is that there's no need
to encode the entire literal dictionary when producing the encoded
representation.  Rather, the translation will look for a "magic
formula" that happens to produce the right results, and encode that.
In the program above, there's a 1 in 7<sup>7</sup> = 823543 chance
that any given random formula would produce the correct output (out of
7 possibilities) for each of the 7 inputs; so it's quite likely that
we'll find an appropriate formula within the first 16<sup>5</sup> =
1048576 attempts and thus be able to represent which specific formula
is in use with a five-nybble number.  (For short magic formulas, only
one nybble is required to record the length of the payload.)

Encoding and then decoding the program above would therefore produce a
program that looks something like this (although the last five
hexadecimal digits would almost certainly be different; the leading
`2` encodes the length):

    y  function_by_cases.2391DF 7

In other words, our day-of-the-week-to-index function can be encoded a
lot more tersely than it might first look; it's only give bytes long
(one for `function_by_cases`, three for the payload's length and
value, and one for the `7`).  Generally speaking, whenever you have a
function with only a small number of valid inputs and no real pattern,
using `function_by_cases` will produce terser code than you'd get
trying to produce a magic formula by hand.


### Trailing bytestrings

A trailing bytestring is a method of embedding arbitrary literal
binary data directly into the program, without escaping,
length-prefixing or delimters.  This has two main uses; one is to save
bytes when you need to embed a large amount of incompressible (or
already-compressed) data into your program (because there's no need to
spend any encoding space on specifying where the data ends); and the
other is for writing quines and similar programs where you care about
which particular characters exist within the source code or encoding,
because trailing bytestrings are never compressed.

Trailing bytestrings exist in both the source code and encoded
representations, and work the same way; they start immediately after
an unmatched `)` token (which is normally used to end a function
definition), and last until the end of the file.  (In the source code
definition, this is a literal `)` character followed by a single
space, with the trailing bytestring starting just after the space; in
the encoded representation, this is an umatched use of the octet 0xC0,
which encodes the `)` and `}` tokens.)

The value of a trailing bytestring is used to determine the values of
`X` and `Y` before they are defined (e.g. in the initial expression of
a function that takes fewer than 2 arguments), or to specify a value
that an I/O specification reads.  If neither of these constructs exist
in the program (and thus the trailing bytestring would otherwise be
unused), the value of the trailing bytestring replaces the last
implicit `y` in the program.

As the name suggests, trailing bytestrings are lists of numbers in the
range 0 to 255 inclusive, representing the raw bytes that make up the
bytestring (no attempt is made to, e.g., do UTF-8 decoding on them);
the numbers in question will have a display flag of "byte".

A good example of trailing bytestrings is writing a quine in Elly
(that is, a program which prints itself).  `repeat 2 y )` encodes to
the bytes `43 32 C0`, so the encoding `43 32 C0 43 32 C0` will, when
executed, print itself.  (We could make this program shorter still by
instead appending `append x y )` to itself, i.e. `48 C0 48 C0`.)

Likewise, if we wanted the program to print a source representation of
itself (rather than the encoded representation that actually gets
executed), we could do this:

    repeat 2 y ) repeat 2 y )

One thing to watch out for: because functions cannot consist entirely
of tacit expressions, and must encode to at least one byte, a trailing
bytestring is not allowed at the start of a program.  If you want to
write a program that prints a literal trailing bytestring, add a no-op
to it, e.g.

    restart x y ) Hello, world!

Bear in mind, though, that you will almost certainly end up with a
shorter program if you allow Elly to compress the string:

    "Hello, world!"

This encodes down to a 9 byte encoding (as opposed to 15 for the
trailing bytestring version); the only reason you might want to use
the 15-byte version is that the string "Hello, world!" will appear
literally in the encoded program, rather than being compressed.


### Unusual encodings

[...]