package BrainFlak::Symbolic; =pod =encoding utf8 =head1 NAME BrainFlak::Symbolic: symbolic compilation of Brain-Flak programs =head1 SYNOPSIS use BrainFlak::Symbolic ':all'; print dumpstate(compile("({}{})", initstate(["x"]))); # [($XH-2 elements), $Xa+$x] # [($YH elements)] # Working: $W+$Xa+$x print dumpstate(compile("({}{})", lonestate(["x", "y"]))); # [$x+$y] # [] # Working: $W+$x+$y print dumpstate(compile("((()){})", fullprogram())); # [2] # [] # Working: 2 print superopt([["a","b"]], [["b"]]); # ({}<{}>) print superopt([["a","b"]], [["b"]], 1); # ({}[({})]{}) =head1 DESCRIPTION Brain-Flak is an esoteric programming language based around stacks of numbers. C is a Perl module that attempts to determine how a given fragment of Brain-Flak code would behave "in the abstract", i.e. to produce a formula describing its behaviour in an arbitrary situation. Unfortunately, adding control flow to the language makes it impossible in general to produce descriptions; thus, this module looks at the fragment of the language without its control flow command C<{…}>. A Brain-Flak program typically has two stacks, an I and a I, which can be swapped (allowing operations to be done on the secondary stack). Brain-Flak also has a variant Mini-Flak, which deletes three commands (C<< <> >>, C<< <…> >>, and C<[]>), and is otherwise identical; due to not having the C<< <> >> command, this variant does not have a secondary stack. =head2 Commands Brain-Flak is normally considered to have eight commands: four commands consisting of an empty pair of matching bracket characters (C<()>, C<[]>, C<< <> >>, C<{}>), and four commands consisting of matching bracket characters with contents (C<(…)>, C<[…]>, C<< <…> >>, C<{…}>). This module is focused on seven of them. However, this module is designed to work with fragments of Brain-Flak code, and that includes fragments with mismatched brackets. From the view of this module, therefore, Brain-Flak has ten commands, of which two (C<{> and C<}>) are not handled (because C<{…}> is the control flow command), so eight are interesting to this module: =over 6 =item C<(> C<[> C<< < >> These commands start a I of commands; Brain-Flak adds together the I of every command in the group, and then decides what to do with those value when the group ends. As such, this module treats all three as interchangeable (although it will attempt to use the correct opening bracket when producing Brain-Flak code as output). =item C<)> Closes the current group of commands. The sum of their return values is pushed to the active stack, and also used as this command's return value. (This command is therefore Brain-Flak's main method of copying data.) =item C<]> Closes the current group of commands. The return value of this command is minus the sum of return values of the group. =item C<< > >> Closes the current group of commands. Their return values are entirely discarded, and this command itself returns 0. =item C<()> Returns 1. =item C<[]> Returns the number of elements on the active stack. =item C<{}> Pops and returns the top element of the active stack. If it's empty, returns 0 instead. =item C<< <> >> Swaps the active and secondary stack. =back =head2 The "working stack" Because this module deals with Brain-Flak program fragments, it needs to precisely define what it means for a program to have unmatched brackets. This definition is done by looking at all the running totals that are currently being formed by unclosed command groups, and treating these as a third "working stack". Return values of commands add to the top element of the working stack (because groups I the return values of the commands inside them), and thus C<(>, which opens a new group without closing any existing groups, is effectively equivalent to pushing a 0 onto the working stack. The bottom element of the working stack represents the sum of all the command return values from the program as a whole. This is often considered uninteresting (and anyway is typically not accessible), so some functions in this module use a representation of the working stack that ignores it. =head2 Lonely program fragments A program fragment is considered by this module to be I if it makes sense by itself, and doesn't try to interact with the program around it. This definition means that the program fragment in question must not try to inspect the height of a stack using C<[]>, and must name all the stack elements it interacts with (i.e. if this module is told "run this lonely program fragment on a stack ending with elements I and I", it'll complain if the fragment attempts to access any stack elements beyond the second). Lonely fragments are particularly easy to analyse, and are generally reusable, and so are highly suited for use with this module. =head2 Program states This module uses the concept of a "program state" to describe the state of a running program, possibly in an only partially defined way (e.g. "the active stack has at least two elements, and the top element is twice as large as the second element" is a valid program state despite being far from fully specified). Most functions in this module take states as inputs, produce them as outputs, or both. The behaviour of a lonely program fragment can be fully defined in terms of a pair of states (i.e. "this program takes state A to state B", where state A is sufficiently general that it could name any input to the program); this can be thought of as I the program to a relationship between states, and is the main purpose of this module. Non-lonely fragments can also be defined this way, with a caveat: such fragments can effectively do conditionals on the stack height via copying the stack height, reading a few stack elements, and comparing the new stack height to the old stack height (as a method of seeing whether C<{}> commands popped the stack or read a zero). This module does not attempt to resolve such conditionals, and thus when it tries to compile a non-lonely program fragment into a relationship between states, the relationship is only guaranteed to hold if there are sufficiently many stack elements present already to prevent the stack ever expanding. =cut use warnings; use strict; use Exporter qw/import/; use Math::Algebra::Symbols; use Carp; =head1 EXPORTS You can import the entirety of this module by using C<:all>: use BrainFlak::Symbolic ':all'; or import specific functions by listing them: use BrainFlak::Symbolic 'superopt', 'fixparens'; Nothing is imported by default. You can use the abbreviation C<:state> for the three functions that create program states (C, C, C). =cut our @EXPORT_OK = qw/compile superopt fixparens dumpstate initstate fullprogram lonestate dumpstate/; our %EXPORT_TAGS = (all => [@EXPORT_OK], state => [qw/initstate fullprogram lonestate/]); sub _pop { my $stacks = shift; my $index = shift; my $stackused = shift; my $rv = pop $stacks->[$index]->@*; if (!defined $rv) { ref $stackused or return 0; defined $stackused->[$index+5] or die "Stack underflow on stack $index"; $rv = symbols($stackused->[$index+2] . ($stackused->[$index+5]++)); } $rv; } sub _push { my $stacks = shift; my $index = shift; push $stacks->[$index]->@*, @_; } my $parened = qr/(? # empty | [^()] (?&parened) | $ (?&parened) $ (?&parened) )/x; my $atom = qr/-?\d+ | \$?[a-z]+ | $$parened$/x; # Note: associativity here is backwards, but it doesn't matter because we're # simply checking to see if this matches, not generating a parse tree, and # doing it this way round avoids an infinite loop my $mul = qr/(? $atom (?: # empty | \s* \* \s* (?&mul) | \s* \/ \s* (?&mul) ))/x; my $add = qr /(? $mul (?: # empty | \s* \+ \s* (?&add) | \s* \- \s* (?&add) ))/x; sub _symparse { my $s = shift; defined $s or return $s; # handle "unknown" in targets ref $s and return $s; # handle Symbolic objects $s =~ s/^\s*//; $s =~ s/\s*$//; $s =~ /^-?\d+$/ and return $s; $s =~ /^\$?([a-z]+)$/ and return symbols($1); $s =~ /^$parened$/ or croak "Mismatched parentheses: $s"; # The associativity here, OTOH, needs to be correct. my $a1; my $a2; $s =~ /^(?$add)\s*\+\s*(?$mul)$/ and ($a1, $a2) = ($+{a1}, $+{a2}) and return _symparse($a1) + _symparse($a2); $s =~ /^(?$add)\s*\-\s*(?$mul)$/ and ($a1, $a2) = ($+{a1}, $+{a2}) and return _symparse($a1) - _symparse($a2); $s =~ /^(?$mul)\s*\*\s*(?$atom)$/ and ($a1, $a2) = ($+{a1}, $+{a2}) and return _symparse($a1) * _symparse($a2); $s =~ /^(?$mul)\s*\/\s*(?$atom)$/ and ($a1, $a2) = ($+{a1}, $+{a2}) and return _symparse($a1) / _symparse($a2); $s =~ /^$(?$add)$$/ and return _symparse($+{a1}); croak "Cannot parse expression: $s"; } my %bpartner = (")" => "(", "]" => "[", "}" => "{", ">" => "<"); =head2 initstate my $state = initstate(["x+1",2,"x"],[]); Produces a state value, an opaque scalar that describes a state of a running Brain-Flak program. State values can be partially defined; this particular function outputs a state that's as general as possible, subject to any arguments you might give to define it. The function takes up to three arguments, specifying some number of elements at the top of the active, secondary, and working stacks respectively. When specifying the top of a stack yourself, the topmost element is at the right end of the stack (i.e. the same terminology as Perl's C and C builtins). The elements themselves can be integers, or symbolic expressions (this is notated as in algebra; strings of lowercase English letters can be used as variables, and you can write expressions using addition, subtraction, multiplication, division, and parentheses). So for example, the example above is specifying that the active stack has some value on top, followed by 2, followed by the top stack element plus 1. (You may if you wish place a C<$> sign before the name of a variable, as in Perl; this will be silently ignored.) Stack elements below the area you specify will be referred to using autogenerated names: C, C, C, etc. for the active stack (C is the element just below the bottom specified element), C, C, C, etc. for the secondary stack. If you don't give a third argument, the name C will be used to refer to the initial top of the working stack (and in the common case where you're working with a program fragment with matched brackets, the fragment as a whole will only add to this, so you'll end up with a top-of-working-stack of the form C for some C). Fragments with unmatched closing brackets can interact with deeper elements on the working stack, which will be named C, C, C, and so on. The remaining part of the state is the stack height. The initial heights of the working stack and secondary stack are named C and C respectively (so if you produced a debug dump of the state shown above, it would explain the active stack as "x+1, 2, x, and XH-3 more elements"). =cut sub initstate { $_[0] ||= []; $_[1] ||= []; $_[2] ||= [symbols("W")]; [[map {[map {_symparse($_)} @$_]} @_], [symbols('XH'), symbols('YH'), 'X', 'Y', 'W', 'a', 'a', 'a']] } =head2 lonestate my $state = lonestate(["x+1",2,"x"],[]); Produces a state that describes the initial state of a lonely program fragment. The syntax is identical to C, but instead of naming all the unspecified parts of the state, those parts are considered to be off limits; if you ask C to do something that would make them relevant (e.g. by running the program fragment C<[]>, or reading more stack elements that are specified), it will C rather than attempting to describe the resulting state. As such, states produced via this function can be used to enforce that a program fragment is supposed to be lonely, and interact only with a specific number of elements on specific stacks. =cut sub lonestate { $_[0] ||= []; $_[1] ||= []; $_[2] ||= [symbols("W")]; [[map {[map {_symparse($_)} @$_]} @_], [scalar $_[0]->@*, scalar $_[1]->@*, 'X', 'Y']]; } =head2 fullprogram my $state = fullprogram(); Produces a state that describes the initial state of a Brain-Flak program running as a whole. As opposed to the starting state of a program fragment, which could be anything, a full Brain-Flak program always starts with all the stacks empty (apart from the working stack, which contains a 0; this eventually collects the sum of return values of the program as a whole, but at the start of the program, there have been no values to sum yet, so the sum is 0). If you wanted to use this module to execute Brain-Flak programs via simulating all the non-control-flow parts of them using this module and doing the control flow yourself, this would be how you got at a suitable initial state. Unlike the other state constructors, this one produces states that track stack heights accurately (rather than assuming that there are always enough elements, like C does); in particular, stack heights as reported by C<[]> will respond correctly to attempts to run C<{}> on an empty stack. =cut sub fullprogram { [[[], [], [0]], 0]; } =head2 dumpstate print dumpstate($state); Debug output for the Brain-Flak program states used by this module. The output will contain three lines, listing the states of the active, secondary, and working stacks in a human-readable form, and is intended to be read by people rather than machines. As such, the exact output format is subject to change. =cut sub dumpstate { my $state = shift; my $s = $state->[0]; my $stackused = $state->[1]; my $rv = ""; local $" = ', '; for my $n (0, 1) { my @stack = $s->[$n]->@*; if (ref $stackused) { my $elcount = $stackused->[$n] - (scalar @stack); unshift @stack, "($elcount elements)" unless $elcount == 0; } $rv .= "[@stack]\n"; } my @stack2 = $s->[2]->@*; $rv .= "Working: @stack2\n"; } =head2 compile my $newstate = compile($program, $oldstate); Runs a Brain-Flak program fragment symbolically; with a given starting state (called C<$oldstate> above), the state that will result (C<$newstate>) is calculated. The resulting state uses the same terminology as the original state (so for example, if you initially constructed a state using C, C will consisently be used to refer to the I height of the active stack). Or to put it another way, if you repeatedly run C to step through the states of a program, all the states will consistently use the same names for the same quantities. For example: my $state = initstate(); # could be anything $state = compile("({}{})", $state); # active stack is $Xa+$Xb $state = compile("({}{})", $state); # active stack is $Xa+$Xb+$Xc We've run a program that adds the top two stack elements together twice, so the combined effect is to add together the top three stack elements. Relative to the initial state, therefore, we have an active stack whose top element is C<$Xa+$Xb+$Xc>, whose fourth element is C<$Xd>, and so on. If the state in question was produced via C (or via C with such a state as input), this function will C if any attempt is made to operate on values that were not explicitly mentioned as existing withing the initial state. =cut sub compile { my $prog = shift; my $state = shift; # Deep-clone the input state, so that we can modify it. my $s = [[$state->[0]->[0]->@*], [$state->[0]->[1]->@*], [$state->[0]->[2]->@*]]; my $stackused = ref $state->[1] ? [$state->[1]->@*] : 0; while ($prog ne '') { if ($prog =~ s/^\<\>//) { # Swap the first and second stack. $s = [@{$s}[1, 0, 2]]; if (ref $stackused) { my @newstackused = @{$stackused}[(1, 0, 3, 2, 4, 6, 5, 7)]; $#newstackused = $#$stackused; $stackused = \@newstackused; } next; } if ($prog =~ s/^//) { # Increment the third stack. Stack heights unchanged. my $x = _pop $s, 2, $stackused; _push $s, 2, $x + 1; next; } if ($prog =~ s/^\{\}//) { # Pop top of first and third stacks, add, push to third stack. # The first stack gets 1 element shorter. my $x = _pop $s, 0, $stackused; my $y = _pop $s, 2, $stackused; _push $s, 2, $x + $y; ref $stackused and $stackused->[0]--; next; } if ($prog =~ s/^\[\]//) { # Add the height of the first stack to the top element of the # third stack. Stack heights unchanged. ref $stackused and scalar @$stackused < 8 and die "[] used, but stack height is secret"; my $height = ref $stackused ? $stackused->[0] : scalar $s->[0]->@*; my $x = _pop $s, 2, $stackused; _push $s, 2, $x + $height; next; } if ($prog =~ s/^[[(<]//) { # Push 0 onto the third stack. First/second stack height unchanged. _push $s, 2, 0; next; } if ($prog =~ s/^\)//) { # Pop the third stack. Push the popped element onto the first stack. # Additionally, increase the new top element of the third stack by # the element we just moved. The first stack gets 1 element larger. my $x = _pop $s, 2, $stackused; my $y = _pop $s, 2, $stackused; _push $s, 0, $x; _push $s, 2, $x + $y; ref $stackused and $stackused->[0]++; next; } if ($prog =~ s/^\]//) { # Pop the third stack. Decrease the new top element of the third # stack by the popped value. First/second stack height unchanged. my $x = _pop $s, 2, $stackused; my $y = _pop $s, 2, $stackused; _push $s, 2, $y - $x; next; } if ($prog =~ s/^>//) { # Pop the third stack, discarding the resulting value. # First/second stack height unchanged. _pop $s, 2, $stackused; next; } if ($prog =~ s/^[{}]//) { die "BrainFlak::Symbolic::compile does not handle loops"; } $prog =~ /^./s; die "Unexpected character '$prog'"; } [$s, $stackused]; } sub _stackmatch { my $actual = shift; my $expected = shift; $#$actual == $#$expected or return; for my $i (0..$#$actual) { defined $expected->[$i] and !($expected->[$i] == $actual->[$i]) and return; } 1 } my %delta2length = ('(' => 1, ')' => -1, ']' => -1, '>' => -1, '()' => 0, '{}' => 0, '[]' => 0, '<>' => 0); =head2 fixparens print fixparens("(((]}>"); # <{[]}> The C function takes a Brain-Flak program fragment as input and produces an equivalent fragment as output. The only change in the output will be as to which characters are used to represent the start of a group of commands; C<(>, C<[>, C<{>, or C<< < >> will be used if the group is subsequently closed using C<)>, C<]>, C<}>, or C<< > >> respectively. In other words, it helps document the code via ensuring that the command that opens a group reflects how it's subsequently closed. (Unlike the rest of this module, C<{…}> is also understood as part of the bracket matching.) Although this module does not care whether you use C<(>, C<[>, or C<{> to open a group of commands, other tools that work with Brain-Flak might, and thus you may find this command useful for normalizing Brain-Flak programs to work with them. (The main reason this function exists is for internal use, to ensure that brackets will be correctly matched in the output it generates. However, it was made public as it's a well-defined operation that might potentially be useful.) =cut sub fixparens { my $prog = shift; my @stack; my $rv = ""; while ($prog ne '') { $prog =~ s/(.)$//; my $char = $1; if ($char =~ /[[({<]/) { my $closing = pop @stack; if (defined $closing) { $rv = $bpartner{$closing} . $rv; } else { $rv = $char . $rv; } } else { push @stack, $char; $rv = $char . $rv; } } $rv; } =head2 superopt print superopt([["a","b"]], [["a+b"]]); # "({}{})" A brute-force attempt to find a lonely program fragment that maps a specific state to a specific other state. You can think of this as the inverse operation of C. (This is a "superoptimiser" for program length: it produces the shortest possible program with the given functionality. Note that due to its brute-force nature, it is often slow, and it nearly always uses vast amounts of RAM; this is despite the fact that several tricks are used to try to analyse more promising approaches first, whilst ensuring that the resulting program is the shortest possible.) The first two arguments are array references, giving the initial and final stacks respectively. (You can specify up to three stacks, in the usual active, secondary, working order; the stacks themselves are also array references.) The second argument also permits C stack elements, to specify that you don't care what value ends up in that particular stack slot. If you fail to specify what should happen on a particular stack, it's treated as if you'd specified an empty array (apart from an unknown sum-so-far at the bottom of the working stack); that means that any pre-existing elements on that stack can't be touched if you give it as the input, and the stack will be cleaned of any elements specified by the input if you give it as the output. The produced program fragment will swap the active and secondary stacks an even number of times (usually zero), i.e. it will have no net effect on which stack is active. In order to avoid spending time tracking program fragments which differ only in their return value, the "bottom visible element" of the working stack (i.e. the "sum of return values in this group so far", including of the program fragment being generated) is assumed to be irrelevant (and should not be specified in the output state; otherwise, the superoptimiser will attempt to leave the given value on the working stack I an arbitrary return value). The return value is a Brain-Flak lonely program fragment. C will be automatically called on this, ensuring that, when the desired effect on the working stack makes this possible, the resulting fragment will use the correct sort of opening bracket when starting groups. If you give a third argument, and that argument is boolean-true, the superoptimiser will use Mini-Flak rather than Brain-Flak, ignoring the C<< <> >> and C<< <…> >> instructions. You can use a fourth argument to monitor the progress of the superoptimiser; for every program it inspects, it'll call the fourth argument (if defined) as a function, giving it the inspected program and the resulting state as arguments. =cut sub superopt { my $initial = shift; my $target = shift; # note: omits bottom element of working stack my $mini = shift; my $progress = shift; $target->[1] ||= []; $target->[2] ||= []; $target = [map {[map {_symparse($_)} @$_]} @$target]; # Find out which variable names are in the target. If those disappear # entirely from the state, we're definitely in a dead end. my $targetkey = join ' ', map $_//"?", $target->[0]->@*, '|', $target->[1]->@*, '|', $target->[2]->@*; my $targetfilter = $targetkey; my $target2length = scalar $target->[2]->@*; $targetfilter =~ s/[^a-z]//g; my %targetchars; $targetchars{$_} = undef for split '', $targetfilter; my @symbols; if ($mini) { @symbols = qw/( ) ] () {}/; } else { @symbols = qw/( ) ] > () {} <>/; } my $initstate = lonestate(@$initial); # Deduplicate empty arrays to save memory. my $emptyref = []; # @totry holds triples binned by new program length. # Each triple is (new program, added command, state before # that command was added). my @totry = ([], [["(", "(", $initstate]], [["()", "()", $initstate], ["{}", "{}", $initstate], ["<>", "<>", $initstate]]); $mini and pop $totry[2]->@*; my $length = 1; my $maxlength = 2; my %seen; OUTER: while ($length < $maxlength) { # Find the next potential program of this length to try. my $triple = shift $totry[$length]->@*; while (!defined $triple) { $triple = shift $totry[++$length]->@*; } my ($newprog, $added, $oldstate) = @$triple; # Attempt to run that program. my $newstate; eval { $newstate = compile($added, $oldstate); 1 } or next; $mini || !scalar $newstate->[0][1]->@* and $newstate->[0][1] = $emptyref; my @tail = $newstate->[0][2]->@*; scalar @tail or next; shift @tail; # Have we seen it before? my $key = join ' ', $newstate->[0][0]->@*, ($added eq '(' ? "!" : "|"), $newstate->[0][1]->@*, $newstate->[1][2], @tail; exists $seen{$key} and next; $seen{$key} = undef; # Do we still have all the variables we should? # This slows down the program a bit, but saves on memory. $key=~/$_/ or next OUTER for keys %targetchars; $progress and $progress->($newprog, $newstate); # Is it a valid solution? if ($newstate->[1][2] eq 'X' && _stackmatch($newstate->[0][0], $target->[0]) && _stackmatch($newstate->[0][1], $target->[1]) && _stackmatch(@tail, $target->[2])) { return fixparens $newprog; } # Where can we go next? for my $next (@symbols) { # Don't allow length-1 symbols after (, except another (. # They'd be parsed incorrectly. $added eq '(' and length $next == 1 and $next ne '(' and next; my $pn = "$newprog$next"; my $l = length $pn; # A*-ish optimisation: each character in the program changes the # height of the third stack by at most 1, so predict the resulting # stack height and penalise length by 1 character per difference # in the resulting values. # # We can't do this optimisation on more than one stack # because it's possible for a single command to change two # stacks at once. my $cur2length = scalar $newstate->[0][2]->@*; my $next2length = $cur2length + $delta2length{$next}; $l += abs($next2length - $target2length); push $totry[$l]->@*, [$pn, $next, $newstate]; $l > $maxlength and $maxlength = $l; } } return; } 1; =head1 AUTHOR ais523, Eais523@nethack4.orgE =head1 COPYRIGHT AND LICENSE Copyright © 2018 by Alex Smith This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.26.1 or, at your option, any later version of Perl 5 you may have available. =cut