================================================ ==Permutation Partial-Compiler And Interpreter== This is the design of an interpreter and EXPONENTIALLY-FAST partial-compiler of a subset of Java. If you start with a string of code, made of a sequence of known leaf codes (allowing new leaf codes to be added at runtime if they exist in new code), each with vars renamed to overlap vars in other leaf codes, and you choose random substrings of that code to copy, move, or delete, or rename vars in, then that code can theoretically get ready and run in less time than it takes to concat the new sequence of code strings. *** The time to get ready approaches c*log(c) where c is the number of changes, not the code size. *** Optionally you can avoid creating the new code string by using CodeTree objects directly instead of the strings of code they represent. Only substrings of code that occur or are used most often will be compiled. This is statistical partial-compiling at code string granularity, not class or function granularity. An example of a code string is "f5 = Math.sin(500*f3-f40); f5 = -f5; f3 = f5;", and it could share compiled optimizations with "f5 = Math.sin(500*f3-f40); f3 = f5;". Usually will only compile normed code (CodeTree (extends Func) or Func that is not a CodeTree). Ignoring ob vars (which Audivolv does not use yet but are in the Func interface), normed means the flo vars (f50000, f23, f30, etc) are renamed to first occur in the code in order f0, f1, f2... The order "f0 f0 f1 f0 f2 f1 f0" is normed because the first f0 is before the first f1 is before the first f2. For the same programming-language and string of code, all renamings of vars (allowing name to control location in stack frame) and stack frame sizes are grouped together, for the purpose of sharing compiled optimizations and avoiding compiling very similar code multiple times. Each normed CodeTree will have most of the info for efficiently renaming and reordering vars and creating variations and combinations of code. Each normed CodeTree should be (may be hard to find all of them) grouped with all non-normed CodeTree that are renamings of its vars. Example: Non-normed CodeTree "f3 = f5000*.3 + .7*f3*f2;" should be a wrapper for normed CodeTree "f0 = f1*.3 + .7*f0*f2;". When "f3 = f5000*.3 + .7*f3*f2;" is run, it should copy f3, f5000 (flos[f+5000]), and f2 higher in the stack, call "f0 = f1*.3 + .7*f0*f2;" on that higher part of the stack (recursion), then copy flosStackFrameSize, flosStackFrameSize+1 (flos[f+flos()+1]), and flosStackFrameSize+2 to the original locations: f3, f5000 (flos[f+5000]), and f2. flos() and obs() are the flo and ob stack frame sizes and can be bigger than needed in a child to allow a parent Func to call the child and keep the same stack frame size in the whole tree. If the child's stack frame was smaller than the parent's, and the child recursed, then the child could overwrite the parent's data. Its ok, but confusing, for child stack frame sizes to be bigger than parents. Its best to keep them equal. Each Func has functions to compile it immediately (any amount 0 to 1) and a function to tell how much compiled it is, but they are not necessary. The most common way to become compiled is to wait for Audivolv to see what parts of evolving code occur most often and compile those, and later use that to optimize existing CodeTree. Each CodeTree can have a Func that it wraps that does the same action but faster because its more compiled. All non-normed CodeTree should be a certain type optimized for wrapping a normed Func and reordering its vars on the flos[] stack each time the CodeTree is run. Each Func (or just CodeTree?) has a function to replace the wrapped Func with a different Func thats a faster combination of existing compiled Func. Example: In a linear sequence of code "abcdefghi", the following parts could be compiled: "ab", "cdef", "g", and "hi", and the sequence of those 4 Func could be optimized by replacing it with "abc", "defg", and "hi", if those 3 Func were also compiled. Each leaf code string (written by a person, maybe in the hypercube dir) is always compiled, but combinations of them may be compiled or instead run in interpreted-mode or partial-compiled (a combination). Either way, it has the exact same behaviors (ignoring speed) as "abcdefghi". When I write "defg", I do not mean e has the same vars as e in "abcdefghi". I mean the normed form of "defg" can move vars on the flos[] stack every time it runs to have exactly the same behavior as if it used the same vars. Its a probably-non-normed Func "defg" with the same vars that wraps the compiled normed version of "defg". If "defg" is already normed, and thats the one you need, which is rare, use it directly and gain speed. A normed "defg" does not have the same behaviors as the sequence of a normed "d", normed "e", normed "f", and normed "g" because they probably share vars. In a normed "defg", "d" is always normed, but if "e" shares any vars with "d" then the order of vars in "e" is restricted by the order they occur and were renamed to be normed in "d". Example: If "d" is "f1 = f9*f0;" and "e" is "f5 *= f1*f9;", then normed "de" is "f0 = f1*f2; f3 *= f0*f1;". In the normed "de", "e" is "f3 *= f0*f1;" which is not normed because f3 occurs before f0 and f1. It increases speed to compile the normed version of "de" and use it in code that contains "de", but that compiled version can only optimize Funcs that rename all vars in "de" together. Example: "f0 = f1*f2; f3 *= f0*f1;" --> "f50 = f21*f32; f3 *= f50*f21;". Both of the resulting non-normed child Funcs should already be grouped with their normed versions. "f50 = f21*f32;" should be grouped with "f0 = f1*f2;". "f3 *= f50*f21;" should be grouped with "f0 *= f1*f2;". The wrapping of a normed Func in a non-normed Func should be done efficiently in cpu time and memory so wrappers can be created extremely fast the same way as programmers are not afraid to use the following Java code: String s = "a long string................................."; while(s.length() > 0 && s.charAt(0) != 'x') s = s.substring(1); //remove first char That creates a new String object thats 1 char smaller each iteration of the loop. If the String class was designed differently, that would take time proportional to s.length() squared, but instead it takes time proportional to s.length(). The Big-O is less. Similarly, anyFunc.rename(vars to any other var names) should return in constant-time and memory and when the new Func is run, it should be only a small constant slower (to copy the vars on the stack). Also, Func.equals(Object) and Func.hashCode() must have the same small Big-O of memory and cpu time. COMPILE-OPTIMIZATION-TREE: The following data-structure needs that small Big-O of renaming vars to get new Funcs: Its a tree where every path leads to a compiled Func that has the behaviors of that path. Every edge to a child node represents a leaf Func thats probably not normed but could be. Its exact var names are important. The first node in every path is always normed, but as explained above, a normed sequence of leafs only requires the first leaf be normed and the others to follow its var order. This tree contains only normed paths. To optimize any CodeTree that is a linear sequence of leaf Funcs as its childs, start from each child and view from that child to the end (or a maximum length 10 leafs for example) as a normed CodeTree, and find all paths in the COMPILE-OPTIMIZATION-TREE (will find a max of 10 paths, if subsequence length is 10) and remember which subsequences are compiled. If The CodeTree to optimize has 50 leaf childs (and nothing else), and we are using max 10 subsequence length, then the maximum number of compiled subsequences that could be found equals the number of substrings at most length 10 (and at least 1) in a string length 50. There could be many possible ways to optimize it, but only 1 can be used. Each possible optimization is some combination of those subsequences that covers all 50 leafs exactly once. Example from above{ In a linear sequence of code "abcdefghi", the following parts could be compiled: "ab", "cdef", "g", and "hi", and the sequence of those 4 Func could be optimized by replacing it with "abc", "defg", and "hi", } STATISTICAL-COMPILE-CHOOSING-TREE: Using the same type of data-structure as COMPILE-OPTIMIZATION-TREE uses, and keeping a count in each node of the tree, we can easily learn which subsequences are best to compile. There are many CodeTree in the evolving population, and the subsequences that occur in more of them should be compiled. Theres many possible ways to weight which is best, but this way is simple and will work: In a separate thread (maybe lower priority) than the one playing sound, put all subsequences of all CodeTree (in the evolving population) into this STATISTICAL-COMPILE-CHOOSING-TREE and count how often each occurs. Then sort all the paths in the tree by this function: howManyTimesSequenceOccurs * lengthOfSequence. Start at the high end of those sorted paths and compile maybe 20% of them. As they are compiled, add them to the COMPILE-OPTIMIZATION-TREE. Then call the optimization function on all CodeTree currently being used (maybe on the whole population but thats probably too slow), but certainly call it on the currently playing audio CodeTree. Calling the optimization function never changes any behaviors except it takes less cpu time, and does not make the sound skip or affect the Thread playing sound. Func (and therefore CodeTree), if stateless()==true, are defined as IMMUTABLE for a good reason. Of all possible combinations of ["ab","cdef","g","hi"] vs ["abc","defg","hi"] etc, choosing the one with the highest efficiency (usually the shortest) is a combinatorial optimization problem, and much research has been done on that, and I will use that research later, but for now, a greedy randomized algorithm will work almost as well. For now, I will simply try maybe 100 combinations, and choose the best of those. Each time greedy-randomized algorithm runs, it generates a new subsequence resulting in a provably equal behavior code. The only difference is the speed. There are no traps or dead-ends in the COMPILE-OPTIMIZATION-TREE. You can start from any available sequence ("cdef" or "defg" can not be used together) and always have a way to complete the whole sequence to be optimized. If you start with "defg", then "a", "b", and "c" will have to be added in some way preceding it, and there is always a way to do that. It may be to use "a", "b", and "c" as leafs. All leafs (in normed form) are always compiled when they are first loaded into Audivolv. The "greedy randomized algorithm" I will use for now (before using "combinatorial optimization" research) is to weight toward choosing longer subsequences (like "defg" and "cdef" instead of "hi" or "g") first, but with a lower chance allow any sequence to be tried. After that is tried maybe 100 times, choose the best one, create a Func (CodeTree?) with those childs, and use that Func (CodeTree?) as the wrapped object in all CodeTree that are variations with vars renamed. This is the design of an interpreter and EXPONENTIALLY-FAST partial-compiler of a subset of Java. The subset of Java is all possible combinations of leaf code strings with all possible renamings of their vars, and all leaf code must be stateless code (only uses flos[] and obs[] stack), and there is no limit on the number of leaf code strings you can add. Optionally, leaf code strings could be added when code is found that is not a combination of existing leafs and renamed vars. That could make it practical to have an eval(codeString) function for that subset of Java, and usually around 90% of the text in codeString would already be compiled and used instead of running those parts in interpreted-mode. Instead of eval(codeString), my main reason for creating it is to share compiled optimizations between an exponential number of variations of Java code that Audivolv evolves to play music. --Ben F Rayfield, creator of Audivolv and this new "Permutation Partial-Compiler And Interpreter". ================================================ ===Some of this is old design. Above is newer=== Func has flos() and obs() functions that each return int, but regardless of the var's index in List vars(".*"), "f50000" needs a stack frame at least size 50001 in the flo part of the stack. CodeTree extends Func and has the same problem. OLD... THIS IS BAD DESIGN AND WILL NOT BE USED{ Divide Funcs into 3 categories: (1) Normed. Vars are f0, f1, f2... o0, o1... (2) Permutated. Same vars as normed but reordered. (3) Externstack. Vars are any subset and permutation of f0, f1, f2... o0, o1... but flos() and/or obs() do not describe the range of flos[] and obs[] used. } Instead, flos() and obs() always contain the whole range that can be used, and the only exception is if obs[index in range] is an array that contains arrays that contain arrays... that lead to an array lower in the obs[] stack, and (today 5/2009) I am still thinking about that. OLD{ In interpreted-mode, a CodeTree containing Externstack (and possibly the other types) can simply run its childs in order without copying parameters before and after calling them. } Also 5/2009 I am still defining interpreted-mode (of running CodeTree recursively). No code uses the obs[] stack yet, but the next few versions will. Considering the flos[] stack only, running a child CodeTree can be done in 2 ways, depending on if an optimized version of the child is normed or not. Normed flo vars means it has vars f0, f1, f2... in that order. (1) If normedOptimizedChild, copy flos higher in flos[] stack, call the child on smaller optimized range (size child.flos()) starting at f+thisParent.flos(), and copy the modified flos back to their original location in thisParent's flos[] range. Example: Code "f20 = f15;" in the child is "f0 = f1;" which must be recursed starting at flos[f+thisParent.flos()] then copied back to flos[f+20] and flos[f+15]. (2) If not normedOptimizedChild, run the child in thisParent's current flos[] range because child already uses the correct indexs in flos[] and does not need translation. Example: Code "f20 = f15;" and the child does exactly that. (1) and (2) are both interpreted-mode. The fully compiled version of thisParent would take all Java code strings of itself and childs recursively, as a single String, and compile it to a single Java-class, and directly call it in thisParent's run function with the exact same parameters. A slightly less compiled (more interpreted or partial-compiled) version of that is to use the same Java-class in thisParent and all permuatations of renaming its vars by recursing only once, the same as (1) above. --Ben F Rayfield, creator of Audivolv and this new "Permutation Partial-Compiler And Interpreter".