Enticed by the strong recommendation from some of you, I bought the iPad version of TIS-100p last Sunday and… I can’t put it down!
It is very challenging, but so much fun. I find it very satisfying to solve each puzzle and then go back and try to optimize it for cycle, node, or instruction counts.
The best part is that – to me – it’s very close to home: it is similar to the micro-optimization work I normally have to do in my own games framework and programs (in Assembly, of course) – and just like those, it’s all for fun so it never feels like work!
Moreover, it gives me that same WarGames feeling of finding something you are not supposed to have and trying to figure it out by sheer cleverness and experimentation.
No kidding, it took me about 30 minutes of staring at the stupid interface, pushing buttons around and reading and re-reading the manual over and over just to figure out how to start the darn thing. I was even tempted to go online to search for hints on what to do (but I didn’t). Eventually I stumbled upon the solution to the first puzzle and from then on… boy, I couldn’t let go!
I would be interested in seeing other people’s solutions to the puzzles and sharing mine to whomever is interested. I can’t really tell how good my solutions rank, since I have nothing to compare against. I’m not that clever in any case, so I could stand to learn techniques from others.
To start the ball rolling, below is my solution to the Signal Edge Detector problem, which I just completed this morning. On my first iteration, with pure brute force, I managed a “respectable” (pshaw!) 343 cycles. By the third try, I got it down to 216 and reduced the node count. Whether that’s pretty good or lame, I can’t tell. It’s pretty neat, though!
I’m playing the TIS-100 version on Steam and I can get rough statistics about other players’ achievements. I always optimize for cycles, regardless of the number of nodes or instructions. This unfortunately means that I loose “elegance”, which is something that I would like to preserve, but in TIS-100 you often have to sacrifice it if you want a faster code.
These are the statistics about my Signal Edge Detector solution, the little arrow shows roughly where my result stays among the results of other players.
At first I tried something similar to your solution, using the BAK register, but I wasn’t very happy with the results because I wanted to get rid of SWP/SAV instructions and more importantly I wasn’t using parallelization, which is an extremely important technique in TIS-100 to get faster code. So I used one core to divide the work between two other cores.
The sequence of inputs ABCDE is divided in a way that a first core computes the difference between AB, the second code the difference between BC, the first core the difference between CD and so on… in this way they don’t need to use the BAK register because they directly get the numbers to compare.
The main advantage of this design is that it can be parallelized even further, if you want. I could divide the work between three cores, for example. But I stopped at two because I got an acceptable result and because I “understood the logic” of how to optimize it even more. In these cases to me the puzzle is “solved” and I prefer to jump to the next task. But I’ll probably run a new cycle of optimizations after I will complete all the tasks.
It’s a very good result for this kind of approach. You should be able to easily go under 200 cycles parallelizing it even further.
My design is very different and it uses a lot more instructions per node: it’s a decision that will pay off only when I will copy the code to several other nodes.
There are also some solutions that make sense in my case but probably not in your current design. For example, I saved quite an amount of cycles getting rid of the “neg”. I’m not sure that it would be beneficial in your case, though.
To explain why I find parallelization a boring activity, I just copied my main code to more cores and I easily got 133 cycles, even if I didn’t use all the available cores.
I find it interesting to design an algorithm that has to take advantage of parallelization, but after I have found what it seems to me a good theoretical solution, the goal of reducing the cycles becomes trivial, because all you have to do is to copy-and-paste the same code to several cores and manage the flow of the input data.