top of page
psychhandpoparreha

Movfuscator: How to Write C Programs with Only One Instruction



Dolan finishes the paper with a tongue-in-cheek call for action to write a mov-only compiler. Another researcher, with the confusingly similar-looking name Domas, picked up the gauntlet. Chris Domas is a cybersecurity expert at the nonprofit Battele Memorial Institute, a job which he says gives him "time to explore the fringe areas of cybersecurity." There is little so fringe as his movfuscator (pronounced like "obfuscator"), the compiler he constructed to output only mov instructions. The first version of movfuscator compiled brainfuck. While brainfuck is the classic esolang used for Turing Completeness proofs, it's impractical and quite low-level, so serves here primarily as an initial experiment in translating the behaviors covered by the assembly commands into movs: incrementing, branching, etc. Version 2 takes the leap to compiling C, a far more ambitious work; now we have function calls, data structures, and can write our all-mov programs in one of the most widely used programming languages of all time.




movfuscator – Compile Into ONLY mov Instructions



The example program Domas shares in his overview is a primality test (determining which in a set of values are prime), written in C. Compiling it in (the widely used C compiler) gcc generates thirty lines of assembly code. Movfuscator produces 150+ lines of assembly -- all mov instructions, of course. The strategy Domas uses for branching (drawn from the original paper) is can be compared to what we saw in Evan Buswell's piece. Buswell turns on and off sections of code to run by masking out jump commands. Domas can't do this with movs, so actually executes every one of the "skipped" statements, but first switches the pointers to a scratch memory space so they won't affect the "real" program data. Domas illustrates the difference the control flow graphs below. The movfuscator output is a straight line; every iteration of the program follows the same enormous loop.


Domas's project has received some notoriety, mostly due to his rapid-fire talk at Shakacon on movfuscator and its follow-up project reductio, discussed below. The video (embedded below) is worth watching for a more in-depth discussion of technical challenge of the project. Domas went so far as to create floating-point handling -- normally not usable with movs -- by writing an enormous (500,000 mov instructions!) Arithmetic Logic Unit.


The project has received enough attention that two other programmers, Julian Kirsch and Clemens Jonischkeit, responded with a demovfuscator which analyzes code compiled to all movs and attempts to recreate traditional control flow. As astonishing as movfuscator itself is, unworking the seemingly undifferentiated flow of movs into something traditional is an incredible feat.


Mark Barnes has ported movfuscator to the Harvard architecture. This means reductio can be ported as well -- which feels particularly apt, as the division between code and data is built into the Harvard architecture itself. Reductio shows how fluid these two concepts are, that any clear-cut division can be undermined, and the types of overflow exploits this architecture is resistant to actually can happen if the code is written strangely enough.


In fact, though, the compiler makes a few compromises including using a jump to call external functions and a floating point instruction. You can avoid both by recompiling libraries and adding the MOV-only floating point emulator, which is entertaining, but probably not very performant.


Wraparound of IP can work well in 16-bit mode where IP is 16-bit but CS:IP form a 20-bit linear address (real mode) or in 16-bit protected mode a 64k window somewhere in linear address space. i.e. you can have a 64k block of instructions in only part of your address space, with other space left for data. The DS segment can use a different base. (32-bit addressing modes in 16-bit mode are possible so you still have the full power of any register and scaled-index addressing mode.) Note that the mnemonic for reading and writing segment registers is also mov.


Recently in PoCGTFO 0x12 Chris Domas demonstrated a minimal Turing complete virtual machine that only implements a mov instruction where the operands for the mov instruction are taken from a data list of memory address and offsets. When executed, every program will run using the same instructions but provide different functionality. With this paper he released a tool called 'reductio', a python script that leverages his Movfuscator compiler and reduction techniques to compile C programs to an operand data list that is acted on by the VM. The proof of concept he demonstrated was on x86 but he states that it should be possible to adapt it to other architectures.


Adapting this to AVR assembler is not complicated but requires more instructions as AVR is a load/store RISC architecture. Because we are using an 8-bit AVR with 16-bit addressable memory we utilise special 16-bit registers that combine two 8-bit registers together to address memory. These register pairs are known as X, Y and Z registers. As most of our memory mapped IO registers are 8-bit, our VM only loads and stores 8-bits per instruction in comparison to the x86 implementation that moves 32-bits.


As the Movfuscator compiler and 'reductio' is for x86 and 32-bit mov instructions we have to adapt some of the methodology implemented. Movfuscator is also notorious for producing very long data segments which is fine for a PC with plenty of RAM and fast processor speeds but on an embedded device we have limited resources, for example on the AVR on which we built our PoC is limited to just 8 KB of RAM.


To reduce the amount of data used we only include lookup tables and jump tables that are required for the functionality we want and manually write the operands using Atmel's AVR assembler directives. We also implement a different branching technique which would not be possible for movfuscator to utilise.


However, we are limited to 8-bit load/store instructions with 16-bit addressable memory. This means we can only change one byte of the 16-bit virtual instruction pointer at a time limiting direct branching to destinations within the same 0x100 byte aligned segment of memory. If a branch crosses a 0x100 aligned segment we have to include a branch handling routine at the beginning of each segment. When we want to branch we store the destination and instead branch to the branch handler routine in the same segment. This routine first stores the high byte of the destination into the virtual instruction pointer, this moves execution to the next instruction of the branch routine in the correct segment. The next instruction then moves the low byte of the destination into the virtual instruction pointer finishing the branch.


It would be nice to be able to compile C to Movfuscator VM operands, one approach would be to use Movfuscator and adapt the 'reductio' script to output 8-bit moves rather than 32-bit however we will be stuck with a lot of overhead from Movfuscator. Alternatively we could write a new movfuscator compiler by going through AVR assembler instructions and building macros of each instruction.


The M/o/Vfuscator (short 'o', sounds like "mobfuscator") compiles programs into "mov" instructions, and only "mov" instructions. Arithmetic, comparisons, jumps, function calls, and everything else a program needs are all performed through mov operations; there is no self-modifying code, no transport-triggered calculation, and no other form of non-mov cheating.


So, why aren't these ultra risc processors used instead? Its a real pain to write a compiler for them and you give up a lot of other things that the processor can do. Its really nice to have a bitwise and, and an add rather than trying to do everything with incrementing registers and looping. Thats the basis of a favorite programming language titled Brainfuck which has 8 instructions.


Starting off I used demovfuscator on it (you can find it here ). It can do a couple of things. The first is it can create a graph roughly showing the code flow of the binary. The second is it can generate an elf that replaces some of the `mov` instructions with other instructions that are typically used, which makes it a bit easier to reverse.


A paper by Stephen Dolan which proved that the x86 mov instruction is Turing-complete was released in 2013 and followed by a movfuscator project which made an entire compiler out of the project. See the paper: sd601/papers/mov.pdf. Inside the paper, a Universal Turing Machine (UTM) is proposed as the proof of the Turing-completeness using only a single jump instruction.


Magic: The Gathering is a popular and famously complicated trading card game about magical combat. In this paper we show that optimal play in real-world Magic is at least as hard as the Halting Problem, solving a problem that has been open for a decade. To do this, we present a methodology for embedding an arbitrary Turing machine into a game of Magic such that the first player is guaranteed to win the game if and only if the Turing machine halts. Our result applies to how real Magic is played, can be achieved using standard-size tournament-legal decks, and does not rely on stochasticity or hidden information. Our result is also highly unusual in that all moves of both players are forced in the construction. This shows that even recognising who will win a game in which neither player has a non-trivial decision to make for the rest of the game is undecidable. We conclude with a discussion of the implications for a unified computational theory of games and remarks about the playability of such a board in a tournament setting.


The C preprocessor is only Turing-completeif executed in a loopwhich feeds the output to the input ad infinitum.An example has won theIOCCC 2001 contest.Look into the herrmann1 entry.Nonetheless included in this list for coolness.


The "move" part of moving then is achieved by the fact that the compiler knows and remembers that v1 cannot be used anymore after the move (unless it's a mutable variable, then you can move a new vector into it which will overwrite the invalidated triple that points to the same vec that's logically owned by v2). You can think of the data in v1 after the move like garbage data, as if the variable is uninitialized again. 2ff7e9595c


0 views0 comments

Recent Posts

See All

Comments


bottom of page