Build interpreter for non-existent language
Ruby, 67 lines of regex substitutions
I decided to write the interpreter in regex, while sticking to efficient algorithms.
I could have gone for plain bytes, but using symbols makes the code more readable in my opinion. Of course, if we could pack two instructions in one byte...
Concatenation of negative values results in ten's complement behavior, reflecting the internal representation.
Division is integer division and the remainder is never negative.
subs = [
# stack expansion
[/^ ?([$iv*\/+\-^%dtsz.])/, ' 0 \1' ],
[/^ (\d+ [$*\/+\-^%tsz])/, ' 0 \1' ],
[/^ ((\d+ ){2,3}z)/, ' 0 \1' ],
[/ (0|9)\1+/, ' \1' ],
# concatenation
[/ (\d+) (?:0+|9+)(\d+) \$/, ' \1\2 ' ],
[/ (\d+) (0|9) \$/, ' \1\2 ' ],
# swaps
[/ ((?:\d+ )*)(\d+) </, ' \2 \1' ],
[/ (\d+)((?: \d+)*) >/, '\2 \1 ' ],
[/ (\d+) (\d+) s/, ' \2 \1 '],
[/ (\d+ \d+) (\d+ \d+) z/, ' \2 \1 '],
# dups
[/ (\d+) d/, ' \1 \1 '],
[/ (\d+ \d+) t/, ' \1 \1 '],
# pop
[/ (\d+) \./, ' ' ],
# increment / decrement
[/ (\d+) i/, ' \1I '], [/ (\d+) v/, ' \1V '],
*(%w[0I 1I 2I 3I 4I 5I 6I 7I 8I 9I].zip [*?1..?9, 'I0']),
*(%w[0V 1V 2V 3V 4V 5V 6V 7V 8V 9V].zip ['V9', *?0..?8]),
[' 1', ' 01'], [' 8', ' 98'], [' I', ' '], [' V', ' '],
# addition, subtraction
[/ (\d+) (\d+) \+/, ' \1P \2P ' ], #init addition
[/ (\d+) (\d+) \-/, ' \1S \2S ' ], #init subtraction
[/ ([PS](\d)\w*) (\d+[PS]\w*) /, ' \2\1 \3 ' ], #sign extend left
[/ (\d+[PS]\w*) ([PS](\d)\w*) /, ' \1 \3\2 ' ], #sign extend right
[/ (\d*)(\d)P(\S*) (\d*)0P(0*) /, ' \1P\2\3 \4P0\5 '], #advance addition
[/ (\d*)(\d)S(\S*) (\d*)0S(0*) /, ' \1S\2\3 \4S0\5 '], #advance subtraction
[/ (\d+)P(\S*) (\d*[1-5])P(0*) /, ' \1IP\2 \3VP\4 ' ], #transfer left
[/ (\d+)P(\S*) (\d*[6-9])P(0*) /, ' \1VP\2 \3IP\4 ' ], #transfer right
[/ (\d+)S(\S*) (\d*[1-5])S(0*) /, ' \1VS\2 \3VS\4 ' ], #decrement both
[/ (\d+)S(\S*) (\d*[6-9])S(0*) /, ' \1IS\2 \3IS\4 ' ], #increment both
[/ [PS](\S+) [PS]0+ /, ' \1 ' ], #finish
# digitwise negation
*(%w[9N 8N 7N 6N 5N 4N 3N 2N 1N 0N].zip [*'N0'..'N9']),
#multiplication and division by 2
*([*'H0'..'H9'].zip %w[0H 0F 1H 1F 2H 2F 3H 3F 4H 4F]),
*([*'F0'..'F9'].zip %w[5H 5F 6H 6F 7H 7F 8H 8F 9H 9F]),
*(%w[0T 1T 2T 3T 4T 5T 6T 7T 8T 9T].zip %w[T0 T2 T4 T6 T8 TI0 TI2 TI4 TI6 TI8]),
['H ', ' '], [' T', ' '],
# sign correction for */%
[/ (\d+) (9\d*) ([*\/%])/, ' \1NI \2NI \3'], [' N', ' '],
# multiplication
[/ (0+ \d+|\d+ 0+) \*/, ' 0 ' ], #multiplication by zero
[/ (\d+) (0\d*[02468]) \*/, ' \1T H\2 *' ], #multiplication by an even number
[/ (\d+) (0\d*[13579]) \*/, ' \1 \1 \2V *+'], #multiplication by an odd number
# division / modulo
[?/, 'r.'], [?%, 'rs.'],
[/ (0|9)(\d*) (0\d+) r/, ' \3 0 \1D\2 ' ], #init division
[/ (\d+) (\d+) (0\d*)D(\d*) /, ' \1 \2I \3SD\4 \1S ' ], #subtract divisor
[/ (\d+) (\d+) (9\d*)D(\d)(\d*) /, ' \1 \2V0 \3P\4D\5 \1P '], #add divisor and advance
[/ (\d+) (\d+) (9\d*)D /, ' \2V \3P \1P ' ], #add divisor and finish
#exponentiation
[/ \d+ 0+ \^/, ' 01 ' ], # case: zeroth power
[/ 9\d+ 9+ \^/, ' 9 ' ], # case: reciprocal of negative
[/ \d+ 9\d+ \^/, ' 0 ' ], # case: high negative power
[/ 0\d+ 9\d+ \^/, ' 0 ' ], # case: reciprocal of positive
[/ (\d+) 0+1 \^/, ' \1 ' ], # case: power of one
[/ (\d+) (\d*[02468]) \^/, ' \1 \1 *H\2 ^' ], # case: even exponent
[/ (\d+) (\d*[13579]) \^/, ' \1 \2V ^\1 *' ], # case: odd exponent
]
x = gets.tr '^$iv*/+\-^%><dtsz.', ''
until x =~ /^ (\d+ )*$/
subs.each do |sub|
x.sub!(*sub) # && (puts x; sleep 0.1)
end
end
As for the bonus round, the shortest solution I've come up with (13 characters) is a clean solution:
iistisii$<$<$
x86 assembly (on Win32)
“SPEED!” seems to be hugely important here, and we all know nothing beats assembly language in that regard. So, let’s do that in assembly!
This is an implementation of the language in x86 assembly language (in NASM syntax), with the numbers stored and interpreted as unsigned 32-bit integers, using the native x86 stack directly. Stack underflow, and overflow during any arithmetic operation (or division by zero) is a runtime error, terminating the program with an error message.
global _start
extern _GetCommandLineA@0
extern _GetStdHandle@4
extern _CreateFileA@28
extern _GetFileSize@8
extern _LocalAlloc@8
extern _ReadFile@20
extern _CloseHandle@4
extern _WriteFile@20
section .text
; ---------------------------------------------------------------------------------------
; Initialization
; ---------------------------------------------------------------------------------------
_start:
; Retrieve command line
CALL _GetCommandLineA@0
; Skip argv[0]
MOV ESI, EAX
XOR EAX, EAX
skipuntilspace:
MOV AL, [ESI]
INC ESI
TEST EAX, EAX
JE missingparam
CMP EAX, ' '
JNE skipuntilspace
INC ESI
; Open the file
PUSH 0
PUSH 80h
PUSH 3
PUSH 0
PUSH 1
PUSH 80000000h
PUSH ESI
CALL _CreateFileA@28
CMP EAX, -1
JE cannotopenfile
; Get its size
PUSH EAX
PUSH 0
PUSH EAX
CALL _GetFileSize@8
PUSH EAX
; Allocate memory buffer
PUSH EAX
PUSH 0
CALL _LocalAlloc@8
TEST EAX, EAX
MOV ESI, EAX
JZ outofmemory
POP ECX
POP EAX
PUSH EAX
; Store end-of-program pointer
MOV [programend], ESI
ADD [programend], ECX
; Read the file contents
PUSH 0
PUSH buff
PUSH ECX
PUSH ESI
PUSH EAX
CALL _ReadFile@20
TEST EAX, EAX
JZ cannotopenfile
; Close the file
CALL _CloseHandle@4
; ---------------------------------------------------------------------------------------
; Main loop of the interpreter
; ---------------------------------------------------------------------------------------
; Store the end of stack into EBP
MOV EBP, ESP
; Push an initial 0 onto the stack
XOR EAX, EAX
PUSH EAX
mainloop:
; Load the next opcode, if not end of program
XOR EAX, EAX
CMP ESI, [programend]
MOV AL, [ESI]
JAE endloop
LEA ESI, [ESI+1]
; Check if the opcode is valid
CMP EAX, (maxop - opcodetable) / 8
JA fault_invalidopcode
; Check for required stack space
MOV ECX, [opcodetable + 8 * EAX + 4]
LEA EDI, [ESP + ECX]
CMP EDI, EBP
JA fault_stackunderflow
; Jump to the respective opcode handler
MOV EAX, [opcodetable + 8 * EAX]
JMP EAX
; ---------------------------------------------------------------------------------------
; Implementation of the specific operations
; ---------------------------------------------------------------------------------------
; ************** CAT 0000 (0): Concatenate (Combine top two numbers in a stack as if they were a string. ex: 12,5 -> 125)
op_concatenate:
POP EBX
POP EAX
MOV ECX, EAX
MOV EDI, 10
concat_loop:
XOR EDX, EDX
SHL EBX, 1
DIV EDI
LEA EBX, [4 * EBX + EBX]
TEST EAX, EAX
JNZ concat_loop
ADD EBX, ECX
PUSH EBX
JMP mainloop
; ************** INC 0001 (1): Increment (Add 1 to the number on the top of the stack)
op_increment:
POP EAX
ADD EAX, 1
PUSH EAX
JNC mainloop
JMP fault_intoverflow
; ************** DEC 0010 (2): Decrement (Subtract one from the number at the top of the stack)
op_decrement:
POP EAX
SUB EAX, 1
PUSH EAX
JNC mainloop
JMP fault_intoverflow
; ************** MUL 0011 (3): Multiply (Multiply the top two numbers in the stack)
op_multiply:
POP EAX
POP EDX
MUL EDX
TEST EDX, EDX
PUSH EAX
JZ mainloop
JMP fault_intoverflow
; ************** DIV 0100 (4): Divide (Divide the 2nd-to-top number by the top number on the stack)
op_divide:
POP ECX
TEST ECX, ECX
POP EAX
JZ fault_dividebyzero
XOR EDX, EDX
DIV ECX
PUSH EAX
JMP mainloop
; ************** MOD 0101 (5): Add (Add the top two numbers on the stack)
op_add:
POP EAX
ADD [ESP], EAX
JNC mainloop
JMP fault_intoverflow
; ************** SUB 0110 (6): Subtract (Subtract the top number on the stack from the one below it)
op_subtract:
POP EAX
SUB [ESP], EAX
JNC mainloop
JMP fault_intoverflow
; ************** EXP 0111 (7): Exponent (Calculate the second-to-top number to the power of the top number)
op_exponent:
POP ECX
POP EBX
MOV EAX, 1
exploop:
TEST ECX, 1
JZ expnomult
MUL EBX
TEST EDX, EDX
JNZ fault_intoverflow
expnomult:
SHR ECX, 1
JZ expdone
XCHG EAX, EBX
MUL EAX
TEST EDX, EDX
XCHG EAX, EBX
JZ exploop
JMP fault_intoverflow
expdone:
PUSH EAX
JMP mainloop
; ************** MOD 1000 (8): Modulus: (Find the second-to-top number modulo the top one)
op_modulus:
POP ECX
TEST ECX, ECX
POP EAX
JZ fault_dividebyzero
XOR EDX, EDX
IDIV ECX
PUSH EDX
JMP mainloop
; ************** ROR 1001 (9): Rotate Right (Shift the stack down one. The number on the bottom is now on the top)
op_rotright:
MOV EAX, [EBP - 4]
LEA ECX, [EBP - 4]
SUB ECX, ESP
MOV EDX, ESI
SHR ECX, 2
LEA EDI, [EBP - 4]
LEA ESI, [EBP - 8]
STD
REP MOVSD
MOV [ESP], EAX
CLD
MOV ESI, EDX
JMP mainloop
; ************** ROL 1010 (A): Rotate Left (Shift the stack up one. The number on the top is now on the bottom)
op_rotleft:
MOV EAX, [ESP]
LEA ECX, [EBP - 4]
SUB ECX, ESP
MOV EDX, ESI
SHR ECX, 2
LEA ESI, [ESP + 4]
MOV EDI, ESP
REP MOVSD
MOV [EBP - 4], EAX
MOV ESI, EDX
JMP mainloop
; ************** DUP 1011 (B): Duplicate (Copy the top number so that it appears twice. ex: 4,1 becomes 4,1,1)
op_duplicate:
PUSH DWORD [ESP]
JMP mainloop
; ************** DU2 1100 (C): Double Duplicate (Copy the top two numbers on the stack. ex: 4,1,2 becomes 4,1,2,1,2)
op_dblduplicate:
PUSH DWORD [ESP+4]
PUSH DWORD [ESP+4]
JMP mainloop
; ************** SWP 1101 (D): Swap (Swap the top two numbers on the stack. ex: 4,1,2 becomes 4,2,1)
op_swap:
POP EAX
POP EDX
PUSH EAX
PUSH EDX
JMP mainloop
; ************** SW2 1110 (E): Double Swap (Swap the top two numbers with two below them.ex: 1,2,3,4,5 becomes 1,4,5,2,3)
op_dblswap:
POP EAX
POP EBX
POP ECX
POP EDX
PUSH EBX
PUSH EAX
PUSH EDX
PUSH ECX
JMP mainloop
; ************** POP 1111 (F): Delete/Pop (Remove the number at the top of the stack)
op_pop:
POP EAX
JMP mainloop
; ---------------------------------------------------------------------------------------
; End of the program: print out the resulting stack and exit
; ---------------------------------------------------------------------------------------
endloop:
MOV ESI, ESP
printloop:
CMP ESI, EBP
JNB exit
MOV EAX, [ESI]
MOV EBX, ESI
PUSH EBX
CALL printnum
POP EBX
LEA ESI, [EBX + 4]
JMP printloop
exit:
MOV ESP, EBP
;POP EAX
XOR EAX, EAX
RET
; ---------------------------------------------------------------------------------------
; Faults
; ---------------------------------------------------------------------------------------
fault_invalidopcode:
MOV EAX, err_invalidopcode
JMP fault
fault_stackunderflow:
MOV EAX, err_stackunderflow
JMP fault
fault_dividebyzero:
MOV EAX, err_dividebyzero
JMP fault
fault_intoverflow:
MOV EAX, err_intoverflow
JMP fault
fault:
CALL print
MOV EAX, crlf
CALL print
MOV ESP, EBP
MOV EAX, 1
RET
missingparam:
MOV EAX, err_missingparameter
JMP fault
cannotopenfile:
MOV EAX, err_cannotopenfile
JMP fault
outofmemory:
MOV EAX, err_outofmemory
JMP fault
; ---------------------------------------------------------------------------------------
; Helper functions
; ---------------------------------------------------------------------------------------
printnum:
MOV EBX, 10
CALL printnumrec
MOV EAX, crlf
JMP print
printnumrec:
PUSH EAX
PUSH EDX
XOR EDX, EDX
DIV EBX
TEST EAX, EAX
JZ printnumend
CALL printnumrec
printnumend:
MOV EAX, EDX
CALL printdigit
POP EDX
POP EAX
RET
printdigit:
ADD EAX, '0'
MOV [printbuff], EAX
MOV EAX, printbuff
JMP print
print:
MOV ESI, EAX
PUSH 0
PUSH buff
CALL strlen
PUSH EAX
PUSH ESI
PUSH -11
CALL _GetStdHandle@4
PUSH EAX
CALL _WriteFile@20
RET
strlen:
XOR ECX, ECX
strlen_loop:
CMP BYTE [ESI+ECX], 0
JE strlen_end
LEA ECX, [ECX+1]
JMP strlen_loop
strlen_end:
MOV EAX, ECX
RET
; ---------------------------------------------------------------------------------------
; Data
; ---------------------------------------------------------------------------------------
section .data
; Table of opcode handlers and required stack space (in bytes, i.e. 4*operands)
opcodetable:
DD op_concatenate, 8
DD op_increment, 4
DD op_decrement, 4
DD op_multiply, 8
DD op_divide, 8
DD op_add, 8
DD op_subtract, 8
DD op_exponent, 8
DD op_modulus, 8
DD op_rotright, 0
DD op_rotleft, 0
DD op_duplicate, 4
DD op_dblduplicate, 8
DD op_swap, 8
DD op_dblswap, 16
DD op_pop, 4
maxop:
crlf DB 13, 10, 0
err_invalidopcode DB "Invalid opcode", 0
err_stackunderflow DB "Stack underflow", 0
err_dividebyzero DB "Division by zero", 0
err_intoverflow DB "Integer overflow", 0
err_missingparameter: DB "Missing parameter: Use nexlang file.bin", 0
err_cannotopenfile: DB "Unable to open input file", 0
err_outofmemory: DB "Not enough memory", 0
section .bss
programend RESD 1
printbuff RESD 1
buff RESD 1
To compile this, use something like
nasm.exe -fwin32 nexlang.asm
ld -o nexlang.exe -e _start nexlang.obj -s -lkernel32
The program receives the name of the binary file containing the program on the command line (e.g. nexlang.exe testprg.bin
). When finished, it prints the final contents of the stack to standard output in a human-readable format.
To help with testing, save the following into nex.def
:
%define CAT DB 00h
%define INC DB 01h
%define DEC DB 02h
%define MUL DB 03h
%define DIV DB 04h
%define ADD DB 05h
%define SUB DB 06h
%define EXP DB 07h
%define MOD DB 08h
%define ROR DB 09h
%define ROL DB 0Ah
%define DUP DB 0Bh
%define DU2 DB 0Ch
%define SWP DB 0Dh
%define SW2 DB 0Eh
%define POP DB 0Fh
And then write your NEX (“non-existing”, as named in the question title) programs using the above-defined mnemonics, and compile with something like
nasm.exe -p nex.def -o prg.bin prg.nex
E.g. for the original test case, use the following prg.nex
:
INC ; 1
INC ; 2
INC ; 3
INC ; 4
DUP ; 4 4
DU2 ; 4 4 4 4
ADD ; 8 4 4
DU2 ; 8 4 8 4 4
ADD ; 12 8 4 4
DUP ; 12 12 8 4 4
ROR ; 4 12 12 8 4
ADD ; 16 12 8 4
And finally, for the “2014” challenge, use the following 14-byte NEX program:
DUP ; 0 0
DUP ; 0 0 0
INC ; 1 0 0
INC ; 2 0 0
SWP ; 0 2 0
CAT ; 20 0
SWP ; 0 20
INC ; 1 20
DUP ; 1 1 20
INC ; 2 1 20
INC ; 3 1 20
INC ; 4 1 20
CAT ; 14 20
CAT ; 2014
GolfScript, 64 chars
OK, so I decided to try and golf this. And what better language for golfing than GolfScript?
Conveniently, GolfScript itself is already a stack-based language with single-byte commands, and, as it happens, 11 out of your 16 commands map directly to built-in GolfScript commands. So all I really need to do to interpret your language is to implement the remaining five commands in GolfScript and build a translation table:
0\{'`+~
)
(
*
/
+
-
?
%
](+~
])\~
.
1$1$
\
[@]\+~\
;'n%=~}/]-1%`
The code looks kind of spread out, because I'm using newlines as delimiters for the translation table. The initial 0\
pushes a zero onto the stack and moves it below the input program. The { }/
loop, comprising most of the code, takes the input program off the stack and iterates the loop body over each of its characters, and the final ]-1%`
collects the stack into an array, reverses it (because your sample output starts from the top of the stack) and stringifies it.
The loop body starts with a 16-line single-quoted string. n%
splits this string at line breaks, =
looks up the substring corresponding to the input character, and ~
evaluates the substring as GolfScript code.
Finally, here are the GolfScript implementations of the 16 commands:
- 0 =
`+~
: concatenate two numbers as strings - 1 =
)
: increment - 2 =
(
: decrement - 3 =
*
: multiply - 4 =
/
: divide - 5 =
+
: add - 6 =
-
: subtract - 7 =
?
: raise to power - 8 =
%
: modulus - 9 =
](+~
: rotate stack right - A =
])\~
: rotate stack left - B =
.
: duplicate - C =
1$1$
: double duplicate - D =
\
: swap - E =
[@]\+~\
: double swap - F =
;
: pop
I'm kind of unhappy with the double swap — it's ugly and much longer than any of the other commands. It feels like there ought to be a better way, but if so, I haven't found it yet. Still, at least it works.
For example, running the program above on the input (given as a GolfScript / Ruby / Perl / Python / etc. double-quoted string):
"\x01\x01\x0B\x0C\x05\x0C\x05\x0B\x09\x05"
yields the output:
[8 6 4 2]
Edit: I managed to save two more chars, for a total of 62 chars, using a more compact encoding of the translation table. However, it kind of sacrifices readability:
0\{(')(*/+-?%'1/'](+~
])\~
.
1$1$
\
[@]\+~\
;
`+~'n/+=~}/]-1%`
Notable features of this version include the (
at the beginning of the loop, which shifts the command indices from 0..15 to -1..14 so that I can put the long sequence of single-character commands from 1 to 8 at the beginning of the table. This allows me to store them in a separate string and eliminate the eight newlines delimiting them; alas, the extra complexity costs me six characters elsewhere.