A recent edition of [Babbage’s] The Chip Letter discusses the obscurity of assembly language. The author points out that assembly language is more often read than written, yet nearly all of them are hampered by obscurity left over from the days when punched cards had 80 columns and a six-letter symbol was all you could manage in the limited memory space of the computer. For example, without looking it up, what does the ARM instruction FJCVTZS do? The instruction’s full name is Floating-point Javascript Convert to Signed Fixed-point Rounding Towards Zero. Not super helpful.
The author suggests that nothing is stopping you from writing a literate assembler that is made to be easier to read. Most C compilers will accept some sort of asm statement, and you could probably manage that with compile-time string construction and macros. However, there is a better possibility.
Reuse, Recycle
It sounds like you have a universal cross assembler that uses some simple tricks to convert standard-looking assembly language formats into C code that is then compiled. Executing the resulting program outputs the desired machine language into a desired file format. It is very easy to set up, and in the middle, there’s a nice C program that emits machine code.
At the heart of the system is a C program that lives in soloasm.c. It handles command line options and output file generation. It calls an external function, genasm with a single integer argument. When that argument is set to 1, it indicates the assembler is in its first pass, and you only need to fill in label values with real numbers. If the pass is a 2, it means actually fill in the array that holds the code.
That array is defined in the __solo_info instruction (soloasm.h). It includes the size of the memory, a pointer to the code, the processor’s word size, the beginning and end addresses, and an error flag. Normally, the system converts your assembly language input into a bunch of function calls it writes inside the genasm function. But in this case, I want to reuse soloasm.c to create a literate assembly language.
Modernize
I wrote all this a long time ago, but I wanted the creation of literate assembly to be easier, so I decided to do a low-effort conversion to C++. This allows you to use nice data structures for the symbol table, for example. However, I didn’t use all the C++ features I could have, simply in the interest of time.
The base class is reasonably agnostic about the processor, and, as an example, I’ve provided a literate RCA 1802 assembler. Just a proof of concept, so I could probably name the instructions a bit more consistently, and there is plenty of room for other improvements, but it gets my point across.
Here’s an excerpt of a blinking light program written for the 1802 using the standard assembler syntax:
ORG 0 Main: LDI HIGH(R3Go) PHI R3 LDI LOW(R3Go) PLO R3 SEP R3 R3Go: LDI HIGH(Delay) PHI R9 LDI LOW(Delay) PLO R9 LDI HIGH(Stack) PHI R7 LDI LOW(Stack) PLO R7 SEX R7 LDI 0 STR R7 Loop: OUT 4 . . . NOP BR DELAY1 ORG $F0 Stack: DB 0 END Main
Now here is the exact same written for the literate assembler:
// Simple 1802 Literate Program #include "lit1802.h" #define ON 1 #define OFF 0 #define DELAYPC 9 // delay subroutine #define DELAYR 8 // delay count register #define MAINPC 3 // Main routine PC #define RX 7 // RX value #define DELAYVAL 0xFF // time to delay (0-255) void Program(void) { Origin(0x0); // Blinky light program // Main: Define_Label("Main"); // Force R3 as PC just in case Load_R_Label(MAINPC,"R3Go"); Set_PC_To_Register(MAINPC); // Here we are P=3 // R3Go: Define_Label("R3Go"); // Set R9 to delay routine (default PC=0) Load_R_Label(DELAYPC,"Delay"); // Set RX=7 at memory 00F0 Load_R_Label(RX,"Stack"); Set_X_To_Register(RX); Load_D_Imm(0); Store_D_To_Reg_Address(RX); // Loop: Define_Label("Loop"); Output_Mem_RX_Incr(4); // write count to LED . . . NOP(10); Branch(Label("Delay1")); // note... could define BRANCH as _BRANCH and then #define Branch(l) _BRANCH(Label(l)) if you like... Location(0xF0); // storage for RX // Stack: Define_Label("Stack"); Byte(); End_Program(Label("Main")); // End of program }
Well, admittedly, there are comments and symbols, but still. You can download both files if you want to compare. You can also find the entire project online.
Under the Hood
The idea is simple. Each function simply populates an array with the byte or bytes necessary. Admittedly, the 1802 is pretty simple. It would be harder to do this for a modern processor with many instructions and complex modes. But not impossible.
You can do lots of things to make life easier, both while programming and while setting up instructions. For example, if you wanted 100 NOP instructions, you could write:
for (int i = 0 ; i < 100 ; i++) NOP();
On the other hand, NOP has an optional argument that will do it for
you. You can freely use the C++ compiler and the macro preprocessor to
make your life easier. For example, a common task on the 1802 is putting
a constant value like a label into a register. The lit1802.h
file has a macro to make this easy:
void Load_R_Label(uint8_t reg,const std::string s) { Load_D_Imm(HIGH(s)); Put_High_Register(reg); Load_D_Imm(LOW(s)); Put_Low_Register(reg); }
Obviously, you can change the names to suit or have as many aliases
as you want. Don’t forget that function call overhead, like calling Load_R_Label
, is incurred at compile time. You wind up with the same machine code either way.
The assembler is two-pass. The first pass only defines labels. The second pass generates real code. This would make it hard, for example, to create a smart jump instruction that used a branch when the target was near and a long jump when it was far unless you don’t mind padding the branch with a NOP, which would not save space but might save execution time.
There would be other complications for a modern processor. For example, not trying to allocate the entire memory space or generating relocatable output. But this is truly a proof-of-concept. None of those things are impossible, they are just more work.
Bottom Line
If you want to make assembly more readable, there are benefits and it doesn't have to be that hard to do. You could also write a litasm disassembler to convert object code into this kind of format.
Regarding the Universal Assembler, it is a universal relocatable macro assembler and linker that does not target any particular processor. Instead, you define the instruction set grammar using pseudo-instructions. This is a convenient assembler for custom micro-engines you might design for an FPGA or ASIC
No comments:
Post a Comment