Published: 2023-05-16

Tests of Reluctance

Happy to say, I made plenty of progress over the past week, after a slight interlude. Unfortunately, that will make this post a bit of a long one.

Progress

When I left last time, I’d been working on getting memory operations (pointers and dereferencing) working correctly. After that, I managed to work my way through array definitions, initialisations and subscript accessing. There was a bit of a tricky time getting assignments to work correctly, but super satisfying once it was working.

After that, I paused the rapid progress so I could set up a test suite. The last time I worked on a compiler (as a student, during my degree), we had access to a test suite. It was composed of small example programs that exercised different areas of the compiler, as well as different features of the language. It would do a comparison between the expected program output and the actual result, as well as check for compiler errors.

I wanted something similar for Fang. Compilers are tightly coupled and complex pieces of software, so verifying that I didn’t break a feature that was working is especially valuable. It’s easy to do this by accident sometimes.

That said, I didn’t want to spend too long working on testing as I’m eager to get more of the language implemented, so I put a script together in Bash which:

run the compiler
check the compiler’s exit code and output
if it produced a program, it runs it, checks the output and exit code.

All of this is compared to the test case and if it matches, “PASS”!

A list of tests, marked with a green “PASS” to indicate all is good in the world.

This ensures that the compiler’s behaviour is as expected. Relying on matching text content is somewhat brittle (and it makes debugging by text output trickier) but it’s been super helpful.

I slapped together a few programs to demonstrate the basic features so far, and getting them all green definitely feels good (I’m pretty familiar with ANSI terminal codes so colouring text is easy. Gotta have reds and greens!).

Next up, I decided to put some work into error handling. A compiler should produce useful error messages to aid a programmer debugging their work. This also comes in handy for a compiler developer debugging a compiler. So, I identified areas where I knew errors were being caught and added messages which try to explain why the error occurred. Then, I wrote test programs to trigger those messages, ran the test suite and checked it all got picked up correctly.

It’s a huge improvement, but I know there’s still some points where a message could be added or improved. These will get added over time.

The test suite was particularly helpful as I worked on the next couple of features: allocating stack space for local variables, and handling static memory for global variables and constants.

These both required changes in how accesses to variables had been working so far, and catching those with the test suite was so helpful. The issues here most came down to “fighting the architecture”.

Fighting the Architecture

Occasionally, an aspect of the compiler will be conceptually or architecturally simple, and then it’ll get complicated because of the fact that my compilation target currently is ARM64 (and Apple hardware to boot). So, I might spend ages fighting obscure stack alignment issues or strange page offset addressing stuff. This stack overflow answer set me on the right path this time.

ARM64 operates on 64-bit values, although you can convince it to work on 32-bit values sometimes as well. It has some support for 8-bit and 16-bit values, but it introduces a little weirdness unless you’re extremely consistent and careful, and up until this point, I’d been allocating a full 64-bits for each variable, even though my data types mostly fit in single bytes (pointers require the full 64-bits though). My code generator didn’t really account for data size when I hacked it together originally, so it would sometimes read a byte when it meant to get a full wide value, or when writing to something that should be a byte, it’d overwrite adjacent values in memory as well, leading to all kinds of bugs.

I think I’ve got most of it squared away now, although there’s a few places where we could be more “compact”. The stack has to be accessed on 16-byte boundaries, so it’s easier to deal with function parameters that way, even if we could pack more values in there.

The ARM64 backend is mostly for debugging and proofing the architectural abstractions, so this inefficiency is acceptable here. It won’t be when I do the true backends for MOS6502 and Z80 chips later, but both work natively in 8-bits and 16-bits, so it’ll be a much less complicated issue.

What’s next?

There are two major features for the language coming next…

Spreading code across multiple files/modules. This is needed to allow for “library” code. At the moment, Fang has access to the write system call because of a block of Fang code that gets preprended to every file that is compiled. I’d prefer it if you specified this explicitly, and that the lexical scanner would account for the different files in error reporting.
composite record types (aka “structs”, if you come from C or Go). This is a pretty important piece of the language design, hopefully we can get it to work fairly quickly.

Until next time!