Published: 2023-05-06

Referential Treatment

Last time, I had just gotten functions to work reliably, after a lot of puzzling out how the stack frame was going to work on ARM64, at least as a model for simpler platforms.

Since then, I started trying to implement memory pointer operations: converting variables to pointers and dereferencing them into values, as well as assigning to them through a pointer.

Getting and using pointers in expressions wasn’t too bad to get working, but the assignment proved to be a problem.

My parser before now was making a pretty ham-fisted approach to differentiating between “lvalues” and “rvalues”. An lvalue is something which has an associated storage location, and thus, can be assigned to. Everything else becomes an rvalue.

The parser needs to decide which is which so that we can generate the appropriate assembly code for (a) loading the address of a variable vs (b) the value stored in the variable, in the appropriate circumstances.

To do that before, I just assumed that assignments were being done to a named variable, but that stops working if you want to do through-assignment using a pointer.

Fang’s syntax and semantics around pointers are currently modelled on the C ones. The “dereference” symbol is “@” in Fang, rather than “*”. I like this from a human reading perspective, because an expression like “@x” would mean “the value at x’s address”.

But to assign, you want to do “@x = 42;” and that requires a different set of assembly operations to be produced to store things correctly.

In this case, I stumbled on a partial solution thanks to the great collection of blogs/tutorials/musings by Warren Toomey, aka DoctorWkt.

Basically, the parser starts in lvalue mode, and treats almost everything like an lvalue until it reaches an assignment, and then the right hand side of the assignment is treated as rvalues exclusively. You can use the “ref” operator to convert some things back into lvalues, and that’s how you get a pointer to an object in memory.

This status is propagated down the AST using a stack.

There were a couple of edge cases to handle where an expression needs to be processed in rvalue mode from the beginning as no assignment can occur (conditions in For loops, while loops and if statements mostly, but also parameters in function calls).

I decided that this was easier to handle inside my type system resolver, rather than the parser itself, so during the recursive descent through the AST, the node’s lvalue or rvalue status is updated, and any changes are pushed onto the stack, if necessary.

Getting all this figured out was a little demoralising as the refactoring to get to this point broke code generation entirely for a while.

Now we are back on track though, and pointers work as intended!

It should only be a short leap from here to implementing array accesses, so hopefully that will follow soon.