Building a virtual machine Part 3 - Choice. The problem is choice
Don't you love the way all my titles in this series refer to famous movies. Well - I think it's cool anyway.:) As always, if you want to help out, you'll find the links at http://smokevm.sourceforge.net - though there's nothing much there yet.
Writing a VM (like writing anything else) involves a series of choices. Some of them are clear cut if you take things like your goal, resources,etc into consideration while others can be real head-scratchers.Here are a few of those choices (I might write about them in detail in a later post)
Calling a function - Register-based vs Stack based
All the popular mainstream VMs(JVM/CLR,etc) today are stack-based.This means that for every function call, the arguments are pushed onto the stack and any form of communication involves pushing and popping off the stack. Parrot, however, uses a different register based mechanism.It has a zillion registers where you can pretty much stuff in anything. So, instead of pushing in 2 integer variables for an 'add' function, you just put them in 2 integer registers and call the function.
Now, both have their pros and cons - the Parrot folks are convinced that their way is better as it offers better performance (the theory being that compiler writers know how to write compilers for register-based machines better). Now, as tempting as this looks, I played safe on this one and stuck to a stack-based VM as I was safe ground on that one.
Calling functions - Continuation Passing Style or Not
Continuation Passing Style (or CPS in short) is this elegant way of calling functions used mainly by the Lisp guys. If you don't understand continuations, don't even bother trying to understand this. Basically, instead of every function returning its value to its parent, the parent function tells the child function which function to pass the result to. No function call ever returns. Yes - I know it doesn't make sense. Dan says it better than I ever could.
Probably the most famous application was Guy Steele's work on the RABBIT compiler for Scheme back in the 70s. Parrot uses a CPS-style to call functions too (they seem to do all the cool things).
Again, I decided to play it safe and stick to stack based mechanism. Though having a CPS mechanism would make it easy for me to do continuations in the VM, I wasn't sure as to how to convert all programs to CPS style. However, the way my VM is structured, we could probably add CPS without too much effort.
Executing byte code (Direct function calls vs. Indirect function calls vs. TIL vs. the Big Switch vs. JIT)
Now, you have the byte code with you - but how do you go about executing it? You have a bunch of options again. Frankly, I didn't know much about this till I looked at Dan Sugalski's presentation on the topic. We decided to go with the big switch for now - but with the extra feature of the programmer having the ability to add opcodes and handle them *at runtime*. Kaushik is itching to finish everything else and do the JIT :-) - so I'll leave a discussion on the JIT to him.
Garbage collection (Ref counting vs Mark-Sweep vs Copying vs Generational)
This thing deserves its own blog post - so when I find the time, I'll write up something on it. Right now, we're going with a ref counting mechanism (cyclic references be damned!) till we find the time to do a mark-sweep. As for a generational GC, that just seems like too much work considering the limited time we have
The language - C or C++
We went for C++ in the end as I saw most code written in C for VMs trying to emulate classes and objects anyway. Kaushik still hasn't forgiven me for this ;-)
Every day, we run into several design choices, though these were probably the biggest ones. I'll blog the others if I find any interesting