Sriram Krishnan (Moved to http://www.sriramkrishnan.com/blog)

Search. Usability. Virtual machines.Geek stuff

<December 2008>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910


Navigation

Subscriptions

News

Link blog
Technorati Profile
The Blogs I read
Creative Commons Licence
This work is licensed under a Creative Commons License.


Monday, February 21, 2005 - Posts

Building a virtual machine Part 3 - Choice. The problem is choice

Don't you love the way all my titles in this series refer to famous movies. Well - I think it's cool anyway.:) As always, if you want to help out, you'll find the links at http://smokevm.sourceforge.net - though there's nothing much there yet.

Writing a VM  (like writing anything else) involves a series of choices. Some of them are clear cut if you take things like your goal, resources,etc into consideration while others can be real head-scratchers.Here are a few of those choices (I might write about them in detail in a later post)

Calling a function - Register-based vs Stack based

All the popular mainstream VMs(JVM/CLR,etc) today are stack-based.This means that for every function call, the arguments are pushed onto the stack and any form of communication involves pushing and popping off the stack. Parrot, however, uses a different register based mechanism.It has a zillion registers where you can pretty much stuff in anything. So, instead of pushing in 2 integer variables for an 'add' function, you just put them in 2 integer registers and call the function.

Now, both have their pros and cons - the Parrot folks are convinced that their way is better as it offers better performance (the theory being that compiler writers know how to write compilers for register-based machines better). Now, as tempting as this looks, I played safe on this one and stuck to a stack-based VM as I was safe ground on that one.

 

Calling functions - Continuation Passing Style or Not

Continuation Passing Style (or CPS in short) is this elegant way of calling functions  used mainly by the Lisp guys. If you don't understand continuations, don't even bother trying to understand this. Basically, instead of every function returning its value to its parent, the parent function tells the child function which function to pass the result to. No function call ever returns. Yes - I know it doesn't make sense. Dan says it better than I ever could.

Probably the most famous application was Guy Steele's work on the RABBIT compiler for Scheme back in the 70s. Parrot uses a CPS-style to call functions too (they seem to do all the cool things).

Again, I decided to play it safe and stick to stack based mechanism. Though having a CPS mechanism would make it easy for me to do continuations in the VM, I wasn't sure as to how to convert all programs to CPS style. However, the way my VM is structured, we could probably add CPS without too much effort.

 

Executing byte code (Direct function calls vs. Indirect function calls vs. TIL vs. the Big Switch vs. JIT)

Now, you have the byte code with you - but how do you go about executing it? You have a bunch of options again. Frankly, I didn't know much about this till I looked at Dan Sugalski's presentation on the topic. We decided to go with the big switch for now - but with the extra feature of the programmer having the ability to add opcodes and handle them *at runtime*. Kaushik is itching to finish everything else and do the JIT :-) - so I'll leave a discussion on the JIT to him.

 

Garbage collection (Ref counting vs Mark-Sweep vs Copying vs Generational)

This thing deserves its own blog post - so when I find the time, I'll write up something on it. Right now, we're going with a ref counting mechanism (cyclic references be damned!) till we find the time to do a mark-sweep. As for a generational GC, that just seems like too much work considering the limited time we have

The language - C or C++

We went for C++ in the end as I saw most code written in C for VMs trying to emulate classes and objects anyway. Kaushik still hasn't forgiven me for this ;-)

 

Every day, we run into several design choices, though these were probably the biggest ones. I'll blog the others if I find any interesting

 

posted Monday, February 21, 2005 3:30 AM by sriram

Building a virtual machine Part 2 - the Virtual Machine diaries

 

For the last 2 weeks, I've been reading up a lot on virtual machines - past and present.It's been a lot of fun actually. Some random notes from what I've seen and done so far

  •  Everything worth inventing has been invented by the Lisp guys 2 decades ago. People like Guy Steele have done some really cool work
  •  Smalltalk has the most kick-ass bytecode and VM design I've seen. Unfortunately, simplicity is tough - if I was smarter, I would have made mine as simple as  Smalltalk's. But hey - I'm not that smart..so complexity is more my thing
  •  Python's simplicity is present at every level - right down to the VM
  • .NET's bytecodes seem to be tuned for speed more than anything else
  •  This stuff is hard. Really hard.
  • If you're doing anything involving virtual machines, Dan Sugalski's blog is a must-read.
  • I hate C++
  • I love Visual Studio.NET
  • Hungarian ain't that bad...in the beginning,atleast
  • COM's QueryInterface concepts come in handy when you least expect it.
  • Don Box is one smart guy
  • Getting Sourceforge + CVS + SSH to work on Windows is a huge pain
If that doesn't make sense , don't worry. Start writing a virtual machine and you'll figure out what I mean.:-)

posted Monday, February 21, 2005 3:09 AM by sriram

Building a virtual machine 1 - Virtual Machine Begins

Sorry for the wrong delay everyone. I haven't been able to blog for 2 reasons. The first being that an unexpected attack of 'too-lazy-to-blog'. And I deal with the second one in this post

For the last 2 weeks,I've been deep down into the internals of program execution - the virtual machine. For those who know me, this should come somewhat as a surprise, for I've been known to flinch whenever I see a pointer and run away screaming when I see C or C++ code. And that was probably a reason in me choosing to do a virtual machine for my final year project. I wanted to see how the lower levels of a virtual machine looked like, before getting back to my lambdas and search engines.

3 weeks ago, my team(Anand, Balakrishnan) decided that our whole Bittorrent idea was not good enough (for several reasons) and that we needed something else. After some mucking around, a virtual machine was what we decided on. We decided to stick with the name 'Smoke' though we had called our Bittorrent app the same thing. Why? Well - call us sentimental. Plus, I think it is kind of cool.If you want to see how we're doing, head on over to http://smokevm.sourceforge.net

Some 2 weeks later, we had barely started . After all, we had 3 months to go (which some of you old hands might recognize as an extremely short time to build something of the size of a VM).However, sometime last week, we got a huge shock when we heard that instead of the promised 3 months, it would now be 3 'weeks' and by March 2nd week or so, we would need to have the whole thing done.

Bummer.

Giving ourselves one firm kick on the backside, I thanked my lucky stars that I had roped in too extremely capable people. One was Boy Wonder Kaushik .Another was Aarthi, who promised to write the Lisp compiler for our VM.

A lot of people have a lot of definitions of what a 'VM' is.If you look at Citeseer, you'll see a lot of papers on a varied range of virtual machines ranging from CLR/JVM-like VMs to Virtual PC-like VMs with a lot of stuff inbetween. In our case, a 'VM' refers to a CLR/JVM-like Vm. Something that can take the compiled byte code of a language, and run the damn thing, doing things like memory management,exceptions along the way.

Now, you might be wondering why we're doing a VM at all. After all, things like the CLR and the JVM exist. Well..our answer would be ' we're different'. The problem with the JVM and the CLR is that they're inherently built for statically typed languages. Fitting on a language like Jython or IronPython requires a lot of work. On the other hand, dynamic runtimes like the CPython runtime and the various Lisp implementations are built for their own languages and require quite some work if we need to compile a different language.

Our basic idea was to do a virtual machine designed from the ground up for functional and dynamic languages. This means doing things like multi-method dispatch, continuations, closures,etc - but doing them *inside* the VM.

This is *very* similar to what the Parrot (www.parrotcode.org) guys are doing and in fact, I've been lurking on their lists for a few weeks now. If I had had the choice, I would have built upon their code- but my problem is that my college doesn't like people modifying existing code. So working on Parrot or Rotor was out of the question.Bummer again.

Recipe
 
Different VMs have different things. Here's what we decided on as our final list of 'ingredients'. These are the things we want to deliver 3 weeks from now
  • 2 compilers.Right now,we're looking at Python and Common Lisp
  • A byte code interpreter (the heart of the VM)
  • An assembler (similar to ILASM)
  • A GC (don't ask what kind - don't know yet. Ref counting until we find the time to do something else)
  • All that funky VM stuff (typing, continuations,etc,etc)
Right now, we're looking to copy from existing VMs as much as we can since we really don't have time to design a lot of stuff. This doesn't mean that we don't want to innovate - just that we don't want to reinvent every wheel. I'm personally looking forward to figuring out how to implement continuations inside the VM itself (sorry Roshan! :-) )
 
As the more astute among you would have figured out by now, I plan on making this into a series of posts on virtual machines. Hopefully, I'll have a post some 3 weeks from now about my completed VM
 

posted Monday, February 21, 2005 3:03 AM by sriram




Powered by Dot Net Junkies, by Telligent Systems