February 2005 - Posts

Write Less, Think More: The Code-Bloat Antidote

I'm always trying to get people to write better code. Come to think of it, I'm always trying to write better code myself. And the more I analyze how programmers approach writing software, the more I realize that it's often a thoughtless process: it's easy to bang away on the keyboard tackling each challenge almost on a line-by-line basis, disregarding the big-picture of your application's overall design. Call me pessimistic, but it's so easy that finding well thought-out code is quite a rare thing these days.

Recently I've been experimenting with an idea I had to encourage developers to improve their code. The idea involves enforcing an arbitrary line limit on the code they write. For example, I may tell them to implement a certain feature, but that the code they write cannot be longer than, say, 100 lines. That may sound like a stupid idea, but it always ends up with better and more well thought-out code. Let me explain the logic behind this.

Given enough time, any standard of programmer can implement pretty much any application. The result may end up a completely unmaintainable mess, but it'll work for sure. Sometimes. Eventually. But at what point does an application become a mess? Usually when it's too big: when there's too many lines of code.

Have you ever had to take on someone else's code and complained that the code is unmanageable because it's too small? I doubt it. Generally, even unmaintainable code can easily become maintainable if it's small enough, because we can simply rewrite the complicated sections. However when you're faced with an unmaintainable application consisting of millions of lines of code, the likelihood of a bad design increases, and with it the likelihood of a complete rewrite too: By keeping your project line count low, you're helping keep your project maintainable.

Now, I know it's easy to make these blanket statements that the more lines of code, the more complex the application becomes - but it's not as random as you might think. The concept certainly isn't new, but I do think it deserves more attention.

When you're forced to write code in fewer lines you suddenly take on a entirely different mindset and approach to programming. Consider how it changes your approach:

  • You're forced to re-use code because it might be the only way you can reduce the number of lines.
  • You start thinking twice before copying and pasting code.
  • You look at your classes to see how you can use them polymorphically.
  • You derive more classes and build a class hierarchy.
  • You might be forced to use declarative programming techniques, because by moving the essence of your application's initialization into configuration you might find you've ended up with less duplication of logic.

The bottom line is that there's quite a lot of good programming practices that are forced onto you when you're given a constraint on the size of the code you write.

And the key isn't necessarily to set an arbitrary line limit, but to ask the question: can this code be any smaller than it is? By setting an arbitrary line limit, you're forcing them to make it as small as it can be. The line limit is just a carrot dangling on the end of a stick.

If you're a lazy programmer, which I am, then some of this may come naturally to you. Of course I don't mean mentally lazy, I mean physically lazy. Yes, so lazy you can't be bothered to move your fingers to type out all those lines. Lazy programmers don't need a team lead telling them to write less lines of code, the very thought of having to maintain, let alone type out, all that code is enough to constrain the amount of code they write a day.

Some call this phenomenon "code bloat", and I think it's sufficiently widespread to warrant some major action. You don't have to look very far to find the evidence: it seems the more memory and hard disk space available on average machines, the less care goes into keeping code small. It really wasn't all that long ago, relatively speaking, that a useful operating system could fit on a 180 Kb floppy disk. And now Windows XP on my machine takes up almost 2 gigabytes.

Lets look at an example of what I mean. Take this C# code here:

class Mortgage
{
    public bool IsLoanActive;
    public float Principal;
    public float Interest;
}

class CarLoan
{
    public bool IsLoanActive;
    public float Principal;
    public float Interest;
}

class BusinessLoan
{
    public bool IsLoanActive;
    public float Principal;
    public float Interest;
}

// returns the total interest for a year of all active mortgages.
float GetTotalInterestForThisYear( Mortgage[ ] mortgages, CarLoan[ ] loans1, BusinessLoan[ ] loans2)
{
    float totalInterest = 0;
    foreach ( Mortgage mort in mortgages)
    {
        if ( mort.IsLoanActive)
        {
            totalInterest = totalInterest + mort.Interest * 12;
        }
    }     
    foreach ( CarLoan loan1 in loans1)
    {
        if ( loan1.IsLoanActive)
        {
            totalInterest = totalInterest + loan1.Interest;
        }
    }
    foreach ( BusinessLoan loan2 in loans2)
    {
        if ( loan2.IsLoanActive)
        {
            totalInterest = totalInterest + loan2.Interest;
        }
    }
    return totalInterest;
}     

// returns a list of all active loan amounts
float[ ] GetAllLoanAmounts( Mortgage[ ] mortgages, CarLoan[ ] loans1, BusinessLoan[ ] loans2)
{
    int totalActiveLoans = 0;
    // find out how many active loans there are
    foreach ( Mortgage mort in mortgages)
    {
        if ( mort.IsLoanActive)
        {
            totalActiveLoans = totalActiveLoans + 1;
        }
    }     
    foreach ( CarLoan loan1 in loans1)
    {
        if ( loan1.IsLoanActive)
        {
            totalActiveLoans = totalActiveLoans + 1;
        }
    }
    foreach ( BusinessLoan loan2 in loans2)
    {
        if ( loan2.IsLoanActive)
        {
            totalActiveLoans = totalActiveLoans + 1;
        }
    }
    float[ ] amounts = new float[ totalActiveLoans] ;
    int count = 0;
    // put the amounts into the array
    foreach ( Mortgage mort in mortgages)
    {
        if ( mort.IsLoanActive)
        {
            amounts[ count] = mort.Principal;
            count = count + 1;
        }
    }     
    foreach ( CarLoan loan1 in loans1)
    {
        if ( loan1.IsLoanActive)
        {
            amounts[ count] = loan1.Principal;
            count = count + 1;
        }
    }
    foreach ( BusinessLoan loan2 in loans2)
    {
        if ( loan2.IsLoanActive)
        {
            amounts[ count] = loan2.Principal;
            count = count + 1;
        }
    }
    return amounts;
}

The total amount of code inside the function bodies total about 75 lines. It may not look that bad to the untrained eye, but just look at all that redundancy - the code that almost looks like it was copied and pasted. I hate seeing visual patterns in my source code. If I see a pattern, then that tells me the code needs to be refactored.

You can actually reduce those 75 lines to about 10 lines (while retaining the same method signatures), and maybe less than that, with only a little additional thought.

class Loan
{
    public bool IsLoanActive;
    public float Principal;
    public float Interest;
}

class Mortgage : Loan { }
class CarLoan : Loan { }
class BusinessLoan : Loan { }

float GetTotalInterestForThisYear( Mortgage[ ] mortgages,
                                                 CarLoan[ ] loans1,
                                                 BusinessLoan[ ] loans2)
{
    float total = 0;
    foreach ( Loan[ ] loans in new Loan[ ] [ ] { mortgages, loans1, loans2 } )
        foreach ( Loan loan in loans)
            if ( loan.IsLoanActive) total += loan.Interest * 12;
    return total;
}

// returns a list of all active loan amounts
float[ ] GetAllLoanAmounts( Mortgage[ ] mortgages,
                                      CarLoan[ ] loans1,
                                      BusinessLoan[ ] loans2)
{
    ArrayList list = new ArrayList( ) ;
    foreach ( Loan[ ] loans in new Loan[ ] [ ] { mortgages, loans1, loans2 } )
        foreach ( Loan loan in loans)
            if ( loan.IsLoanActive) list.Add( loan.Principal) ;
    return ( float[ ] ) list.ToArray( typeof( float) ) ;        
}

In this code, we're making good use of polymorphism. We need less comments because it's harder to get lost while reading the code. We're also making some innovative use of jagged arrays to eliminate the need to repeat the code for each parameter. And with some additional understanding of .Net's collection types, we can create dynamic arrays that save us from having to determine the size up-front.

Of course there are dangers to telling developers to reduce the number of lines of code. Remember those days when C programmers would write lines so cryptic it was impossible to tell whether it was obfuscated or not? Like 'If' statements that contained multiple assignments, pointer dereferences and increment operators all in one line. Anybody who does this in the name of "code bloat antidote" is missing the point. Reducing the number of lines isn't really about the number of lines - it's about reducing the amount of code. Just because you can write the following to reduce the lines of code:

if ((age2 = age++) > (age3--) ? (age3 = age2++) == 0 : false)

…doesn't mean you should. You're not reducing the code here, you're just putting it all on one line. Plus you're making it more cryptic at the same time. If there's any tenet as useful as reducing the lines of code, it's making your code more readable. Reducing code is important, but never at the cost of clarity. Unfortunately there's no easy way to detect code like this, as it doesn't raise a warning or break any fxcop rules.

You should also consider that, while the code is shorter and more compact, that does not necessarily always mean the code will perform better. Using polymorphism can reduce code but make method calls slower because of extra v-table lookups. However, most of the time the trade off is justified.

So to conclude let me summarize the points I've covered:

  1. Write less code.
  2. Think more about the code you do write.
  3. Learn APIs thoroughly, to save you 'reinventing the wheel'.
  4. Always consider how readable your code is.

Hope you found this advice useful. So how about the next time you write a function, set yourself a line limit of 20 lines and see what difference is makes to the quality of the code you write.

And if you want to keep track of your "total lines of code" for C# projects, check out my line count utility.

 

Brad Abrams on Designing Inheritance Hierarchies (Summary)

I've just finished watching another interesting lesson from Brad Abrams of Microsoft, this time focusing on designing inheritance hierarchies. I've summarized what I learnt below. Of course you should watch it yourself if you have the time!

1. Dangers of Over-Designing. Version 1 should ALWAYS be as simple as possible, meeting only the minimum of the immediate requirements. The use cases and actual requirements aren't know until Version 2 and it's very easy to get carried away by imaginary requirements and end up with a design that's so complex that it ends up being rewritten for Version 2 anyway.

2. Give preference to broad, shallow hierarchies. Try not to go deeper than 3 levels, as deep hierarchies become difficult to maintain and extend.

JW: Personally I find it amusing how the “book smarts“ who learn everything about object orientation from books go straight in and over-design their classes. Sure it's comforting to think you've coded for every possible eventuality, but reality is always quite different. There's nothing quite as discomforting as being bogged down with maintenance while the requirements are piling up. Brad's advice here is priceless.

3. Consider making base classes abstract, to give a clear message that they are not complete and are intended to be extended to provide real implementation.

4. Virtual members are both powerful and can be dangerous. Powerful because of their extensibility, but dangerous because the your code can become more fragile as the true implementation of the methods isn't known. As such, virtual members should be used sparingly.

5. When overriding, don't change the semantics of the member. You should be consistent with the contract defined by the base class.

6. Have a concrete scenario for every virtual member you define: Try to provide an example of when the member would need to be overridden.

7. Ensure that base members calling the virtual will still be able to function reliably with different implementations of the virtual. This may require some defensive coding when calling virtuals in your base classes.

8. Consider making the virtual member an abstract member when the class cannot provide any meaningful implementation of a virtual member, .

9. When possible, use base classes over interfaces. Base classes let you provide a default implementation, making it easier to sub-class and customize. Interfaces require a total re-implementation making it far more difficult to use.

10. Interfaces also have versioning problems that base classes avoid. An interface is a contract and should not be be modified once released. Adding a new method to the interface will break existing classes that implement that interface (as the new method implementation will be missing). Base classes can provide default implementation, and so existing sub-classes will continue to function correctly across versions.

11. An alternative to providing default implementation is by using aggregation, where a default implementation is either contained or delegated to, and exposed through methods returning contained instances or through interface methods that delegate. This is quite common in the COM world, and provides an interesting solution where multiple inheritance is required. [a similar pattern is the decorator pattern].

12. Interfaces are useful for letting you access discreet aspects of an object's functionality, when you want to hide (or not have to be bound to) the true identity of the object.

13. If you are using interfaces, keep the interfaces very small (1 or 2 members is best). IComparable is a good example.

14. You don't always want interface methods to clutter your implementation type. Consider using private interface method implementations (known as explicit method implementations). For example 'int IComparable.ConvertTo(object obj)' will hide the method at the class level, but make it visible when cast to the interface.

15. Private interface method implementations helps with interface versioning, because an object can implement two different interfaces with the same method signature.

Thanks Brad for the useful tips.

Brad Abrams on Class Members (Summary)

Brad Abrams (MS) has put up another one of his classes on framework design/implementation considerations. In this one he goes into the correct usage of constructors, methods, properties, fields and events.

To save you the time of watching it, or reading the transcript, I've tried to summarize what he said here:

1. Constructors should contain minimal functionality. Just capture the data passed in and be done. Some choose to do lots of work in the constructor to make other operations on the class faster. The problem here is that often you won't use the class at all, and the processing will be wasted.

2. It's ok to throw exceptions in the constructor.

3. Always add a default constructor to avoid versioning issues. Adding another constructor will remove the implicit default constructor, so it's best to add the default constructor explicitly up-front.

4. Provide property alternatives to large constructors. As well as providing parameterized constructors, also provide default constructors with configuration options via properties. Try to use the "drag and drop" metaphor - where you construct the object (drop it on a form) and then set properties later, rather than forcing the developer to set the properties when constructing the object.

5. Ensure methods with the same name do exactly the same thing. When overloading, do not change the behavior of the function with each overload. Name the parameters consistently between overloads. If overloading is used to offer optional parameters, the overload should be doing nothing extra other than providing a default value for the missing parameter.

6. There's no reason to make all overloads virtual, just make the most complex overload virtual, as the others should default. The exception is if you want to override the defaults for missing parameters.

7. To help the JITter inline code (for performance): minimize virtual declarations, minimize size of methods and minimize the number of local declarations you have. All of these can stop the just-in-time compiler from optimizing your code.

8. Avoid publicly exposed fields from framework classes. You give up so much control and versioning, it's much easier in the long run to use properties.

9. When defining properties, don't create write-only properties and consider using PropertyChanged events when the properties are on controls.

10. Keep property getters simple. They shouldn't throw exceptions, as this can cause problems in debuggers. If they actually perform some work and/or change the state of the class, it should be a method and not a property getter.

11. Properties should be independent of one another. You should be able to set properties in any order. If a specific order is required, use a method.

12. Avoid properties that return arrays, instead use a method. You shouldn't be providing direct access to internal arrays in your class, so your property should be copying the array. Because it's doing work, it should be a method instead.

13. Use indexers (eg. this[]) if the backing storage for the class is a collection. Keep the indexer itself simple - string or int parameters.

14. Name events with a verb. eg. Click, Paint, DrawItem, DropDown.

15. Events should return void. They should follow the signature of sender, event-arguments.

16. Keep event argument parameters as strongly typed.

17. Put a try/finally around where you're raising an event because the handler could throw an exception.

18. Use statics where the method can be invoked indepdently of the class instance, and where instantiation is overkill. eg. Math. When creating singletons use static readonly.

19. Avoid using ref to return multiple parameters. Instead put the values into a struct and return the struct. When using 'out' in C# parameters, consider the fact that 'out' is not supported in the CLR and ends up being translated to a 'ref' parameter anyway, so the out semantics cannot always be enforced.

20. Always validate arguments passed into public methods. When validating arguments to public methods, throw exceptions derived from ArgumentException.

 

SolGen: C# Solution Generator v0.1

Bottom Line: Download a free C# solution file generator here, that attempts to turn any C# project file into a buildable solution (sln) file.

Introduction

After getting tired of tweaking solution (.sln) files and project references to create buildable solution files, I decided to write a utilty to do this for me. The result is SolGen.

SolGen is a utility that takes any C# project file and generates a buildable Visual Studio solution (.sln file), containing that project and all its dependencies (and their dependencies and so on). It also updates each of the projects' references to point to the projects in the solution. By running this command line tool on the project you end up with a solution that you can instantly build, without having to worry about locating dependencies or updating references manually.

Background

There are lots of issues with solution files in Visual Studio. I've heard a lot of complaints about HintPaths having full or incorrect paths, often where the HintPath points to one particular release (Debug) but the build is to target Release causing a mismatch of DLLs. While the projects may exist in the solution, Visual Studio does not always pick up the projects and update references, leaving you to manually remove and re-add any references that show up with triangles.

Using this simple tool you can automate the creation of a very straightforward build process, without having to worry about creating a build script yourself. The utility intelligently locates projects and effectively creates a build script for you. Once the solution is created, just run devenv.exe, passing in your file and "/build" to build, and your build procedure is complete. You can also specify whether you want a debug or release build, and can even schedule it as a process and have it email the result of the build every night.

How It Works

The SolGen utility works by parsing the content of the main project, and any projects under the specified folder. It collects information on the GUID, AssemblyName and references of each project file. It then uses this to determine, by assemblyName, which references are actually pointing to which projects. It then updates the references to point to the project GUID rather than the HintPath. The GUID is required for the project to be recognized by references when it is part of the solution.

The utility then determines the dependencies of the project and generates a solution of all the projects required to build the main output.

Usage Guidelines

SolGen is a command-line tool, so you should invoke it from the dos prompt. When you run SolGen you will be presented with a list of options for running the utility.

Use: solgen projectfile.csproj [solfile.sln c:\folder]

The most important parameter you will supply is the project file. Given only a project file, it can generate a solution with the same name as the project (but the SLN extension), and look for all C# projects in the parent folder downwards. If you want to override the solut