Sriram Krishnan (Moved to http://www.sriramkrishnan.com/blog)

Search. Usability. Virtual machines.Geek stuff

<August 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456


Navigation

Subscriptions

News

Link blog
Technorati Profile
The Blogs I read
Creative Commons Licence
This work is licensed under a Creative Commons License.


Thursday, October 14, 2004 - Posts

Review of Google desktop search

From the moment I read the Slashdot post, I've been playing with the Google desktop search utility (http://desktop.google.com).  Even though it's in beta, I'm amazed at the amount of sheen and polish. Make no mistake-  this is a typical Google piece of work. Thoughtful, usable and takes care of all the small details. Let's get in, shall we?

Download and installation

I'm amazed at the size of the download - its only 440 Kb. Its half the size of the Google toolbar! If this had been a Microsoft utility, it would have been some 20 MB with some 5 dependencies  and a service pack required. Download went pretty smoothly and quickly - though I downloaded it at the peak slashdotting moment. At that time, there must have been thousands of concurrent downloads streaming out from Google's (or Akamai's?) servers.

Installation went pretty smoothly for me. One annoying thing is that it insists on closing all your open Office, IE and Firefox windows(probably the first time I've seen an application ask to shut down Firefox). It didn't shutdown Outlook properly for me as my Outlook was in the throes of some operation and I had to manually give Outlook the boot.

After installing, it runs in your notification area (or system tray, if you prefer to call it that).

Some people have reported issues with the installation - in particular, some versions of the ISA client conflict with it. Google says this is a known issue in the knowledge base. Also, this utility needs 1 GB of free space in your system drive to install - this is where it stores its index.

First time indexing

Once it has finished installing, it starts building its index for the first time. Here's a problem - I've been letting it index for over 12 hours and it still hasn't finished for me. This is even after leaving the computer on overnight. The help docs say that it indexes when the system is idle - I wish there were some option which said "Index now and CPU usage be damned!". I want it to index but I'm not able to force it to which is a tad frustrating.

In its current release, GDS indexes the following items

- Outlook/ Outlook Express email

- Office documents/text files

- AOL chat

- Web history/cache

I like the way they integrate into Outlook - very unobtrusive and don't make much ado.Typical Google ,actually. Its surprising that they don't do MSN chat archives as its in a simple XML format too. Yahoo chat might be another story altogether as they encrypt their chat history. And all your Firefox users might be a tad disappointed - this thing doesn't see your Firefox history yet. But I'm pretty sure this'll be fixed in a future version.

Search

Search is done in a couple of ways. I have to admit that I never thought Google would take this approach - I was so stuck on clunky search UIs that I forgot the KISS principle. For those of you who don't know this yet, here's the kicker - this thing integrates with your normal Google usage.

When you search for a term in Google, you see the results from the local file system at the top - which you can then expand, drill down on,etc.

Search is a mixed experience - most of it is very good and very relevant as you would expect from Google. They seem to give a lot of importance to the date (as they don't have Pagrerank on the local file system).

I like how fast search is. I find myself viewing the cached version of my emails rather than wait for the elephantine Outlook to be roused from its slumber to view the mail. Oh yes- Google Desktop Search does the same kind of caching as the web version - which is a rather cool feature.

With that description out of the way, here are my thoughts

Things I like:

- The fact that it has Google written all over it.Unlike X1 or Lookout which are heavy and clunky, this is simplicity personified. And it fast. Really fast. But then, what else would you expect from Google?

- It handles all the common file formats with Google promising to throw in more. MSN Messenger not being recognized is a bit of a pain but I'm sure this will be fixed soon And I'd love PDF searching abilities too.

- It integrates right into your Google search. You don't have to do anything special to use it. I suspect Google wants users to think 'Let's Google it' whenever they want to search for anything, be it stuff on the web, their mail,their documents, their misplaced keys. Whatever you want to find in the future, Google is probably what you need to use.

- It does Outlook search better than Lookout can. And since it caches the text, you can now search your Outlook mail without Outlook open. And read it without Outlook open as well. Too cool!

- When it finds any result from your IE cache, it displays a thumbnail of the webpage right next to the search result. I wonder how they do this -my guess is that they plug into the Explorer thumbnail interfaces and get an image generated for them that way. And just to put the Google icing on the cake, the generated image is a PNG too

- It is flexible just like the traditional Google search. You have options for searching for specific filetypes by using the filetype: syntax and you can choose which folders you want left alone. Before privacy freaks get all worked up about Google indexing their life - relax. You can turn off specific file formats, paths,etc.

- And my favorite thing. It just works. And it doesn't get in your way. For those of you who are into UI designing, you would know that this is a huge compliment to their UI engineers.

Things I don't like that much

Well, the software is in beta and Google will probably address most of these issues before the release.

- I want an option of installing it some other drive as most people don't have much space on their system drive.

[Update: You can change where it stores its index by storing the data_dir variable stored in the registry]

- I really think that searching through web history should be better. I think the interface for this requires some thinking - for I usually get too many web history results cluttering up the first page of my search. I'm not sure what the exact solution would be - but this require a slight bit of work

- I would like the ability to add filetypes. Taking this idea along, it would be perfect if Google released a plug-in API for this(I'm sure they use one internally). For right now, I have to use plain old grep to search through all my source code in c# , Vb or Python which is a huge pain. Plugins and a SDK would be great.

Final thoughts

Rarely have I spent several hours on my comp just searching, but this is what GDS has made me do. One thing common to all Google software is that you feel 'good' on using it - be it Gmail or Picasa(Orkut is a black sheep though). I searched for a friend's name and found the name in some files which I had forgotten about. If I were Microsoft I would be worried - for when Microsoft were chasing Netscape, they were chasing an almost stationary target. But Google keeps pushing the bar higher and higher. And I'm not sure why they bought Lookout - but frankly, Lookout can't hold a candle to this.

The search wars have officially begun - and Google has brought the battle right to Microsoft's home turf - the desktop. With people like AOL and Yahoo also developing their own search software, finding stuff is gonna become a lot of fun.

As a friend of mine (http://myxp.blogspot.com) put it last night - "Google desktop would probably make that search dog run away with its tail between its legs".

And its about time too.

Scoble - are you sure it isn't too late for your search champs?

posted Thursday, October 14, 2004 5:29 PM by sriram

Functional programming in Python

Over the past 2 months, I've got into several arguments with close friends of mine on functional languages - especially Lisp. It is quite fun to engage in these debates - though you know they're utterly pointless :-) Usually, I'd wind up defending my favourite language by saying that you could pull off most functional-programming paradigms in Python.

Well - when you run your mouth, you must be prepared to actually know what you're talking about. After some digging around, I was amazed at how easily I could easily use Python to live life the FP-way, so to speak.

Warning: If you're new to functional programming,the some parts of this post could really make your head hurt. If you're an experienced FP or Python programmer, do be warned that I do over-simplify things somewhat in certain parts.

With that disclaimer, out of the way, let's dig in. I'm running this on a alpha release of Python 2.4 but Python 2.3 should be good enough to run this code. If you're running on earlier versions, you may need to do some 'futures' importing.

The laughably simple stuff

One of the reasons Paul Graham gives for using 'car' and 'cdr' instead of 'first' and 'rest' is that they line up properly in your editor. Btw, 'car' is a function which returns the first element of a list - and you can consider a list to be an array which grows automagically and doesn't worry about what it contains. 'cdr' is a function which takes a list and returns everything but the first element. So, putting together 'car' and 'cdr' give you the entire list.

So just to spite those people who turn up their noses at Python for not having 'car' and 'cdr' (you know who you are), let's write a simple version which does no error checking-  but does show off Python's slice operator.

def car(lst):
    return lst[0]

def cdr(lst):
    return lst[1:]

print car( [1,2,3,4,5,6,7,8] )   -->> 1

print cdr( [1,2,3,4,5,6,7,8] )  -->> [2,3,4,5,6,7,8]

Lets get all Pythony with higher order functions

I can hear all the FP-theorists laughing at this."Show us the functions, baby", they scream out [Note: As far as I know, FP theorists don't go about about screaming 'baby'. They usually argue about arcane stuff like monads,macros and other stuff you really don't want to know about].

Functional programming gets its name from the fact that it gives so much importance to functions.Duh!  [Note: This doesn't seem to hold true in all cases. What is so sharp about C# and schemy about Scheme.Hmm]. Therefore, in FP languages, you have the ability to pass functions as parameters and do weird stuff with them. This is not really the same as passing function pointers - FP lets you do a lot more as you'll soon see.

What we're going to do here is already present inside Python as the 'filter' function - but I thought I'd do it again to show you how simple it is. The goal of this exercise to write something like Common Lisp's 'remove-if' function. It works like this - you pass it a function taking one parameter and returning a boolean(we'll call this the predicate, just to sound cool) . You also pass the function a list. Now this function goes on its merry way, calling the function for every element in the list - and here's the kicker - only keeping the elements that the predicate returns false for. Or in other words, read it in English - Remove if this function returns true.

So, let's do this in Python. As functions are first-class citizens in Python (almost), you can pass a function in a parameter without any extra work. So let's implement remove_if - but this time showing off Python's list comprehensions.

def remove_if(predicate, lst):

    return [elem for elem in lst if not predicate(elem)]

print remove_if(lambda x:x % 2, [1,2,3,4,5,6,7,8])

Now, there's quite an amount of new stuff in there, so I'll explain. First, we pass a predicate (the function we'll be calling on every element on the list) and the list itself. We also pass the list itself (the lst parameter). Now, we make use of Python's list comprehensions.

The next line is a shorthand way of writing something like

        return_lst = []
        for elem in lst:
 
         if predicate(elem):
              return_lst.append(elem)
  
        return return_lst

Now, we pass an anonymous function to it which checks whether a number is odd, and then a list. The output of that print call would be 1,3,5,7

This is a pretty trivial example - so let's try something more complicated. I stepped right into enemy territory and took the first example I could find for higher order procedures from "Structure and Interpretation of Computer Programs" . This is the same sum of a series program that you find on page 54. It generates the sum of a series given the predicate that can generate the terms and an iterator-like predicate which generates the next term in the series.

def sum_generic(term,a,next,b):

    if a > b:
        return 0
    else:
        return term(a) + sum_generic(term,next(a),next,b)

Now, let's use this sum_generic function we've defined to do a cubic series - a sum of all the cubes from 1 to 10. To do that, we just have to say

print sum_generic(lambda x:x*x*x, 1, lambda x:1 + x, 10)
-->> 3025

Here the first lambda function generates the terms, and the second one increments the terms by one.This is nothing but 1^3 + 2^3 +... + 10^3. Nice, isn't it?

Returning those pesky lambdas

The Python-haters in the crowd are getting worried now. Yet some of them sneer "You're not truly FP until you can return functions based on functions you take as parameters". Reminds me of those WWF (now WWE) cards I used to collect as a kid. You were not a true fun untill you had *all* of them. Oh well...

Here's where those among you who haven't seem much of the lambda-colored stuff might find it tough-going. Functions being modified and returned is not commonly seen in the imperative world. I have to admit I was a bit confused with this aspect. In fact, I almost considered dropping the whole thing when I saw the example given for this in the "Python cookbook" which involves a wrapper class. But then, a couple of pages later, I saw something that made me stop dead in my tracks. Stop dead as in I've-been-trampled-over-by-a-dinosaur dead.Shocked. And here's why.

 Python has lexical scoping.

Read that. Read that again and let it sink into your veins.I can imagine Aarthi gaping in shock at this. As she knows that this means that Python *can* do a lot of things she thought it wouldn't be able to do. And we'll see exactly what those things are.

For those of you who are wondering what lexical scoping(as opposed to dynamic scoping) is, it's like this. You bind variables to scopes based on how they are at compile-time (or at tokenizing-time) rather than how they are when they are getting executed. It may not be immediately apparent why this is a good thing...but trust me, its a very good thing. For this put together with nested functions support in Python means we can weave some FP magic.

First of all, let's do something simple. I'm tired of writing 'print' after every function call. Wouldn't it be nice if the function printed its output as well as returning it? Pretty easy, you'd say - just add a print statement. But what if its a function you didn't write? And what if you want to do it for a lot of functions? Obviously, writing so many wrapper functions which just print the output for you is not going to be fun.

What you need is some way of modifying the function itself  ( and here, we're talking about any function..whether you've written it..or someone else)- so that it not only does what it is supposed to do - but *also* prints its output before returning it. Think about that - you have to *modify* a function.  Here's how I've done it in Python (you need 2.3 for this - or do from __future__ import nested_scopes in earlier versions).

def print_func(predicate):
   
    def func(*args):
       result =predicate(*args)
       print result
       return result
   
    return func

Looks pretty simple, doesn't it?  We take a predicate as a parameter - and then call it with the arguments which will be specified later. And before returning the result, we make sure we print it out. After having done all this, we return this function. This is a very good example of how closures are cool - we are generating a function at runtime based on various parameters and then returning it.

def add(x,y):
    return x+y

y=print_func(add)

y(5,8)

-->> 13

Now, we can pass any function to it and we'll get a spanking new function generated for us.In this case, we pass it  an add function and we get back a function which not only sums its parameters - but also prints the result.

[Note: My code doesn't take care of named parameters - as I wanted to keep it pretty simple. It should be trivial to add support for it]

At this point , Kaushik(http://www.dotnetjunkies.com/weblog/kaushik) might argue that the same would be possible in .NET by modifying the IL at runtime or in C/C++ by changing the program at runtime. But even he,being the C/C++ hacker he is, might find it a tad difficult to do it in 6 lines of elegant code :-) And he definitely won't want to write the following code in C or C++

[Update: And just to prove once and for all that he is god when it comes to C++ - Kaushik actually took me up on my challenge and wrote up a post at http://dotnetjunkies.com/WebLog/kaushik/archive/2004/10/15/28624.aspx]

Currying

We saw in the previous example how to take a function as a parameter and return a function which calls that predicate. Generalizing on that topic, we get the concept of 'currying'. Yes - I found the name funny too. I would strongly suggest you do a Google search and read up the various articles which do a better job of explaining what currying is. I like to think of currying as calling functions where someone else has already done the trouble of defining some of the parameters for you. To illustrate this, let me steal an example from the Python cookbook.

Think about how you'd write a 'multiply' function. You'd probably write something like this

def multiply(operand1, operand2):

    return operand1 * operand2

Now, let's say you are a sadistic control freak and instead of giving your users the ability to multiply any two numbers, you only give them the ability to double them.  So you'd go about writing a wrapper function,right? You could use currying to accomplish your task. Wouldn't it be cool if you could just say


double = curry(operator.mul, 2)
triple = curry(operator.mul, 3)

double(6)

-->> 12

Now, you could go about adding any number of functions without having to right wrapper functions for each one. Now, let's see how to implement a simplistic 'curry' - which doesn't do named parameters.

def curry(*args):

     def callit(*moreargs):

          return args[0](*(args[1:]+moreargs))
  
return callit

What we're doing is calling the first parameter we get as a function with the rest of the parameters as arguments. We package this in a function and return it [Note: All that * stuff corresponds to variable-length arguments in other languages]

I think I'll stop now - but have fun exploring Python. Its been a long time since I've found a programming language so much fun :-)

 

 


       

 

 

posted Thursday, October 14, 2004 12:00 AM by sriram




Powered by Dot Net Junkies, by Telligent Systems