Place a context-sensitive search on your website or blog

0 comments

Google is making it too easy for the modern web developer. I remember writing a brute-force search engine in Perl for a small web site about 11 years ago. It was slow, resource intensive, and would not scale well with a site of any size.

Skip forward a bit. Google now provides the ability to add a custom search to your site at no cost. Unless yours is for a non-profit organization, displaying ads is mandatory, but you do have the option to share in the profit (or pay to have the ads removed).

Google Custom Search is really quite flexible. You can tell it what sites to include or exclude. You can have it boost the ranking of a given number or sites or simply filter out all the sites that don't meet your criteria. Custom Search even gives you the ability to generate these rules on the fly so that results can be tailored to a particular user or a particular page of the site.

This article is not intended to be an in-depth tutorial on Custom Search. For documentation, see here: http://www.google.com/coop/docs/cse/

The definition of your search engine can either be Google hosted or you can host it yourself (This is termed a Linked CSE). For context-sensitive search, yours must be a Linked CSE.

Custom Search Engines (CSEs) are defined by a simple XML specification. This specification may be generated dynamically for interesting results. Below is a very simple example of a dynamically generated CSE which parses all of the links out of the referring page in order to include only those sites within the search results. The code snippet is written in PHP.

<?php
header('Content-type: text/xml');
$url = $_SERVER['QUERY_STRING'];
if (!$url) $url = $_SERVER['HTTP_REFERER'];
echo <<<CSE
<?xml version="1.0" encoding="UTF-8"?>
<GoogleCustomizations>
<CustomSearchEngine keywords="" language="en">
<Title>TECHHEAD</Title>
<Description>The search for All Things Tech</Description>
<Context>
<BackgroundLabels>
<Label name="techhead_site" mode="FILTER"/>
</BackgroundLabels>
</Context>
</CustomSearchEngine>
CSE;
if ($url) {
$url = urlencode(urldecode($url));
echo <<<CSE
<Include type="Annotations"
href="http://www.google.com/cse/tools/makeannotations?url=$url&label=techhead_site"/>
CSE;
}
echo <<<CSE
<Annotations>
<Annotation about="*.techhead.biz/*">
<Label name="techhead_site"/>
</Annotation>
</Annotations>
</GoogleCustomizations>
CSE;
?>

Notice how certain annotations are included dynamically based on $url (which is simply the HTTP referer of the script). Annotations are for matching URL patterns with labels. The CSE is set to FILTER out any site that isn't part of "techhead_site". So the annotations define what is to be considered for inclusion in the search results.

Now we have a dynamic CSE definition. To use it, paste the appropriate HTML code snippets (http://www.google.com/coop/docs/cse/cref.html) into your web page or blog template. You will need to substitute the value of the hidden form parameter "cref" for whatever the URL to your PHP program is. For a more concrete example, simply look at the page source of this blog at http://blog.techhead.biz

For a much better Google Custom Search tutorial, see here: http://www.free-your-mind.info/ Then come back to fill in the blanks and make your search engine context-sensitive.

Building a true object-oriented system for Common Lisp

3 comments

While the title of this post may be seen as inflammatory, allow me to elaborate before dashing off to the comment link and turning your flamethrower loose.

As you have probably been taught, Dr. Alan Kay coined the term "object oriented" (OO). His term, his rules (Which can be found here: http://www.purl.org/stefan_ram/pub/doc_kay_oop_en). The Common Lisp Object System (CLOS) does not fit the bill to be called object oriented, although it still arrives at many of the benefits of object-oriented principle (and perhaps more -- I will expound on this later). But as Dr. Kay mentions in the above link, OO can be done in Lisp.

I have been exploring programming languages a great deal as of late. One language in particular, Erlang, has caught my attention because of its much hyped nothing-is-shared model of concurrency. Erlang accomplishes scalability by breaking tasks into many lightweight processes that communicate with each other in the form of messages. (Hmm. Where have I seen that before?) There was something very OO looking about some of the code samples I saw. Only instead of method names being "invoked" on an object, a process listens for messages of different types and those message types are represented by "atoms", which are essentially strings that contain some meaning to the program. Communication between processes in Erlang fits with Dr. Kay's requirements for an OO language (messaging, late binding, etc). It also fits the Actor model.

It can be quite confusing to distinguish between the Actor model and that of OO programming. The reason? (And this will likely be a point of some dispute.) There isn't really any difference. The term "Actor model", I believe, just had to be invented because the term "object oriented" had come to represent something other than the original meaning. OO programming has now become synonymous with stack-based models where messages are passed up and down the stack and "messages" are really just function calls conceptualized as messages. Actors on the other hand are discrete (messages and responses don't traverse a call stack) and are therefore inherently concurrent. Erlang processes and web services are good examples of the Actor model.

Now back to CL. CLOS uses multi-methods instead of message passing, so it is not strictly OO. However, as promised earlier, I'm going to stroke the CL users a little bit here and propose that perhaps the CLOS model is more appropriate to be called "object oriented" than Dr. Kay's model. Why? Because OO programming is (supposed to be) all about the encapsulation of behaviors, and objects don't do anything. But functions do.

What do you mean, "An object can't do anything"? Well, in real life, an object (I think of a rock or a piece of wood) does not do anything. A rock cannot skip itself, change its location, size, or color, or tell you any of these things about itself if you send it a message. A rock is simply a rock. If you think in procedural terms, it would be a data structure.

A data structure? Wha? I thought that "everything is an object?" OO languages assign a sort of anthropomorphic view to objects (Objects are people too) which makes perfect sense if you look at the design principles behind Smalltalk (the language Dr. Kay invented to go along with the term). One of the major design goals was to create a human interface with the machine, to allow the programmer to think in an (arguably) more human way.

Along those lines (thinking like a human), I like the term Actor better than Object because it can be explained in a way that makes sense. A rock cannot tell you anything about itself. However, an actor portraying a rock can. (I'm not a rock. I just play one on TV.) This way, everything can be an actor and the model stays coherent. There is no data, only actors that portray it. (Again, refer to Dr. Kay's letter above and where he talks about dataless programming.)

This makes sense to me as I often equate programming language models to models of the universe. But the question remains how to best model the universe. What is the essential building block of the universe? Is there one?

So what is the building block of a program? An object? And what about not only modeling what an object can do but what can be done to it? What rules? What forces apply?

Enter functions. In CLOS, a function (or group of functions called multi-methods) defines a behavior for an object. And these behaviors govern object interactions. Just as a rock has no knowledge of everything that can be done to it, the behaviors do not belong explicitly to the rock, but rather to the context in which the rock is to be used.

And this model is more coherent for "objects" (as I described them earlier). But for our anthropomorphic figures, objects that DO, it makes more sense to just ask them (as in the messaging model).

Most modern programming languages have lost coherency in their model. So when I ask what the building block of a program is, it's not simply rhetoric. The answer will, of course, vary by model, but then is it even possible to introduce a model where there is a single coherent answer? Well, of course it is possible. But perhaps a better question should be, is it practical? Is it a viable way to model our universe?

The question came to me when I realized that an object could be implemented as a function and vise versa, so it's sort of a chicken-and-egg problem.

In some languages (one of them is CL), a function may be curried with a particular parameter or live within an enclosing scope. These functions are essentially just objects as we know them. Here's a short example:

(let* ((privateVar 2)
(methodDictionary
`((set . ,(lambda (x) (setf privateVar x)))
(+ . ,(lambda (x) (+ x privateVar)))
(- . ,(lambda (x) (- x privateVar))))))
(defun simple-object (methodName &rest args)
(apply (cdr (assoc methodName methodDictionary)) args)))

(simple-object '+ 2) ; equals 4
(simple-object 'set 40)
(simple-object '+ 2) ; equals 42

A whole object system can be created using such closures. In fact, that's exactly what I've done. It may not have been the most practical exercise. (I'm certainly not advocating using it over CLOS.) But for someone interested in language design, as I am, you may find it enlightening. It is prototype based, and state is kept private. You can find it here or on Wikipedia as YACLOS, so feel free to improve on the code if you are so inclined.

Beust Coding Challenge

2 comments

Once again I am late to the party. Cedric posted a coding challenge on his blog. It looks like Crazy Bob took the prize for most creative (and fastest) implementation. I would like to think that I could have come up with a similar solution, but then again, I also like to think that I could look like Schwarzenegger if I ever started working out.

However, Cedric posted a follow up, “Port Bob's code to your language.” Bob’s code was in Java. However, I am a Java programmer, so that sort of rules out a port. But I was able to make a few performance enhancements. Instead of a doubly linked list, I used a singly linked list and thus was able to prevent a few unnecessary assignments and eliminate a few checks. In addition, once the max has been found, I use a Throwable to unwind the stack. On my Intel MacBook, I see performance enhancements of 10-20%. The code can be found here: ModifiedBeustSequence. Or for the comparison, see here: CompareBeustSequence.

But sticking to the assignment, I did do a straight port of my modified version into Common Lisp, which I have been toying with lately (The port can be found here: beust.lisp). I’m sure that any respectable Lisp programmer could come up with a much more Lisp-like solution. If you go to the trouble, I would love to see it.

The Groovy Programming Language

0 comments

Long before Guice, there was PicoContainer. I used to lurk the mailing lists of PicoContainer and the Avalon Framework project back in the day when Inversion of Control (IoC) was a relatively hot topic. (In fact, if you Google “Jonathan Hawkes” at this time, one of the very few mentions of me is one of my very few posts on the Avalon list.) However, this article is not intended to be an introduction to IoC.

PicoContainer is how I discovered Groovy.

Ant was at its hayday. XML was still somewhat of a buzzword (if it is not now). The folks at the NanoContainer project (PicoContainer’s big brother) made the (gasp) startling realization that all of the configuration files required by Avalon and the Spring framework were actually code. XML configuration files can be large and unwieldy. Learning all of the correct tags and attributes can be a more daunting task for a programmer than simply learning a new API. So why not just write configuration code in… well… code?

One reason is that it’s nice to be able to change configuration on the fly, or at least without a recompile. This is where Groovy comes in.

Groovy is a language for the Java VM. To call it a scripting language would be somewhat of a misnomer. Groovy produces Java bytecodes. It can call Java classes and be called by Java classes. However, like a scripting language, you can change it. You don't have to build it. (Unless you want your vanilla Java code to call it in a typesafe manner.)

Groovy essentially extends the syntax of Java. In most cases, valid Java code will be valid Groovy code. However, like a scripting language, Groovy does not force you to know the type of an object at “compile time”. It supports closures and lots of other little goodies.

Go check it out. If you are a Java programmer, you owe it to yourself.

http://groovy.codehaus.org

What is a TECHHEAD?

1 comments

“When I was a child, I was quite adept at taking things apart. Putting them back together on the other hand...”

This is a story I casually tell clients as I have their notebook computer splayed out all over the room. They think that I am teasing them, and I am in part, but as with most jesting, there lies a kernel of truth. And the truth is, in my life I have taken apart many more things than I have put back together, caused myself more problems than I have fixed, and started more projects than I have finished. I find that learning is bold, messy, and full of unforeseen consequences. Mine is a tale of learning.

My name is Jonathan Hawkes, and I am a TECHHEAD.

I began my odyssey into computer programming when I was seven years old. I started out by writing simple games in BASIC on my friend’s Commodore 64. We would spend the better hours of the day punching away at the integrated keyboard, testing, modifying and testing again. We would draw out sprites onto graph paper, convert our makeshift bitmap into hexadecimal, input the figures into our program and enjoy the fruits of our hard-earned artistic labor. Then long after sundown, we would give a simple salute, power down the machine, and say a solemn goodbye to our wasted day’s effort. (At that time, my friend had no floppy drive.)

The subsequent years passed in similar fashion. Games, robots, lasers, etc. — Some were built; all were taken apart; few survived. BBSs were the primary mode of communication. L.O.R.D. and TradeWars rocked. Apogee was the coolest company ever.

Then Al Gore said, “Let there be internet.” And there was internet. The local college had a dial-up gateway that would allow terminal-based access to Gopher, Usenet and the WWW (through the Lynx browser). Way cool. I would page through binary Usenet groups while capturing output to a text file within HyperTerminal. I wrote a UUEncoder/Decoder in C and had instant access to a far larger range of games and other content than any BBS to which I had been a member.

More years passed. My family subscribed to a “real” dial-up account. It came with a finite amount of hosted web space and CGI access. Bingo. I soon discovered that Perl was more suited to the challenge of building web applications than C, and my first Perl project was a sort of simple wiki (circa 1995) called the ScrawlWall (which was just a text box that anyone could modify and save to leave a short note or ASCII art). That same year, I also developed an AJAX precursor, the JavaScript ScrawlWall, which could be saved and loaded without ever leaving the page. Before shutting the project down in 1998, I hosted over 8,000 ScrawlWalls.

After that began my career. As an independent contractor, I developed numerous web sites as well as a (then impressive) cross-browser DHTML library called Pane DOM. This was back when Netscape Navigator only allowed manipulation of content through “layers” and Internet Explorer had much of the same limited capabilities that it does now. Being as impressive of a (still) teenager as I was, I was offered a full-time job by one of my clients. I was newly married (Yikes! Kids these days), so I jumped at the opportunity. This was near the time the first dot com bubble burst. My company proceeded to sink over 3 million dollars into advertising and not a tenth of that into development. Though we had an economic development grant of over a half a million, the plug was pulled on our project before we ever had a finished product (or spent the money). Brilliant!

However my boss, who was one of those really great guys you work for once in a lifetime, managed to keep the team working for a while on various other projects (with remarkably similar outcomes). In the end, we did manage a Cobol-to-Java rewrite of a popular online commodity futures and options trading platform. And I led the development and maintenance for a number of years.

Salute. Power down. Say goodbye.

The problem with having a job that requires your mind is that when you lose your mind, your job will soon follow. So once my project no longer required my guardianship, and I could no longer stand the sight of it, I did what any reasonable person would do. I quit my well-paying job and became a professional ski patroller.

For any other kind of person, this may have been the end of the story. But for a select few there will always be a draw, an inexplicable force that compels one to the soft glow of a computer display and the distinctly stale aroma of the great indoors. So a number of years later, I traded in my ski boots for house shoes and returned to the home office. Thus TECHHEAD was (re)born.

Are you seeing a pattern here? I am a firm believer in reincarnation. I just believe that we have one lifetime in which to accomplish it. So now I am back where I started. But I am learning. I am building. And starting with this, my first blog post, I am sharing. Today I hope to build something truly great, scrap it, and start again tomorrow with the same enthusiasm and perhaps just a little bit wiser.