Stuff I'll forget if I don't write it down: February 2011

My Blog has moved to Github Pages

Thursday 24 February 2011

ElasticSearch vs SOLRCloud

For an upcoming work project I need a scalable search platform - scalable to tens or hundreds of millions of documents (news articles), and millions of queries per day. We're a (mostly) Java shop, and have a lot of experience with Lucene, so two solutions that pique my curiosity are SOLRCloud (SOLR + ZooKeeper) and ElasticSearch.

Initial Impressions - ElasticSearch

ElasticSearch is impressive. Its clean, simple, and elegant. For those who are familiar with Compass, ElasticSearch can be considered as Compass 3.0 (quoting Shay Bannon, author of Compass). ElasticSearch has been under development for about 9 months at time of writing, and is currently at version 0.15. It appears to be very actively developed, with new features and fixes flowing steadily.

My main worry at this point is that there appears to be only one "resource" active on the project - Shay Bannon (@kimchy) himself, who seems to be architect, developer, documentation-writer, and a prolific commenter on forums.

Noteworthy features include:

Document-oriented / Schema-free (JSON documents)
Store, retrieve, index and search multiple versions of documents
Self-hosting RESTful web-service api
Exposes the full power of lucene queries
Multiple Indexes in one cluster (described as Multi-Tenancy)
Built from the ground-up with scalability and distributed-operation in mind - supporting distributed search, automatic fail-over and re-balancing, with no single point of failure
Support for async write/backup to shared storage (Gateway, in ElasticSearch parlance)
"Percolator" (aka. prospective search)

Initial Impressions - SOLRCloud

SOLR is a project from the same (Apache) stable as Lucene itself, and the projects have recently merged to some degree. SOLRCloud is an extension that integrates ZooKeeper with SOLR with the express aim of "enabling and simplifying the creation and use of Solr clusters."

SOLRCloud is described as "still under development", ie., not yet a GA release. Currently proclaimed features include:

Central configuration of the entire cluster
Automatic load-balancing and fail-over for queries
ZooKeeper integration for cluster coordination and configuration (not sure I would have listed that as a feature personally!)

I'll add that SOLRCloud is part of the SOLR code-base, and is being developed by core Lucene and SOLR committers including Mark Miller and Yonik Seeley. This can only be a good thing :). On top of all that, SOLR has been around for a good long time now, so it is battle-tested and there's lots of information available (including numerous books).

That said, I still have a good few worries:

Setup/deployment just sounds fiddly - it is recommended not to deploy zookeeper embedded with SOLR (though I cannot find any explanation to back up that recommendation), which means you need both a ZooKeeper ensemble - multiple ZooKeeper instances - and a SOLRCloud ... er ... cloud.
No GA release as yet, and no roadmap that I can find (this is the closest I got).

Next Steps

My next steps are to dive in to both technologies and really get to see which best suits our needs, and really how difficult these things are likely to be to manage in a medium/large-scale deployment.

Tuesday 15 February 2011

Progressive Enhancement with GWT, part 3

Read this article on my new blog

Read about my GWT.Progressive library on my new blog

This is the third part in a series, following my thoughts on using GWT in SEO'able web applications. The other parts in the series are part 1 and part 2.

In my previous posts I described an idea for progressive enhancement using GWT - "activating" server-generated html, to combine GWT goodness with an SEO friendly server-generated website, and my findings after some initial trials.

One of the problems I described in that second post was that it would be very difficult to work with these widgets if nested widgets could not be automatically (or at least easily) bound to fields within this widget.

After a little playing around and learning about GWT Generators I now have what seems like a nice solution, using a Generator to do almost all of the donkey work. Think of it like UiBinder, but with the templates provided at runtime (courtesy of the server). Here's an example class that automatically binds sub-widgets - an Image in this case - to a field of that class:

public class MyWidget extends Widget {
   
    interface MyActivator extends ElementActivator<MyWidget> {}
    private static MyActivator activator = GWT.create(MyActivator.class);
   
    @RuntimeUiField(tag="img", cssClass="small") Image small;
   
    public MyWidget(Element anElement) {
        // this will set our element and bind our image field.
        setElement(activator.activate(this, anElement));
   
        // now we can play with our fields.
        small.addClickHandler(new ClickHandler() {
            public void onClick(ClickEvent aEvent) {
                Window.alert("clicked!");
            }
        });
    }   
}

This class will bind onto any html that has an image tag somewhere in its inner-html, for example:

<div> <!-- Say our MyWidget is bound here -->
  <div>
    <span>
      <img class="small" src="/images/image.jpg"> <!-- will be bound to our Image widget -->
    </span>
  </div>
</div>

Anyone familiar with UiBinder will recognize the pattern I've used for the "activator":

Extend an interface with no methods

interface MyActivator extends ElementActivator<MyWidget> {}

GWT.create() an instance of that interface

private static MyActivator activator = GWT.create(MyActivator.class);

Then use it to initialize your widget

setElement(activator.activate(this, anElement));

The nice thing about this is we can automatically bind as many widgets as we like onto various sites within the inner-html of our current widget's element. It doesn't mess with the structure (unless you explicitly do so after the binding is done for you), and you can have as much other html within the elements as you like - it will just be left alone, which gives your designers the flexibility to change the layout quite a lot without necessarily needing to re-compile your GWT code.

Currently I have my generator set up to allow your widgets to bind to a choice of tag-name or css-class or both, for example:

// bind to the first <div> found by breadth-first search of child elements
@RuntimeUiField(tag="div") Label label;

// bind to first element with class="my-widget" found by breadth-first search
@RuntimeUiField(cssClass="my-widget") Label label;

// bind to first <div> with class="my-widget" found by breadth-first search
@RuntimeUiField(tag="div", cssClass="my-widget") Label label;

Notice in my examples so far I'm binding standard GWT widgets onto the nested elements. This works for the elements I've used in these examples because they all have a

public static Type wrap(Element anElement)

method which allows those widgets to be bound onto elements that are already attached to the DOM.

It is also possible to bind widgets of your own making in one of two ways:

Create a wrap method like

public static MyWidget wrap(Element anElement)

Create a single-argument public Constructor that accepts an Element as its argument.

Activate-able widgets can be nested within other such widgets - with no limits that I am aware of so far - and it is also possible to assign nested widgets to a List field in the enclosing widget, like this:

@RuntimeUiField(tag="img") List<Image> images;

This will search recursively for any <img> tags inside the enclosing widget's element and bind them all to Image widget's that will be added to the List. The current limitations here are that the List must be declared either as List or ArrayList, and parameterized with a concrete type that meets the criteria defined above (i.e. has a static wrap(Element) method, or a single-arg constructor that takes an Element as the argument).

A remaining question is how to bind the outer-most Widget. Currently I'm doing that using the DOM scanning code I wrote during earlier experiments and which I'm also using in the automatic scanning process set up by the Generator. For example to find the outer-most widgets and kick off the binding process I have something like this in my EntryPoint:

public void onModuleLoad() {
    List<MyWidget> _myWidgets = new ArrayList<mywidget>();
    for (Element _e : Elements.getByCssClass("outer-most-widget")) {
        new MyWidget(_e);
    }
    // do stuff with our widgets ...
}

I think of this as very similar to the RootPanel situation - "normal" GWT apps kick off by getting a RootPanel(body tag) or RootPanel's (looked up by id), to which everything else is added. It would be nice to hide away some of that scanning code inside a "top-level" widget - much like RootPanel does for the normal case. I can imagine this might look something like:

public void onModuleLoad() {
    Page _page = Page.activate();
    _page.doStuffWithWidgets();
    // ...
}

I still have lots of things to figure out and questions to answer, for example:

What's the performance like when binding many hundreds of widgets?
How will this really work when I make ajax requests for more data? (should I make ajax requests for html snippets which I add to the DOM and then bind onto, or switch to json for ajax requests and make my widgets able to replicate themselves from an initial html template?)
What's the best way to divide labour between developers and designers, and for them to organize their interaction? (Ideally I'd like there to be something of a cycle between them, where the designer can rough-out a page design, agree the componentisation with the developer, the developer knocks out some components and a build which the designer can use to activate their static designs, add fidelity, work on other pages with the same components, etc).
Where is the sweet-spot between creating high-fidelity html server-side and decorating it client-side using GWT? Should the GWT components really be just for adding dynamism, or is it a good idea to use them to build additional html sweetness? - I mean the server could dish out html that is more of a model than a view (just enough "view" to satisfy SEO), and the GWT layer acts as a client-side controller and view (SOFEA/TSA with a nod to SEO).

I'll try to keep posting as I work things out.

This probably belongs in a separate post, but with reference to that last point on TSA (Thin Server Architecture) - the working group list the following points to define the concept:

Do not use server-side templating to create the web page.
Use a classical and simple client-server model, where the client runs in the browser.
Separate concerns using protocol between client and server and get a much more efficient and less costly development model.

I'm right behind them on (2) and (3), and also on (1) for "enterprise" apps where SEO is a non-goal. However, for an app that needs SEO, (1) is a deal-breaker, so I'd offer this alternative 1st rule instead:

Use server-side templating to produce a model for the client to consume which minimally satisfies the needs of SEO.

Sunday 13 February 2011

Progressive Enhancement with GWT, part 2

This is the second part in a series, following my thoughts on using GWT in SEO'able web applications. The other parts in the series are part 1 and part 3.

Since my earlier post, I spent a little time (only a few hours really, so far) trying a few things out. Here's a smattering of things I learned...

Scanning for elements and binding widgets onto them is easy. Making those widgets behave just like widgets in any normal GWT app needs a little more work.

Who's the daddy?

One big problem to get around is that normally GWT widgets are attached via a hierarchy of other widgets (parents) leading back to the RootPanel, whereas when you bind onto some arbitrary element that is already on the page you don't get this hierarchy for free.

When widgets are added to a parent widget some magic happens to set up things like the eventing system. Without that magic you can add as much event-handling plumbing as you like, but it won't work because your widget isn't wired into the eventing system.

Actually getting around this is not all that difficult. Simply invoking onAttach() will wire up your widget, though its a little unpleasant to have to do that.

Another problem with the lack of hierarchy is, well, there's no hierarchy. Things that you would normally do in GWT widgets - like adding, removing or replacing child widgets - gets a little trickier. If you want to use the technique recursively (and why wouldn't you?), you need to allow widgets to bind to elements inside other widgets without causing them to be removed from and re-attached to the DOM, but crucially you still need to add them as 'logical' children of the parent widget, otherwise the parent knows nothing about the child widgets and can't do any of those "normal" operations with them.

To do that there are two problems to overcome:

The parent needs to have the children added to it, so that the set of child widgets is known and available for manipulation (say by extending ComplexPanel and using the getChildren() method).
Some of the child widgets might need hard, typed references in the parent widget to allow direct manipulation of the child widget - just like in a "normal" GWT widget you would keep a reference to the Button you added in the constructor in order that you can bind ClickHandler's to it or toggle its enabled-ness.

Point 1 is easily solved - any widget that wants to play this way needs to support adding other widgets without triggering an attachment to the current element. When you add a normal child widget to a normal parent widget, the child is detached from its current parent - logically and physically - so that its html element is actually inserted into the DOM under the parent's element. This is not what we want when binding onto a template - we just want the logical attachment step, so we need to support an add method something like:

public void logicalAdd(Widget aWidget) {

getChildren().add(aWidget);

adopt(aWidget);

}

I've yet to try to solve point 2. So far I've built:

Tools to help with scanning for elements to bind to, and then binding the right widget.
Plumbing to allow recursively binding widgets with logical hierarchy intact (point 1 above).
An example that binds widgets recursively - an outer container, an inner container, and a bunch of widgets inside that are manipulated by the inner container.

I'll try to update the post with an example at some point. Meanwhile my next challenge is to solve point 2 such that widget developers can build their widgets in a fairly typical GWT way.

As an aside, I lay awake for a while last night pondering the ability to give designers a client-side templating system, where they can write the html for a component once (declaring it to be a template, which may include recursive binding points for GWT-activated widgets) and then re-use it elsewhere within their html by reference to the template. I'm sure this would be possible, though its utility might extend only to mock-ups.

Saturday 12 February 2011

Installing fonts in Ubuntu

Installing fonts in ubuntu is very easy these days - just open a ttf file and you are presented with a nice sample of the font (quick brown fox style), and a button in the bottom right corner to install the font.

Nice'n'easy, but you're not quite done yet. You'll definitely need to restart running apps before the font becomes available to them, and quite possibly you'll need to rebuild the font cache, which you can do by rebooting (hah!) or:

sudo fccache -fv

btw., check out Eurostile. Its about 50 years old, but nonetheless is one of the most gorgeous fonts i've ever seen.