Introducing Hashdot

From the summary of Hashdot as published on SourceForge:

Hashdot elevates Java-platform script interpreters to first class status on Unix-like operating systems. It provides a script aware replacement to the stock ‘java’ launcher, and thus avoids numerous issues in using the ‘java’ launcher to bootstrap a script interpreter. All relevant interpreter and JVM options (i.e: Java heap size) may be specified directly in a script header and/or via system profiles, without resorting to environment variables, command line arguments, and the additional wrapper shell scripts needed to maintain them.

There are some interesting aspects to this work. Firstly, Hashdot really needed to be written in C for tight UNIX integration (fork(), dlopen(), prctl(), setsid(), etc.) Though I had developed in C and C++ for many years, it’s been a while, and I was surprised at how cumbersome I now found it.

Its a bit of of an odd predicament to be forced to write C, essentially as glue code between UNIX, java, and a Java-platform script interpreter. When I took on my first large scale java project in the late 90’s (Java 1.1, Netscape Search federation and UI) I reserved the right to do any heavy lifting that was required in C++. Surprisingly, it didn’t come to that then, and only rarely has in the last 10 years. I have spent some time optimizing inner loops in a style of java that looks all too close to C. So here I am writing actual C again, but as glue code rather than for performance reasons, and simply to gain some modest comforts while writing ruby and java for the JVM!

Its doubly odd to be writing C glue code in order to make decidedly incremental and cosmetic improvements to a UNIX, Java, and JRuby stack. What if however, the aggregate total of current stack integration warts, annoyances, and complexities might otherwise retard broad acceptance of that stack by any of the UNIX, Java, or Ruby camps? Maybe then it would be worth diving in.

The Stock ‘java’ Launcher

Understanding the why of Hashdot must begin with a problem statement, and the problems begin with the stock ‘java’ launcher command line:

% java -h
Usage: java [-options] class [args...]
           (to execute a class)
   or  java [-options] -jar jarfile [args...]
           (to execute a jar file)

where options include:
    -client       to select the "client" VM
    -server       to select the "server" VM
    -cp <class search path of directories and zip/jar files>
    -classpath <class search path of directories and zip/jar files>
                  A : separated list of directories, JAR archives,
                  and ZIP archives to search for class files.
    -D<name>=<value>
                  set a system property
    ...

Like many before and after, the Java programming language started life as a toy. It must have seemed reasonable at that point, for a user starting a java program via a class’s Main method to effectively link its dependencies by passing -cp on the command line. Anyone using Java in any sort of performance intensive application would know that also setting an appropriate maximum heap size (-Xmx) and, for long lived services, using the -server VM is essentual. All told the following example “command line” is typical for a full production java service:

java -server -Xss512k -Xms32m -Xmx256m -XX:NewSize=16m -Xnoclassgc
-Dsun.net.inetaddr.ttl=30 -Dsun.net.inetaddr.negative.ttl=0
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=/home/david/run/\
  apache-tomcat-5.5.20/conf/logging.properties
-Djava.endorsed.dirs=/home/david/run/apache-tomcat-5.5.20/common/endorsed
-classpath :/home/david/run/apache-tomcat-5.5.20/bin/bootstrap.jar\
  :/home/david/run/apache-tomcat-5.5.20/bin/commons-logging-api.jar
-Dcatalina.base=/home/david/run/apache-tomcat-5.5.20
-Dcatalina.home=/home/david/run/apache-tomcat-5.5.20
-Djava.io.tmpdir=/home/david/run/apache-tomcat-5.5.20/temp
org.apache.catalina.startup.Bootstrap start

Since its rather impractical to type this on the command line, the obvious solution is to write a shell script wrapper that exec’s it. Ironically, by not offering a better facility for process boot strapping, and in the name of portability, Sun pushed developers into writing highly non-portable shell scripts. For a developer starting his career in Java, this was an awkward introduction to (ba)sh scripts. For a complete server such as Tomcat used in the above example, I count 1276 lines of shell script (sh + bat) and a maintenance job in its own right.

Enter JRuby

Scripting languages like ruby, perl, or python have all done a much better job than java in supporting cross-platform portability while also not hamstringing OS, file system, or console operablility. This was but one aspect that drew me to jruby. Would it not be possible to deploy entire, potentially complex, java-based services as easily installed gems and directly executable ruby scripts, rather than the typical amalgam of tarballs, XML config files, and wrapper bash scripts?

I believe the answer is a resounding “yes!” and that it will be well worth the effort, but there is some work to do to really get there. Lets consider the following very simple theoretical service implemented in java with a jruby based launch script:

#!/opt/dist/jruby-1.1.4/bin/jruby

require 'myservice.jar'
import 'com.foobar.MyService'

service = MyService.new
service.run

Note the use of a UNIX-standard ‘#!’ hashbang (or shebang) to reference the interpreter for the script. The UNIX shell or system exec calls will detect the hashbang and invoke the named interpreter executable, passing the script file as an argument. While this works fine with any native script interpreter (ruby, perl, python, bash, etc.) it won’t work out of the box with jruby or any other java-platform script interpreters not resorting to native code. The reason is that the referenced ‘jruby’ launcher is itself a bash script which needs to exec a java command line with the required details. On Linux at least, multiple scripts/intpreters can not be chained in this way. The work around presented for jruby is:

#!/usr/bin/env jruby

…which then requires something like:

PATH=/opt/dist/jruby-1.1.4/bin:$PATH

…to be set in a user or system global profile. Besides raising some traditional security concerns, the use of the env workaround has notable side effects. First, it introduces a PATH dependency that must be managed external to the script. Next consider wanting to set various jruby intpreter and java options made available as command line flags to jruby, for example enabling full compilation and the java max heap size:

#!/usr/bin/env jruby -X+C -J-Xmx256m

But this also doesn’t work, at least on current versions of Linux. When used as a hashbang, the jruby -X+C -J-Xmx256m is interpreted as the single parameter, and (fortunately) isn’t found in the path. Similarly the jruby script launcher supports JRUBY_OPTS as an alternative means to set these options:

#!/usr/bin/env JRUBY_OPTS='-X+C -J-Xmx256m' jruby

But again, on current versions of Linux, this approach fails even more spectacularly: it causes an infinite loop! Brilliant!

You could tune all of the java and jruby option flags by directly modifying the jruby launcher script in the jruby distribution, but then you are stuck with one set of settings per host and jruby install. Poor at best. The only practical option (short of native code) is to write a wrapper script to launch any ruby script we want to set options for:

#!/bin/bash

# Hard code myservice.rb location or have additional fun determining
# relative location via $0
exec /opt/dist/jruby-1.1.4/bin/jruby -X+C -J-Xmx256m -J-server \
     /home/david/run/myservice.rb

Isn’t this fun? We are now back where we started with the java launcher: needing to write wrapper scripts around any significant production application. Besides being a nuisance and hiding important details in a seperate wrapper script file, this approach is particularly at odds with the bin script support offered by RubyGems. Without a native launcher like Hashdot, does this lack of exec integration not deminish jruby by comparison, not with java, but with MRI (native) ruby?

In my next post on Hashdot, I’ll demonstrate how Hashdot solves these and other historic blemishes and limitations of java and java-platform script interpreters like JRuby.