[RAD stands for Ruby Ape Diaries, of which this is part VI.] The reason I first built the Ape in JRuby was so I could get at all those nice Java APIs, and that turned out to be a good reason. Of course, there is a bit of impedence mismach, and I ended up writing some glue code. I kind of suspect that, should JRuby catch on, there’s going to be scope for quite a bit of this glue-ware. This fragment is just a few examples of the genre, to provide examples and perhaps provoke thought.
All but one of these examples omit an initial require 'java'
and a bunch
of lines of include_class
.
Table of Contents · I can’t imagine that anyone would actually want to read the whole thing.
Parsing an XML Document ·
There are lots of ways to do this, but I’m using about the plainest-vanilla
Java approach.
This is somewhat but not entirely unlike REXML::Document.new
.
def Document.new(text)
begin
unless @@dbf
@@dbf = DocumentBuilderFactory.newInstance
@@dbf.setNamespaceAware true
end
db = @@dbf.newDocumentBuilder
@dom = db.parse(InputSource.new(StringReader.new(text)))
rescue NativeException
@last_error = $!.to_s
end
end
XML Escaping and Unescaping · Not strictly Java-related, but kind of interesting. I’m sure this is somewhere in Ruby but for some reason I couldn’t find it, so this code survives in the native-Ruby version of the Ape.
def Parser.escape(text)
text.gsub(/([&<'">])/) do
case $1
when '&' then '&'
when '<' then '<'
when "'" then '''
when '"' then '"'
when '>' then '>'
end
end
end
def Parser.unescape(text)
text.gsub(/&([^;]*);/) do
case $1
when 'lt' then '<'
when 'amp' then '&'
when 'gt' then '>'
when 'apos' then "'"
when 'quot' then '"'
end
end
end
Serializing XML ·
The XML is stored in an org.w3c.dom
structure.
This is just a partial implementation; it works for me because I produce an
entity-free DOM.
This is more elaborate than it needs to be because I thought I was going to
have to take care of all the namespace-declaration book-keeping, but no, it
seems that the DOM has all the xmlns
goo appearing as if they
were actual attributes. Go figure.
It’s interesting to note that you access Java fields using Ruby’s
::
separator.
class Unparser
def initialize output
@output = output
end
def unparse doc
unparse1 doc.dom
end
def unparse1 node
case node.getNodeType
when Node::CDATA_SECTION_NODE, Node::TEXT_NODE
out Parser.escape(node.getNodeValue)
when Node::COMMENT_NODE, Node::PROCESSING_INSTRUCTION_NODE
# bah
when Node::DOCUMENT_NODE
node = node.getFirstChild
unparse1 node
when Node::DOCUMENT_TYPE_NODE
node = node.getNextSibling
unparse1 node
when Node::ELEMENT_NODE
unparseStartTag node
Nodes.each_node(node.getChildNodes) { |child| unparse1(child) }
unparseEndTag node
when Node::ENTITY_NODE, Node::ENTITY_REFERENCE_NODE, Node::NOTATION_NODE
raise(ArgumentError, "Floating XML goo, can't serialize")
else
raise(ArgumentError, "Unrecognized node type #{node.getNodeType}")
end
end
def out str
@output << str.to_s
end
def unparseStartTag node
out '<'
unparseName node
Nodes.each_node(node.getAttributes) { |a| unparseAttribute(a) }
out '>'
end
def unparseAttribute attr
out ' '
unparseName attr
out '="'
out Parser.escape(attr.getNodeValue)
out '"'
end
def unparseName node
out node.getNodeName
end
def unparseEndTag node
out '</'
unparseName node
out '>'
end
end
XPath ·
The three useful XPath functions in REXML are each
,
first
, and match
. Here are the JRuby
versions.
class XPath
@@xpf = nil
def XPath.match node, path, namespaces={}
node = fixNode node
xp = XPath.newXP namespaces
collect xp.evaluate(path, node, XPathConstants::NODESET)
end
def XPath.each node, path, namespaces = {}
node = fixNode node
xp = XPath.newXP namespaces
list = xp.evaluate(path, node, XPathConstants::NODESET)
collect(list).each { |node| yield node }
end
def XPath.first node, path, namespaces = {}
node = fixNode node
xp = XPath.newXP namespaces
xp.evaluate(path, node, XPathConstants::NODE)
end
I’ve tested them and they do what you’d expect, the real Ruby versions
dropped in without anything breaking. Now, there are some interesting
housekeeping functions, all private. In particular,
newXP
, which needs to exist because a Java XPath needs an
interface to call back to get namespace-prefix mappings.
(Cognoscenti will note the theft of a line of code from REXML:Xpath).
def XPath.newXP namespaces
raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash
@@xpf ||= XPathFactory.newInstance
xp = @@xpf.newXPath
xp.namespaceContext= NSCT.new namespaces
return xp
end
You might wonder how you implement a Java Interface in JRuby? No problem, inherit from it just like any other class-like thingie. No need to implement any of the interface’s methods if you don’t use them. This feels strange, exotic, and cool to me.
class NSCT < NamespaceContext
def initialize namespaces
super() # Required due to bug JRuby-66
@namespaces = namespaces
end
def getNamespaceURI prefix
if prefix == 'xml'
XMLConstants::XML_NS_URI
else
@namespaces[prefix]
end
end
Attribute Hashing ·
REXML gives each element an attributes
member which is a hash
of attribute value by name. Here’s the Java-based namespace-sensitive
version.
class Attributes
# should only be called by Element
def initialize node
@attrsNode = node.getAttributes
end
def [] name
if name =~ /^(.*):(.*)$/
ns = node.getNamespaceURI $1
anode = @attrsNode.getNamedItemNS(ns, $2)
else
anode = @attrsNode.getNamedItem(name)
end
anode.getNodeValue
end
end
Rubyfying a NodeList ·
The problem is that lots of things in the Java DOM are
NodeLists
, which are retrieved by number. You can’t do anything
in Ruby without an each
-like method, so here it is.
def each_node(list)
len = list.getLength
(0 ... len).each do |i|
yield list.item(i)
end
end
Calling Jing · As I said before, the Jing API is fairly gnarly. This is perhaps the most extreme example of brute-forcing the way across the impedence mismatch. In this case I include all the includes.
require 'java'
include_class 'com.thaiopensource.validate.rng.CompactSchemaReader'
include_class 'com.thaiopensource.validate.ValidationDriver'
include_class 'org.xml.sax.InputSource'
include_class 'java.io.StringReader'
include_class 'java.io.StringWriter'
include_class 'com.thaiopensource.xml.sax.ErrorHandlerImpl'
include_class 'com.thaiopensource.util.PropertyMapBuilder'
include_class 'com.thaiopensource.validate.ValidateProperty'
class Validator
attr_reader :error
def initialize(text)
@error = false
@schemaError = StringWriter.new
schemaEH = ErrorHandlerImpl.new(@schemaError)
properties = PropertyMapBuilder.new
properties.put(ValidateProperty::ERROR_HANDLER, schemaEH)
@driver = ValidationDriver.new(properties.toPropertyMap,
CompactSchemaReader.getInstance)
if !@driver.loadSchema(InputSource.new(StringReader.new(text)))
@error = @schemaError.toString;
end
end
def validate(text)
if @driver.validate(InputSource.new(StringReader.new(text)))
return true
else
@error = @schemaError.toString
return false
end
end
end