choose your own toppings: whatever code inside CDATA

I really should be packing for an overseas vacation that begins tomorrow, but I wanted to jot some stuff down before I forget – and I intend to forget a lot!

Anywho, in a previous post I wrote about putting XSL inside a CDATA block inside an XML config file. I had the following example:

<map name="LibriVox">
  <XSLT>./XSLT/LibriVox_to_Solr.xsl</XSLT>
  <nextXSL>
  <![CDATA[
  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:template match="/">
      <xsl:variable name="baseURL" select="'%s'" />
      <xsl:variable name="URL_params" select="'%s'" />
      <xsl:variable name="offset_" select="substring-after($URL_params,'=')" />
      <xsl:variable name="offset" select="substring-before($offset_,'&amp;')" />
      <xsl:variable name="limit_" select="substring-after($URL_params,'&amp;')" />
      <xsl:variable name="limit" select="substring-after($limit_,'=')" />
      <xsl:variable name="output">
        <xsl:value-of select="$baseURL" />
        <xsl:text>?offset=</xsl:text>
        <xsl:value-of select="$offset+50" />
        <xsl:text>&amp;limit=</xsl:text>
        <xsl:value-of select="50" />
      </xsl:variable>
      <xsl:value-of select="$output" />
    </xsl:template>
  </xsl:stylesheet>
  ]]>
  </nextXSL>
</map>

This is part of this pOAIndexter script I'm working on for, well, work.

The XML code above is from one of the config files where the <XSLT> element points to an XSL file used to process metadata retrieved from a website. In this case, the point is to make a Solr-compatible XML document that can be used for indexing purposes. The second element, <nextXSL>, is used to return to pOAIndexter the URL for the next batch of metadata for a given feed, i.e. the next page or the next set within a collection, etc.

And as you can see there are two weird looking variables at the top:

      <xsl:variable name="baseURL" select="'%s'" />
      <xsl:variable name="URL_params" select="'%s'" />

The reason being that the pOAIndexter script actually populates these with the actual base URL for the batch just retrieved and the parameters, respectively, before the XSL within the <nextXSL> element is run, returning the string of the next URL.

I chose XSL because I think, as a librarian, it seems to be common to a lot of metadata and digital library folk and such people could extend the capabilities of pOAIndexter without having to know Python. But all along I wanted people to be able to process the metadata and return the next URL with whatever scripting language they want, provided the interpreter exists on their system.

So, say for example you like PHP instead. Instead of the using XSL you could use something like this:

<map name="LibriVox">
  <PHP>./PHP/LibriVox_to_Solr.php</PHP>
  <nextPHP>
  <![CDATA[
  <?php
  $baseURL=%s;
  $URL_params=%s;
  
  //some PHP code here

  echo $output; //where $output is the next URL ...
  ?>
  ]]>
  </nextPHP>
</map>

That way PHP could be used to make the Solr-XML file and to return to pOAIndexter the next URL string so that the next batch of metadata from a feed could be processed/transformed. Of course, you could mix and match, too – XSLT for making the Solr-XML file and PHP just for getting the next URL.

That's actually pretty simply to do with the common scripting languages like Python, PHP, Perl, Ruby, etc. But what I really wanted to support was JavaScript because, well, it would just be cool, but also because that's another one of those common languages that a lot of people might know even though there might be great variation amongst the other scripting languages they know when compared to a lot of their colleagues.

But I didn't know how to execute Javascript via the command line so that pOAIndexter can capture the next URL via the standard output stream.

Well, enter PhantomJS.

That is all. Time to pack.

--------------

Related Content:

Leave a Comment

Your email address will not be published. Required fields are marked *

*