<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>blog.humaneguitarist.org &#187; XML</title>
	<atom:link href="http://blog.humaneguitarist.org/category/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.humaneguitarist.org</link>
	<description>discoveries in digital audio, music notation, and information encoding</description>
	<lastBuildDate>Tue, 07 Feb 2012 03:33:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>geo this, geo that: easy acquisition of KML files with BatchGeo</title>
		<link>http://blog.humaneguitarist.org/2012/01/28/geo-this-geo-that-easy-acquisition-of-kml-files-with-batchgeo/</link>
		<comments>http://blog.humaneguitarist.org/2012/01/28/geo-this-geo-that-easy-acquisition-of-kml-files-with-batchgeo/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 14:52:44 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[technophilia]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[BatchGeo]]></category>
		<category><![CDATA[geolocation]]></category>
		<category><![CDATA[KML]]></category>
		<category><![CDATA[laziness]]></category>
		<category><![CDATA[maps]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=4072</guid>
		<description><![CDATA[Geolocation/geocoding is so &#34;hip&#34; these days. Everyone&#39;s so obsessed where where they and other things are. There&#39;s almost a comparison with 3-D filmmaking &#8230; Funny. Not too many folks seem all that concerned with when things are. Anyway &#8230; At work, we have a database with all the libraries we serve and their addresses. And [...]]]></description>
			<content:encoded><![CDATA[<p>Geolocation/geocoding is so &quot;hip&quot; these days. Everyone&#39;s so obsessed where where they and other things are. There&#39;s almost a comparison with 3-D filmmaking &#8230;</p>
<p>Funny. Not too many folks seem all that concerned with <em>when</em> things are.</p>
<p>Anyway &#8230;</p>
<p>At work, we have a database with all the libraries we serve and their addresses. And the other week we needed to quickly make a map with all their locations.</p>
<p>If necessity if the mother of invention, laziness is it&#39;s favorite uncle.</p>
<p>Enter <a href="http://batchgeo.com/">BatchGeo</a>. We were able to take those values from our database and get a map generated in minutes. But it gets better.</p>
<p>One of the nice things about this process is that in addition to a map, you also get a <a href="http://code.google.com/apis/kml/">KML</a> file download option. Taking this little XML file, it&#39;s a simple process (via XSL or other) to make a delimited file containing the inputted names of institutions and their latitude and longitude (altitude is also available).</p>
<p>From there, it&#39;s not brain surgery to get those coordinates into a database and using an SQL JOIN to be able to push out an institution&#39;s name and now its coordinates, too, whenever.</p>
<p>Just in case someone wants/needs to do something similar with an address book or a list of businesses, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2012/01/28/geo-this-geo-that-easy-acquisition-of-kml-files-with-batchgeo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>layer cake: XML config files with XSL inside CDATA</title>
		<link>http://blog.humaneguitarist.org/2011/11/12/layer-cake-xml-config-files-with-xsl-inside-cdata/</link>
		<comments>http://blog.humaneguitarist.org/2011/11/12/layer-cake-xml-config-files-with-xsl-inside-cdata/#comments</comments>
		<pubDate>Sat, 12 Nov 2011 17:18:51 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[XML]]></category>
		<category><![CDATA[CDATA]]></category>
		<category><![CDATA[dessert]]></category>
		<category><![CDATA[XSL]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=3599</guid>
		<description><![CDATA[Sometimes in life &#8211; or coding projects &#8211; there are regrets. But there is cake, too. Anyway, for a current project I want to place some XSL inside an XML config file. But of course, you can&#39;t just drop XML inside XML without coating it in something. So for another project, PubMed2XL, I did something [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes in life &#8211; or coding projects &#8211; there are regrets.</p>
<p>But there is cake, too.</p>
<p><img alt="yummy looking layer cake" class="alignnone" height="269" src="http://www.wilton.com/img/spumoni-layer-cake-main.jpg" title="layer cake" width="269" /></p>
<p>Anyway, for a current project I want to place some XSL inside an XML config file.</p>
<p>But of course, you can&#39;t just drop XML inside XML without coating it in something.</p>
<p>So for another project, <a href="http://blog.humaneguitarist.org/projects/pubmed2xl/">PubMed2XL</a>, I did something like this:</p>
<pre class="brush:xml">&lt;cell&gt;{{?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?}}
  {{xsl:stylesheet version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;}}
  {{xsl:output method = &quot;text&quot; /}}
  {{xsl:template match=&quot;/&quot;}}
    {{xsl:value-of select=&quot;//PMID&quot; /}}
  {{/xsl:template}}
  {{/xsl:stylesheet}}
&lt;/cell&gt;
</pre>
<p>Putting XSL inside double curly brackets works just fine, but now I know a better way: just put it inside a <a href="http://www.w3schools.com/xml/xml_cdata.asp">CDATA</a> section!</p>
<pre class="brush:xml">&lt;map name=&quot;LibriVox&quot;&gt;
  &lt;XSLT&gt;./XSLT/LibriVox_to_Solr.xsl&lt;/XSLT&gt;
  &lt;nextXSL&gt;
  &lt;![CDATA[
  &lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
    &lt;xsl:output method=&quot;text&quot;/&gt;
    &lt;xsl:template match=&quot;/&quot;&gt;
      &lt;xsl:variable name=&quot;baseURL&quot; select=&quot;&#39;%s&#39;&quot; /&gt;
      &lt;xsl:variable name=&quot;URL_params&quot; select=&quot;&#39;%s&#39;&quot; /&gt;
      &lt;xsl:variable name=&quot;offset_&quot; select=&quot;substring-after($URL_params,&#39;=&#39;)&quot; /&gt;
      &lt;xsl:variable name=&quot;offset&quot; select=&quot;substring-before($offset_,&#39;&amp;amp;&#39;)&quot; /&gt;
      &lt;xsl:variable name=&quot;limit_&quot; select=&quot;substring-after($URL_params,&#39;&amp;amp;&#39;)&quot; /&gt;
      &lt;xsl:variable name=&quot;limit&quot; select=&quot;substring-after($limit_,&#39;=&#39;)&quot; /&gt;
      &lt;xsl:variable name=&quot;output&quot;&gt;
        &lt;xsl:value-of select=&quot;$baseURL&quot; /&gt;
        &lt;xsl:text&gt;?offset=&lt;/xsl:text&gt;
        &lt;xsl:value-of select=&quot;$offset+50&quot; /&gt;
        &lt;xsl:text&gt;&amp;amp;limit=&lt;/xsl:text&gt;
        &lt;xsl:value-of select=&quot;50&quot; /&gt;
      &lt;/xsl:variable&gt;
      &lt;xsl:value-of select=&quot;$output&quot; /&gt;
    &lt;/xsl:template&gt;
  &lt;/xsl:stylesheet&gt;
  ]]&gt;
  &lt;/nextXSL&gt;
&lt;/map&gt;
</pre>
<p>Duh. I guess like a good XML parser that I always just ignored anything inside a CDATA section. Never thought I&#39;d need to use one.</p>
<p>Putting the XSL inside a CDATA section worked like a charm in terms of being able to read it with a script and perform an XSLT with it.</p>
<p>Luckily, the PubMed2XL script can use either the CDATA way of embedding XSL or my <a href="http://en.wikipedia.org/wiki/Curly_Howard">Curly</a> solution &#8211; not that I knew that when I wrote it!</p>
<p>It&#39;s certainly easier to cut/paste the XSL in the CDATA block without having to replace the brackets with curly quotes or vice versa. It&#39;s also just easier to read, which makes it easier to edit and troubleshoot. And it tastes better, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2011/11/12/layer-cake-xml-config-files-with-xsl-inside-cdata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>pretty printing XML with Python, lxml, and XSLT</title>
		<link>http://blog.humaneguitarist.org/2011/11/12/pretty-printing-xml-with-python-lxml-and-xslt/</link>
		<comments>http://blog.humaneguitarist.org/2011/11/12/pretty-printing-xml-with-python-lxml-and-xslt/#comments</comments>
		<pubDate>Sat, 12 Nov 2011 16:05:04 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[XML]]></category>
		<category><![CDATA[lxml]]></category>
		<category><![CDATA[pretty printing]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=3597</guid>
		<description><![CDATA[Last week or so I was doing some work with Python and lxml. And, it seems like a lot of people, using lxml&#39;s pretty printing wasn&#39;t really doing anything for me. I couldn&#39;t find any native lxml solutions to make my XML look pretty. All I found were some functions on various code sites written [...]]]></description>
			<content:encoded><![CDATA[<p>Last week or so I was doing some work with Python and <a href="http://lxml.de/">lxml</a>. And, it seems like a lot of people, using <a href="http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output">lxml&#39;s pretty printing</a> wasn&#39;t really doing anything for me.</p>
<p>I couldn&#39;t find any native lxml solutions to make my XML look pretty. All I found were some functions on various code sites written by people to pretty print the XML using a bunch of regular expressions. Yuck.</p>
<p>So I thought, &quot;Why not use XSLT to pretty print my XML?&quot; and I found an XSL written by none other than <a href="http://en.wikipedia.org/wiki/Michael_Kay_%28software_engineer%29">Michael Kay</a> on <a href="http://www.dpawson.co.uk/xsl/sect2/pretty.html#d8621e19">this</a> page (see comment #4).</p>
<p>And it seems to work just fine as a function to return pretty XML, not to mention it&#39;s super short and sweet.</p>
<p>Anyway, here&#39;s an example of using the XSL for pretty printing.</p>
<pre class="brush:python">from lxml import etree
from lxml.etree import XSLT

def prettify(someXML):
  #for more on lxml/XSLT see: http://lxml.de/xpathxslt.html#xslt-result-objects
  xslt_tree = etree.XML(&#39;&#39;&#39;\
    &lt;!-- XSLT taken from Comment 4 by Michael Kay found here:
    http://www.dpawson.co.uk/xsl/sect2/pretty.html#d8621e19 --&gt;
    &lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
    &lt;xsl:output method=&quot;xml&quot; indent=&quot;yes&quot; encoding=&quot;UTF-8&quot;/&gt;
      &lt;xsl:strip-space elements=&quot;*&quot;/&gt;
      &lt;xsl:template match=&quot;/&quot;&gt;
        &lt;xsl:copy-of select=&quot;.&quot;/&gt;
      &lt;/xsl:template&gt;
    &lt;/xsl:stylesheet&gt;&#39;&#39;&#39;)
  transform = etree.XSLT(xslt_tree)
  result = transform(someXML)
  return unicode(result)

myXML = etree.XML(&#39;&lt;a&gt;&lt;b&gt;&lt;c&gt;&lt;d/&gt;&lt;/c&gt;&lt;/b&gt;&lt;/a&gt;&#39;)
print prettify(myXML)</pre>
<p>The example above would output the following:</p>
<p><code>&gt;&gt;&gt; <br />
	&lt;?xml version=&quot;1.0&quot;?&gt;<br />
	&lt;a&gt;<br />
	&nbsp; &lt;b&gt;<br />
	&nbsp;&nbsp;&nbsp; &lt;c&gt;<br />
	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;d/&gt;<br />
	&nbsp;&nbsp;&nbsp; &lt;/c&gt;<br />
	&nbsp; &lt;/b&gt;<br />
	&lt;/a&gt;<br />
	</code></p>
<p>By the way I don&#39;t even need to see the XML I&#39;m processing most of the time, so why all the pretty printing fuss?</p>
<p>Well, because it bothers me &#8230;</p>
<p>And all good XML should look like an <a href="http://en.wikipedia.org/wiki/X-wing">X-wing</a> starfighter. If it doesn&#39;t your probably doing something wrong or your schema just sucks.</p>
<p>It isn&#39;t called an <u>X</u>-wing for no reason.</p>
<p><img alt=":P" src="http://blog.humaneguitarist.org/wp-content/plugins/fckeditor-for-wordpress-plugin/ckeditor/plugins/smiley/images/tounge_smile.gif" title=":P" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2011/11/12/pretty-printing-xml-with-python-lxml-and-xslt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>indexing and searching timed text with Solr</title>
		<link>http://blog.humaneguitarist.org/2011/10/16/indexing-and-searching-timed-text-with-solr/</link>
		<comments>http://blog.humaneguitarist.org/2011/10/16/indexing-and-searching-timed-text-with-solr/#comments</comments>
		<pubDate>Sun, 16 Oct 2011 14:54:33 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[digital audio]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[timed text]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=3449</guid>
		<description><![CDATA[I&#39;m still learning about Solr so maybe this post is much ado about nothing. But according to this nabble.com thread, one can&#39;t index a source XML document in Solr with it&#39;s native XML structure intact and then in turn search that structure as one can in an XML database like BaseX. For most things, that&#39;s [...]]]></description>
			<content:encoded><![CDATA[<p>I&#39;m still learning about Solr so maybe this post is much ado about nothing. But according to <a href="http://lucene.472066.n3.nabble.com/Storing-indexing-and-searching-XML-documents-in-Solr-tp2958452p2959272.html">this</a> nabble.com thread, one can&#39;t index a source XML document in Solr with it&#39;s native XML structure intact and then in turn search that structure as one can in an XML database like <a href="http://basex.org/">BaseX</a>.</p>
<p>For most things, that&#39;s fine. I mean for indexing titles, creators, and descriptions, etc. I just need to index the value of a given element like &lt;title&gt; so that I can search for that element&#39;s value.</p>
<p>But for timed text, it&#39;s different. Or at least, it can be.</p>
<p>Say I have this DFXP snippet for an audio file with an &quot;id&quot; value of &quot;XYZ&quot;.</p>
<p><code>&lt;p begin=&quot;10.0s&quot; end=&quot;30.0s&quot;&gt;Hello world!&lt;/p&gt;</code></p>
<p>I would need the user to search for the string &quot;Hello world!&quot; or part of it but I would also need to index at least the value of the &quot;begin&quot; attribute so that I can pass that to a page that will play the file &quot;XYZ&quot; starting at the 10 second mark &#8211; if the user clicks on the &quot;Hello world!&quot; line in their search result. And I don&#39;t want the &quot;10&quot; second value to be something they search against since they might be searching for the string &quot;10&quot; within the text itself.</p>
<p>So I&#39;m wondering how to do that with Solr.</p>
<p>Maybe when I learn more I&#39;ll discover a better way to do this, but for now I&#39;m thinking I could do the following:</p>
<p>First, I would pretty much index the timed text twice in Solr.</p>
<p><code>&lt;doc&gt;<br />
	&nbsp; &lt;field name=&quot;id&quot;&gt;XYZ&lt;/field&gt;<br />
	...<br />
	&nbsp; &lt;field name=&quot;timedText-stripped&quot;&gt;Hello world!&lt;/field&gt;<br />
	&nbsp; &lt;field name=&quot;timedText&quot;&gt;Hello World! {10}&lt;/field&gt;<br />
	&lt;/doc&gt;<br />
	</code></p>
<p>After indexing the &quot;id&quot; of the audio file this would index:</p>
<ul>
<li>just the text &quot;Hello world!&quot;</li>
<li>the text of &quot;Hello world!&quot; with the &quot;begin&quot; attribute value in curly quotes.</li>
</ul>
<p>I guess this way the user could be made to search across the &quot;timedText-stripped&quot; field but, via the XSL that can be passed to Solr to display results, the &quot;timedText&quot; field could be displayed in a manner that would make the text &quot;Hello World!&quot; linked to whatever file will play file &quot;XYZ&quot; starting at the 10 second mark. Basically, by planting the &quot;begin&quot; value in curly quotes, I can parse the string for the text and the &quot;begin&quot; value as separate things.</p>
<p>So, here&#39;s a really crappy XSL snippet that would do something like that. It assumes a variable &quot;$id&quot; exists that equals &quot;XYZ&quot;, the identifier for the example audio file.</p>
<pre class="brush:xml">&lt;xsl:for-each select=&quot;//field[@name=&#39;timedText&#39;]&quot;&gt;
  &lt;xsl:variable name=&quot;whole&quot;&gt;
    &lt;xsl:value-of select=&quot;.&quot;/&gt;
    &lt;!-- Gets entire element string --&gt;
  &lt;/xsl:variable&gt;
  &lt;xsl:variable name=&quot;text&quot;&gt;
    &lt;xsl:value-of select=&quot;substring-before($whole,&#39;{&#39;)&quot;/&gt;
    &lt;!-- Gets text prior to seconds --&gt;
  &lt;/xsl:variable&gt;
  &lt;xsl:variable name=&quot;begin&quot;&gt;
    &lt;xsl:value-of select=&quot;substring-before(substring-after($whole,&#39;{&#39;),&#39;}&#39;)&quot;/&gt;
    &lt;!-- Gets seconds value from end of string --&gt;
  &lt;/xsl:variable&gt;
  &lt;a href=&quot;someMediaPlayer.php?id={$id)&amp;amp;begin={$begin}&quot;&gt;
    &lt;xsl:value-of select=&quot;$text&quot;/&gt;
  &lt;/a&gt;
  &lt;!-- So, I&#39;m saying that
  &quot;someMediaPlayer.php?id=XYZ&amp;start=10&quot;
  would launch a player that would start file XYZ at the 10 seconds mark.
  --&gt;
&lt;/xsl:for-each&gt;
</pre>
<p>The search output would be some HTML code like so:</p>
<p><code>&lt;a href=&quot;someMediaPlayer.php?id=XYZ<tt>&amp;amp;begin=10&gt;Hello World!&lt;/a&gt;</tt></code></p>
<p>It seems weird to index something twice, more or less, but as user Erick says in the nabble.com thread, &quot;<span>You&#39;ve gotta take off your DB hat and not worry about duplicating </span><span>data.&quot;</span></p>
<p><span>But now as I write this, I&#39;m wondering if I can&#39;t just index</span> as follows:</p>
<p><code>&nbsp; &lt;field name=&quot;text&quot;&gt;Hello world!&lt;/field&gt;<br />
	&nbsp; &lt;field name=&quot;begin&quot;&gt;10&lt;/field&gt;</code></p>
<p>and trust that for each &quot;text&quot; field, there will be a matching &quot;begin&quot; field and that they can&#39;t just be used in tandem to create the same HTML link as above. Sounds like I need to play around some more.</p>
<p><img alt=":)" src="http://blog.humaneguitarist.org/wp-content/plugins/fckeditor-for-wordpress-plugin/ckeditor/plugins/smiley/images/regular_smile.gif" title=":)" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2011/10/16/indexing-and-searching-timed-text-with-solr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>learning about XProc on a Sunday morning</title>
		<link>http://blog.humaneguitarist.org/2011/08/28/learning-about-xproc-on-a-sunday-morning/</link>
		<comments>http://blog.humaneguitarist.org/2011/08/28/learning-about-xproc-on-a-sunday-morning/#comments</comments>
		<pubDate>Sun, 28 Aug 2011 14:28:45 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[XML]]></category>
		<category><![CDATA[Daisy]]></category>
		<category><![CDATA[pipe]]></category>
		<category><![CDATA[XML processing]]></category>
		<category><![CDATA[XProc]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=3046</guid>
		<description><![CDATA[There are some cool PowerPoint slides on the &#160;xfront.com&#160; page about XProc, which I didn&#39;t know anything about until today. I like the idea of a one-stop-shop for all kinds of XML processing, but I think unless I had a specific need to use it I&#39;d probably use a Python script or something to sequentially [...]]]></description>
			<content:encoded><![CDATA[<p>There are some cool <a href="http://www.xfront.com/xproc/xproc.zip">PowerPoint slides</a> on the <span _fck_bookmark="1" style="display: none">&nbsp;</span>xfront.com<a href="http://www.xfront.com/xproc/"><span _fck_bookmark="1" style="display: none">&nbsp;</span></a> page about <a href="http://www.w3.org/TR/xproc/">XProc</a>, which I didn&#39;t know anything about until today.</p>
<p>I like the idea of a one-stop-shop for all kinds of XML processing, but I think unless I had a specific need to use it I&#39;d probably use a Python script or something to sequentially do some batch XML work on a given document. That&#39;s exactly what XProc is a solution against, but I guess it all depends on one&#39;s needs. I should certainly think about it in terms of doing things with MusicXML though.</p>
<p>Anyway, I&#39;ve only been through one slide &#8211; and it&#39;s long at about 170 slides, but I found it well done and easy to understand.</p>
<p>Also, there&#39;s a list of XProc implementations <a href="http://xproc.org/implementations/">here</a> &#8211; Java, Java, Java &#8230;</p>
<p>Apparently, there used to be a Python implementation on <a href="https://github.com/bendyer/py-xproc">GitHub</a>, but it&#39;s pulling a 404. Bummer. Well, at least GitHub&#39;s 404 message is a cool homage to Star Wars!</p>
<p><img alt="GitHub 404" height="266" src="http://www.jacho.net/images/fun/github404.png" width="640" /></p>
<p>Lastly, this <a href="http://code.google.com/p/daisy-pipeline/wiki/XProcOverview">daisy-pipeline</a> for Daisy talking books looks interesting, too.</p>
<p>So is this post just a fancy way for me to save bookmarks for my future use or what?</p>
<p><img alt=":P" src="http://blog.humaneguitarist.org/wp-content/plugins/fckeditor-for-wordpress-plugin/ckeditor/plugins/smiley/images/tounge_smile.gif" title=":P" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2011/08/28/learning-about-xproc-on-a-sunday-morning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>fun with lxml, part 2</title>
		<link>http://blog.humaneguitarist.org/2011/04/09/fun-with-lxml-part-2/</link>
		<comments>http://blog.humaneguitarist.org/2011/04/09/fun-with-lxml-part-2/#comments</comments>
		<pubDate>Sat, 09 Apr 2011 15:56:34 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[XML]]></category>
		<category><![CDATA[libxml]]></category>
		<category><![CDATA[test code]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=2586</guid>
		<description><![CDATA[Just following up on a previous post from about a month ago &#8230; Per a request, I need to tweak some software of mine to allow a user to specify a parent element in an XML document and in turn retrieve child element values. Big deal. That&#39;s what XSLT is for &#8211; blah, blah, blah. [...]]]></description>
			<content:encoded><![CDATA[<p>Just following up on a <a href="http://blog.humaneguitarist.org/2011/03/07/fun-with-libxml/">previous post </a>from about a month ago &#8230;</p>
<p>Per a request, I need to tweak some software of mine to allow a user to specify a parent element in an XML document and in turn retrieve child element values. Big deal. That&#39;s what XSLT is for &#8211; blah, blah, blah. But this is particularly for PubMed XML exports and <a href="http://blog.humaneguitarist.org/projects/pubmed2xl/">turning those into Excel files</a>.</p>
<p>Anyway, the value of a given child element needs to be able to be specified (i.e. by position) and placed into an Excel cell. Alternatively, all children values need to be able to be placed into one cell separated by a delimiter.</p>
<p>So before I try and tinker with the software I want to work a solution out using test code:</p>
<pre class="brush:python">from lxml import etree

##### Step 1
    # make an XML example
xml = &#39;&lt;a&gt;  \
            &lt;b&gt;  \
                &lt;c&gt;cee1&lt;/c&gt;  \
                &lt;d&gt;dee1&lt;/d&gt;  \
                &lt;c&gt;cee2&lt;/c&gt;  \
                &lt;d&gt;dee2&lt;/d&gt;  \
            &lt;/b&gt;  \
            &lt;b&gt;bee&lt;/b&gt;  \
            &lt;c&gt;cee3&lt;/c&gt; \
        &lt;/a&gt;&#39;

##### Step 2
    # parse the XML example
parseXML = etree.XML(xml)

##### Step 3
    # make a list of the first (i.e. the Zero-th) &lt;b&gt; element
b_list = parseXML.findall(&#39;.//b&#39;)[0]

##### Step 4
    # get a list of all the children in that first &lt;b&gt; element
b_childList = b_list.getchildren()

##### Step 5
    # make a new list called &quot;c_list&quot; with only &lt;c&gt; elements
    # that are children of our first &lt;b&gt; element

c_list = [] # make an empty list to put things in and
# place into that list only element *values* for child elements
# of first &lt;b&gt; element from children that are &lt;c&gt; elements only
for child in b_childList:
    if child.tag == &#39;c&#39;:
        c_list.append(child.text)

##### Step 6
    # print desired results

for c in c_list: #print all values, one per line
    print (c)

print (&#39;-&#39;*4) # print dash line for reading ease
print (&#39;; &#39;.join(c_list)) # print all values on one line with delimeter

print (&#39;-&#39;*4)
print (c_list[1]) #print only the second &lt;c&gt; element value
</pre>
<p>Here are the results:<code><br />
	&gt;&gt;&gt; <br />
	cee1<br />
	cee2<br />
	----<br />
	cee1; cee2<br />
	----<br />
	cee2</code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2011/04/09/fun-with-lxml-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>fun with lxml</title>
		<link>http://blog.humaneguitarist.org/2011/03/07/fun-with-libxml/</link>
		<comments>http://blog.humaneguitarist.org/2011/03/07/fun-with-libxml/#comments</comments>
		<pubDate>Mon, 07 Mar 2011 17:13:08 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[scripts]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[libxml]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[xml parsing]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=2271</guid>
		<description><![CDATA[First off, I don&#39;t consider myself a programmer. I just know enough to dabble even though I try and learn new stuff all the time in the hope that I &#8211; as someone in digital libraries &#8211; can occasionally write something that can serve the needs of others, rather than serving my ego. Don&#39;t get [...]]]></description>
			<content:encoded><![CDATA[<p>First off, I don&#39;t consider myself a programmer. I just know enough to dabble even though I try and learn new stuff all the time in the hope that I &#8211; as someone in digital libraries &#8211; can occasionally write something that can serve the needs of others, rather than serving my ego. Don&#39;t get me started on people who try and write software that has no utility other than patting themselves on the back &#8230;</p>
<p>Anyway, that&#39;s another post for another time.</p>
<p>So, the other day I got some questions/feature requests for <a href="http://vimeo.com/15098984">PubMed2XL</a> and so I started thinking about ways to tackle a few of the issues. It kinda makes me feel like a real programmer when people in the real world are asking about the software &#8211; but only for a few minutes before I make myself come back down to earth.</p>
<p><img alt=":/" src="http://blog.humaneguitarist.org/wp-content/plugins/fckeditor-for-wordpress-plugin/ckeditor/plugins/smiley/images/confused_smile.gif" title=":/" /></p>
<p>Currently, the software places into a spreadsheet cell the value of one XML element, the position of which is defined by the user in the <a href="http://blog.humaneguitarist.org/projects/pubmed2xl/#Changing">setup file</a>. But there may potentially be a need it seems to be able to concatenate ALL the values for a given element into one spreadsheet cell. So I wrote a little function to help me get started with that.</p>
<p>The code uses <a href="http://www.w3schools.com/xml/simple.xml">this</a> simple restaurant-based XML file from W3Schools and uses the awesome <a href="http://lxml.de/">lxml</a> Python library.</p>
<p>When run, it yields the following:</p>
<p><samp>&gt;&gt;&gt; <br />
	Calories for the first entree:<br />
	650<br />
	Calories for all entrees:<br />
	650; 900; 900; 600; 950<br />
	</samp></p>
<p>And here&#39;s the code:</p>
<pre class="brush:python">#import required modules (lxml is non-standard; it likely needs to be installed)
import urllib #makes it easy to read documents from the web!
from lxml import etree #great XML parser and more!
                       #see: http://lxml.de/

#retrieve values from an XML file
def ElementCherryPicker(xpathArg, positionArg):
    &#39;&#39;&#39;
    This places all the element values for the element passed as the
    &quot;xpathArg&quot; argument into a list called &quot;elementBox&quot;. It then returns
    the list item preceeding the one specified by the &quot;positionArg&quot; argument.
    This means passing a &quot;1&quot; equates to the first item in the list instead
    of the traditional &quot;0&quot;. If &quot;0&quot; is passed then the entire list will be
    returned as a string with a delimiter of &#39;; &#39;.
    &#39;&#39;&#39;
    positionArg = positionArg - 1
    elements = parseUrl.findall(xpathArg) #make list of all matching elements
    elementBox = [] #create empty list
    for element in elements:
        elementBox.append(element.text) #place element values into the list
    if positionArg != -1:
        try:
            elementBox = elementBox[positionArg]
        except:
            elementBox = [] #if no element at stated position exists,
                            #then make the list empty again
    else:
        delimiter = &#39;; &#39;
        elementBox = delimiter.join(elementBox)
    return elementBox

#define, open, read, and parse an XML file
url= &#39;http://www.w3schools.com/xml/simple.xml&#39;
readUrl = urllib.urlopen(url).read()
parseUrl = etree.XML(readUrl)

#print header and the values returned from ElementCherryPicker()
print &#39;Calories for the first entree:&#39;
print ElementCherryPicker(&#39;.//calories&#39;, 1)
print &#39;Calories for all entrees:&#39;
print ElementCherryPicker(&#39;.//calories&#39;, 0)
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2011/03/07/fun-with-libxml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>of ADLs and SMIL and stuff</title>
		<link>http://blog.humaneguitarist.org/2011/01/09/of-adls-and-smil-and-stuff/</link>
		<comments>http://blog.humaneguitarist.org/2011/01/09/of-adls-and-smil-and-stuff/#comments</comments>
		<pubDate>Sun, 09 Jan 2011 18:02:34 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[digital audio]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[AATranslator]]></category>
		<category><![CDATA[ADL]]></category>
		<category><![CDATA[AudioRegent]]></category>
		<category><![CDATA[Kino]]></category>
		<category><![CDATA[session files]]></category>
		<category><![CDATA[SimpleADL]]></category>
		<category><![CDATA[SMIL]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=2195</guid>
		<description><![CDATA[Even more than usual &#8211; this post is me thinking out loud. So some of the stuff at the bottom might not make sense since it refers to some software of mine that really only I use. This morning I played around a little with Kino, an open-source video editor, and Adobe Audition, Adobe&#39;s flagship [...]]]></description>
			<content:encoded><![CDATA[<p>Even more than usual &#8211; this post is me thinking out loud. So some of the stuff at the bottom might not make sense since it refers to some software of mine that really only I use.</p>
<p>This morning I played around a little with Kino, an open-source video editor, and Adobe Audition, Adobe&#39;s flagship audio editor &#8211; which is based on their <a href="http://www.adobe.com/special/products/audition/syntrillium.html">acquisition of Cool Edit</a>.</p>
<p>The reason I wanted to play around with Kino is because it can export the project timeline to <a href="http://www.w3.org/TR/SMIL/">SMIL</a>. I was mainly interested in seeing if it could be used as a pseudo audio editor &#8211; the idea being it could be a quick and dirty SMIL exporter. Well, it doesn&#39;t seem to support importing audio formats. I couldn&#39;t get it to import WAV or OGG files. It&#39;s still a cool application though.</p>
<p>The session exports from Audition are, as expected, pretty dense. For people like me who work in libraries there are issues involved in terms of setting limits for how much can and should be done in digital audio &quot;preservation&quot; (funny, I don&#39;t remember ordering jam and bread &#8230;). Well, at least I think there need to be limits, lest libraries want to start being creators, too, and admit that in doing so they are donating material of their own editorial designs back onto themselves. Anyway, by imposing limits I&#39;m not sure XML session exports of thousands of lines for simple edits are a good idea.</p>
<p>I&#39;d like to see other session formats without downloading demos for all kinds of audio editing software (some more expensive packages don&#39;t even seem to offer demos). For a small fee, there&#39;s always <a href="http://www.aatranslator.com.au/">AATranslator</a>.</p>
<p>But getting back to SMIL, I&#39;m wondering how to use it in conjunction with <a href="http://blog.humaneguitarist.org/projects/audioregent/">AudioRegent</a> without writing more code into the application &#8211; for now.</p>
<p>It would seem pretty easy to create a SMIL to <a href="http://blog.humaneguitarist.org/projects/audioregent/#SimpleADL">SimpleADL</a> XSLT and set up a chain to create derivative files.</p>
<p>Specifically, say I have a source file called source.wav. And I have two SMIL files as such:</p>
<p>source-1.smil.xml</p>
<pre class="brush:xml">&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;smil xmlns=&quot;http://www.w3.org/2001/SMIL20/Language&quot;&gt;
  &lt;body&gt;
    &lt;seq&gt;
      &lt;audio src=&quot;source.wav&quot; clipBegin=&quot;00:00:00.000&quot; clipEnd=&quot;00:00:30.000.&quot;/&gt;
    &lt;/seq&gt;
  &lt;/body&gt;
&lt;/smil&gt;
</pre>
<p>and source-2.smil.xml</p>
<pre class="brush:xml">&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;smil xmlns=&quot;http://www.w3.org/2001/SMIL20/Language&quot;&gt;
  &lt;body&gt;
    &lt;seq&gt;
      &lt;audio src=&quot;source.wav&quot; clipBegin=&quot;00:00:30.000&quot; clipEnd=&quot;00:00:50.000.&quot;/&gt;
    &lt;/seq&gt;
    &lt;seq&gt;
      &lt;audio src=&quot;source.wav&quot; clipBegin=&quot;00:01:00.000&quot; clipEnd=&quot;00:02:00.000.&quot;/&gt;
    &lt;/seq&gt;
  &lt;/body&gt;
&lt;/smil&gt;
</pre>
<p>For both, the assumption is that two clips are to be made from source.wav: source-1 and source-2.</p>
<p>All I&#39;d need to do is then setup a chain as such:</p>
<ol>
<li>Do source-1.smil.xml to temp.adl.xml via XSLT.</li>
<li>Have AudioRegent make source.ogg by pointing it, via the command line options, to the source file, source.wav, and the SimpleADL file, temp.adl.xml.</li>
<li>Rename source.ogg to source-1.ogg &#8211; i.e. with the same prefix as the corresponding SMIL file.</li>
<li>Do source-2.smil.xml to temp.adl.xml via XSLT, overwriting temp.adl.xml.</li>
<li>Have AudioRegent make source-2.ogg by pointing it, via the command line options, to the source file, source.wav, and the SimpleADL file, temp.adl.xml.</li>
<li>Rename source.ogg to source-2.ogg &#8211; i.e. with the same prefix as the corresponding SMIL file.</li>
</ol>
<p>Here&#39;s what temp.adl.wav would look like initially (step 1):</p>
<pre class="brush:xml">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;audioDecisionList filename=&quot;source.wav&quot;&gt;
  &lt;region id=&quot;_01&quot;&gt;
    &lt;in unit=&quot;seconds&quot;&gt;0&lt;/in&gt;
    &lt;duration unit=&quot;seconds&quot;&gt;30&lt;/duration&gt;
  &lt;/region&gt;
  &lt;outputAsTracks&gt;false&lt;/outputAsTracks&gt;
&lt;/audioDecisionList&gt;
</pre>
<p>And then it would look like this during the second pass (step 4):</p>
<pre class="brush:xml">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;audioDecisionList filename=&quot;source.wav&quot;&gt;
  &lt;region id=&quot;_01&quot;&gt;
    &lt;in unit=&quot;seconds&quot;&gt;30&lt;/in&gt;
    &lt;duration unit=&quot;seconds&quot;&gt;20&lt;/duration&gt;
  &lt;/region&gt;
  &lt;region id=&quot;_02&quot;&gt;
    &lt;in unit=&quot;seconds&quot;&gt;60&lt;/in&gt;
    &lt;duration unit=&quot;seconds&quot;&gt;60&lt;/duration&gt;
  &lt;/region&gt;
  &lt;outputAsTracks&gt;false&lt;/outputAsTracks&gt;
&lt;/audioDecisionList&gt;</pre>
<p>By the way, since the SimpleADL files are temporary, I don&#39;t see why &#8211; rather than converting time format to seconds &#8211; I couldn&#39;t just use something like this:</p>
<pre class="brush:xml">&lt;in unit=&quot;time&quot;&gt;00:01:00.000&lt;/in&gt;
&lt;duration unit=&quot;time&quot;&gt;00:01:00.000&lt;/duration&gt;</pre>
<p>or something &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2011/01/09/of-adls-and-smil-and-stuff/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MXMLiszt version 0.9.1 released</title>
		<link>http://blog.humaneguitarist.org/2010/10/02/mxmliszt-version-0-9-1-released/</link>
		<comments>http://blog.humaneguitarist.org/2010/10/02/mxmliszt-version-0-9-1-released/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 17:08:00 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[music notation]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[MXMLiszt]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=1763</guid>
		<description><![CDATA[I&#39;ve made some minor changes to MXMLiszt to address a bug that began to appear after months of trouble-free performance. So here are the changes I made to address the issue related to the display of MODS metadata: Created mods.css file to display MODS on a transparent background. Changed displayMODS.php to display MODS files via [...]]]></description>
			<content:encoded><![CDATA[<p>I&#39;ve made some minor changes to MXMLiszt to address a bug that began to appear after months of trouble-free performance.</p>
<p>So here are the changes I made to address the issue related to the display of MODS metadata:</p>
<ul>
<li><font size="2">Created mods.css file to display MODS on a transparent background.</font></li>
<li><font size="2">Changed displayMODS.php to display MODS files via an &lt;iframe&gt;.</font> The previous version was using the mods.xsl stylesheet to parse the MODS element values in real-time.</li>
</ul>
<p>You can read the documentation and download the source code for version 0.9.1 <a href="http://blog.humaneguitarist.org/projects/mxmliszt/">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2010/10/02/mxmliszt-version-0-9-1-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PubMed to Excel: PubMed2XL version 0.9</title>
		<link>http://blog.humaneguitarist.org/2010/09/19/pubmed2xl-version-0-9/</link>
		<comments>http://blog.humaneguitarist.org/2010/09/19/pubmed2xl-version-0-9/#comments</comments>
		<pubDate>Mon, 20 Sep 2010 00:03:22 +0000</pubDate>
		<dc:creator>nitin</dc:creator>
				<category><![CDATA[scripts]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[PubMed]]></category>
		<category><![CDATA[PubMed2XL]]></category>

		<guid isPermaLink="false">http://blog.humaneguitarist.org/?p=1713</guid>
		<description><![CDATA[I&#39;ve released the first Beta version of PubMed2XL, a Windows application that converts article lists from pubmed.gov into Microsoft Excel files. If you&#39;d like to use the software you can download it. Yes, it&#39;s free. Here&#39;s a little video tutorial on installing and using the software: PubMed2XL: Basic Installation and Use from nitin arora on [...]]]></description>
			<content:encoded><![CDATA[<p>I&#39;ve released the first Beta version of PubMed2XL, a Windows application that converts article lists from <a href="http://pubmed.gov">pubmed.gov</a> into Microsoft Excel files.</p>
<p>If you&#39;d like to use the software you can <a href="http://blog.humaneguitarist.org/projects/pubmed2xl/#Download">download</a> it. Yes, it&#39;s free.</p>
<p><img alt=":P" src="http://blog.humaneguitarist.org/wp-content/plugins/fckeditor-for-wordpress-plugin/ckeditor/plugins/smiley/images/tounge_smile.gif" title=":P" /></p>
<p style="margin-bottom: 0.08in;"><font size="2">Here&#39;s a little video tutorial on installing and using the software:</font></p>
<p><iframe frameborder="0" height="300" src="http://player.vimeo.com/video/15098984" width="400"></iframe></p>
<p><a href="http://vimeo.com/15098984">PubMed2XL: Basic Installation and Use</a> from <a href="http://vimeo.com/user3665532">nitin arora</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<p>PubMed2XL&#39;s documentation is available at: <a href="http://blog.humaneguitarist.org/projects/pubmed2xl/">blog.humaneguitarist.org/​projects/pubmed2xl/</a>.</p>
<p>The documentation includes a download link to the program files.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.humaneguitarist.org/2010/09/19/pubmed2xl-version-0-9/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

