blog.humaneguitarist.org

LS-598 #2: XQuery problems and solutions

[Thu, 04 Feb 2010 13:29:47 +0000]
Just a quick morning post today ... The last 10 days or so I've been struggling with some major problems that arose in trying to implement effective XQuer-ies on my web demo. 1. Dublin Core doesn't allow me to differentiate creator "types", so I was limited to searching across the DC:creator element for all creators, be they Composer, Lyricist, or Arranger. MusicXML does differentiate these types, so essentially Dublin Core was making me "dumb down" some information. I want people to be able to search creator specifically by their role: Composer, Lyricist, or Arranger. 2. I needed a way to iterate an XQuery over all the MusicXML documents and I needed it to be relatively fast. A demo is a demo, but impatience is impatience and I just can't accept slow query processing. 3. The XQuery processor I was using didn't support some XQuery functions that would allow a searcher to type in "Bach" and retrieve documents for which the creator was "J.S Bach", "Johann Sebastian Bach", "Bach, J.S", "P.D.Q Bach", etc. This really was limiting the search/query coolness factor and I wasn't at all happy about it. Here are my solutions (details to follow in a few days or so): 1. Ditch Dublin Core and switch to MODS, which does allow me to specify the role of a creator. Last week, I made a MusicXML to MODS XSL transformation for descriptive metadata and it's working well. 2. Steal an idea from Using XQuery on MusicXML Databases for Musicological Analysis [http://ismir2008.ismir.net/papers/ISMIR2008_217.pdf] so that rather than iterate one query (say for the number of notes in a piece) across multiple MusicXML docs, I just concatenated all the MusicXML documents. The original files are left alone, but a "super" MusicXML file gets created so that one can just query that one file, hence no need for lengthy iteration. I'm not sure how those fellows did it, but I just automated it via PHP using the following format: <hyperMXML> <hypoMXML file="foo1.xml"> 1st MusicXML document </hypoMXML> <hypoMXML file="foo2.xml"> 2nd MusicXML document </hypoMXML> </hyperMXML> 3. Switch XQuery processors! I'll go into the ones that didn't support the function I needed another time, but I will say that BaseX [http://www.inf.uni-konstanz.de/dbis/basex/index.php] did the trick. Below is the query that searches for creators with "Bach" somewhere in MusicXML's element. For the deliverable demo, I won't be querying these big MusicXML documents for simple descriptive metadata like Creator, that's what the MODS is for. But this is just an example. The "ftcontains" syntax is what allows for retrieval of values where "Bach" is somewhere within the element, but isn't necessarily equivalent to the entire element value. for $i in doc("../temp/concat/concatMXML.xml")/hyperMXML/hypoMXML/score-partwise where $i/identification/creator ftcontains "Bach" return ($i/work/work-title) ___________________________________________________________________________ This blog post is part of a semester-long investigation into digital encoding of symbolic music representation (SMR), its context in libraries, web-based delivery, preservation and metadata, and search and retrieval technologies.