blog.humaneguitarist.org

discoveries in digital audio, music notation, and information encoding

Archive for the ‘XML’ tag

XSLT: a practical usage example with Pubmed records

leave a comment

Update, December 10, 2010: If you are interested in getting PubMed citations into a spreadsheet application (Excel, etc.) please see PubMed2XL. PubMed2XL is free software that can convert PubMed citations into a Microsoft Excel file.

As part of my coursework for the University of Alabama SLIS program, I took a database class last year. Long story short, one of assignments was to create a Microsoft Access dbase based on Medline records.

The records were already provided for us as well as Java-based script to parse the information into a tab-delimited format prior to import into Access.

For extra credit, we were given another script that would parse records from an Ovid database. If we could find access to an Ovid dbase (I couldn't as they were all password protected, understandably), we could run the script, parse the records and bring them into Access for additional credit.

But there was a way to use a free source, Pubmed, and still get the job done.

How? Well, Pubmed allows article information to be exported as XML.

Once in XML, there was no need for a script to parse the information. From there it was simple to bring the information into Access. I found it easier to import it into Excel, clean it up, and then import that Excel data source into Access.

But what if you have OpenOffice?

I'm not aware of a simple way to import XML documents into OpenOffice Calc (their spreadsheet app) or Base (their dbase app).

But by using XSLT, there's a way around this issue.

Here are the steps:

  1. Conduct searches in Pubmed.
  2. Send your articles to the Clipboard.
  3. Set display to "XML".
  4. Send the results to "File" (see image below).
  5. Save the file as "pubmed_results.txt".
  6. Change the file's extension from "txt" to "xml".
  7. Open the document in a text editor.
  8. Above the DTD (i.e. <!DOCTYPE PubmedArticleSet PUBLIC … ">), add the following line:

<?xml-stylesheet type="text/xsl" href="pubmed_xslt.xsl"?>

  1. Re-save the file.
  2. Then, download this file to the same directory as your "pubmed_results.xml" file.
  3. Now click on "pubmed_results.xml" ; your browser should now display select data in an HTML tabular format.
  4. From here, simply copy/paste the tabular data into OpenOffice Calc, clean it up as desired, save it as a ".ods" file, hook it up to OpenOffice Base, and design your queries, etc.

And now you've got a totally Free (minus the cost of a laptop, internet connexion, etc.) desktop dbase of Medline results.

* Note that the XML stylesheet I provided only displays certain info. You can always open the stylesheet in a text editor and set it to display more information, such as Abstract, etc.


 

--------------

Related Content:

Written by nitin

August 15th, 2009 at 1:48 pm

Posted in XML

Tagged with , , , ,

XSLT transformations: "more than meets the eye"

one comment

A few months ago, my department head had encouraged us to learn about XML stylesheets and XSLT transformations. After picking at it here and there, I finally had my breakthrough with it this weekend. Of course, were I more patient, I could have gotten paid to do this at work tomorrow.

As usual, the majority of the work is in finding examples and explanations that speak to me. This thread was particularly helpful.

One of the biggest breakthroughs – as embarrassing as it is to admit – was my realization that one needed an XSLT processor to actually create a new XML document based on the instructions provided in the stylesheet.

I’ve been experimenting with both the Saxon and Microsoft processors. Rather than run them from the Windows command prompt, I’ve been using the command line interface in the jEdit text editor. There’s a built in XSLT processor plug-in with jEdit, but I couldn’t get it to work, hence the use of the afformentioned methods.

If I understand corrently, one of the uses of this will be to take XML data about audio files generated from the JSTOR/Harvard Object Validation Environment (JHOVE) and map the pertinent information to another schema/XML document. That’s a bit out of my league right now, but a modest start is yet a start.

I’ll also be interested in using transformations to make customized XML documents from MusicXML sources and Zotero exports. Admittedly, I have no real ideas as to what I’d need to do this for, but I simply have a hankering to think of related projects. Maybe pulling the lyrics out of a MusicXML document into a TEI verse document?

--------------

Related Content:

Written by nitin

August 9th, 2009 at 6:38 pm

Posted in XML

Tagged with , , , , ,

Switch to our mobile site