PubMed2XL

PubMed2XL 2.01


Table of Contents

Introduction

Download

Using PubMed2XL

Changing Settings

Advanced Capabilities for Developers

FAQ


Introduction

PubMed2XL is a Microsoft Windows and Linux application that can convert PubMed.gov citations to Microsoft Excel 2007 (.xls) or OpenDocument (.ods) spreadsheets.

PubMed2XL is licensed under the MIT software license.

PubMed2XL includes software developed by Roman V. Kiseliov.


Download

You can download PubMed2XL by clicking on one of the links below.

Click here to download the Windows self-installer (Windows)

Click here to download the ZIP file (Windows/Linux).

The Windows installer will create a shortcut to the main application in the same directory in which you install the software as well as in the Start Menu. I highly recommend you choose "C:\PubMed2XL-2.01" as your installation directory. Otherwise, the software may not be allowed to create new Excel files on your Windows computer.

Both download files contain Windows executables and the Python source code. Linux users can run the Python files after installing the dependencies noted in the "/docs/DEPENDENCIES.TXT" file.

PubMed2XL 2.0 has been tested on 32-bit versions of Windows 7 and Linux Mint Xfce Edition.

I would like to provide a self-installing Mac OS version one day. If someone can help compile one, please leave me a note. Thanks!

Note: as of December 2015 a user sent instructions on how to get the application to work on a Mac with WineBottler. You can read more here.

Older versions can be accessed here.


Using PubMed2XL

To learn how to install (Windows) and use PubMed2XL, please see the video tutorial below.

PubMed2XL: Basic Installation and Use from nitin arora on Vimeo.


Changing Settings

The "Options" tab (image below) gives users options for:

  1. selecting the output between Excel 2007 format and OpenDocument spreadsheet format using the "Toggle Output Format" option,
  2. selecting whether or not to process citations from books using the "Toggle Book Citations" format,
  3. and altering the spreadsheet column output (see "Stylesheets" below).

The "Preferences>Save Preferences" command allows users to save any changes to the options.

PubMed2XL Options tab

Stylesheets

Pubmed2XL uses "stylesheets" to determine the output format of the spreadsheet that it creates. In other words, things like column order, column title, and the PubMed data placed in a given column's cell is controlled by a PubMed2XL stylesheet.

Users can specify a different stylesheet by clicking on "Options>Change Stylesheet" prior to using the "Tools>PubMed XML to spreadsheet" command.

Creating Stylesheets

Users familiar with XML and the specific XML outputted by PubMed can change the format of the outputted Excel file that PubMed2XL creates by creating their own stylesheet. Using an external stylesheet in a well-known markup language allows researchers and librarians, etc. to customize PubMed2XL's output and to easily share stylesheets with each other and their friends and patrons.

More information and instructions for making new stylesheets can be seen in the Pubmed2XL stylesheet XSD located at "./styles/schema/". By studying the schema document and the default stylesheet, advanced users can better customize PubMed2XL to their needs.

Starting with version 1.0, PubMed2XL uses XSLT 1.0 as the only method of parsing data from PubMed XML files. This is in contrast to 9.x versions that used a home-grown XML processing language. As such, PubMed2XL 1.0+ is not backwards compatible with version 9.x stylesheets. The move to XSLT was to accommodate requests from users for more capabilities.

If you need assistance customizing a stylesheet, please leave a comment below so that we can all work together. Thanks!

Here is an example stylesheet that will make a spreadsheet with on column called "PMID" in which the PubMed article ID value will be placed. The data in the cell is hyperlinked to the article's page on PubMed.gov using the optional <hyperlink> element.

<?xml version="1.0" encoding="UTF-8" ?>
<spreadsheet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema/PubMed2XL-2.0.xsd">
    <column>
        <title>PMID</title>
        <cell><![CDATA[
            <xsl:stylesheet version="1.0" encoding="UTF-8" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
            <xsl:output method = "text" />
            <xsl:template match="/">
                <xsl:value-of select="//PMID" />
            </xsl:template>
            </xsl:stylesheet>]]>
        </cell>
        <hyperlink><![CDATA[
            <xsl:stylesheet version="1.0" encoding="UTF-8" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
            <xsl:output method = "text" />
            <xsl:template match="/">
                <xsl:text>http://www.ncbi.nlm.nih.gov/pubmed/</xsl:text>
                <xsl:value-of select="//PMID" />
            </xsl:template>
            </xsl:stylesheet>]]>
        </hyperlink>
    </column>
</spreadsheet>

Advanced Capabilities for Developers

The PubMed2XL GUI application is, as of version 2, built atop a Python library, "pm2xl.py".

The goal of the library is to provide developers a way to automate tasks from searching PubMed programmatically, to creating PubMed XML files from a list of PMIDs (PubMed IDs), and to – of course – making a spreadsheet from a PubMed XML file. If you are a programmer and use the library, I'd really appreciate feedback as to how you use the library as well as any constructive criticism.

Advanced users with Python programming skills can use the library in other scripts and for task automation. Any of the "pm2xl.py" functions can also be called via the command line, allowing it to be used by programmers who prefer other scripting languages.

For example, consider the "makeSheet()" function which converts a PubMed XML file into a spreadsheet.

To call the function from the command line one can do the following:

$ python ./pm2xl.py makeSheet('myPubMedXML.xml', outputFile='myPubMedSpreadsheet.xls')

Note, that only single quotation marks can be used when enclosing strings from the command line. Also note that, on error, the script will return a "1" unless the string "DEBUG" is passed as the last argument as in the example below – which also demonstrates using the Windows ".exe" version of the library instead of the Python version.

$ pm2xl.exe makeSheet('myPubMedXML.xml') DEBUG

For more information on the library functions, please see the PyDoc documentation located at "./docs/pm2xl.html".


FAQ

    1. How many citations can PubMed2XL process?
      1. PubMed2XL is not recommended for processing an XML file of PubMed citations for more than a few thousand citations (less than 5k).
      2. I've tested PubMed2XL for ~5,000 records. The XML file downloaded from PubMed.gov was roughly 50 megabytes. It took PubMed2XL less than 1 minute to create an 8 megabyte Excel (.xls) file.
      3. I also tested nearly 25k records using a development version of PubMed2XL from the command line. That took approximately 1.5 hours to process and it took OpenOffice almost 10 minutes just to open the spreadsheet.
      4. Both tests were on my Lenovo T510 (431328U) and used the default Pubmed2XL stylesheet which is verbose as it retrieves a lot of data from the PubMed citations including abstracts.
    2. I don't have Excel. Can I still use PubMed2XL?
      1. PubMed2XL creates an Excel (.xls) file that can also be opened in OpenOffice and other applications, including Google Docs.
    3. Why do I see the message "File Error: data may have been lost." in Excel when I open a file created by PubMed2XL?
      1. PubMed2XL uses the Excel 2007 format (.xls) instead of the newer format (.xlsx); this seems to be the sole culprit in generating the error message. If you are using Excel 2010+ and are seeing this error, try clicking through the error and use the "Save As" command to save the PubMed2XL file in the newer Excel format (.xlsx) and delete the ".xls" file created by PubMed2XL. I've never seen that any data was actually lost despite the error message.
    4. I don't want to create an Excel (.xls) file and/or I prefer to use OpenOffice or LibreOffice, etc. Can PubMed2XL create Open Document (.ods) spreadsheets?
      1. Yes. Just use the "Options>Toggle Output Format" command. Programmers can use set the extension to ".ods" using the "outputFile" argument in "makeSheet()".
    5. Does the software process book-based citations if found within the XML?
      1. By default, no. But you can, as of version 2.0, use the "Options>Toggle Book Citations" command.
      2. Note that the default stylesheet included with the software is tailored to journal citations. As such, many of the columns for book citations will likely be left blank.
--------------

Related Content:

90 Comments

  1. J

    Hi,

    This is an awesome software!
    I tried to convert PubMed XML results to Excel, but it would not show anything? It would process but I don't get anything. Is it because I have 64bit operating system with Excel 2016? Please help!

    When I exit, I also get an error message–"They are logged in: logs/2017-11-29-10h-01m-11s.log.

    Thanks!

    Reply
    1. nitin (Post author)

      For future reference: we were able to get this working on J's computer by uninstalling PubMed2XL and re-installing with the following destination folder (selected during installation): "C:\PubMed2XL-2.01".

      The default "Program File" installation area was causing problems because Windows didn't want to allow PubMed2XL to write Excel files to J's hard drive.

      Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

*