blog.humaneguitarist.org
PubMed2XL
[Sun, 08 Aug 2010 14:12:00 +0000]
PubMed2XL 2.01
___________________________________________________________________________
Table of Contents
Introduction Download Using PubMed2XL Changing Settings Advanced Capabilities for Developers FAQ
___________________________________________________________________________
Introduction
PubMed2XL is a Microsoft Windows and Linux application that can convert PubMed [http://www.ncbi.nlm.nih.gov/pubmed].gov [http://www.ncbi.nlm.nih.gov/pubmed] citations to Microsoft Excel 2007 (.xls) or OpenDocument (.ods) spreadsheets. PubMed2XL is licensed under the MIT software license [http://opensource.org/licenses/MIT]. PubMed2XL includes software [http://sourceforge.net/projects/pyexcelerator/] developed by Roman V. Kiseliov [http://sourceforge.net/users/rvk].
___________________________________________________________________________
Download
You can download PubMed2XL by clicking on one of the links below. Click here [http://blog.humaneguitarist.org/uploads/PubMed2XL/PubMed2XL-2.01-setup.exe] to download the Windows self-installer (Windows) Click here [http://blog.humaneguitarist.org/uploads/PubMed2XL/PubMed2XL-2.01.zip] to download the ZIP file (Windows/Linux). The Windows installer will create a shortcut to the main application in the same directory in which you install the software as well as in the Start Menu. I highly recommend you choose "C:\PubMed2XL-2.01" as your installation directory. Otherwise, the software may not be allowed to create new Excel files on your Windows computer. Both download files contain Windows executables and the Python source code. Linux users can run the Python files after installing the dependencies noted in the "/docs/DEPENDENCIES.TXT" file. PubMed2XL 2.0 has been tested on 32-bit versions of Windows 7 and Linux Mint Xfce Edition. Note: as of December 2015 a user sent instructions on how to get the application to work on a Mac with WineBottler. You can read more here [http://blog.humaneguitarist.org/2015/12/28/user-contributed-content-getting-pubmed2xl-to-work-on-macos/].
___________________________________________________________________________
Using PubMed2XL
To learn how to install (Windows) and use PubMed2XL, please see the video tutorial below.
IFRAME: http://player.vimeo.com/video/15098984
PubMed2XL: Basic Installation and Use [http://vimeo.com/15098984] from nitin arora [http://vimeo.com/user3665532] on Vimeo [http://vimeo.com].
___________________________________________________________________________
Changing Settings
The "Options" tab (image below) gives users options for:
1. selecting the output between Excel 2007 format and OpenDocument spreadsheet format using the "Toggle Output Format" option,
2. selecting whether or not to process citations from books using the "Toggle Book Citations" format,
3. and altering the spreadsheet column output (see "Stylesheets" below).
The "Preferences>Save Preferences" command allows users to save any changes to the options. IMAGE: "PubMed2XL Options tab"[http://blog.humaneguitarist.org/uploads/PubMed2XL/version2_changeStyle.png]
Stylesheets
Pubmed2XL uses "stylesheets" to determine the output format of the spreadsheet that it creates. In other words, things like column order, column title, and the PubMed data placed in a given column's cell is controlled by a PubMed2XL stylesheet. Users can specify a different stylesheet by clicking on "Options>Change Stylesheet" prior to using the "Tools>PubMed XML to spreadsheet" command.
Creating Stylesheets
Users familiar with XML [http://www.w3schools.com/xml/] and the specific XML outputted by PubMed can change the format of the outputted Excel file that PubMed2XL creates by creating their own stylesheet. Using an external stylesheet in a well-known markup language allows researchers and librarians, etc. to customize PubMed2XL's output and to easily share stylesheets with each other and their friends and patrons. More information and instructions for making new stylesheets can be seen in the Pubmed2XL stylesheet XSD [http://www.w3schools.com/schema/default.asp] located at "./styles/schema/". By studying the schema document and the default stylesheet, advanced users can better customize PubMed2XL to their needs. Starting with version 1.0, PubMed2XL uses XSLT [http://www.w3schools.com/xsl/xsl_intro.asp] 1.0 as the only method of parsing data from PubMed XML files. This is in contrast to 9.x versions that used a home-grown XML processing language. As such, PubMed2XL 1.0+ is not backwards compatible with version 9.x stylesheets. The move to XSLT was to accommodate requests from users for more capabilities. If you need assistance customizing a stylesheet, please leave a comment below so that we can all work together. Thanks! Here is an example stylesheet that will make a spreadsheet with on column called "PMID" in which the PubMed article ID value will be placed. The data in the cell is hyperlinked to the article's page on PubMed.gov using the optional element.
<?xml version="1.0" encoding="UTF-8" ?>
<spreadsheet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema/PubMed2XL-2.0.xsd">
<column>
<title>PMID</title>
<cell><![CDATA[
<xsl:stylesheet version="1.0" encoding="UTF-8" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method = "text" />
<xsl:template match="/">
<xsl:value-of select="//PMID" />
</xsl:template>
</xsl:stylesheet>]]>
</cell>
<hyperlink><![CDATA[
<xsl:stylesheet version="1.0" encoding="UTF-8" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method = "text" />
<xsl:template match="/">
<xsl:text>http://www.ncbi.nlm.nih.gov/pubmed/</xsl:text>
<xsl:value-of select="//PMID" />
</xsl:template>
</xsl:stylesheet>]]>
</hyperlink>
</column>
</spreadsheet>
___________________________________________________________________________
Advanced Capabilities for Developers
The PubMed2XL GUI application is, as of version 2, built atop a Python library, "pm2xl.py". The goal of the library is to provide developers a way to automate tasks from searching PubMed programmatically, to creating PubMed XML files from a list of PMIDs (PubMed IDs), and to - of course - making a spreadsheet from a PubMed XML file. If you are a programmer and use the library, I'd really appreciate feedback as to how you use the library as well as any constructive criticism. Advanced users with Python programming skills can use the library in other scripts and for task automation. Any of the "pm2xl.py" functions can also be called via the command line, allowing it to be used by programmers who prefer other scripting languages. For example, consider the "makeSheet()" function which converts a PubMed XML file into a spreadsheet. To call the function from the command line one can do the following:
$ python ./pm2xl.py makeSheet('myPubMedXML.xml', outputFile='myPubMedSpreadsheet.xls')
Note, that only single quotation marks can be used when enclosing strings from the command line. Also note that, on error, the script will return a "1" unless the string "DEBUG" is passed as the last argument as in the example below - which also demonstrates using the Windows ".exe" version of the library instead of the Python version.
$ pm2xl.exe makeSheet('myPubMedXML.xml') DEBUG
For more information on the library functions, please see the PyDoc documentation [http://blog.humaneguitarist.org/uploads/PubMed2XL/pm2xl.html] located at "./docs/pm2xl.html".
___________________________________________________________________________
FAQ
1.
1. How many citations can PubMed2XL process?
1. PubMed2XL is not recommended for processing an XML file of PubMed citations for more than a few thousand citations (less than 5k).
2. I've tested PubMed2XL for ~5,000 records. The XML file downloaded from PubMed.gov was roughly 50 megabytes. It took PubMed2XL less than 1 minute to create an 8 megabyte Excel (.xls) file.
3. I also tested nearly 25k records using a development version of PubMed2XL from the command line. That took approximately 1.5 hours to process and it took OpenOffice almost 10 minutes just to open the spreadsheet.
4. Both tests were on my Lenovo T510 (431328U) [http://www.amazon.com/Lenovo-ThinkPad-431328U-15-6-Inch-Laptop/dp/B0032WH4NY] and used the default Pubmed2XL stylesheet which is verbose as it retrieves a lot of data from the PubMed citations including abstracts.
2. I don't have Excel. Can I still use PubMed2XL?
1. PubMed2XL creates an Excel (.xls) file that can also be opened in OpenOffice [http://www.openoffice.org/] and other applications, including Google Docs.
3. Why do I see the message "File Error: data may have been lost." in Excel when I open a file created by PubMed2XL?
1. PubMed2XL uses the Excel 2007 format (.xls) instead of the newer format (.xlsx); this seems to be the sole culprit in generating the error message. If you are using Excel 2010+ and are seeing this error, try clicking through the error and use the "Save As" command to save the PubMed2XL file in the newer Excel format (.xlsx) and delete the ".xls" file created by PubMed2XL. I've never seen that any data was actually lost despite the error message.
4. I don't want to create an Excel (.xls) file and/or I prefer to use OpenOffice [http://www.openoffice.org/] or LibreOffice [https://www.libreoffice.org/], etc. Can PubMed2XL create Open Document [http://en.wikipedia.org/wiki/OpenDocument] (.ods) spreadsheets?
1. Yes. Just use the "Options>Toggle Output Format" command. Programmers can use set the extension to ".ods" using the "outputFile" argument in "makeSheet()".
5. Does the software process book-based citations if found within the XML?
1. By default, no. But you can, as of version 2.0, use the "Options>Toggle Book Citations" command.
2. Note that the default stylesheet included with the software is tailored to journal citations. As such, many of the columns for book citations will likely be left blank.
COMMENTS
Hi DG, The NLM Journal catalog XML is a completely different data set and data format than the PubMed citations that PubMed2XL parses. So PubMed2XL can't help you. Unless you find another solution, directly loading the XML into Excel might be your best option - of course, that might require a lot of work to figure out how to do. I'd recommend you contact NLM directly and see if they know of anything that can help you. thanks,
Hi, I wish to know is there any stylesheet available to analyze (excel file format) for NLM Journal catalogue which can also be downloaded in XML file format. Presently there are about more than 5200 journals listed with PubMed. When we try downloading in XML file format and then useing MS-Access or Excel tried it doesn't work. Thanks in anticipation,
Hi DG, I think the author affiliation is only provided for the first Author for PubMed. If you know of a specific article that also provides it for different authors, please let me know and send me the article link. I have a previous stylesheet that will get the Affiliation for the 1st Author and the MESH terms. Note: all the terms appear in one column and are semicolon separated. Instructions: Save this file ("SAVE AS" via your browser): http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_2.x/jkorkou_120813.xml in PubMed2XL's "/styles" folder. Then, when you run PubMed2XL, go to "Tools>Select Stylesheet" and select "jkorkou_120813.xml" before selecting "File>Convert PubMed File".
Sir, This is very useful tool to analyse PubMed records. I could retrieve First Author Affiliation with Jackson style-worksheet. How to retrieve last author affiliation and MeSH term. I mean is there any similar style sheet available. If yes please inform from where I can download the same. Thanks, DG
For future reference: we were able to get this working on J's computer by uninstalling PubMed2XL and re-installing with the following destination folder (selected during installation): "C:\PubMed2XL-2.01". The default "Program File" installation area was causing problems because Windows didn't want to allow PubMed2XL to write Excel files to J's hard drive.
Hi, This is an awesome software! I tried to convert PubMed XML results to Excel, but it would not show anything? It would process but I don't get anything. Is it because I have 64bit operating system with Excel 2016? Please help! When I exit, I also get an error message--"They are logged in: logs/2017-11-29-10h-01m-11s.log. Thanks!
I emailed Yogesh offline to let then know this would be custom programming and is not something supported by PubMed2XL.
Thanks Nitin for this wonderful application. I wanna ask you can you provide similar softwares for other databases like KoreaMed and EuropePMC ?
Note: I'd written Pruthvi offline to suggest that his downloaded results may have included book citations. By default, the PubMed2XL doesn't extract these. But it can with the "Options">"Toggle Book Citations" option.
Hi, I am downloading nine thousand data from pubmed, but when i extract using this data, its showing only 2000 data, is it possible to increase the limit. THank you
Hi Subha, It looks like I already did this for someone else. You'll need to download this file: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_2.x/claudia_121213.xml and place the file inside PubMed2XL's "styles" folder. Then do: – launch PubMed2XL – go to TOOLS> SELECT STYLESHEET – select the XML file you just downloaded – then choose FILE> CONVERT PUBMED FILE and process you PubMed XML file you already exported from pubmed.gov. thanks,
Great software!! Very helpful. Any chance you (or I) can add a "Last Author" column. The last authors are usually head of labs/principal investigators, so would really be helpful.
Hi Andrea, I'm glad the software is useful. I'm sorry for the delay in responding. I don't think there is a way to extract your query and place it in the Excel file automatically. You could, however, just create a new column in the spreadsheet after it's made. You could call that column "source_query" or something like that and past the query you typed into every cell of results. If I've misunderstood your question, let me know. Thanks, Nitin
Hi, First of al, many thanks for your software: it is great and allows for tremendous time saving. Going to the question: is there a way to retrieve in the excel file the search terms used to generate the search results? I am thinking of something similar to what is displayed in the box "search details" on the right hand sinde of the page showing search hits. Many thanks for letting me know. Best, Andrea
Thanks for your question. I've had one user write up something on how to get it to work on a Mac. Here's the link: http://blog.humaneguitarist.org/2015/12/28/user-contributed-content-getting-pubmed2xl-to-work-on-macos/ I hope that helps.
Hi, I used to use this software on my windows, however i changed to a mac and would like to use. Do you know if there is a way to download? Thank you
Hi I want to get keywords of articles in a column of excel file, how can i do that?
Here's the reply I sent to Emdy last month via email ... I have something that will sort of do this. Because the XML data isn't flat it doesn't map super-well to Excel. But you will need to download this file to alter the output of the Excel output: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_2.x/jason_160321.xml Place the file inside PubMed2XL's "styles" folder. Then: – launch PubMed2XL – go to TOOLS>SELECT STYLESHEET – select the XML file you just downloaded – then choose FILE>CONVERT PUBMED FILE and process you PubMed XML file you got from rom pubmed.gov. An explanation of the output is here: http://blog.humaneguitarist.org/projects/pubmed2xl/comment-page-1/#comment-26306 Let me know if these instructions make sense. thanks,
Sounds like the issue is with your IT department. Maybe you could try talking to them rather than posting here.
On a gov computer. Can't download and install anything. So, this is worthless to me.
Hi Nitin, Thanks for this tool - it is great. At present, I can only get the affiliation of the first author only. Do you know if there is a way of pulling out all the authors and their associated affiliations? I should probably point out up front that I am not that technical, sorry! :) Any help would be gratefully received Thanks, Emdy
This is excellent, thankyou so much for sharing (saved me hours of work!) :)
Hi Sandy, I think you'd want to just save items to the Clipboard and then export your Clipboard to XML. Here's a link that might help: http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Saving_and_Emailing_Results_a thanks, Nitin
Hi Nitin, Thanks very much for putting this together! I have a very basic questions. For searches with >500 or >1000 citations, what is the right approach to select all and download all to an XML file? Do you create a collection and add to it, or is there another way to select all citations other than creating a collection?
While I followed up with Jason via email, I forgot to post his stylesheet online. It's here: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_2.x/jason_160321.xml [http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_2.x/jason_160321.xml] Here's some information on the output as I wrote to Jason. This stylesheet retains the "All Authors (Last, Initials)" column with a SEMICOLON plus SPACE ("; ") delimiter between each author. After that is a new "All Authors (Afilliation)" column. Multiple affiliations for the same author should be separated by a " | ". A "; " should separate affiliations for each author. So for three authors, the affiliation for the 2nd author should follow the first semicolon in the cell - provided there are no semicolons in the PubMed affiliation string for the first author - if that makes sense. Note: depending on what's encoded in the XML there might not be ANY author affiliations. It looks like they might have added per-author affiliations more recently. So if there aren't any, that column could just have lots of cells with semicolons and no text.
Hi Nitin, I completely forgot I posted this question. Thanks so much for your reply. Your software works great for me. Just one thing, only the first authors affiliation is extracted into the excel file. Would it be possible for you to write a stylesheet that extracts all the authors affiliations please? (these can be found under the author information drop down. Having them all in the same cell would be fine. Thanks for your help! Jason
Hi Jason, The default output doesn't get that data, but the software can be made to get the data - if it exists as discrete information from the export. I can follow up with you via email, but for now this previous thread shows how the author affiliation can be extracted: http://blog.humaneguitarist.org/projects/pubmed2xl/comment-page-1/#comment-213. Hope that helps, Nitin
Does this software get email addresses and institutions from pubmed?
Chris is correct. PubMed2XL works within Winebottler on OS X (only tried Capitan). Use the advanced mode in Winebottler to install PubMed2XL.exe (Select copy all files in the folder to the App bundle). Once the xls has been created by PubMed2XL, you may have to open the file directly. Thanks again Nitin! ___________________________________________________________________________ Editor's note: I've uploaded the email instructions Bob sent for getting PubMed2XL 2.0 working on his Mac with WineBottler. You can read more here [http://blog.humaneguitarist.org/2015/12/28/user-contributed-content-getting-pubmed2xl-to-work-on-macos/] and see the instructions. Thanks Bob.
I am PhD student working on dissertation, I have Mac, I was wondering if you could help me to install pubmed2xl to mac, I tried, however, without application, I was not able to open the file. ___________________________________________________________________________ Editor's note: I've emailed Sherry off-list to let her know that PubMed2XL isn't currently available for Mac OS, but I wanted to leave her comment up so others could see it. Technically, one could run PubMed2XL on a Mac using the Python source files, but that would require a good bit of technical know-how.
Nitin, I have managed to figure out quite a tidy workaround to run PubMed2XL successfully on Mac OS X, by installing it with 'WineBottler' (available free). This even allows you to distribute a Mac version of PubMed2XL as a stand-alone Mac App. This is a great product and even greater now that it can be used by those of us who appreciate a decent operating system! Nitin if you would like me to share this stand-alone copy with you so that you can distribute it on your site, let me know. Alternatively, as it is such a big file (600mb), you may think it more appropriate to provide instructions for your users on how to install it themselves onto a Mac OS. Chris ___________________________________________________________________________ Editor's note: I've requested the instructions from Chris, but until/if I receive them I at least wanted others to see the note about WineBottler. And I agree that Mac OS is "decent" ... and only just decent. ;-)
Hi Peter, I wasn't even aware of Microsoft SCCM until you posted your comment. I'm not familiar with silent installs, are you trying to install this on several machines at once? Sorry I can't be of any help.
hi, i know this thread as a bit old, but does this application lend itself to silent installs via something like microsoft sccm. if it does do you have any info on how to achieve that. Thanks P.
Hey David, Thanks for the question. Short answer: yes, this is possible. I'll message you off-list, but since this is for a non-PubMed XML source, I can't really offer any support, but I might throw a modified version you're way that supports the WHO format.
Like all, I find PubMed2XL very useful. I am also using the World Health Organisation Global Health Library (http://www.globalhealthlibrary.net/php/index.php [http://www.globalhealthlibrary.net/php/index.php]) to compile a spreadsheet. unfortunately, there is no option to export a CVS file, but there is a button to save the search in XML format. May it be possible to adapt PubMed2XL to create an XL file with basic details such as ID, URL, & citation? I'm a writer, not tech, so doing so is beyond me, but maybe others have thought of this already?
For anyone wanting to do the same thing as Franco, I've pasted (with his permission) the solution he came up with below. - Nitin. Hi Nitin, thank you for your answer. I found a solution for my problem which work very well. After the search in PMC (I need to search in PMC because some keyword I’m interested in are only listed in the “material and method” section, which is not searchable in pubmed) I can simply click on see related articles in pubmed which gives me the same list in pubmed. Then I do the save as xml and use your fantastic tool. I adapted the style sheet to give me the First and the Last Author with Firstname and Lastname in separated Fields. Thank you again, Cheers Franco
Hi Franco, The PMC XML output is very different from the PubMed XML format - that's why it doesn't work. If you want to email me off-list at "nitaro74 AT gmail DOT com" about what you are wanting to do we can talk more. thanks,
Hi Nitin, this programm is absolute fantastic, thank you so much. Is there a way to also convert PMC search results to Excel. I tried but with the default style sheet it doesn't work. Cheers, Franco
Thanks for the feedback.
Thanks a lot. I appreciate your effort so much.
Sorry for the confusion Thomas. To run it on Linux, you'll have to run the Python files. The dependencies are listed in the "/docs/DEPENDENCIES.TXT" file. What distro/version are you using?
How is this app supposed to be linux-compliant? Binaries are clearly for Windows, or does the author assume integration of WINE??
Since this was a pretty specific/custom request for David's work, I'm not going to upload a new stylesheet but just going to paste the code below so I don't lose it. <column> <title>MeSH</title> <cell> <![CDATA[ <xsl:stylesheet version="1.0" encoding="UTF-8" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method = "text" /> <xsl:template match="/"> <xsl:for-each select="//MeshHeading"> <xsl:value-of select="DescriptorName" /> <xsl:for-each select="QualifierName"> <xsl:text>/</xsl:text> <xsl:value-of select="." /> </xsl:for-each> <xsl:if test="position()!=last()"> <xsl:text>||</xsl:text> </xsl:if> </xsl:for-each> </xsl:template> </xsl:stylesheet> ]]> </cell> <hyperlink /> </column> <pre><code></code></pre>
Hello, I am trying to modify the code provided by jkorkou (jkorkou_120813.xml) to get MeSH descriptors AND qualifiers into Excel. Each MeSH descriptor may have zero, one, or more qualifiers. What I would like in Excel is for all the data to go into one cell in the form descriptor/qualifer1/ qualifier2. Below is what the XML looks like from PubMed and what I have come up with so far. The problem is that only the first qualifier is coming into Excel. If there is more than one qualifier, it is ignored. Do you have any suggestions? Thank you. <MeshHeadingList> <MeshHeading> <DescriptorName MajorTopicYN="N">HIV Infections</DescriptorName> <QualifierName MajorTopicYN="N">epidemiology</QualifierName> <QualifierName MajorTopicYN="Y">prevention & control</QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Humans</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Urinary Tract Infections</DescriptorName> <QualifierName MajorTopicYN="N">epidemiology</QualifierName> <QualifierName MajorTopicYN="Y">prevention & control</QualifierName> </MeshHeading> </MeshHeadingList> <column> <title>MeSH</title> <cell> <xsl:stylesheet version="1.0" encoding="UTF-8" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method = "text" /> <xsl:template match="/"> <xsl:for-each select="//MeshHeading"> <xsl:choose> <xsl:when test="position()=last()"> <xsl:value-of select="DescriptorName" /> <xsl:text>/</xsl:text> <xsl:value-of select="QualifierName" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="DescriptorName" /> <xsl:text>/</xsl:text> <xsl:value-of select="QualifierName" /> <xsl:text>||</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:template> </xsl:stylesheet> </cell> <hyperlink /> </column>
Just FYI to any other readers, claudia now has a stylesheet that also outputs the Last Author to the default output. Here’s the link to the stylesheet: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_2.x/claudia_121213.xml Note this is designed for Pubmed2XL versions 2.x.
hi, love your tool! do you have a script to have the last author listed? Thanks a lot, Claudia
Just FYI to any other readers, jjrok now has a stylesheet that supports getting the MESH descriptors in one column, they are SEMICOLON delimited. Note: there's also a new field for ISSN which got in there and jjrok also added a column for "Chemical Substances". Here's the link to the stylesheet: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_2.x/jkorkou_120813.xml Note this is compatible with Pubmed2XL versions 1.x and 2.x. I recommend using version 2 of the software.
hello, does one of you has came up with a stylesheet to extract every meshterms in one cell but separated by commas or | ? I am very interested in this but not familiar enough with XML to do it... Thanks a lot, jrrok
Glad to hear it's handy. :-)
Thanks very much, Nitin. That looks like a very good solution. BTW, so far, despite having to cut-and-paste new records to my existing spreadsheet, I'm finding PubMed2XL very handy!
Hey Oliver, Thanks for the question. PubMed2XL can't achieve this, but you should be able to use your spreadsheet application (Excel, etc.) to copy/paste and combine two different spreadsheets. If you have Excel you might want to take a look at this add-in called RDBMerge (http://www.rondebruin.nl/win/addins/rdbmerge.htm) to combine multiple spreadsheet files. Hope this helps, Nitin
I'm excited to try this thing. I'm wondering, though, if I start a new search session, can I append to the same spreadsheet of a previous session?
Thanks for leaving a comment. I'm glad it's helping. - Nitin
Thank you!! Thank you!! This was exactly what I needed, I was trying all day with the .XML file in EXCEL and ACCESS and couldn't find the way to organize the data in a logical way, and then with like 5 clicks my work was done!!! God bless you, Melissa
Hi Doreen, I’ll need you to send me the following at my email (nitaro74 AT gmail DOT com): - version of PubMed2XL you are using, - a copy of the XML file you got from PubMed (i.e. email as attachment). You can get the version number by using the Help>About menu option. thanks, Nitin
Hi, Nitin, it seems fantastic program i've ever seen ! however it doen's work for me T.T i am using excel 2007, it doesn't work .. if I click "convert pubmed file" and select "PubMedResults.xml", PubMed2XL stops. it does not show me any pop-up, and not create any xls files. what should i do ?
Hi Nitin, That is perfect. Thank you so much :-) I am so impressed with this tool and it saves so much time. What a great job! Thanks, Andy
Hey Andy, If I understand correctly, you want a column with a link to the article that shows the actual full link in the column? If so, try downloading this style: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_1.x/andy_030613.xml and storing it in the "/styles" folder. When you run PubMed2XL, go to "Tools>Select Stylesheet" and select "andy_030613.xml" before selecting "File>Convert PubMed File".
What a fabulous tool, thank you so much! I can't believe I have been doing so many lengthy procedures for so long. I have a question if you are able to help please. I would like to have a column in which the URL for the PMID is displayed as a hyperlink that actually shows the text instead of just the PMID number (if that makes sense). I know I could do this manually in Excel, I am just unsure of how to modify the xml style to do this automatically. Any help would be greatly appreciated. Thank you
Oscar and I figured out that the most likely reason this wasn't working was an accent mark and/or whitespace in his Windows user name. When he moved PubMed2XL to C:\PubMed2XL it worked (i.e. bypassing the path with his user name). Something for me to keep in mind for version 2.
Many thanks for your tool, it look very useful. However, even though I have followed step by step the instructions, I cannot generate the Excel file. To be honest, I don't know what is going on. I have Excel 2010 and Windows 7. I'll really appreciate your kind help.
Thanks Christophe. Keep an eye out for a new version (I'm aiming for early March). The new version will still have a GUI interface, but will also be a Python module and a command line tool - the latter two being helpful if you really want to do some serious automation and integrate PubMed2XL into other scripts. thanks, Nitin
Thank you very much for this software. This will save me many hours of work !!! It is perfect for what I need to do (bibliometric analysis) using pubmed data Christophe
Hey David, Thanks for writing. This really isn't possible with the software as it wasn't designed to that. Getting everything really would be a job for a totally new script. But there would be semantic issues in regard to naming the column fields per the data that got extracted - which is likely to vary from article to article within the XML. At some point this kind of thing also raises questions about trying to totally flatten out nested data in the first place and would likely point to the need to ingest the XML in a native XML database. If you have Excel you could try importing the XML directly though I imagine that would require a lot of cleanup.
A stylesheet that outputs everything would be ideal. For my situation, and I am guessing many others, the best way to handle multi-value fields (like authors, MeSH, etc) would be to put all the values in one cell with a delimiter like a double verticle pipe (||).
Ok, I sent you an email. Thanks!
Hey Aaron, I can check it out. I'll need you to send me the following at my email (nitaro74 AT gmail DOT com): - version of PubMed2XL you are using - a copy of the XML file you got from PubMed (i.e. email as attachment) - a copy of the stylesheet (as email attachement) - a copy, if you have it, of the outputted Excel file (as attachment). thanks,
Hi Nitin, You helped me with a new stylesheet that pulls the first author's affiliation. I used this a few months ago successfully, but now it doesn't seem to be working. I have selected to use that custom style sheet, and restarted the program and my computer, but there is still no affiliation column. Any thoughts? Much thanks!
Hi Oren, I don't have a style which would export everything. One of the problems is that I don't know all the elements, though one could perhaps generated a generic "catch-all" style by parsing the document type definition for PubMed - something like this: http://www.ncbi.nlm.nih.gov/entrez/query/static/PubMed.dtd. But this would still be sub-optimal given that there are still questions related to how to display multi-valued fields like all the authors, MESH terms, etc. within one cell. If you are putting it into a database, you might be able to directly import the XML file into Excel (if you have Excel). I think one could use the import "as a read-only workbook option" - or something. I think you'll have duplicates that way (i.e. row per author per PMID), but maybe that could be cleaned up with a pivot table or database query. Actually, that might be more desirable than only having one row per article since the latter is less granular.
Nitin Thanks, you have solved a problem I have been strugling with for some time, i have used work arounds through web of sceince which can export to excel however i do prefer Pubmed. Is there a style which could let me export all the fields to excel? I can later decide if i use them in my database. Thanks
Hey Steve, My bad. Here's the style sheet (check columns "O" and "S"): http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_1.x/steve_102711.xml Let me know if it works. Thanks,
Thanks, the other program worked great! Sorry for delayed response. I was hoping that there would be a way to pull out the country of the first author's affiliation, rather than the name of institution, etc. But that may be more difficult since the country appears in different places in that string of text. Much thanks!
Nitin, Sorry if I was not clear. I was after an output style that included BOTH, PUBLICATION TYPE and AUTHOR LOCATION. so a combination of the two styles you have added above. Many thanks Steve p.s. excellent piece of work by the way
Hey Steve, If you just need to remove Publication Type and Author from the default output, it's pretty easy to delete these in Excel, etc. You could even record a macro in your spreadsheet program to do that with one click.
Would you be able to put together style sheet that included all the basics but with PUBLICATION TYPE and AUTHOR Location. That would be fabulous. Many thanks Steve
Hey Aaron, Thanks for the feedback. You're lucky I'm home sick today, so I took a stab at the stylesheet. If it isn't what you want, I would need more detail from you so you'd need to email me off-list. Re: first author's location, I used the Affiliation field which may or may not have the information. You can go here: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_1.x/ And download the file with your name on it. Make sure to save it with an XML extension ("yourFilename.xml") inside PubMed2XL’s "styles" folder. Then: - launch PubMed2XL - go to TOOLS>SELECT STYLESHEET - select the XML file you just downloaded - then choose FILE>CONVERT PUBMED FILE and process the PubMed XML file from pubmed.gov. thanks,
May you help me create a stylesheet that exports only the PMID, Title, First Author's Name, Pub Date, State/Country of 1st author, Journal Name, Abstract? Until then, I will just download the default and delete unneeded columns. MUCH THANKS!
GOD BLESS YOU, sir! This saved me bazillions of hours!!
Hey Theis, Thanks for the feedback. I like the GUI builder idea! I've added it to my to-do list. As far as Publication Type, I *think* I know what you mean. You can try saving this file: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_1.x/theis_101411.xml and placing it inside PubMed2XL’s “styles” folder. Make sure it’s saved as an XML file such as “theis_101411.xml”. Then: - launch PubMed2XL - go to TOOLS>SELECT STYLESHEET - select the XML file you just downloaded - then choose FILE>CONVERT PUBMED FILE and process the PubMed XML file from pubmed.gov. There should now be in Column O a "Publication Type(s)" field containing semicolon separate values for each publication type a la: "Journal Article; Research Support, Non-U.S. Gov't". Then you can filter the data with the spreadsheet software's "Filter" feature by things such as: Contains "Journal Article" AND Does not Contain "Research Support", etc. I don't know if that helps. If not, let me know and maybe we can figure something out.
Thnx, so much for this great application. Been using an old script from your blog for my last project - this solution is perfect! Prehaps a GUI for selecting what columns to put in the output file is in v2.0? In the most pubmed listings a "publication type" tag has been added but it often contains several lines - any ideas how to handle this so you can sort in "reviews" "case reports" "RCT's" etc. A danish user Theis
great tool!..... makes my work much easier :) thanks
Thank you for your work on this program, it is perfect for what I need to do.
Hey Robert, There appears to only be one Affiliation listing per article. You can try saving this file: http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_1.x/robert_091111.xml [http://blog.humaneguitarist.org/uploads/PubMed2XL/moreStyles/version_1.x/robert_091111.xml] and placing it inside PubMed2XL's "styles" folder. Make sure it's saved as an XML file such as "robert_091111.xml". Then: - launch PubMed2XL - go to TOOLS>SELECT STYLESHEET - select the XML file you just downloaded - then choose FILE>CONVERT PUBMED FILE and process the PubMed XML file from pubmed.gov. This is for PubMed2XL version 1.0 or higher only. Let me know if this helps. Thanks, Nitin
Hi Nitin, I know nothing about XML coding, but have spent several hours trying to figure out how to add to your default stylesheet the additional information for "Affiliation". Help please! Cheers, Robert
Thanks Nitin.
Hey Aidan, I’ve seen that same error on Office 2010, but I can’t see any evidence of a real problem. I don’t know much about the library I’m using to make an Excel 2007 file, but that might have something to do with it. As far as the outputted filename, the only way (other than to rename the file after the fact) is to use the command line version: $ PubMed2XL-1.0-CL --in="pubmed_result.xml" --out="pubmed_result.xls" The example above assumes you downloaded the PubMed XML file to the same folder as the PubMed2XL application and that you want to save the spreadsheet to the same directory.
Nitin, This was just what I needed to round out a set of Excel macros. I could not work out how extract XML info in VBA, so this does the trick wonderfully. 2Qs though 1)I have Excel 2010 in WIn7 x64. When I open your Excel output it says "File error: data may have been lost". Everything seems OK but I don't know what is happening. Any ideas? 2) Is there a way to alter the file output name to a generic like "pubmed result"? I would like my macro to identify and open the data directly, but with the variable file name it makes life tricky. Thanks again, Aidan
Thanks Alison. I'm glad it's useful.
This is awesome and so much faster than the workarounds I was thinking of to extract out the abstracts. I got my CSV file...which is great and then was trying to think of a way to get the abstracts in there for reference. This saved me a lot of time and was very easy to use. The default output is great and it is pretty easy to customize if you would like an alternative output. Thanks.
Dear Nitin, This is AWESOME!!!
Hey Jim, Excel isn't required - I have OpenOffice. Can you send me a copy of the XML file you are trying to use? You can email it to: nitaro74 AT gmail DOT com Thanks and sorry for the trouble.
Hi Nitin, I tried to download and install your software in VMware Fusion, which is a Mac Virtual MS OS. The program appears to install ok but will not read the PubMED txt (XML) file. Do you need Excel installed for your program to work? Cheers, Jim
Thanks for leaving a comment! - and for checking out the software.
Nice work - this is a very handy tool!