blog.humaneguitarist.org
easy calls to OpenCalais with Python, daggummit!
[Fri, 23 Mar 2012 17:28:00 +0000]
Yesterday, I wrote this [http://blog.humaneguitarist.org/2012/03/22/make-you-some-facets-boy/] post about using Yahoo's deprecated term extraction web service to generate "subjects" - or whatever you want to call them - for an item based on the metadata housed in a Solr-compatible XML file. I'd also wondered about doing the same thing with OpenCalais [http://www.opencalais.com/].
Before we go any further, I'd just like to say I wrote that post from my hotel room. I'm writing today's from the Denver airport with about 2 hours to kill before my flight departs. And I'd also like to point out that when writing blog posts with spotty Wi-Fi connections, one should not compose their post online through WordPress. I'm using WordPad, and I should probably make that a habit.
Yeah, so anyway there's not that much good documentation on how to make calls on the Calais site. By "good" I mean there's no code sample to rip off. I'm sure it's perfectly fine for people who actually know what they're doing.
Using "The Google" I found this [http://www.flagonwiththedragon.com/2011/06/08/dead-simple-python-calls-to-open-calais-api/] helpful post on making calls to OpenCalais. While I found it very well written and the code very helpful, I didn't want to have "httplib2" as a dependency since it's not available out-of-the-box with Python 2.7, as far as I know. Nor did I want to do anything with JSON. I'm just trying to make a simple POST request to the OpenCalais REST API - is all.
Using that post's code as a starting point, I whipped up some simple Python without "httplib2".
Note that this code passes three parameters to the API through the following variables:
* "myCalaisAPI_key": this is where to paste your API key once you get it from Calais here [http://www.opencalais.com/APIkey].
* "sampleText": this is a string of plain text to send to Calais for it to analyze and build terms for.
* "calaisParams": these are the options to pass to the service in XML format.
Note that I'm specifically requesting what I really want, "social tags", via the following option:
c:enableMetadataType="GenericRelations,SocialTags"
... and I'm specifically requesting a simple result format as follows:
c:outputFormat="Text/Simple"
There are other options, including RDF, that can be requested per the options mentioned on this [http://www.opencalais.com/documentation/calais-web-service-api/forming-api-calls/input-parameters] page.
If you look at the code, you can see I'm asking Calais to analyze some text about Tim Tebow [http://en.wikipedia.org/wiki/Tim_Tebow] since I was in Denver when the Denver Broncos [http://en.wikipedia.org/wiki/Denver_Broncos] football team acquired Peyton Manning [http://en.wikipedia.org/wiki/Peyton_Manning] and traded Tebow to the New York Jets. The text is from a USA Today article from, um, yesterday.
The Jets, I'd like to state, are not worthy of a hyperlink. And that's only part of the reason I'm sad to see Tebow go there. Alas.
Anway, here's the output below, followed by the code. Note that - as mentioned in the code - I'm using the slightly older REST API. But what do I care right now. I'm just testing.
Here's the output:
<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.-->
<!--
Company: HBO,
Organization: New York Jets,
Person: Tim Tebow,
TVShow: Hard Knocks,
-->
<OpenCalaisSimple>
<Description>
<calaisRequestID>dafa6c80-b4f6-77b1-1363-de96bb7764f4</calaisRequestID>
<id>http://id.opencalais.com/ODNr1ciDte8wwv0nU3G1jw</id>
<about>http://d.opencalais.com/dochash-1/895ba8ff-4c32-3ae1-9615-9a9a9a1bcb39</about>
<docTitle/>
<docDate>2012-03-23 00:56:09.679</docDate>
<externalMetadata/>
</Description>
<CalaisSimpleOutputFormat>
<Company count="1" relevance="0.643" normalized="HBO & Company">HBO</Company>
<Organization count="1" relevance="0.643">New York Jets</Organization>
<Person count="1" relevance="0.643">Tim Tebow</Person>
<TVShow count="1" relevance="0.643">Hard Knocks</TVShow>
<SocialTags>
<SocialTag importance="2">Training camp<originalValue>Training camp (National Football League)</originalValue>
</SocialTag>
<SocialTag importance="2">New York Jets<originalValue>New York Jets</originalValue>
</SocialTag>
<SocialTag importance="2">Florida Gators football team<originalValue>2008 Florida Gators football team</originalValue>
</SocialTag>
<SocialTag importance="1">Tim Tebow<originalValue>Tim Tebow</originalValue>
</SocialTag>
<SocialTag importance="1">HBO<originalValue>HBO</originalValue>
</SocialTag>
<SocialTag importance="1">Hard Knocks<originalValue>Hard Knocks (TV series)</originalValue>
</SocialTag>
<SocialTag importance="1">Entertainment_Culture</SocialTag>
<SocialTag importance="1">Sports</SocialTag>
</SocialTags>
<Topics>
<Topic Taxonomy="Calais" Score="1.000">Entertainment_Culture</Topic>
<Topic Taxonomy="Calais" Score="1.000">Sports</Topic>
</Topics>
</CalaisSimpleOutputFormat>
</OpenCalaisSimple>
And the code:
# this code is based on: http://www.flagonwiththedragon.com/2011/06/08/dead-simple-python-calls-to-open-calais-api/
import urllib, urllib2
#########################
##### set API key and REST URL values.
myCalaisAPI_key = '' # your Calais API key.
calaisREST_URL = 'http://api.opencalais.com/enlighten/rest/' # this is the older REST interface.
# info on the newer one: http://www.opencalais.com/documentation/calais-web-service-api/api-invocation/rest
# alert user and shut down if the API key variable is still null.
if myCalaisAPI_key == '':
print "You need to set your Calais API key in the 'myCalaisAPI_key' variable."
import sys
sys.exit()
#########################
##### set the text to ask Calais to analyze.
# text from: http://www.usatoday.com/sports/football/nfl/story/2012-03-22/Tim-Tebow-Jets-hoping-to-avoid-controversy/53717542/1
sampleText = '''
Like millions of football fans, Tim Tebow caught a few training camp glimpses of the New York Jets during the summer of 2010 on HBO's Hard Knocks.
'''
#########################
##### set XML parameters for Calais.
# see "Input Parameters" at: http://www.opencalais.com/documentation/calais-web-service-api/forming-api-calls/input-parameters
calaisParams = '''
<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<c:processingDirectives c:contentType="text/txt"
c:enableMetadataType="GenericRelations,SocialTags"
c:outputFormat="Text/Simple"/>
<c:userDirectives/>
<c:externalMetadata/>
</c:params>
'''
#########################
##### send data to Calais API.
# see: http://www.opencalais.com/APICalls
dataToSend = urllib.urlencode({
'licenseID': myCalaisAPI_key,
'content': sampleText,
'paramsXML': calaisParams
})
#########################
##### get API results and print them.
results = urllib2.urlopen(calaisREST_URL, dataToSend).read()
print results
COMMENTS
Hey Peter, I've been meaning to thank you for posting. It's been a while since I tried LiveWriter ... maybe I should give it another shot. I've tried uploading blogs directly from OpenOffice, too, but it seems both the blog editors and WordPress mangles a lot of the formatting/HTML and I have to do all kinds of manual correction. TRUE STORY: As I was writing this reply my internet cut out, so I pasted my response in a plain text editor and re-pasted it (then added this note). Creepy!
Thanks for the sample code. WIth reference to offline blog writing, have you tried Windows LiveWriter? I use it all the time and find it really useful for composition and publishing to multiple sites. You can get it from http://www.microsoft.com/en-us/download/details.aspx?id=8621 [http://www.microsoft.com/en-us/download/details.aspx?id=8621]