easy calls to OpenCalais with Python, daggummit!

Yesterday, I wrote this post about using Yahoo's deprecated term extraction web service to generate "subjects" – or whatever you want to call them – for an item based on the metadata housed in a Solr-compatible XML file. I'd also wondered about doing the same thing with OpenCalais.

Before we go any further, I'd just like to say I wrote that post from my hotel room. I'm writing today's from the Denver airport with about 2 hours to kill before my flight departs. And I'd also like to point out that when writing blog posts with spotty Wi-Fi connections, one should not compose their post online through WordPress. I'm using WordPad, and I should probably make that a habit.

Yeah, so anyway there's not that much good documentation on how to make calls on the Calais site. By "good" I mean there's no code sample to rip off. I'm sure it's perfectly fine for people who actually know what they're doing.

Using "The Google" I found this helpful post on making calls to OpenCalais. While I found it very well written and the code very helpful, I didn't want to have "httplib2" as a dependency since it's not available out-of-the-box with Python 2.7, as far as I know. Nor did I want to do anything with JSON. I'm just trying to make a simple POST request to the OpenCalais REST API – is all.

Using that post's code as a starting point, I whipped up some simple Python without "httplib2".

Note that this code passes three parameters to the API through the following variables:

  • "myCalaisAPI_key": this is where to paste your API key once you get it from Calais here.
  • "sampleText": this is a string of plain text to send to Calais for it to analyze and build terms for.
  • "calaisParams": these are the options to pass to the service in XML format. 

Note that I'm specifically requesting what I really want, "social tags", via the following option:

c:enableMetadataType="GenericRelations,SocialTags"

… and I'm specifically requesting a simple result format as follows:

c:outputFormat="Text/Simple"

There are other options, including RDF, that can be requested per the options mentioned on this page.

If you look at the code, you can see I'm asking Calais to analyze some text about Tim Tebow since I was in Denver when the Denver Broncos football team acquired Peyton Manning and traded Tebow to the New York Jets. The text is from a USA Today article from, um, yesterday.

The Jets, I'd like to state, are not worthy of a hyperlink. And that's only part of the reason I'm sad to see Tebow go there. Alas.

Anway, here's the output below, followed by the code. Note that – as mentioned in the code – I'm using the slightly older REST API. But what do I care right now. I'm just testing.

Here's the output:

<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.-->
<!--
Company: HBO,
Organization: New York Jets,
Person: Tim Tebow,
TVShow: Hard Knocks,
-->
<OpenCalaisSimple>
  <Description>
    <calaisRequestID>dafa6c80-b4f6-77b1-1363-de96bb7764f4</calaisRequestID>
    <id>http://id.opencalais.com/ODNr1ciDte8wwv0nU3G1jw</id>
    <about>http://d.opencalais.com/dochash-1/895ba8ff-4c32-3ae1-9615-9a9a9a1bcb39</about>
    <docTitle/>
    <docDate>2012-03-23 00:56:09.679</docDate>
    <externalMetadata/>
  </Description>
  <CalaisSimpleOutputFormat>
    <Company count="1" relevance="0.643" normalized="HBO &amp; Company">HBO</Company>
    <Organization count="1" relevance="0.643">New York Jets</Organization>
    <Person count="1" relevance="0.643">Tim Tebow</Person>
    <TVShow count="1" relevance="0.643">Hard Knocks</TVShow>
    <SocialTags>
      <SocialTag importance="2">Training camp<originalValue>Training camp (National Football League)</originalValue>
      </SocialTag>
      <SocialTag importance="2">New York Jets<originalValue>New York Jets</originalValue>
      </SocialTag>
      <SocialTag importance="2">Florida Gators football team<originalValue>2008 Florida Gators football team</originalValue>
      </SocialTag>
      <SocialTag importance="1">Tim Tebow<originalValue>Tim Tebow</originalValue>
      </SocialTag>
      <SocialTag importance="1">HBO<originalValue>HBO</originalValue>
      </SocialTag>
      <SocialTag importance="1">Hard Knocks<originalValue>Hard Knocks (TV series)</originalValue>
      </SocialTag>
      <SocialTag importance="1">Entertainment_Culture</SocialTag>
      <SocialTag importance="1">Sports</SocialTag>
    </SocialTags>
    <Topics>
      <Topic Taxonomy="Calais" Score="1.000">Entertainment_Culture</Topic>
      <Topic Taxonomy="Calais" Score="1.000">Sports</Topic>
    </Topics>
  </CalaisSimpleOutputFormat>
</OpenCalaisSimple>

And the code:

# this code is based on: http://www.flagonwiththedragon.com/2011/06/08/dead-simple-python-calls-to-open-calais-api/

import urllib, urllib2

#########################
##### set API key and REST URL values.

myCalaisAPI_key = '' # your Calais API key.
calaisREST_URL = 'http://api.opencalais.com/enlighten/rest/' # this is the older REST interface.
# info on the newer one: http://www.opencalais.com/documentation/calais-web-service-api/api-invocation/rest

# alert user and shut down if the API key variable is still null.
if myCalaisAPI_key == '':
  print "You need to set your Calais API key in the 'myCalaisAPI_key' variable."
  import sys
  sys.exit()

#########################
##### set the text to ask Calais to analyze.

# text from: http://www.usatoday.com/sports/football/nfl/story/2012-03-22/Tim-Tebow-Jets-hoping-to-avoid-controversy/53717542/1
sampleText = '''
Like millions of football fans, Tim Tebow caught a few training camp glimpses of the New York Jets during the summer of 2010 on HBO's Hard Knocks.
'''

#########################
##### set XML parameters for Calais.

# see "Input Parameters" at: http://www.opencalais.com/documentation/calais-web-service-api/forming-api-calls/input-parameters
calaisParams = '''
<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <c:processingDirectives c:contentType="text/txt"
      c:enableMetadataType="GenericRelations,SocialTags"
      c:outputFormat="Text/Simple"/>
  <c:userDirectives/>
  <c:externalMetadata/>
</c:params>
'''

#########################
##### send data to Calais API.

# see: http://www.opencalais.com/APICalls
dataToSend = urllib.urlencode({
    'licenseID': myCalaisAPI_key,
    'content': sampleText,
    'paramsXML': calaisParams
})

#########################
##### get API results and print them.

results = urllib2.urlopen(calaisREST_URL, dataToSend).read()
print results
--------------

Related Content:

2 Comments

  1. Peter Quirk

    Thanks for the sample code.
    WIth reference to offline blog writing, have you tried Windows LiveWriter? I use it all the time and find it really useful for composition and publishing to multiple sites. You can get it from http://www.microsoft.com/en-us/download/details.aspx?id=8621

    Reply
    1. nitin (Post author)

      Hey Peter,

      I've been meaning to thank you for posting.

      It's been a while since I tried LiveWriter … maybe I should give it another shot. I've tried uploading blogs directly from OpenOffice, too, but it seems both the blog editors and WordPress mangles a lot of the formatting/HTML and I have to do all kinds of manual correction.

      TRUE STORY: As I was writing this reply my internet cut out, so I pasted my response in a plain text editor and re-pasted it (then added this note). Creepy!

      Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

*