blog.humaneguitarist.org

discoveries in digital audio, music notation, and information encoding

Archive for the ‘scripts’ Category

the serpent, the apple, and Joe

leave a comment

For better or worse, the one application of mine that people actually use is the one I wrote pretty casually with Python over a couple weekends from bed because I was too lazy or hungover to get moving on those days.

That software, PubMed2XL, lets people do a few things with downloaded citations from PubMed.gov that isn't currently offered directly from the site. I've gotten some nice feedback from librarians, researchers, and information-y people at companies that have found it useful.

This post isn't a plug though; it's more an acknowledgement of something that I didn't really realize in full at the time. And that is when one writes software that people go on to actually use, one better be prepared to support it. Now, the software's simple enough that there haven't been real bugs save one, but it does eat at me that I can't offer a simple way for it to work on multiple platforms.

While the Windows version is really easy to setup – thanks to py2exe and Inno Setup – getting it running on Linux is a bit more work, given all the distro variations and dependency installation. But getting it running on a Mac – particularly with an easy to use installer – isn't going to be possible unless I can find someone to compile it for a Mac who will also test it and compile future versions. Sure, there's the possibility of using Wine, but that's still asking a lot from end users.

Normally, I wouldn't care. Apple doesn't make it easy for people to develop for Macs unless you fork over the change for a Mac – and I ain't buying a copy of OSX and doing the Hackintosh bit. But, since the software is ultimately about health-related research, I do care.

Unfortunately I made – with the advantage of hindsight – two coding decisions that create problems.

First, I chose PyQT as the GUI toolkit for the software simply because it looks prettier than Python's native Tkinter. My reasoning at the time was the people were more likely to trust better looking software even though it's just a small window with some basic menu options. Eventually, I added a progress bar, too, so downgrading to Tkinter has become less of an option.

Second (and this is the big one), I used lxml since the PubMed2XL setup files employ XSL to tell the software what data to put in a spreadsheet cell. Granted, lxml is freakin' fantastic, but since it's not a pure Python module I can't just distribute it in a folder and import the module locally. Not that I had much of a choice: there's no built in XSLT-capable module that ships with Python 'far as I know.

So I've been asking myself how to make the serpent (Python) and the apple (OSX) get along.

I've consider just making PubMed2XL a web-app, but that will entail expenses for me that simply offering people a desktop app doesn't entail.

So, I think the solution lies in a cup of Joe. That's to say that a Java app is the obvious solution, specifically using Jython.

That would leave me to replace PyQT with Swing. I'm fine with that. It's not like PyQT is all that Pythonic in the first place. There's a nice Jython/Swing tutorial here.

And as for the XSLT component, this tutorial on XSLT with Jython and native Java libraries should help immensely.

So, I should be able to use Jython to make a cross-platform version of PubMed2XL. I don't necessarily want to, but given the type of research I'd like to help facilitate (in a very small way, I know), I think I probably should.

--------------

Related Content:

Written by nitin

January 7th, 2012 at 10:22 am

HammerFlicks

leave a comment

HammerFlicks


Table of Contents

Introduction

Source Code

How it Works

Live Demo

FAQ


Introduction

HammerFlicks is a small project to discover which Hammer Films movies are available as streaming movies on Netflix. You can see it live here.

As of November, 2011 HammerFlicks script runs thrice a day, querying the Netflix API as to which Hammer Films are available for streaming. HammerFlicks checks for films (not shorts) listed in the Hammer Filmography on Wikipedia.

The previous name of the project was "HammerFlix" though this was changed to comply with the Netflix API branding requirements.

If you're interested in learning why this project exists, please read the original "HammerFlix" entry here.


Source Code

The source code isn't available for download yet. If you'd like a copy, just leave a comment/request and that'll motivate me to clean up all the code enough to share.

:P


How it Works

HammerFlicks sends each movie title from HammerFlicks_filmography.txt (derived from the Wikipedia filmography) to the Netflix API and requests only one result back per movie. It assumes that if the production year for the returned result is within one year of the date listed in the filmography file that the result is most likely the Hammer Film in question. If the production year reported by Netflix doesn't match the +/- one year range, then it is assumed that Netflix doesn't carry the movie in either DVD or streaming formats – i.e. the returned result was for the closest match within the Netflix catalog, though for a completely different film.

Cross-reference links are also created by HammerFlicks given that many earlier Hammer Films were released under different titles (UK and US versions). As seen below, the filmography file contains a non-unique "movie_id" field which is used to associate a title with its corresponding alternate title. In the example below, "The Public Life of Henry the Ninth" has no alternate title, but "The Mystery of the Marie Celeste" and "The Phantom Ship" are notated as the same film as they both share the same movie_id. As such, HammerFlicks will create a cross-reference link between the two.

title year titles_and_year movie_id
The Public Life of Henry The Ninth 1935 The Public Life of Henry The Ninth 1935 1
The Mystery of the Marie Celeste 1935 The Mystery of the Marie Celeste / The Phantom Ship 1935 2
The Phantom Ship 1935 The Mystery of the Marie Celeste / The Phantom Ship 1935 2

When HammerFlicks has determined whether or not the film likely lives on Netflix, it creates an HTML "snippet" inside the ./apiResults folder with any applicable cross-reference links. Applicable hyperlinks to the Netflix page for the film as well as the Watch Instantly link are also placed within the snippet.

The PHP similar_text function is used to assign a "match score" within each snippet if it believes Netflix carries the film. This helps determine the reliability of HammerFlicks given that occasionally an incorrect match is generated if the result returned was for a completely differently movie that happens to fall within the correct date range. Comparing the sent title with the one returned by Netflix helps address this issue.

Finally, the HammerFlicks.php file strings all the snippets together, allowing a user to traverse the list of Hammer Films and see which ones are on Netflix or not.


Live Demo

You can see HammerFlicks in the frame below or you can go directly to the page by clicking here. I recommend the latter otherwise if you click on a link the Netflix page will open in the frame below as well.


FAQ

  1. I don't care for Hammer Films, but would like to do something similar for a list of other films. Can the source code be easily modified for this purpose?
    • I don't see why not. You probably only need to edit the "HammerFlicks_filmography.txt" file to reflect the films you are interested in. Editing the CSS file and logo would also be necessary for aesthetic reasons though they have no effect on functionality. Of course, you'll need to register for your own API key.
    • Note that HammerFlicks is only designed to support cross-referencing films with no more than one alternate title. This may or may not be an issue depending on the films you are interested in checking.
--------------

Related Content:

Written by nitin

November 27th, 2011 at 12:00 pm

Posted in

Tagged with , ,

bidi bidi bidi and more on pOAIndexter-ing metadata

leave a comment

It's shaping up to be a sunny day and this means I need to go on a long walk.

But before I do that, I'll follow up to this post about grabbing OAI metadata from an online source and throwing the metadata into Solr for searching purposes, etc.

Last night – while watching streaming the Gil Gerard iteration of Buck Rogers – I wrote a small PHP script to grab this OAI metadata from the Library of Congress' site. BTW: this is a cool page of theirs that helps one get started with OAI feeds, etc.

Aside: Is it only since the advent of hypertext that the word "this" began appearing in a referential context within documents?

As I mentioned in the previous post, an XML config file will instruct the code where to get the metadata and which XSL file will be used to transform the data into something Solr can chew on. I haven't bothered with the config file yet, so for now I just tested it on the specific metadata linked to above since the config file aspect of this is the most trivial component of the whole thing.

Anyway, below is the PHP file, the OAI to Solr XSL file, and a snippet of the output. Last is a Python script that does the same thing as the PHP. It's not OO like the PHP file, but I just whipped it up this morning for shiggles.

Here's the PHP …

<?php

function grabMetadata($urlArg) {
    $ch = curl_init(); // see: http://php.net/manual/en/book.curl.php
    curl_setopt($ch, CURLOPT_URL, $urlArg);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $curlOut = curl_exec($ch);
    return $curlOut;
    curl_close($ch);
}

// See "http://www.php.net/manual/en/xsltprocessor.transformtoxml.php" for instructions re: XSL processing as below.
function useXSL($output) {
    $search_results = new DOMDocument;
    $search_results->loadXML($output);
    // If you just use "load" instead of "loadXML" it won't work unless you first stored the XML results in a file (boo!).
    // For info on "loadXML" see: http://www.php.net/manual/en/domdocument.loadxml.php
    $proc = new XSLTProcessor;
    $xsl = new DOMDocument;
    $xsl->load('OAI_to_solr.xsl');
    $proc->importStyleSheet($xsl);
    $processed = $proc->transformToXML($search_results);
    return $processed;
}

function writeSOLR($solrXML) {
    $myFile = "for_solr-PHP.xml";
    $fh = fopen($myFile, 'w') or die("can't open file");
    fwrite($fh, utf8_encode($solrXML)); // For UTF-8, see: http://www.php.net/manual/en/function.fwrite.php#73764
    fclose($fh);
}

// Do stuff ...
$output = grabMetadata('http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&metadataPrefix=oai_dc&set=papr');
writeSOLR(useXSL($output));
?>
The XSL file …
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
exclude-result-prefixes="oai_dc dc">
  <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
  <xsl:template match="/">
    <add>
      <xsl:for-each select="//oai_dc:dc">
        <doc>
          <field name="identifier">
            <xsl:value-of select="dc:identifier" />
          </field>
          <field name="title">
            <xsl:value-of select="dc:title" />
          </field>
          <field name="creator">
            <xsl:value-of select="dc:creator" />
          </field>
          <xsl:for-each select="dc:subject">
            <field name="subject">
              <xsl:value-of select="." />
            </field>
          </xsl:for-each>
          <field name="description">
            <xsl:value-of select="dc:description" />
          </field>
        </doc>
      </xsl:for-each>
    </add>
  </xsl:template>
</xsl:stylesheet>
The Millionare and his wife … er, wrong show. I mean the sample Solr XML snippet …
<add>
  <doc>
    <field name="identifier">http://hdl.loc.gov/loc.mbrsmi/amrlv.4007</field>
    <field name="title">[Theater commercial--electric refrigerators]. Buy an electric refrigerator /</field>
    <field name="creator">AFI/Kalinowski (Eugene) Collection (Library of Congress)</field>
    <field name="subject">Refrigerators.</field>
    <field name="subject">Advertising--Electric household appliances--Pennsylvania--Pittsburgh.</field>
    <field name="subject">Trade shows--Pennsylvania--Pittsburgh.</field>
    <field name="subject">Silent films.</field>
    <field name="subject">Pittsburgh (Pa.)--Manufactures.</field>
    <field name="description">Largely graphic commercial for electric refrigerators in general and a refrigerator show, presumably in Pittsburgh, in particular.</field>
  </doc>

...

</add>
Some Python for fun …
import codecs
import urllib
from lxml import etree, _elementpath # see: http://lxml.de/
from lxml.etree import XSLT,fromstring

## some OAI metadata from the Library of Congress
url = 'http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&metadataPrefix=oai_dc&set=papr'
metadata = urllib.urlopen(url).read()
metadata = etree.XML(metadata)

## the XSL file that will transform the OAI metadata to Solr
xsl = open('OAI_to_solr.xsl', 'r')
xsl = xsl.read()
xsl = etree.XML(xsl)

## XSL transformation
style = XSLT(xsl)
result = style.apply(metadata)

## the outputted Solr XML
fw = codecs.open('for_solr-PY.xml', 'w', 'utf-8-sig')
utf8_result = unicode(str(result), encoding='utf8')
fw.write(utf8_result)
fw.close()

And most importantly, the introduction to Buck Rogers in the 25th Century – Season 1, of course! I couldn't even make it through the first ten minutes of the Season 2 opener. I mean they changed the introduction which was brilliant and brilliantly narrated – as you shall see!

I'd prefer to watch the South Park spoof over the Season 2 insult-to-perfection any day of the week.

And here's a bad-ass fan trailer that I think respects the greatness of the first season.

--------------

Related Content:

Written by nitin

October 15th, 2011 at 9:05 am

Posted in scripts

Tagged with , , ,

pOAIndexter: grabbing and indexing online metadata

leave a comment

As per usual, a good bit of my computer-y stuff at home relates to something that's come up at work. And as usual, I'm pretty ignorant of what I'm getting myself into, but I don't mind.

The other week, my boss and I met with some great people at digitalnc.org and we started talking about the idea of having a super simple, lightweight approach to providing a one-stop-shop search interface for collections across the state – provided those collections expose their metadata somehow. For now, we talked about limiting this to people who do so with an OAI feed and grabbing that metadata. But eventually, this thing should be metadata agnostic – in the sense that it isn't about a metadata format, but just the data itself.

By the way, I guess "grabbing" and "feed" aren't what I typically see with OAI – about which I admittedly don't know much – but I don't care. Same difference.

Of course, there's nothing new to this. I guess one could use Blacklight or VuFind to do this kind of thing, but I'm not sure, though even those are existing open souce projects, that doing so isn't overkill and won't in turn increase dependencies and maintenance overhead.

Actually, that's a topic for another time – I mean the idea that just because part of something is capable of doing what you want doesn't necessarily make it a better option than rolling one's own if using and updating said something entails more cost in the long run. Paved roads often get you there faster, but a willingness to get lost now and then is how you learn where all the really cool local bars are …

;)

Anyway, here's what I'm thinking. A small script would simply look at an XML setup file from which it would know which places to go grab metadata from, the type of feed, the last time the metadata was requested, and stuff like the resumptionToken if applicable. It would also store the appropriate XSL file to process the metadata with so that the metadata could be passed into Solr to be indexed and searchable. Anyone who's site doesn't provide metadata as XML could simply create a web service that does so, e.g. a RESTful MySQL to XML thingamajig. The outputted XML just needs to have an XSL that will facilitate passing it to Solr for that data to be part of the shared metadata store. And since XSL is the universal translator in this context, other metadata types such as RSS/ATOM feeds could be grabbed, too. All one needs to do is add to the XML config file so the script knows to retrieve metadata from that site and make sure there's an XSL file that can be used to facilitate passing the data into Solr. So in the end all this should take in terms of coding is a small script, one XML config file, and as many XSL files as needed.

For fun and to start learning about Solr, I just manually grabbed some OAI metadata from CalTech yesterday – it was for some oral histories. And then I ran them through an XSL file and then posted them to Solr. Within no time I had a searchable, local metadata store to play around with (screenshot below). Since I was using all the defaults from the Solr tutorial I had to map the <dc:creator> field to things like manufacturer, since the default is set up for an electronics store.

Solr screenshot

BTW if we use this, at some point I won't be able to call it "pOAIndexter" but for now I can.

Since I don't know if I'll do this in Python or PHP and since OAI is what we'll work on first, I guess it stands for "Python or PHP OAI Indexer".

Yes, I'm a dork.

--------------

Related Content:

Written by nitin

October 2nd, 2011 at 11:20 am

HammerFlix 3: Village of the DOMed

leave a comment

Update, November 27, 2011: If you're looking for a live list of Hammer Films streaming on Netflix you can see it here.

To read more about the HammerFlicks project, click here.

Grrr. Obsession is a good ally for creativity, but it's still annoying.

I woke up today and decided to have a decent breakfast and do some very light work on HammerFlix – a small project to use the Netflix API to discover which Hammer Films movies are available on Netflix's Watch Instantly.

Basically, I just added some buttons/Javascript that will allow one to filter out the non-matches, show only Hammer films that HammerFlix thinks are on Netflix, or show only the ones that are apparently available for streaming.

Sure, there's still some work to be done to improve the reliability of the results as I mentioned yesterday, but I'm not too worried about it for now.

If you're wondering why the search for "The House Across the Lake" from 1954 shows "Them!" as a streaming match it's because "The House Across the Lake" isn't on Netflix and "Them!" is the first match the API returns. In this case, both movies happen to share 1954 as the release year. Currently, if HammerFlix sees that the first API returned result matches the release year, it reports the movie as available on Netflix. So, I need to make it a little smarter than that, but not this weekend.

Anyway, you can see the latest results here.

Dorks can view the source files in this folder. The best thing in there is the MIT license generated by the Spiteful Open Source License Generator.

:P

No more coding this weekend … time for a long walk sans electronics.

Update: OK, so I lied. I couldn't resist. I just added a score for how reliable the match is.

Go here to see the HammerFlix results from about 10pm EST tonight. on September 26, 2011.

There's now a "match score" under each thing HammerFlix claims is on Netflix. This is done by comparing the title from the Wikipedia filmography against the "short" and "regular" titles from the Netflix API XML result (i.e. two scores). I used the PHP similar_text() function to get the score for each and then averaged them for the "match score".

There's occasionally a PHP "undefined offset" error for a couple of films. That's no big deal and I'll fix it later by just making sure I test for some stuff before printing to screen. But it seems to work pretty well. The "The House Across the Lake"  vs. "Them!" mismatch only gets a 20% rating, so that tells me it might not actually be the same movie.

And as a reward, I'm watching "To the Devil, a Daughter" now …

--------------

Related Content:

Written by nitin

September 25th, 2011 at 11:00 am

Posted in scripts

Tagged with ,

in the can: another HammerFlix update

leave a comment

Update, November 27, 2011: If you're looking for a live list of Hammer Films streaming on Netflix you can see it here.

To read more about the HammerFlicks project, click here.

Last night while streaming Star Trek: TOS on NetFlix or Qwikster or whatever, I finished up this Python file around 3am that makes this semicolon delimited file of Hammer Films and their production year if you throw it at a local copy of this Wikipedia/Hammer Films filmography page.

Had enough of "this"? Well, there's more where that, I mean this, came from.

So, I woke up at about 9am and worked on this PHP version on HammerFlix.

And finally, about 12 hours after going to bed, I've got this list of Hammer Films on Netflix and whether they stream or not.  I should mention that I tested with both the "UK" and "US" versions of the titles if both were present in Wikipedia, so there are a few duplicate items in the list.

Eventually, I'll make the PHP file clickable so anyone can run it and get a current report. But as of right now, I've used up all my Netflix API allowance for the day.

Now, as I mentioned in this post I was only testing to see if a movie title and its production year matched the Wikipedia filmography data in order to determine if the movie was available on Netflix and whether it streamed.

Testing that way isn't perfect. For example, the Wikipedia page lists 2011 as the year for "The Woman in Black" but IMDB and Netflix say it's 2012. So that movie isn't reported correctly by HammerFlix.

I'm noticing a few other things, too, but I'll work on it later. Seriously, I need to get on with my weekend …

Maybe I'll watch a Hammer film tonight!

--------------

Related Content:

Written by nitin

September 24th, 2011 at 3:24 pm

Posted in scripts

Tagged with , ,

a HammerFlix update

leave a comment

Update, November 27, 2011: If you're looking for a live list of Hammer Films streaming on Netflix you can see it here.

To read more about the HammerFlicks project, click here.

Principal photography has begun on HammerFlix – a small project to use the Netflix API to discover which Hammer Films movies are available on Netflix's Watch Instantly.

So far, I've got a PHP file that has two functions.

The first one, named Igor, takes two arguments: a movie title and its release year. Igor then sends this to the Netflix API and retrieves only the first result for searching against that given title. Then Igor sends the XML version of the API results to another function, Master.

Master, aka Dr. Frankenstein, then evaluates the result. If the Netflix release date for the movie returned by the API matches the year value sent to Igor, then Master will display the link to that movie on Netflix. If the movie is available via Watch Instantly, Master will also display the link to the streaming movie. If the year doesn't match, Master reports that no results were found.

Testing against just the first match and using release year as the only qualifier might not be the best, but I think it might work pretty decently. If not, I'll have Igor retrieve more results and then Master can evaluate more results and use more test criteria before assuming the movie isn't on Netflix.

The next step is to get all the Hammer titles from the Hammer Filmography on Wikipedia and send each title and release year to HammerFlix. There might be some open/linked data opportunities later down the road with dbPedia, but that's not important for now.

You can see this very basic test of HammerFlix 0.01 here.

Anyway, here's the code for the development version of 0.01.

<?php

//Igor does the hard work of hitting up the Netflix API for movies matching $title.
//The code is mainly from: http://developer.netflix.com/page/resources/sample_php
function Igor($title, $year) {

    include ('../authentication/myAPI.php'); //this includes my Netflix API key and shared secret as $apiKey and $sharedSecret.

    //build stuff to send to API.
    $arguments = Array(
        'term' => $title,
        'expand' => 'formats',
        'max_results' => '1',
        'output' => 'xml'
    );

    $path = "http://api.netflix.com/catalog/titles";
    $oauth = new OAuthSimple();
    $signed = $oauth->sign(Array('path' => $path,
                'parameters' => $arguments,
                'signatures' => Array('consumer_key' => $apiKey,
                    'shared_secret' => $sharedSecret
                    )));

    //hit up API via CURL.
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $signed['signed_url']);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    //curl_setopt($curl, CURLOPT_SETTIMEOUT, 2); //Nitin commented this out on 2/5/2011 to prevent a PHP error message.
    $buffer = curl_exec($curl);
    if (curl_errno($curl)) {
        die("An error occurred:" . curl_error());
    }

    Master($buffer, $title, $year); //send XML results to Master().
}

//Master (Dr. Frankenstein) parses/returns the Netflix XML results retrieved by Igor.
function Master($buffer, $title, $year) {

    $xml = simplexml_load_string($buffer);
    $movieInfo = ($xml->catalog_title);
    $short = "short";
    $movieTitle_short = ($movieInfo->title->attributes()->$short);
    $regular = "regular";
    $movieTitle_regular = ($movieInfo->title->attributes()->$regular);
    $movieLink = ($movieInfo->id);
    $movieId = str_replace("http://api.netflix.com/catalog/titles/movies/", "", $movieLink);
    $movieYear = ($movieInfo->release_year);

    //test if movie is available for Watch Instantly/streaming.
    $streaming = $xml->xpath('//availability/category/@label');
    foreach ($streaming as $instantTest) {
        if ($instantTest == 'instant') {
            $streams = '';
        }
    }

    //output findings.
    echo "<li><p>Testing:<em> " . $title . " </em>from the year:<em> " . $year . "</em><br />";
    if ($movieYear == $year) {
        echo "<a href='http://movies.netflix.com/WiMovie/" . $movieId . "'>" . $movieTitle_short . "</a>";

        //IT'S ALIVE!!!
        //aka: show user the Watch Instantly link if it exists.
        if (isset($streams)) {
            echo "<br /><strong><a href='http://movies.netflix.com/WiPlayer?movieid=" . $movieId . "'>Watch Instantly</a></strong>";
        }
    } else {
        echo "No match.";
    }
    echo "</p></li>";
}

//Create life!
//aka: start doing things.
include ('../authentication/OAuthSimple.php');

echo "<ul>"; //put results in unordered list; send arguments to Igor().
Igor("The Brides of Dracula", "1960");
Igor("The Brides of Dracula", "1961");
Igor("Dracula Has Risen from the Grave", "1968");
Igor("Vampire Circus", "1972");
echo "</ul>";
?>
--------------

Related Content:

Written by nitin

September 10th, 2011 at 1:16 pm

Posted in scripts

Tagged with ,

SAVS: a Simple Audio/Video Synchronizer

leave a comment

About a year ago I did some text to audio synchronization tests with HTML5 and Flash.

The tests were partially successful, but I think what really mattered is that I set four goals that I felt needed to be met before the word "synchronization" could truly be used:

  1. The user should be able to click on a line of text and hear the related media.
  2. The user should be able to "scrub" ahead on the media player and the text should follow.
  3. The page should report where in the document the user is.
  4. The page should automatically keep the media/text synchronized without user intervention.

Basically, I've seen a few people make it so that you could watch media while the transcript text was also on the page (scrollable as opposed to overlaid closed captions) and the user could click on a line and have the movie/audio skip ahead to that moment (goal #1). That's great and all, but that's not synchronization.

;)

Synchronization is a two way street and I've been working this past week during what I'm calling "4 days of madness" to come up with a really simple solution to real synchronization. I did run across this really cool RadioLab page that achieves goal #1, but as much as I like it I want more features with less flash (as in "flash and dash" not Adobe Flash!) and less code. No mistake: it looks fantastic and I also appreciate that they've got the text timed to clusters of a couple of words rather than by line but the only thing I've seen that gets it all "right" per my perspective was a subscription resource by Alexander St. Press. It achieved all the goals above using a Flash player and the rest appeared to by done with Javascript and some jQuery smooth scrolling. It was also timed by clusters of words and not just by line or by paragraph. Of course, conceptually it's the same whether one marks up their text – in the temporal sense – by line or by word, but it's a little more work to do it by word of course. Unfortunately, I've seen people do the opposite: they use a static unit of time like 60 seconds and only mark up the text every minute. That's taking the easy way out and also misses the point entirely since it makes the text subservient to an arbitrary unit of time. Would it be acceptable if closed captioning and subtitles on your foreign films only showed up in large chunks every minute? I would hope not, and in the case of the former it would violate the the spirit if not the letter of the "law" in regard to accessibility. If done right, you can use the same timed text file to both serve up captions in addition to showing the full text on the page. It's more time and cost efficient to re-purpose the same data for two needs.

Anyway, let's get back to Alexander St. Press. I loved what I saw when my boss (I work at NC Live) showed it to me. I got really excited and said something like, "This is what I've been waiting to see!". In addition to the great and true syncing, they also had a feature that would let the user make and share clips, much the way you can on sites like NBC's Meet The Press. The Alexander St. Press site also allowed you to annotate that clip, which is a great feature for teachers and librarians, etc. Alexander St. Press also has this with their classical music streaming subscription service, which in the spirit of full disclosure I pay for. They ALSO had a cool timeline where you could see what I call "hot spots" – places where others had made clips. The idea, I guess, is that spots on the timeline with more clusters would indicate a particular point of interest. Nothing new, because you see that all the time with streaming sports like the US Open's site where you can go back and watch previous moments in matches and then "go live" at any time. But the difference is, of course, that Alexander St. Press was using user-contributed clips.

So long story short (or just not as long), in a few weeks I need to present these ideas to some people and talk about how we think these features could be useful for our users. And the more I struggled with how to talk about these concepts without a prototype the more I thought I would a) sound like I'm crazy and b) like I'm full of hot air.

I decided that it was time to go back to some earlier tests of mine from early April and just build a prototype so we could just show it to people and not have to talk theoretical speak. I think it's generally easier to explain and convince people of the utility of software by showing it rather than telling it. Actions > words, right?

Well, early tests are working and only required me to add one line of Actionscript to our current Flash player and about only 50 lines of Javascript code are needed to keep the text and media synced. The tests I did were for some PBS videos we purchased along with closed captioning files.

I was so excited that it was finally working that I went home during those "4 days of madness" to write an HTML5 version which is virtually identical to the Flash version. It's got basic clip making features as well as a very basic tool inspired by this video score tutorial to make timed text files provided you have the audio and full text in hand. Eventually, I'll comment the code up and improve some options and post a download to the source for the HTML5 version. At work, we'll probably eventually offer the code as it's tweaked to meet our aesthetic needs, etc. As you'll see in the demo video below, I have no aesthetics!

I'll shut up now and leave you to the video if you're interested. I recommend watching it in HD so you can read the words on the page.

As my friend whom the HTML5 version is kinda named after likes to say:

More later …

SAVS: a Simple Audio/Verse Synchronizer from nitin arora on Vimeo.

Update, September 20, 2011: To avoid confusion as to what this does, I'm renaming this from "Simple Audio/Video Synchronizer" to "Simple Audio/Verse Synchronizer" or something …

:)

Update, October 16, 2011: Cool, I found one more thing that meets all the four goals at http://www.dinglabs.com. They're pitching it as a foreign language learning tool, but same difference. Also, that site led me to TranscriberAG, a tool for transcribing audio.

--------------

Related Content:

Written by nitin

September 5th, 2011 at 9:39 am

making a DOT graph for PHP include statements

leave a comment

A couple of months ago, I posted about my experience with making a Python dependency graph.

Of course, as the post states, I was originally looking for a way to make a graph showing the relationship among PHP files in regard to include statements.

Well, I'm home sick and after a few hours of trying to find an easy, out-of-box solution I gave up and rolled my own Python script to make me a DOT graph file.

I didn't have anything better to do.

:(

The results are pretty simplistic, but I'm happy enough with it for now.

The Python script takes three arguments: the directory in which the PHP files exist, whether to search recursively or not (0=no, 1=yes), and the name of the output file as such:

$ python makeDOT.py blog/wordpress 1 wordpressIncludes.dot

#####
#importing modules
import glob, re, sys, os, fnmatch
br = "\n"
tab = "\t"

#####
#exiting if all 3 arguments are not passed via command line
def fail():
    print ("ERROR: " + str(len(sys.argv)-1) + " of 3 required arguments provided.")
    sys.exit()

#####
#getting arguments passed via command line

#testing for root DIRECTORY string
try: myDir = sys.argv[1]
except: fail()

#testing for RECURSION boolean
try: myRec = sys.argv[2]
except: fail()

#testing for OUTPUT filename string
try: myFile = sys.argv[3]
except: fail()

#####
#making list of PHP files within DIRECTORY
if myRec == "0": #without recursion
    myDir2 = myDir + "/*.php"
    PHP_list = glob.glob(myDir2)
elif myRec == "1": #with recursion
    PHP_list = []
    for dirname, dirnames, filenames in os.walk(myDir):
        for filename in filenames:
            if fnmatch.fnmatch (filename,("*.php")):
                match = os.path.join(dirname,filename)
                PHP_list.append(match)

#make an empty list;
#tuples will go in the list;
#each tuple will contain a PHP filename and a PHP filename it includes
includeList = []

#iterate through each PHP file and place tuples in the list
for phpFile in PHP_list:
    fileOpen = open(phpFile, "r")
    #for each line in a PHP file
    for line in fileOpen:
            m = re.match(r"(.*)include(.*\()(.*)\)", line) #for include(),include_once()
            if m:
                matchFile = m.group(3)[1:-1]
                if matchFile[-4::] == ".php": #only PHP files
                    phpFile = phpFile.replace("\\","/")
                    matchFile = matchFile.replace("\\","/")
                    matchFile = matchFile.replace("\"","")
                    matchFile = matchFile.replace('\'',"")
                    includeList.append([phpFile[len(myDir)+1:], matchFile])
            else: pass

            m = re.match(r'(.*)require(.*\()(.*)\)', line) #for require(), require_once()
            if m:
                matchFile = m.group(3)[1:-1]
                if matchFile[-4::] == '.php': #only PHP files
                    phpFile = phpFile.replace("\\","/")
                    matchFile = matchFile.replace("\\","/")
                    matchFile = matchFile.replace("\"","")
                    matchFile = matchFile.replace('\'',"")
                    includeList.append([phpFile[len(myDir)+1:], matchFile])
            else: pass

#####
#creating DOT file
dot = open(myFile, "w")

#writing to DOT file
dot.write("digraph {" + br)
for a,b in includeList:
    dot.write(tab)
    dot.write("\"")
    dot.write(a)
    dot.write("\"")
    dot.write(" -> ")
    dot.write("\"")
    dot.write(b)
    dot.write("\"")
    dot.write(";")
    dot.write(br)
dot.write("}")
dot.close()

#####
#exiting
sys.exit()

I ran the Python script on the PHP scripts for MXMLiszt.

Then I used the "circo" layout engine in Graphviz – specifically the Gvedit.exe application – on this resultant DOT file.

Here's the result:


--------------

Related Content:

Written by nitin

July 30th, 2011 at 1:03 pm

AudioRegent 1.3.1 released

leave a comment

I've updated AudioRegent to version 1.3.1.

You can read an overview of the software and get the download link to the new version here.

The only reason I updated the software is because, as I've mentioned before, I've been having problems with Windows (and only recently at that) in terms of calling executables from the command line.

What seems to have helped is to no longer pass a command as a string a la:

RunSoxString = SoxPath + " ./outWavs/" + OggArray[cnt] + ws + "--comment-file comment.txt ./outOggs/" + str(OggArray[cnt])[:-4] + "." + outputType + ws + SoxOptions
RunSox = subprocess.Popen([RunSoxString], shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
RunSox.wait() #wait until the subprocess finishes

Now, it seems I have to pass it as a Python list (aka an array):

RunSoxString = SoxPath + " ./outWavs/" + OggArray[cnt] + ws + "--comment-file comment.txt ./outOggs/" + str(OggArray[cnt])[:-4] + "." + outputType + ws + SoxOptions
import shlex
RunSoxList = shlex.split(RunSoxString)
RunSox = subprocess.Popen(RunSoxList, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
RunSox.wait() #wait until the subprocess finishes

By the way, I totally haven't tested this new version enough to distribute it and I haven't tested it at all on a Linux box. But since no one's using it, I'm not too worried.

--------------

Related Content:

Written by nitin

July 28th, 2011 at 6:31 pm

Posted in digital audio,scripts

Tagged with , ,

Switch to our mobile site