SAVS: a Simple Audio/Video Synchronizer

[Mon, 05 Sep 2011 13:39:59 +0000]
About a year ago I did some text to audio synchronization tests [] with HTML5 and Flash. The tests were partially successful, but I think what really mattered is that I set four goals that I felt needed to be met before the word "synchronization" could truly be used: 1. The user should be able to click on a line of text and hear the related media. 2. The user should be able to "scrub" ahead on the media player and the text should follow. 3. The page should report where in the document the user is. 4. The page should automatically keep the media/text synchronized without user intervention. Basically, I've seen a few people make it so that you could watch media while the transcript text was also on the page (scrollable as opposed to overlaid closed captions) and the user could click on a line and have the movie/audio skip ahead to that moment (goal #1). That's great and all, but that's not synchronization. ;) Synchronization is a two way street and I've been working this past week during what I'm calling "4 days of madness" to come up with a really simple solution to real synchronization. I did run across this really cool RadioLab page [] that achieves goal #1, but as much as I like it I want more features with less flash (as in "flash and dash" not Adobe Flash!) and less code. No mistake: it looks fantastic and I also appreciate that they've got the text timed to clusters of a couple of words rather than by line but the only thing I've seen that gets it all "right" per my perspective was a subscription resource by Alexander St. Press []. It achieved all the goals above using a Flash player and the rest appeared to by done with Javascript and some jQuery smooth scrolling. It was also timed by clusters of words and not just by line or by paragraph. Of course, conceptually it's the same whether one marks up their text - in the temporal sense - by line or by word, but it's a little more work to do it by word of course. Unfortunately, I've seen people do the opposite: they use a static unit of time like 60 seconds and only mark up the text every minute. That's taking the easy way out and also misses the point entirely since it makes the text subservient to an arbitrary unit of time. Would it be acceptable if closed captioning and subtitles on your foreign films only showed up in large chunks every minute? I would hope not, and in the case of the former it would violate the the spirit if not the letter of the "law" in regard to accessibility. If done right, you can use the same timed text file to both serve up captions in addition to showing the full text on the page. It's more time and cost efficient to re-purpose the same data for two needs. Anyway, let's get back to Alexander St. Press. I loved what I saw when my boss (I work at NC LIVE []) showed it to me. I got really excited and said something like, "This is what I've been waiting to see!". In addition to the great and true syncing, they also had a feature that would let the user make and share clips, much the way you can on sites like NBC's Meet The Press []. The Alexander St. Press site also allowed you to annotate that clip, which is a great feature for teachers and librarians, etc. Alexander St. Press also has this with their classical music streaming subscription service, which in the spirit of full disclosure I pay for. They ALSO had a cool timeline where you could see what I call "hot spots" - places where others had made clips. The idea, I guess, is that spots on the timeline with more clusters would indicate a particular point of interest. Nothing new, because you see that all the time with streaming sports like the US Open's site [] where you can go back and watch previous moments in matches and then "go live" at any time. But the difference is, of course, that Alexander St. Press was using user-contributed clips. So long story short (or just not as long), in a few weeks I need to present these ideas to some people and talk about how we think these features could be useful for our users. And the more I struggled with how to talk about these concepts without a prototype the more I thought I would a) sound like I'm crazy and b) like I'm full of hot air. I decided that it was time to go back to some earlier tests of mine from early April and just build a prototype so we could just show it to people and not have to talk theoretical speak. I think it's generally easier to explain and convince people of the utility of software by showing it rather than telling it. Actions > words, right? Well, early tests are working and only required me to add one line of Actionscript to our current Flash player and about only 50 lines of Javascript code are needed to keep the text and media synced. The tests I did were for some PBS videos we purchased along with closed captioning files. I was so excited that it was finally working that I went home during those "4 days of madness" to write an HTML5 version which is virtually identical to the Flash version. It's got basic clip making features as well as a very basic tool inspired by this video score [] tutorial to make timed text files provided you have the audio and full text in hand. Eventually, I'll comment the code up and improve some options and post a download to the source for the HTML5 version. At work, we'll probably eventually offer the code as it's tweaked to meet our aesthetic needs, etc. As you'll see in the demo video below, I have no aesthetics! I'll shut up now and leave you to the video if you're interested. I recommend watching it in HD so you can read the words on the page. As my friend whom the HTML5 version is kinda named after likes to say: More later ... IFRAME: SAVS: a Simple Audio/Verse Synchronizer [] from nitin arora [] on Vimeo []. ... Update, September 20, 2011: To avoid confusion as to what this does, I'm renaming this from "Simple Audio/Video Synchronizer" to "Simple Audio/Verse Synchronizer" or something ... :) Update, October 16, 2011: Cool, I found one more thing that meets all the four goals at []. They're pitching it as a foreign language learning tool, but same difference. Also, that site led me to TranscriberAG [], a tool for transcribing audio.