About a year ago I did some text to audio synchronization tests with HTML5 and Flash.
The tests were partially successful, but I think what really mattered is that I set four goals that I felt needed to be met before the word "synchronization" could truly be used:
- The user should be able to click on a line of text and hear the related media.
- The user should be able to "scrub" ahead on the media player and the text should follow.
- The page should report where in the document the user is.
- The page should automatically keep the media/text synchronized without user intervention.
Basically, I've seen a few people make it so that you could watch media while the transcript text was also on the page (scrollable as opposed to overlaid closed captions) and the user could click on a line and have the movie/audio skip ahead to that moment (goal #1). That's great and all, but that's not synchronization.
Anyway, let's get back to Alexander St. Press. I loved what I saw when my boss (I work at NC LIVE) showed it to me. I got really excited and said something like, "This is what I've been waiting to see!". In addition to the great and true syncing, they also had a feature that would let the user make and share clips, much the way you can on sites like NBC's Meet The Press. The Alexander St. Press site also allowed you to annotate that clip, which is a great feature for teachers and librarians, etc. Alexander St. Press also has this with their classical music streaming subscription service, which in the spirit of full disclosure I pay for. They ALSO had a cool timeline where you could see what I call "hot spots" – places where others had made clips. The idea, I guess, is that spots on the timeline with more clusters would indicate a particular point of interest. Nothing new, because you see that all the time with streaming sports like the US Open's site where you can go back and watch previous moments in matches and then "go live" at any time. But the difference is, of course, that Alexander St. Press was using user-contributed clips.
So long story short (or just not as long), in a few weeks I need to present these ideas to some people and talk about how we think these features could be useful for our users. And the more I struggled with how to talk about these concepts without a prototype the more I thought I would a) sound like I'm crazy and b) like I'm full of hot air.
I decided that it was time to go back to some earlier tests of mine from early April and just build a prototype so we could just show it to people and not have to talk theoretical speak. I think it's generally easier to explain and convince people of the utility of software by showing it rather than telling it. Actions > words, right?
I was so excited that it was finally working that I went home during those "4 days of madness" to write an HTML5 version which is virtually identical to the Flash version. It's got basic clip making features as well as a very basic tool inspired by this video score tutorial to make timed text files provided you have the audio and full text in hand. Eventually, I'll comment the code up and improve some options and post a download to the source for the HTML5 version. At work, we'll probably eventually offer the code as it's tweaked to meet our aesthetic needs, etc. As you'll see in the demo video below, I have no aesthetics!
I'll shut up now and leave you to the video if you're interested. I recommend watching it in HD so you can read the words on the page.
As my friend whom the HTML5 version is kinda named after likes to say:
More later …
Update, September 20, 2011: To avoid confusion as to what this does, I'm renaming this from "Simple Audio/Video Synchronizer" to "Simple Audio/Verse Synchronizer" or something …
Update, October 16, 2011: Cool, I found one more thing that meets all the four goals at http://www.dinglabs.com. They're pitching it as a foreign language learning tool, but same difference. Also, that site led me to TranscriberAG, a tool for transcribing audio.--------------