discoveries in digital audio, music notation, and information encoding

Archive for the ‘streaming media’ tag

jAUs 3-D: no glasses required

leave a comment

So, I am slightly obsessive compulsive.

I worked a little more tonight on this jAUs thing to add support for start and stop time attributes in the HTML5 <audio> tag.

Video should work, too, with a little work, but I don't care about that right now.

What I did tonight was make it so that if a "stopTime" attribute is used, then after that point is reached the player will move the scrubber back to the original "startTime" value though at that point the playback is already paused.

If there is no "stopTime" attribute, then after the file's played itself to the end, the scrubber will move back to the "startTime" value. Again, playback is already paused at that point.

I've tested this with Firefox 9.01, Internet Explorer 9, Safari 4.0.5, Opera 11.60, and Chrome 16.0.912.75 on my Windows 7 (32-bit) laptop.

Running everything from my desktop, I had no problems except that I should mention that if I used the "autoplay" attribute and set it to "true", Chrome didn't start the playback as it should, but it seems that maybe that's a Chrome problem that others are having, too.

Testing with the files uploaded to my hosted account was a different story. Opera seemed to need a page refresh before the scrubber would locate itself at the proper positions – though adding a pesky alert() at the beginning seemed to make Opera happy. Chrome and Safari seemed to take a few seconds to get situated, although they seemed to generally need a restart to move the scrubber to the right place for the last audio player. I didn't test the alert() thing for these two. Firefox did well although I hate the way Firefox moves the audio players around depending on whether the audio has been played or not. And that leaves IE9 … which, hands down, did the best. Maybe that's because of the exception I'd added for it as I mentioned in an earlier post.

So, there's still work to do and things to investigate.

Also, I haven't tested this with really long files or anything so I don't know how that would go. But then again, as I've heard others say, HTML5 media elements aren't really for long-form media anyway.

Oh yeah, one more thing. I did in fact change to "data-startTime" and "data-stopTime" to make the HTML legal HTML5.

Here's the HTML5 code itself, letting one see that the JavaScript has now been moved into a separate JS file.

<!DOCTYPE html>
    <meta charset="UTF-8" />
  <script type="text/javascript" src="jAUs.js"></script>
  <p>This is the entire recording of Shakespeare's Sonnet 130, read by Nathan Miller for 
  <a href="">LibriVox</a>.</p>
  <audio controls>
    <source src="sonnet130_shakespeare_njm.mp3" type="audio/mp3" />
    <source src="sonnet130_shakespeare_njm.ogg" type="audio/ogg" />
    Your browser does not support the audio element.
  <p>"My mistress' eyes are nothing like the sun." - <em>start = 10; end = 13.</em></p>
  <audio class="jAUs" controls data-startTime="10" data-stopTime="13">
    <source src="sonnet130_shakespeare_njm.mp3" type="audio/mp3" />
    <source src="sonnet130_shakespeare_njm.ogg" type="audio/ogg" />
    Your browser does not support the audio element.
  <p>"End of poem."- <em>start = 57, no end specified.</em></p>
  <audio class="jAUs" controls data-startTime="57">
    <source src="sonnet130_shakespeare_njm.mp3" type="audio/mp3" />
    <source src="sonnet130_shakespeare_njm.ogg" type="audio/ogg" />
    Your browser does not support the audio element.

Oh, and here's the JavaScript file:

***** Note: This software is still in ALPHA. Please refrain from using 
the code without first contacting Nitin Arora at

jAUs: JavaScript <audio> Shark.

Copyright (c) 2012 Nitin Arora. 

Permission is hereby granted, free of charge, to any person obtaining a 
copy of this software and associated documentation files (the 
"Software"), to deal in the Software without restriction, including 
without limitation the rights to use, copy, modify, merge, publish, 
distribute, sublicense, and/or sell copies of the Software, and to 
permit persons to whom the Software is furnished to do so, subject to 
the following conditions:

The above copyright notice and this permission notice shall be included 
in all copies or substantial portions of the Software. 


jAUs is licensed under the MIT license:


function jAUs(){
  audioTagArray = document.getElementsByClassName("jAUs");
  for (i=0;i<=audioTagArray.length-1;i++){
    var thisAudioTag = audioTagArray[i];
function jAUs_2(audioTagArray,thisAudioTag){
  //if this is placed directly into jAUs() - i.e. not a separate function,
  //then this whole thing doesn't seem to work.
  if (navigator.appName == "Microsoft Internet Explorer"){
    thisAudioTag.onloadeddata = function(){
      var thisAudioTag_startTime = thisAudioTag.getAttribute("data-startTime");
      thisAudioTag.currentTime = thisAudioTag_startTime;
  else {
    var thisAudioTag_startTime = thisAudioTag.getAttribute("data-startTime");
    thisAudioTag.currentTime = thisAudioTag_startTime;
  var thisAudioTag_stopTime = thisAudioTag.getAttribute("data-stopTime");
  var stopString = "jAUs_3(this.currentTime," + thisAudioTag_stopTime + "," + i + ");";
  //returns "jAUs_3(this.currentTime, 13, i);" where "i" is an int.
function jAUs_3(this_currentTime,thisAudioTag_stopTime,i){

  if (thisAudioTag_stopTime){
    //if there's a data-stopTime attribute then ...
    if (this_currentTime > thisAudioTag_stopTime){
      //... reset audio to data-startTime when data-stopTime is reached.
      audioTagArray[i].currentTime = audioTagArray[i].getAttribute("data-startTime"); 
  else if (audioTagArray[i].ended == true){
    //if there's no data-stopTime, move back to data-startTime when playback has ended.
    audioTagArray[i].currentTime = audioTagArray[i].getAttribute("data-startTime");
window.onload = function(){  

Update, January 12, 2012: Turns out "jAUs" is the name for some robotic SDK and I'm not too crazy about that name anyway. So, I'm leaning toward "m(AUj)ulate" (pronounced like 'modulate') which would stand for something like "My Untimely Little Audio Tag Extender". The word "untimely" being, of course, a play on the fact that time is what this is all about. The parenthetical bit refers to "audio" (AU) and JavaScript (j).

And, yes, I care much more about the name/acronym than the script itself.


Update, January 15, 2012: OK, this is interesting. If, for the source of the audio file, I actually use the audio files on the site for the LibriVox recordings like so …

  <audio class="jAUs" controls data-startTime="10" data-stopTime="13">
    <source src="" type="audio/mp3" />
    <source src="" type="audio/ogg" />
    Your browser does not support the audio element.

… then this seems to be working OK in all the browsers. Chrome seems, per casual observation, the slowest in terms of getting the scrubber moved to the appropriate points, but I guess this is progress.

Related to all this stuff I've been messing with, I found this: Consistent event firing with HTML5 video – Dev.Opera. But here, too, they use an alert() to notify the user that metadata is loaded using "onloadedmetadata", but in my tests it seems like the alert() function itself was what was fixing the inability of some browsers to set the current time as my script was instructing.


Related Content:

Written by nitin

January 11th, 2012 at 7:55 pm

SAVS: a Simple Audio/Video Synchronizer

leave a comment

About a year ago I did some text to audio synchronization tests with HTML5 and Flash.

The tests were partially successful, but I think what really mattered is that I set four goals that I felt needed to be met before the word "synchronization" could truly be used:

  1. The user should be able to click on a line of text and hear the related media.
  2. The user should be able to "scrub" ahead on the media player and the text should follow.
  3. The page should report where in the document the user is.
  4. The page should automatically keep the media/text synchronized without user intervention.

Basically, I've seen a few people make it so that you could watch media while the transcript text was also on the page (scrollable as opposed to overlaid closed captions) and the user could click on a line and have the movie/audio skip ahead to that moment (goal #1). That's great and all, but that's not synchronization.


Synchronization is a two way street and I've been working this past week during what I'm calling "4 days of madness" to come up with a really simple solution to real synchronization. I did run across this really cool RadioLab page that achieves goal #1, but as much as I like it I want more features with less flash (as in "flash and dash" not Adobe Flash!) and less code. No mistake: it looks fantastic and I also appreciate that they've got the text timed to clusters of a couple of words rather than by line but the only thing I've seen that gets it all "right" per my perspective was a subscription resource by Alexander St. Press. It achieved all the goals above using a Flash player and the rest appeared to by done with Javascript and some jQuery smooth scrolling. It was also timed by clusters of words and not just by line or by paragraph. Of course, conceptually it's the same whether one marks up their text – in the temporal sense – by line or by word, but it's a little more work to do it by word of course. Unfortunately, I've seen people do the opposite: they use a static unit of time like 60 seconds and only mark up the text every minute. That's taking the easy way out and also misses the point entirely since it makes the text subservient to an arbitrary unit of time. Would it be acceptable if closed captioning and subtitles on your foreign films only showed up in large chunks every minute? I would hope not, and in the case of the former it would violate the the spirit if not the letter of the "law" in regard to accessibility. If done right, you can use the same timed text file to both serve up captions in addition to showing the full text on the page. It's more time and cost efficient to re-purpose the same data for two needs.

Anyway, let's get back to Alexander St. Press. I loved what I saw when my boss (I work at NC Live) showed it to me. I got really excited and said something like, "This is what I've been waiting to see!". In addition to the great and true syncing, they also had a feature that would let the user make and share clips, much the way you can on sites like NBC's Meet The Press. The Alexander St. Press site also allowed you to annotate that clip, which is a great feature for teachers and librarians, etc. Alexander St. Press also has this with their classical music streaming subscription service, which in the spirit of full disclosure I pay for. They ALSO had a cool timeline where you could see what I call "hot spots" – places where others had made clips. The idea, I guess, is that spots on the timeline with more clusters would indicate a particular point of interest. Nothing new, because you see that all the time with streaming sports like the US Open's site where you can go back and watch previous moments in matches and then "go live" at any time. But the difference is, of course, that Alexander St. Press was using user-contributed clips.

So long story short (or just not as long), in a few weeks I need to present these ideas to some people and talk about how we think these features could be useful for our users. And the more I struggled with how to talk about these concepts without a prototype the more I thought I would a) sound like I'm crazy and b) like I'm full of hot air.

I decided that it was time to go back to some earlier tests of mine from early April and just build a prototype so we could just show it to people and not have to talk theoretical speak. I think it's generally easier to explain and convince people of the utility of software by showing it rather than telling it. Actions > words, right?

Well, early tests are working and only required me to add one line of Actionscript to our current Flash player and about only 50 lines of Javascript code are needed to keep the text and media synced. The tests I did were for some PBS videos we purchased along with closed captioning files.

I was so excited that it was finally working that I went home during those "4 days of madness" to write an HTML5 version which is virtually identical to the Flash version. It's got basic clip making features as well as a very basic tool inspired by this video score tutorial to make timed text files provided you have the audio and full text in hand. Eventually, I'll comment the code up and improve some options and post a download to the source for the HTML5 version. At work, we'll probably eventually offer the code as it's tweaked to meet our aesthetic needs, etc. As you'll see in the demo video below, I have no aesthetics!

I'll shut up now and leave you to the video if you're interested. I recommend watching it in HD so you can read the words on the page.

As my friend whom the HTML5 version is kinda named after likes to say:

More later …

SAVS: a Simple Audio/Verse Synchronizer from nitin arora on Vimeo.

Update, September 20, 2011: To avoid confusion as to what this does, I'm renaming this from "Simple Audio/Video Synchronizer" to "Simple Audio/Verse Synchronizer" or something …


Update, October 16, 2011: Cool, I found one more thing that meets all the four goals at They're pitching it as a foreign language learning tool, but same difference. Also, that site led me to TranscriberAG, a tool for transcribing audio.


Related Content:

Written by nitin

September 5th, 2011 at 9:39 am