blog.humaneguitarist.org

discoveries in digital audio, music notation, and information encoding

Archive for the ‘timed text’ tag

full-text searching of timed text and a farewell to Andy Roddick

leave a comment

It’s been a while since I had one of my “So, I’m home sick today and wrote this silly, little script” things.

Well, here’s another one while the antibiotics take root.

I’ve always wanted to do something with offering full-text search against timed-text files and allowing a user to click on a result and skip to the audio segment matching the returned line of timed-text, etc. Hulu has had a BETA version of this kind of thing for a while and I suspect others do too.

Well, today I just whipped up a little search API using PHP and MySQL. It’s a nice little start and super easy to do.

I made a database table using the timed-text data from my SAVS project, OpenOffice Calc, and phpMyAdmin. The text is from Shakepeare’s Sonnet 130 using a LibriVox recording (version #14, Miller). BTW, parsing DFXP or SRT files and throwing those into a table is easy, but it’s not within the scope of this little mock-up.

If I send a query for “rare love” to the API as such:

http://blog.humaneguitarist.org/uploads/SAVS/currentVersion/search/?q=rare%20love

… I get the following JSON response:

{
  "results":{
    "result":[
      {
        "text":"Than in the breath that from my mistress reeks.",
        "highlighted_text":"Than in the breath that from my <mark>mistress<\/mark> <mark>reeks<\/mark>.",
        "startTime":"34",
        "stopTime":"37",
        "source":"sonnet130_shakespeare_njm",
        "relevance":"4.04993200302124"
      },
      {
        "text":"My mistress, when she walks, treads on the ground:",
        "highlighted_text":"My <mark>mistress<\/mark>, when she walks, treads on the ground:",
        "startTime":"46",
        "stopTime":"49",
        "source":"sonnet130_shakespeare_njm",
        "relevance":"1.62977826595306"
      }
    ]
  }
}

Note that the text is returned in the “text” field and I’m also trying to return a “highlighted_text” field in which search terms are surrounded by the HTML5 “mark” tag. There’s also a relevance score … of sorts (pun!).

It needs a lot of work, but there’s enough data returned to launch an audio segment using some HTML5/JavaScript or some Flash or Silverlight API, etc. Hey, it ain’t too bad for a bad stomach and some sports-entertainment distractions.

Below, I’ll paste the CSV file I used to make the table, the PHP script … and a personal note about the best male American tennis professional of the last decade.

Here’s the CSV file from the spreadsheet application (note the “line_text” field is full-text indexed in the database):

"line_id";"line_text";"start_time";"stop_time";"file_prefix"
"1";"Coral is far more red than her lips' red:";"13";"17";"sonnet130_shakespeare_njm"
"2";"If snow be white, why then her breasts are dun;";"17";"21";"sonnet130_shakespeare_njm"
"3";"If hairs be wires, black wires grow on her head.";"21";"26";"sonnet130_shakespeare_njm"
"4";"I have seen roses damask'd, red and white,";"26";"29";"sonnet130_shakespeare_njm"
"5";"But no such roses see I in her cheeks;";"29";"32";"sonnet130_shakespeare_njm"
"6";"And in some perfumes is there more delight";"32";"34";"sonnet130_shakespeare_njm"
"7";"Than in the breath that from my mistress reeks.";"34";"37";"sonnet130_shakespeare_njm"
"8";"I love to hear her speak, yet well I know";"37";"40";"sonnet130_shakespeare_njm"
"9";"That music hath a far more pleasing sound:";"40";"43";"sonnet130_shakespeare_njm"
"10";"I grant I never saw a goddess go, --";"43";"46";"sonnet130_shakespeare_njm"
"11";"My mistress, when she walks, treads on the ground:";"46";"49";"sonnet130_shakespeare_njm"
"12";"And yet, by heaven, I think my love as rare";"49";"54";"sonnet130_shakespeare_njm"
"13";"As any she belied with false compare.";"54";"56";"sonnet130_shakespeare_njm"

Here’s the PHP script:

<?php
//GET search words from URL parameter
$searchWords = trim($_GET["q"]);

//prepare for highlighting keywords
$search_array= explode(" ", $searchWords);

//prepare for output
$output = array();

//connect to database
include_once("db_setup.php");

//run query
$searchWords = mysql_real_escape_string($searchWords);
$query = "SELECT *, MATCH(line_text) AGAINST(\"$searchWords\") AS relevance
FROM $table WHERE MATCH(line_text) AGAINST(\"$searchWords\" IN BOOLEAN mode)
ORDER BY relevance DESC";
$result = mysql_query($query);

if($result) {
    while($row = mysql_fetch_array($result)) {
      $line_text = $row["line_text"];
      $start_time = $row["start_time"];
      $stop_time = $row["stop_time"];
      $file_prefix = $row["file_prefix"];
      $relevance = $row["relevance"];

      //highlight seach words in line_text
      $highlighted_text = $line_text;
      foreach ($search_array as $word) {
        $highlighted_text = str_ireplace($word, "<mark>$word</mark>", $highlighted_text);
      }

      $this_output = array("text" => htmlspecialchars($line_text),
      "highlighted_text" => htmlspecialchars($highlighted_text),
      "startTime" => $start_time,
      "stopTime" => $stop_time,
      "source" => $file_prefix,
      "relevance" => $relevance);
      array_push($output, $this_output);
    }
}

//send JSON results
if (count($output) == 0) {
  $results = array("results" => "No results.");
}

else {
  $result = array("result" => $output);
  $results = array("results" => $result);
}

$response = json_encode($results);
include_once("indent_json.php");
header("Content-type: application/json; charset=UTF-8");
echo(indent_json($response));
?>

And here’s something more important.

As a huge tennis fan, today was a melancholy one for me as Andy Roddick played his last match, having just lost a few moments ago to Juan Martin del Potro. The Wikipedia article on Roddick here already lists him as retired but the important thing to remember about Roddick is that he achieved more with less than a lot of other players with more talent and was entertaining to watch, win or loose, in big matches.

Thanks for the memories!

--------------

Related Content:

Written by nitin

September 5th, 2012 at 6:17 pm

less is more, a SAVS update

leave a comment

Just a quick post before I find a movie to stream on Netflix and ride out my Sunday …

So, I've been working a tad on SAVS, aka the "Simple Audio/Verse Synchronizer". And the changes really have to do with the data model for the timed text and for the backend/technical requirements. It's now all done with HTML, CSS, and JavaScript – as it should be.

First, the data model for a line of timed text in what I'm calling "st2" or "SAVS timed text" is now like this:

  <span
    class="savs-st2"
    data-startTime="10"
    data-stopTime="13">My mistress' eyes are nothing like the sun
  </span>

Before, it was much clunkier, like this:

  <p onclick="seekTo(10)" id="1">
    <span class="savs-text">My mistress' eyes are nothing like the sun</span>
    <span class="savs-time">10</span>
  </p>

That's to say, now – using the HTML5 "data-" attribute – the demands for the HTML markup are far fewer given that the JavsScript file "savs.js" takes care of more.

Before, with the older mark up model, there was no support for a stop time value and one also had to take the responsibility for adding several attributes related to calling JavaScript functions and for creating "id" attributes for both the corresponding <audio> or <video> element as well as for the timed text, etc.

I actually have thought about doing this as a jQuery plugin, but I'm not sure I see the point. Simply including the "savs.js" file is easier. By editing the "savs.css" file, one can control the look of their page. But I digress …

Now that the data model is different and the JavaScript file does more, one can generate a "SAVS compliant" HTML doc with whatever they want.

See, before I was thinking I'd write a PHP script that would build the page, etc, etc. but then I realized that "No, that's not my job." People should be able to store their timed data however they want, generate their HTML however they want, and only have to use the "savs.js" file and the "st2" data model to get this to work.

Sort of.

One also needs to give their HTML5 <audio> or <video> element an id of "savs-player" and also needs to put a tag somewhere in their HTML doc with an id of "savs-caption" a la:

<span class="savs-caption"></span>

That's where the captions go and it's currently required. If someone doesn't want to display captions, then they can just use CSS to hide that element.

Anyway, I'm not explaining anything well since I'm in a rush to watch a movie and have a soda, so here's the latest demo and below is the original version shown via a screencast.

SAVS: a Simple Audio/Verse Synchronizer from nitin arora on Vimeo.

--------------

Related Content:

Written by nitin

March 11th, 2012 at 8:50 pm

Posted in digital audio,scripts

Tagged with , ,

indexing and searching timed text with Solr

leave a comment

I'm still learning about Solr so maybe this post is much ado about nothing. But according to this nabble.com thread, one can't index a source XML document in Solr with it's native XML structure intact and then in turn search that structure as one can in an XML database like BaseX.

For most things, that's fine. I mean for indexing titles, creators, and descriptions, etc. I just need to index the value of a given element like <title> so that I can search for that element's value.

But for timed text, it's different. Or at least, it can be.

Say I have this DFXP snippet for an audio file with an "id" value of "XYZ".

<p begin="10.0s" end="30.0s">Hello world!</p>

I would need the user to search for the string "Hello world!" or part of it but I would also need to index at least the value of the "begin" attribute so that I can pass that to a page that will play the file "XYZ" starting at the 10 second mark – if the user clicks on the "Hello world!" line in their search result. And I don't want the "10" second value to be something they search against since they might be searching for the string "10" within the text itself.

So I'm wondering how to do that with Solr.

Maybe when I learn more I'll discover a better way to do this, but for now I'm thinking I could do the following:

First, I would pretty much index the timed text twice in Solr.

<doc>
  <field name="id">XYZ</field>
...
  <field name="timedText-stripped">Hello world!</field>
  <field name="timedText">Hello World! {10}</field>
</doc>

After indexing the "id" of the audio file this would index:

  • just the text "Hello world!"
  • the text of "Hello world!" with the "begin" attribute value in curly quotes.

I guess this way the user could be made to search across the "timedText-stripped" field but, via the XSL that can be passed to Solr to display results, the "timedText" field could be displayed in a manner that would make the text "Hello World!" linked to whatever file will play file "XYZ" starting at the 10 second mark. Basically, by planting the "begin" value in curly quotes, I can parse the string for the text and the "begin" value as separate things.

So, here's a really crappy XSL snippet that would do something like that. It assumes a variable "$id" exists that equals "XYZ", the identifier for the example audio file.

<xsl:for-each select="//field[@name='timedText']">
  <xsl:variable name="whole">
    <xsl:value-of select="."/>
    <!-- Gets entire element string -->
  </xsl:variable>
  <xsl:variable name="text">
    <xsl:value-of select="substring-before($whole,'{')"/>
    <!-- Gets text prior to seconds -->
  </xsl:variable>
  <xsl:variable name="begin">
    <xsl:value-of select="substring-before(substring-after($whole,'{'),'}')"/>
    <!-- Gets seconds value from end of string -->
  </xsl:variable>
  <a href="someMediaPlayer.php?id={$id)&amp;begin={$begin}">
    <xsl:value-of select="$text"/>
  </a>
  <!-- So, I'm saying that
  "someMediaPlayer.php?id=XYZ&start=10"
  would launch a player that would start file XYZ at the 10 seconds mark.
  -->
</xsl:for-each>

The search output would be some HTML code like so:

<a href="someMediaPlayer.php?id=XYZ&amp;begin=10>Hello World!</a>

It seems weird to index something twice, more or less, but as user Erick says in the nabble.com thread, "You've gotta take off your DB hat and not worry about duplicating data."

But now as I write this, I'm wondering if I can't just index as follows:

  <field name="text">Hello world!</field>
  <field name="begin">10</field>

and trust that for each "text" field, there will be a matching "begin" field and that they can't just be used in tandem to create the same HTML link as above. Sounds like I need to play around some more.

:)

Update, September 6, 2012: I wrote a related post to this yesterday in terms of searching across timed text with MySQL and in doing so I realized that the way I was thinking of doing it in Solr was off. Rather than doing it the way I outlined in the original post content (above) in which I was thinking to index all the timed text for a given recording in one Solr "doc" element, I think it makes much more sense to index each line in its own "doc" element as such:

<doc>
  <field name="id">someMediaPlayer.php?source=someFile.mp3&amp;begin=10&amp;end=30</field>
  ...
  <field name="startTime">10</field>
  <field name="stopTime">30</field> 
  <field name="timedText">Hello world!</field>
  <field name="source">someFile.mp3</field> 
</doc>

That way there's no need to post-parse any data fields to get the start and stop time. And, moreover, rather than construct the URL to launch that segment of audio you can just put the URL directly in the "id" field. You can always use Solr built-in support for facets to facet off of the "source" field or some descriptive metadata like "title".

I'll file the original post under the "thinking out loud yet poorly" category.

--------------

Related Content:

Written by nitin

October 16th, 2011 at 10:54 am

SAVS: a Simple Audio/Video Synchronizer

leave a comment

About a year ago I did some text to audio synchronization tests with HTML5 and Flash.

The tests were partially successful, but I think what really mattered is that I set four goals that I felt needed to be met before the word "synchronization" could truly be used:

  1. The user should be able to click on a line of text and hear the related media.
  2. The user should be able to "scrub" ahead on the media player and the text should follow.
  3. The page should report where in the document the user is.
  4. The page should automatically keep the media/text synchronized without user intervention.

Basically, I've seen a few people make it so that you could watch media while the transcript text was also on the page (scrollable as opposed to overlaid closed captions) and the user could click on a line and have the movie/audio skip ahead to that moment (goal #1). That's great and all, but that's not synchronization.

;)

Synchronization is a two way street and I've been working this past week during what I'm calling "4 days of madness" to come up with a really simple solution to real synchronization. I did run across this really cool RadioLab page that achieves goal #1, but as much as I like it I want more features with less flash (as in "flash and dash" not Adobe Flash!) and less code. No mistake: it looks fantastic and I also appreciate that they've got the text timed to clusters of a couple of words rather than by line but the only thing I've seen that gets it all "right" per my perspective was a subscription resource by Alexander St. Press. It achieved all the goals above using a Flash player and the rest appeared to by done with Javascript and some jQuery smooth scrolling. It was also timed by clusters of words and not just by line or by paragraph. Of course, conceptually it's the same whether one marks up their text – in the temporal sense – by line or by word, but it's a little more work to do it by word of course. Unfortunately, I've seen people do the opposite: they use a static unit of time like 60 seconds and only mark up the text every minute. That's taking the easy way out and also misses the point entirely since it makes the text subservient to an arbitrary unit of time. Would it be acceptable if closed captioning and subtitles on your foreign films only showed up in large chunks every minute? I would hope not, and in the case of the former it would violate the the spirit if not the letter of the "law" in regard to accessibility. If done right, you can use the same timed text file to both serve up captions in addition to showing the full text on the page. It's more time and cost efficient to re-purpose the same data for two needs.

Anyway, let's get back to Alexander St. Press. I loved what I saw when my boss (I work at NC Live) showed it to me. I got really excited and said something like, "This is what I've been waiting to see!". In addition to the great and true syncing, they also had a feature that would let the user make and share clips, much the way you can on sites like NBC's Meet The Press. The Alexander St. Press site also allowed you to annotate that clip, which is a great feature for teachers and librarians, etc. Alexander St. Press also has this with their classical music streaming subscription service, which in the spirit of full disclosure I pay for. They ALSO had a cool timeline where you could see what I call "hot spots" – places where others had made clips. The idea, I guess, is that spots on the timeline with more clusters would indicate a particular point of interest. Nothing new, because you see that all the time with streaming sports like the US Open's site where you can go back and watch previous moments in matches and then "go live" at any time. But the difference is, of course, that Alexander St. Press was using user-contributed clips.

So long story short (or just not as long), in a few weeks I need to present these ideas to some people and talk about how we think these features could be useful for our users. And the more I struggled with how to talk about these concepts without a prototype the more I thought I would a) sound like I'm crazy and b) like I'm full of hot air.

I decided that it was time to go back to some earlier tests of mine from early April and just build a prototype so we could just show it to people and not have to talk theoretical speak. I think it's generally easier to explain and convince people of the utility of software by showing it rather than telling it. Actions > words, right?

Well, early tests are working and only required me to add one line of Actionscript to our current Flash player and about only 50 lines of Javascript code are needed to keep the text and media synced. The tests I did were for some PBS videos we purchased along with closed captioning files.

I was so excited that it was finally working that I went home during those "4 days of madness" to write an HTML5 version which is virtually identical to the Flash version. It's got basic clip making features as well as a very basic tool inspired by this video score tutorial to make timed text files provided you have the audio and full text in hand. Eventually, I'll comment the code up and improve some options and post a download to the source for the HTML5 version. At work, we'll probably eventually offer the code as it's tweaked to meet our aesthetic needs, etc. As you'll see in the demo video below, I have no aesthetics!

I'll shut up now and leave you to the video if you're interested. I recommend watching it in HD so you can read the words on the page.

As my friend whom the HTML5 version is kinda named after likes to say:

More later …

SAVS: a Simple Audio/Verse Synchronizer from nitin arora on Vimeo.

Update, September 20, 2011: To avoid confusion as to what this does, I'm renaming this from "Simple Audio/Video Synchronizer" to "Simple Audio/Verse Synchronizer" or something …

:)

Update, October 16, 2011: Cool, I found one more thing that meets all the four goals at http://www.dinglabs.com. They're pitching it as a foreign language learning tool, but same difference. Also, that site led me to TranscriberAG, a tool for transcribing audio.

--------------

Related Content:

Written by nitin

September 5th, 2011 at 9:39 am

VideoScores from the MuseScore gang

leave a comment

Here's a cool screencast from Thomas on how to create a VideoScore for musescore.com.

I like how easy it is to do the matching of audio to score … I wonder if there's something similar for audio to transcript matching.

--------------

Related Content:

Written by nitin

January 9th, 2011 at 11:11 am

Switch to our mobile site