blog.humaneguitarist.org

discoveries in digital audio, music notation, and information encoding

Archive for the ‘MySQL’ tag

full-text searching of timed text and a farewell to Andy Roddick

leave a comment

It’s been a while since I had one of my “So, I’m home sick today and wrote this silly, little script” things.

Well, here’s another one while the antibiotics take root.

I’ve always wanted to do something with offering full-text search against timed-text files and allowing a user to click on a result and skip to the audio segment matching the returned line of timed-text, etc. Hulu has had a BETA version of this kind of thing for a while and I suspect others do too.

Well, today I just whipped up a little search API using PHP and MySQL. It’s a nice little start and super easy to do.

I made a database table using the timed-text data from my SAVS project, OpenOffice Calc, and phpMyAdmin. The text is from Shakepeare’s Sonnet 130 using a LibriVox recording (version #14, Miller). BTW, parsing DFXP or SRT files and throwing those into a table is easy, but it’s not within the scope of this little mock-up.

If I send a query for “rare love” to the API as such:

http://blog.humaneguitarist.org/uploads/SAVS/currentVersion/search/?q=rare%20love

… I get the following JSON response:

{
  "results":{
    "result":[
      {
        "text":"Than in the breath that from my mistress reeks.",
        "highlighted_text":"Than in the breath that from my <mark>mistress<\/mark> <mark>reeks<\/mark>.",
        "startTime":"34",
        "stopTime":"37",
        "source":"sonnet130_shakespeare_njm",
        "relevance":"4.04993200302124"
      },
      {
        "text":"My mistress, when she walks, treads on the ground:",
        "highlighted_text":"My <mark>mistress<\/mark>, when she walks, treads on the ground:",
        "startTime":"46",
        "stopTime":"49",
        "source":"sonnet130_shakespeare_njm",
        "relevance":"1.62977826595306"
      }
    ]
  }
}

Note that the text is returned in the “text” field and I’m also trying to return a “highlighted_text” field in which search terms are surrounded by the HTML5 “mark” tag. There’s also a relevance score … of sorts (pun!).

It needs a lot of work, but there’s enough data returned to launch an audio segment using some HTML5/JavaScript or some Flash or Silverlight API, etc. Hey, it ain’t too bad for a bad stomach and some sports-entertainment distractions.

Below, I’ll paste the CSV file I used to make the table, the PHP script … and a personal note about the best male American tennis professional of the last decade.

Here’s the CSV file from the spreadsheet application (note the “line_text” field is full-text indexed in the database):

"line_id";"line_text";"start_time";"stop_time";"file_prefix"
"1";"Coral is far more red than her lips' red:";"13";"17";"sonnet130_shakespeare_njm"
"2";"If snow be white, why then her breasts are dun;";"17";"21";"sonnet130_shakespeare_njm"
"3";"If hairs be wires, black wires grow on her head.";"21";"26";"sonnet130_shakespeare_njm"
"4";"I have seen roses damask'd, red and white,";"26";"29";"sonnet130_shakespeare_njm"
"5";"But no such roses see I in her cheeks;";"29";"32";"sonnet130_shakespeare_njm"
"6";"And in some perfumes is there more delight";"32";"34";"sonnet130_shakespeare_njm"
"7";"Than in the breath that from my mistress reeks.";"34";"37";"sonnet130_shakespeare_njm"
"8";"I love to hear her speak, yet well I know";"37";"40";"sonnet130_shakespeare_njm"
"9";"That music hath a far more pleasing sound:";"40";"43";"sonnet130_shakespeare_njm"
"10";"I grant I never saw a goddess go, --";"43";"46";"sonnet130_shakespeare_njm"
"11";"My mistress, when she walks, treads on the ground:";"46";"49";"sonnet130_shakespeare_njm"
"12";"And yet, by heaven, I think my love as rare";"49";"54";"sonnet130_shakespeare_njm"
"13";"As any she belied with false compare.";"54";"56";"sonnet130_shakespeare_njm"

Here’s the PHP script:

<?php
//GET search words from URL parameter
$searchWords = trim($_GET["q"]);

//prepare for highlighting keywords
$search_array= explode(" ", $searchWords);

//prepare for output
$output = array();

//connect to database
include_once("db_setup.php");

//run query
$searchWords = mysql_real_escape_string($searchWords);
$query = "SELECT *, MATCH(line_text) AGAINST(\"$searchWords\") AS relevance
FROM $table WHERE MATCH(line_text) AGAINST(\"$searchWords\" IN BOOLEAN mode)
ORDER BY relevance DESC";
$result = mysql_query($query);

if($result) {
    while($row = mysql_fetch_array($result)) {
      $line_text = $row["line_text"];
      $start_time = $row["start_time"];
      $stop_time = $row["stop_time"];
      $file_prefix = $row["file_prefix"];
      $relevance = $row["relevance"];

      //highlight seach words in line_text
      $highlighted_text = $line_text;
      foreach ($search_array as $word) {
        $highlighted_text = str_ireplace($word, "<mark>$word</mark>", $highlighted_text);
      }

      $this_output = array("text" => htmlspecialchars($line_text),
      "highlighted_text" => htmlspecialchars($highlighted_text),
      "startTime" => $start_time,
      "stopTime" => $stop_time,
      "source" => $file_prefix,
      "relevance" => $relevance);
      array_push($output, $this_output);
    }
}

//send JSON results
if (count($output) == 0) {
  $results = array("results" => "No results.");
}

else {
  $result = array("result" => $output);
  $results = array("results" => $result);
}

$response = json_encode($results);
include_once("indent_json.php");
header("Content-type: application/json; charset=UTF-8");
echo(indent_json($response));
?>

And here’s something more important.

As a huge tennis fan, today was a melancholy one for me as Andy Roddick played his last match, having just lost a few moments ago to Juan Martin del Potro. The Wikipedia article on Roddick here already lists him as retired but the important thing to remember about Roddick is that he achieved more with less than a lot of other players with more talent and was entertaining to watch, win or loose, in big matches.

Thanks for the memories!

--------------

Related Content:

Written by nitin

September 5th, 2012 at 6:17 pm

Go Daddy and database connection problems

leave a comment

It's funny that GoDaddy.com uses a race car driver as their spokesperson, given that their service doesn't always "go".

Anyway, I created a new MySQL db in my Go Daddy account and I had all kinds of connectivity errors. Sometimes my PHP script would retrieve the data I wanted from the db and other times a connection just couldn't be established.

This seemed weird since my WordPress blog is using a MySQL db hosted by Go Daddy and I don't have any problems with connectivity and my blog.

Customer service was, of course, a waste of time. I got the generic, stock answers about verifying my PHP connection string … but if it connects sometimes but not all the times isn't is clear the issue isn't with my connection string? Worthless.

So what I did was cheat …

Since Go Daddy does make it easy to install WordPress or various other PHP/MySQL applications, I just installed one (OpenDB) that I didn't intend to use. In other words, I made Go Daddy create the db.

I then just deleted all the files on my server related to OpenDB.

Then, I used phpMyAdmin to drop all the tables in the db created for OpenDB as described here.

After that, I imported my tables and it seems that this workaround will allow me to access a MySQL db hosted by Go Daddy without PHP connection problems.

I wasted so much time on the weekend over this issue that I hope this post helps others deal with the issue if they have the same problem.

--------------

Related Content:

Written by nitin

June 26th, 2011 at 10:56 am

Posted in news,technophilia

Tagged with , ,

MusicSQL: initial thoughts

one comment

One of the nice things about an emerging standard, namely MusicXML, having a command center (Recordare LLC) is having a central place to learn about what’s new.

On Friday, I was looking at Recordare’s page of MusicXML related software for software that worked from the command line and noticed something new and really interesting: MusicSQL.

According the the Goodle Code page that hosts this project, MusicSQL is:

… a system for conducting complex searches of symbolic music databases. The database can import and export MusicXML files. In the current version searches are constructed using a command line interface or through simple Python scripting tools.

Basically, at least as I understand it, MusicSQL is a Python program that sits on top of a MySQL database – now I really hope Oracle doesn’t kill MySQL if it buys Sun.

I was so excited to get MusicSQL working that I didn’t notate all the little problems I had along the way. The documentation for MusicSQL is very good and is written for Windows, Mac, and Linux (Ubuntu) users. But I’m inconceivably impatient, so I just mowed through the installation with little care for remembering what I was doing.

I do remember that I had to install Python 2.5, whereas I already have Python 2.6 installed – now I have both. I put/installed all the dependencies in my Python 2.5 directory just to compartmentalized everything – the exception being MySQL, which I installed wherever the default is.

So far, I only ran the first query in the documentation that uses "scientific" musical notation in the form Nx, where "N" is the alphabetical note name, say C, and "x" is an integer that denotes what octave the note is a member of. In other words, a C-Major scale would be "Cx Dx Ex Fx Gx Ax Bx Cx+1", something like "C5 D5 … B5 C6", etc. You can place an integer before the note name to denote its duration.

Running the query from the command line, I was really happy with the speed and the output of MusicSQL for the test query.

One problem I did have, though, is I kept getting errors for another great feature of MusicSQL. Basically, after you run your query, you can see a PDF of the results (i.e. the music excerpt pertaining to the query results). The PDF is made by Lilypond, a text-based notation software that produces – in my opinion – the absolute best looking engraving out there, that’s why I use it (and yes, it’s free).

Now Lilypond doesn’t natively read MusicXML, it uses its own encoding. So MusicSQL takes advantage of a Python script that comes with the Lilypond install called "xml2ly" that converts MusicXML to Lilypond format. I left a message on the project forum for MusicSQL, so I’m hoping I can figure out what I need to do to get the Lilypond outout of the query results to work. At any rate, I do wonder how effective it can be since the conversion from MusicXML to Lilypond can sometimes get ugly.

I wonder if an alternative solution is to use the command line options for the MuseScore notation software to generate a PDF of the query results. Musescore can also convert MusicXML to other graphics formats (PNG) and even audio (WAV, FLAC, OGG), so theoretically it could be leveraged to make audio files for the corresponding query results.

At any rate, I’m really looking forward to the future developments of MusicSQL.

And as for using MuseScore’s command line in conjunction with MusicXML and how it can add value to a web collection of MusicXML docs – there will be more to that later …

--------------

Related Content:

Written by nitin

November 15th, 2009 at 3:54 pm

Switch to our mobile site