blog.humaneguitarist.org

discoveries in digital audio, music notation, and information encoding

Archive for the ‘GFA’ tag

keyword vs. phrase searching of the Soundboard, a GFA publication

leave a comment

As I mentioned before, last summer I went to the Guitar Foundation of America convention in Charleston.

I also mentioned that I'd asked some questions about whether the GFA journal, "Soundboard" was full-text indexed.

Via the FlippingBook software the GFA uses to display current issues online (membership required), there is full-text searching capability because the content is indexed as far as I can tell. But as I was saying, I don't think one can search across *all* online Soundboards simultaneously – i.e. fire off one query and get results across all online Soundboards. I could be wrong about that.

In contrast, the PDF back issues sold on a DVD-ROM are not full-text indexed nor full-text searchable with Adobe Acrobat Reader as far as I can tell. And I think this is where there's real confusion – perhaps on my part – about what we mean when we use terms like "keyword" searching.

To me, keyword searching means full-text and not a "find" (as in Acrobat Reader). The Webopedia site differentiates these as "keyword" and "phrase" searches, respectively. The GFA is using a different meaning, per the "How to search Soundboard back issues.pdf" file that comes with the DVD, for "keyword" searching:

"These issues have been processed both to reproduce the page-by-page appearance of the originals on your computer screen, and to apply an "optical character recognition" (OCR) process to the text, so that every page of every issue is now keyword searchable."

In my experience, however, the search provided internally via Adobe Acrobat Reader (and Foxit Reader, too) is what I'd just call a "find" (i.e. the same as Ctrl-F on your browser). In fact, in my version of Acrobat Reader and per the screenshot in the "How to search Soundboard back issues.pdf" file, Adobe also uses the phrase "find" and not "search" in their application. Their "Advanced Search" adds options really dealing with what to search (comments, all files in a folder, etc.) but not really how to search (in the algorithmic sense) – so, it's still a "find", though more feature-rich. Now, if you have Acrobat Pro (admittedly I do through work) you apparently can create an index and then actually do a full-text search, but that doesn't help people who don't have the pro version and won't/can't buy it.

Granted, I can index the PDF with my operating system (Windows) and do a full-text search, but I don't really get much useful information other than what files match. I don't get useful information on where the passage exists (page number, etc).

Consider the following passage from Soundboard Volume 1, Number 1, 1974:

"Mr. Llois Mauerhofer, Elizabethstrasse 93, 8010 Graz, Lustria, was reported working on a doctoral dissertation at the University of Graz on Leonard von Call, early 19th c. guitarist active in Vienna who is best remembered for his serenades for guitar and strings."

A "find" won't match that passage if you search for "Graz University" or "University Graz" or "strings Vienna" but a real keyword search likely would.

Of course, a demonstration is in order, so using a tool called Apache Tika to extract the text from the aformentioned PDF scan of Soundboard v.1, #1, 1974; a little Python software script I wrote to output the data to a database-friendly file; and an online database, I indexed the data and made a little API – all that means is that there's page you can go to, throw some search terms at it, and get the results back as structured data (um, usually not fun to read through).

By the way, I normally use more technical jargon in my posts but I have some guitarist buddies who I want to read this page.

Anyway, here are the three searches mentioned above that don't yield results in Acrobat Reader but do using a full-text search (you can see the search terms in bold in the links below). Don't worry if you can't read the output, just focus on the fact that something comes back (provided my database isn't down at the moment!).

http://blog.humaneguitarist.org/uploads/Soundboard/currentVersion/search/?q=Graz+University
http://blog.humaneguitarist.org/uploads/Soundboard/currentVersion/search/?q=University+Graz
http://blog.humaneguitarist.org/uploads/Soundboard/currentVersion/search/?q=strings+Vienna

For a more user-friendly version, try going here:

http://blog.humaneguitarist.org/uploads/Soundboard/currentVersion/soundboard_search.html

Try typing in the three searches mentioned above. Then try some more searches for fun. For simplicity's sake, I hard-coded the system to never return more than 10 results.

Of course, this should all scale to indexing the text of all the PDFs on the DVD, but exposing those openly on the web wouldn't be appropriate.

But my point with this demo is to say that this is more like what I meant by "keyword" searching at the GFA convention. There's probably a way to ingest the old PDFs into the FlippingBook software or at least something else like the Internet Archive book reader. That would probably require re-OCRing the images so that the coordinates of the words could be indexed as well, allowing one to see where on a page the results are, just as with the current issues via FlippingBook.

Ok, if you're still here and are a geek, here's the Python script, "soundboardToTabDelimited.py".

'''
usage example:
  $ python soundboardToTabDelimited.py V01-n1-1974.pdf

This yields "V01-n1-1974.xhtml" and then "V01-n1-1974.txt"
 
Note: you must have the lxml module installed (which isn't always fun).
You can get it here: http://lxml.de/
'''

import codecs, subprocess, sys
from lxml import etree

##### globals
tab = "\t"
br = "\n"


##### run Apache Tika on the file passed via the command line
soundboard = sys.argv[1].replace(".pdf", "")
command_string = "java -jar tika-app-1.2.jar %s > %s" %(soundboard + ".pdf", soundboard + ".xhtml")
command = subprocess.Popen(command_string, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
command.wait() #wait until the subprocess finishes.


##### write file headers (this needs to be deleted if you're going to later import the file via PHPMyAdmin).
tab_delimited = codecs.open(soundboard + ".txt", "w", "utf-8") #output file

tab_delimited.write("journal_id" + tab + "volume" + tab + \
                    "issue" + tab + "year" + tab + \
                    "page_id" + tab + "text_id" + tab + "text" + br)


##### extract volume, issue, year from filename
volume = int(soundboard.split("-")[0].replace("V", ""))
issue = int(soundboard.split("-")[1].replace("n", ""))
year = int(soundboard.split("-")[2])
journal_id = "%04d_%04d_%04d" %(volume, issue, year)


##### parse xhtml file
soundboard_parse = etree.parse(soundboard + ".xhtml")
root = soundboard_parse.xpath(".")

div_tags = root[0].xpath("//xhtml:div[@class='page']",
             namespaces={"xhtml":"http://www.w3.org/1999/xhtml"})


##### extract text from each div/p tag and write data to file
page_id = 1
for div_tag in div_tags:
  text_id = 0
  p_tags = div_tag.xpath("xhtml:p",
             namespaces={"xhtml":"http://www.w3.org/1999/xhtml"})

  for p_tag in p_tags:
    p_text = p_tag.text
    if p_text !=None and p_text !="":
      p_text = p_text.replace(br, "")
      p_text = p_text.replace(tab, "  ")
      p_text = p_text.strip()
      if p_text != "":
        tab_delimited.write(str(journal_id) + tab + str(volume) + tab + \
                            str(issue) + tab + str(year) + tab + \
                            str(page_id) + tab + str(text_id) + \
                            tab + p_text + br)
        text_id = text_id + 1
     
  page_id = page_id + 1

tab_delimited.close()
# fin
--------------

Related Content:

Written by nitin

January 5th, 2013 at 12:35 pm

the GFA 2012, Charleston, and me

leave a comment

Last week I went back home in a couple of respects.

I went to Charleston, South Carolina to attend the 2012 Guitar Foundation of America International Convention and Competition.

I was born and reared in Columbia, SC but at the end of 2001 – after a brief stint near San Francisco with my cousin – I moved to Charleston.

Charleston's my home.

I lived there for almost 8 years, and in many respects I'm still there, albeit not physically. My greatest friendships were formed there and I loved living there. Unfortunately, at the time I was there the combination of my education and the job market made me move on. For now. I hope there's someway I can get back to Charleston one day. But it was great to visit again, if only for a week.

As for my other home, Music, the trip was really great, too. I met some new and interesting people, heard some great playing, and met up with some old friends.

I may work in libraries, but I developed a kinship for libraries only in the sense that they provided me a way to study music.

Anyway, in the lines below I'll outline what I did each day as those activities relate to Charleston and the GFA convention.

Tuesday, June 26

Got into town about an hour and half late – I missed the I-26 exit because I turned my cell phone off because the GPS/Google Navigation makes it overheat. That, and I'm directionally challenged. Maybe it's time to ditch the smartphone and just buy the right tool: a standalone GPS unit for the car.

Checked into the King Charles Inn. Met Dave and chatted about Charleston before he helped take my luggage. Dave lives on a boat.

Went to dinner at Sermet's. Interior is totally different. Talked to Sean the bartender. Sean's an artist. He gave me his card, I passed it onto a friend of mine I met for coffee later in the week.

Late night drink at Burns Alley.

Later night drink at … well I can't say.

Wednesday, June 27

Went home-home. Visited my former apartment on 4 John St. Luckily, my landlord, Mr. James was outside. We talked on the upstairs patio for a couple of hours – about 4 John, his travels abroad, my health, the Chucktown home market, the neighborhood association, the public library, and lots of other things.

Lunch at good old La Hacienda. The waiter who used to come into the library and ask me for things to help his English is no longer there. Hope he's well wherever he is.

Dinner at El Cortile del Re. Never ate there when I lived there, so I stopped in. Chatted with Terry, my bartender. We talked about how nice it is to step out in the middle of the night sometimes in Charleston and just walk around all the old buildings and by the ocean. Peace.

At some point we, the owner, and one of the servers started talking about old Saturday Night Live skits, Ginger vs. Mary Ann, Daphne vs. Velma, et al.

We couldn't remember the names of the girls in Scooby-Doo and the owner though it was funny that all the guys went to their cell phones to look it up while she didn't have a smartphone. We talked about how before cell phones the information wouldn't even have been important enough for us to even bother with researching. She also pointed out the one bathroom at the inn across the street and that the blinds were almost never closed and how easy it was to look inside – whether you'd want to or not.

Dessert at Belgian Gelato on King. This place is new and new to me. I like it.

Thursday, June 28

Met my friend Christina for coffee at Kudu. Talked about old times and got updates on friends. Talked about what we dubbed "short films", the idea that people often try to make "features", i.e. try to do more than they are capable, that they fail when there are people out there making "shorts" – things of quality that actually affect people though they might seem less massive. Ultimately, nobody cares about how ambitious you are – or think you are – if you just produce junk. Quality matters. Precision matters. Aesthetics matter. Size doesn't. At least not in this context.

Talked about the lack of female plays/playwrights in Charleston. Gave her Sean's card.

Ah. Charleston summers … and Wimbledon. Watched Nadal get beaten by Rosol. If Nadal has a problem with Rosol's return motion, maybe he should stop delaying the server – which isn't only distracting, it's against the rules.

Lunch at Gilroy's.

Went to the Simmons Arts Center to get my GFA name tag, etc. and attend a lecture. Couldn't find registration desk. Had to ask around. Learned it was intermixed with the vendor booths. No sign that screamed at me and said "This is the registration desk." Strike one.

Went to room 309 per the GFA website to attend Alexander Dunn's "Beethoven Songs with the Guitar" lecture. Walked in the Sergio Assad masterclass. Went back to registration. Learned that the printed program said the lecture was in room 101. Print > web; data mismanagement. Strike two.

Missed the first couple minutes of Dunn's presentation because of the communications problem. Great presentation. Particularly found the comments about the expertise of the Diabelli arrangements to be of interest. Enjoyed the songs that Dunn played with a singer. Would have preferred to hear on period instrument. Talked to Alexander about this later in the week, sounds like such a recording might be in the works down the road.

Skipped the Kavanaugh concert. Program consisted of student repertoire and her own pieces scattered throughout the program. Not what I go to the GFA for. Strike two and a half. Sorry.

Dinner at Bocci's with friends Jim and Karen. Jim had a presentation the next morning.

Dessert with Jim and Karen at Belgian Gelato on King. Used the coupons I got the night before.

Late drink at Burns Alley – hey, only the locals know about it and how to find it. I think that's the point.

Friday, June 29

Went to the GFA Round Table. Introduced by Executive Director Galen Wixson. Hosted by Tom Heck.

A few comments made about the online Soundboards (SB). Don't think my question about whether they are full-text indexed was understood. So I dropped it.

I was particularly asking about whether I could search across all SB's simultaneously online. While their ebook software (http://flippingbook.com) is indexing each SB and I can search full-text across an issue, I still don't see a way to search across all the ones that are available through the GFA website. Same for the back issues in PDF. Sure, I can probably index it with with my operating system's built-in indexer, but we need to do more and index it and make it a web service. Or maybe I'm missing something.

And now why I came: presentation by Robert Coldwell of the Digital Guitar Archive (DGA). Good overview of Robert's work re: harvesting metadata from across diverse online digital score archives. "Metadata" never uttered once, of course. These are guitarists, not librarians.

Conversation starts about using iPads to view sheet music. I commented that it's interesting that all the talk is about co-opting a general purpose device (the iPad) when maybe it's simply insufficient for music. Mentioned something like the original Microsoft Surface is perhaps better. Maybe the stand and the tablet are one. Someone had previously mentioned wanting a digital score view that automatically turned the page as it listened to the player. I felt he and I were talking about what needs to happen whereas some others were talking about simple accepting what exists and trying to make it fit. Reminds me of what my former professor at South Carolina, Mr. Berg, always said about sitting: "…bring the guitar to you, don't bring yourself to the instrument (paraphrase)".

In light of some of the things I heard here and the website problems I'm got the feeling I might be able to make some contributions to the membership just in raising awareness about things like digital symbolic music representation, music metadata, etc. As of this post the winners are still not listed on the GFA Drupal site itself, but rather on the WordPress blog; the Drupal site is still presenting information about registration!

My friend Jim Buckland presented on the Guadagnini family of luthiers. Lots of new information for the attendees to absorb and some shattered stereotypes (people gasped audibly when they learned Giuliani's Fabricatore has a scale length that exceeds current convention). Jim's research is critically important and I keep pestering him to formally document as much as he can. After the allotted hour, people were asked if they wanted to hear more. Almost everyone stayed for about another hour. I helped Jim by advancing the PowerPoint slides so he could focus on speaking and answering questions.

Chatted with Alexander Dunn a bit. Met Jim McCutcheon. Took a few pictures for him, was in a few pictures with him.

Few of us agreed to have lunch. I walked with Robert and we discussed geek stuff like his SQL schema, his preferred scripting language choices for the DGA, and whether he'll eventually offer up an API. He later said this was the only technical conversation he's ever had at one of these conventions. I'll definitely be talking more to Robert and I hope there's something we can work on together down the road.

Lunch at Mellow Mushroom with Robert, Tom, the two Jims, and Karen. Had an Abita Purple Haze on draft. Saw beginnings of the Federer/Benneteau epic on the television. Took a nap in the hotel. Woke up to Fed taking the last set. Whew.

Quick dinner with Jim, Karen, Geoff Ferdon of Alhambra Guitars and Patrick, another South Carolina grad at Coast Bar and Grill.

Friday concert was the legendary Assad Brothers. Practically got a standing ovation for walking onto the stage. Second half by far stronger than the first. Amazing how different their techniques are given how seemlessly they merge their sounds.

Solo pieces by Sergio, played by Odair, were probably best thing on the program. Unlike some others, Sergio's music is on par with anything else they could play so it's OK if his music is played throughout the concert. Less established composers need to segment their stuff off to the end of the program in my opinion for two reasons: so that people can leave (sorry) and so one doesn't try to "boost" the value of their music by sandwiching it between established works.

Sitting behind me was an old friend, Rick Zender from the College of Charleston. That was a happy coincidence. It was great to talk with him again. I'd known him through my event programming activities when I worked at the Charleston County Public Library.

Dessert at Kaminsky's. Got my favorite: the Cuban coffee and some cake as well just for fun. Walked around the Market a little bit while waiting for a seat.

Saturday, June 30

Attended Christiano Porqueddu concert. All Gilardino, all the time. Given how many, um, flaky concert programs I saw this week I went to this largely in support of Porqueddu's commitment to serious programming. This is an international guitar festival after all. Let's be serious here.

The Gilardino I've heard and looked at is as hard to listen to as to play, but I still respect it. Porqueddu's playing was very good. He was in fact the only person I heard at the Sottile Theater (the default concert venue) that didn't use a microphone. Not that I associate that with good playing; rather it's just an observation.

By the way if people are using mics: then I'm thinking the Sottile isn't the right venue. At the 1999 convention in Charleston, I remember hearing concerts at the nearby church (which one?, so many) and the acoustics and ambiance were superior to the Sotille (which is dealing with some internal renovation work, too).

Porqueddu said not one word during the concert. And I think he should re-think that approach. Especially when you are playing music many people have never even heard or heard of. Lighten things up, talk a little and guide us through what you're playing and why it means something to you. Tuning problems same as on some of his CDs. This could be due to a lot of reasons and is more likely due to a preference for certain intervals then it is due to any musical/hearing issues, etc. But it's still distracting here and there.

Walked around town with Jim and Karen. Hollywood Video on Calhoun/Alexander is gone and the Starbucks (where we went) seems run down. It used to feel so new. Walked by the library and saw an old co-worker coming out. Chatted for a few seconds.

Went to the Charleston Museum: they had guitars made by Martin Co. One had a Stauffer/Legnani headstock. Totally ruined by trying to make it a steel string after the fact. Major structural damage. Sold by Siegling Music House of Charleston. I've been saying for years that there are probably 19th century guitars lying around in Charleston here and there. Never saw guitars in the museum before. And I used to live only 3 doors down. I'll be researching Siegling …

Quick dinner at Nick's BBQ on King. Another place that's new to me.

Went to the Roland Dyens concert. Opened with an improvised piece. Played some Tchaikovsky and some Chopin amongst his own works. I had to use the restroom during the intermission and missed the first set of the second half.

Unfortunately, they had cranked the microphone up between the 1st and 2nd half.

Bad decision. Now, I'm not having to actively listen, now the music's being shoved in my face and the mid range is so muddy it's working against this man's fine playing. Why are we even using mics in classical guitar concerts? I hope this wasn't Dyens' idea. I'd prefer to blame the audience or the GFA for catering to the lowest common denominator. I need to start an anti-mic campaign for the next convention. The players don't need it and neither do responsible listeners.

Anyway, I enjoyed the Dyens though I would have preferred a more structured program. It felt a little too casual and light to me for a final concert.

Went out by myself to a few new bars who's names I can't recall. Thanks to Kristin and Sonny for doing shots at the bar after they pretty much closed for the night. Good luck to Sonny as he pursues his chemistry degree.

Sunday, July 1

Went to Clinton, SC for the day. Hung out with Jim, Karen, and their cat Bria.

Dinner at local Mexican restaurant. Unfortunately, our standard El Jalisco is closed on Sundays but this other place was good. Waiter was super happy that Spain trounced Italy in the football match.

Heard some of the new Maccari-Pugliese CD: "Music for Guitar & Piano". Speechless. I love these dudes and their playing. They need to be playing at a future GFA. They are too good. Plus, they're crazy fun to hang out with!

Watched "Tenure" with Luke Wilson – interesting to watch with two professors.

Headed to my parents' home in Columbia for a doctor's appointment the next day.

… and "life", as it were, resumes though I feel refreshed and rejuvenated.

Final Thoughts

I need to go to Charleston more:

  • it's only 4.5 hours away.
  • I miss it.

The GFA needs to do more:

  • better coordination between the data on the website, their blog, and their printed documents. i.e. better organization and communication before, during, and after the convention.
  • maybe form a task force to get feedback from members in a structured manner to try and make things even better next year.
  • consider having TV's in the waiting area of the halls with live feeds of the concerts for people who are too sick to sit comfortably inside the concert hall, etc.
  • consider social media: tweets, SMS, etc. for rapid updates during the convention and for on-the-fly meet ups and events.
  • work harder to promote scholarly activity. There were only like 2 scholarly presentations. Seriously? Come on folks. Rumor does have it that a scholarly, peer-reviewed supplement to the SB is being discussed. This needs to happen.
  • consider "tracks". i.e. every day should have a "performance" track about technique, etc., a "scholarly" track about research,  a "guitar and technology" track, etc, etc. An attendee should have options in each track throughout each day. Let's be more methodical, more organized, more comprehensive, and more deliberate with our gatherings.
  • ban microphones. If the player needs it, don't invite them. If the hall requires it, don't use it.
  • summer dates in places as hot as Charleston might not be the best idea.
--------------

Related Content:

Written by nitin

July 4th, 2012 at 11:40 am

Posted in music,news

Tagged with , , ,

when worlds collide: guitar meet online library

leave a comment

Thomas Heck's "Expanding the GFA's Online Resources" from a recent issue (Vol. 36, no. 3) of the Guitar Foundation of America's Soundboard listed some cool online resources that I know of and use and some that were new to me.

They all appear to be listed on the GFA page here: http://www.guitarfoundation.org/drupal/node/4112.

The coolest of all is the Boije Collection which I backed up to my local drive and all my backup areas as soon as I learned about it a few years ago.

I will never in my life learn a score by reading it off a screen, but I can print scores out from this collection and work on them. Given that it's mostly a 19th century music collection, it fits perfectly with the type of repertoire I work on. Thankfully, the scores are hand-written, if not facsimiles, as I really hate working on scores produced by machines. They aren't beautiful and don't look alive even though the music itself is. Also, they make learning the music harder for me and less fun. I could go on …

--------------

Related Content:

Written by nitin

December 19th, 2010 at 9:27 am

Posted in music,music notation

Tagged with , ,

Switch to our mobile site