AudioRegent 1.3.1 (Beta)

Table of Contents



Running AudioRegent

Changing Settings: AudioRegent.xml


How It Works





AudioRegent seeks to provide a simple yet effective way to automate the non-destructive creation of derivative audio files from master WAV files by means of an easy-to-use audio decision list.

AudioRegent utilizes:

  1. SoX, i.e. Sound Exchange. SoX is a command line audio editing tool.

  2. SimpleADL, an XML-based audio decision list; developed in conjunction to AudioRegent.

AudioRegent and SimpleADL are intended to be used by advanced users conversant in XML and digital audio technologies and terminologies.

AudioRegent is licensed under the MIT software license.


AudioRegent has been tested on 32-bit versions of Windows XP (SP2, SP3), Windows 7, and Xubuntu 9.10.

It has been tested using Python versions 2.5 and 2.6.

To install the program, download

    Unzip the file and place the root AudioRegent folder wherever you like on your system.

    Install SoX on your system if you don't already have it.

    • You must make sure that the sox command is executable from within the AudioRegent folder.

    Lastly, you need to download and install Python version 2.5 or 2.6 if you don't already have it.

    • To date, AudioRegent has not been tested with Python versions 2.7 and 3.0.

    Running AudioRegent

    To run the default interface for AudioRegent do:

    $ python

    To see the available command-line options do:

    $ python --help

    Changing Settings: AudioRegent.xml

    Using a simple text editor, you can change some of the things AudioRegent does by changing the element values in the AudioRegent.xml file.

    The AudioRegent application doesn't care about the attribute values in AudioRegent.xml, but you might want to leave them intact as a record of the default values.

    Here are the default values:

      <outputType default="ogg">ogg</outputType>
      <SoxOptions default="gain -n -3">gain -n -3</SoxOptions>
      <comment default=""/>
      <delete_outWavs default="true">true</delete_outWavs>
      <timestampLogFiles default="false">false</timestampLogFiles>
    1. For <outputType> choose from the following lowercase values: wav, aif, flac, or ogg. This is the format of the final audio files to be found in outOggs folder.

      • If you are wondering why mp3 isn't an option, read the SoX format documentation (see "mp3") for information about rendering MP3 files. Basically, you can't use the default version of SoX to make MP3 files due to licensing concerns. If you need mp3 as an output option, you'll have to build SoX from source to have MP3 creation capabilities or you can simply output losless files and later convert them to MP3 files with a third-party application.

    2. For <SoxOptions> just use your preferred and *valid* SoX effects for derivative audio files. These effects will be present in the final audio files in the outOggs folder. AudioRegent will not fail if an invalid string is used, but SoX will and your audio files will not get properly made.

    3. For <comment> just enter your preferred comment 'tag' or leave this element empty. If you specify a comment string it will likely show up as embedded metadata in OGG, FLAC, and AIF files if one of these formats is chosen as the <outputType>.

    4. For <delete_outWavs> use "true" if you want to empty the *entire* outWavs folder automatically. Use "false" if you want to leave the files in this folder intact. Be warned: using "false" means that *every* WAV file in the outWavs folder will have a derivative placed in the outOggs folder, i.e if there are pre-existing WAV files in the outWavs folder, AudioRegent will be making derivatives of them.

    5. For <timestampLogFiles> use "false" if you don't want to timestamp the filenames for the log files. Doing so will overwrite the previous log files every time you run AudioRegent. Use "true" if you do want these filenames to be timestamped.


    SimpleADL stands for Simple Audio Decision List.

    SimpleADL is a homegrown XML-based way to:

    • optionally capture some basic statistics about a master WAV audio file,

    • define audio regions within the file,

    • and optionally notate comments and textual components within each region. These components may includes information about the region, interview transcription text, song lyrics, or theatrical dialog, etc.

    The XML schema for SimpleADL version 1.0 is located here:

    The basic tree structure of an example SimpleADL defining two regions is as follows:

    <audioDecisionList filename="">
      <region id="">
        <in unit="seconds"></in>
        <duration unit="seconds"></duration>
      <region id="">
        <in unit="seconds"></in>
        <duration unit="seconds"></duration>

    SimpleADL's <region> element provides an easily retained record of desired regions within an audio file.

    The <in> element specfies where the region starts, while the <duration> element specifies the length of the region.

    The <outputAsTracks> element instructs AudioRegent on how to output derivative files. Specifically, this means that an element value if "true" instructs AudioRegent to output one derivative audio file per region while an element value of "false" would tell AudioRegent to output only one derivative audio file consisting of all regions spliced together. For more, see the section below entitled How It Works.

    Because SimpleADL has such a basic tree structure, it's easily extensible. For example, here's a SimpleADL file with added technical metadata about the WAV file example2.wav. A <text> block is now also present.

    <?xml version="1.0" encoding="utf-8"?>
    <audioDecisionList filename="example2.wav"
        <channel position="mono">
          <minimumSamplePosition unit="seconds">145.854921</minimumSamplePosition>
          <minimumSampleValue unit="dbfs">-0.440</minimumSampleValue>
          <maximumSamplePosition unit="seconds">168.396961</maximumSamplePosition>
          <maximumSampleValue unit="dbfs">-1.644</maximumSampleValue>
          <RMS_level unit="dbfs">-26.969</RMS_level>
        <length unit="seconds">15</length>
      <region id="_01">
        <in unit="seconds">1</in>
        <duration unit="seconds">9</duration>
            <p>Hello World!</p>

    Note that the optional statistical information used was based on the available statistics in Sony's Sound Forge 9.0. AudioRegent doesn't use the <statistics> element, so you could use the statistical measurements of your choice based on personal preference and what your software is capable of analyzing. You could also disregard this element.

    Also note that it's safest to use UTF-8 encoding for SimpleADL files. Not doing so could cause AudioRegent to crash if your SimpleADL file contains certain diacritics but is encoded with Windows-1252, etc.

    How it Works

    The diagram below shows how AudioRegent would work for bar.wav and its accompanying SimpleADL file bar.adl.xml. The diagram shows what would happen if the SimpleADL element <outputAsTracks> was set to “true” (Left Side of image) or if it was set to “false” (Right Side of image).

    By default, all WAV files created by AudioRegent in the outWavs and tempWavs folders are deleted automatically. If you want to retain the WAV files in the outWavs folder, see Changing Settings: AudioRegent.xml.



    Here are a few SimpleADL examples that assume we use the default settings in AudioRegent.xml. Hopefully, these will give you an idea of what can be done with AudioRegent when it uses these SimpleADL files.


    Example 1:

    Let's assume we have a file called lecture.wav. It's 5 and half minutes (330 seconds) long. There's a nasty, undesirable sound that occurs between the 300 second mark and the 305 second mark. We want to output a file that omits that sound.

    Using the following SimpleADL file:

    <audioDecisionList filename="lecture.wav">
      <region id="_01">
        <in unit="seconds">0</in>
        <duration unit="seconds">300</duration>
      <region id="_02">
        <in unit="seconds">305</in>
        <duration unit="seconds">25</duration>

    AudioRegent would produce a 5 minute and 25 second file called example.ogg that doesn't contain our unwanted sound. This file will be normalized to -3 dbfs per the <SoxOptions> value of "gain -n -3".


    Example 2:

    Now consider the following SimpleADL file for a 45 second WAV file called AllForAPailOfWater.wav.

    <?xml version="1.0" encoding="utf-8"?>
    <audioDecisionList filename="AllForAPailOfWater.wav"
      <region id="_part1">
        <in unit="seconds">0</in>
        <duration unit="seconds">20</duration>
            <p class="soundEffect">Sound of phone ringing.</p>
            <p class="transcript">Jack: Hello?</p>
            <p class="transcript">Jill: Hi, Jack. It's me Jill.</p>
            <p class="comment">Jack pauses for nearly 10 seconds.</p>
      <region id="_part2">
        <in unit="seconds">30</in>
        <duration unit="seconds">15</duration>
            <p class="transcript">Jill: Jack, are you there?</p>
            <p class="transcript">Jack: What do you want?</p>
            <p class="transcript">Jill: I just want to know how your crown is? Are you OK?</p>
            <p class="transcript">Jack: Jill, you can't come tumbling after me anymore. I mean it. Goodbye.</p>
            <p class="soundEffect">Sound of phone hanging up.</p>

    AudioRegent would produce two WAV files: AllForAPailOfWater_part1.ogg and AllForAPailOfWater_part2.ogg.

    Listening to both files back-to-back would let you listen to the conversation while being able to avoid having to hear Jack pause for 10 seconds before he can say anything. I realize that Jack's pause is part of the "story" of this conversation and from a certain perspective it should be left in, but this is just an example.

    Now you have the sound files, but what else can be done? Well, you also have a transcription of the conversation embedded in the SimpleADL file so by using XSL/XSLT (or even copy/paste!) you could extract the text in the <p> tags where the "class" attribute value equals "transcript", wrap the OGG files inside the HTML5 <audio> tag, and generate an HTML5 page like so:

          <p>Part 1:</p>
          <audio src="AllForAPailOfWater_part1.ogg" controls="controls"/>
          <p>Jack: Hello?</p>
          <p>Jill: Hi, Jack. It's me Jill.</p>
          <p>Part 2:</p>
          <audio src="AllForAPailOfWater_part2.ogg" controls="controls"/>
          <p>Jill: Jack, are you there?</p>
          <p>Jack: What do you want?</p>
          <p>Jill: I just want to know how your crown is? Are you OK?</p>
          <p>Jack: Jill, you can't come tumbling after me anymore. I mean it. Goodbye.</p>

    This is what the page looks like in Firefox 3.0:


    Of course, this approach limits the user to hearing only one segment at a time. By mapping the transcript text to HTML and/or using something fancier for audio playback, perhaps some Javascript code or the Javascript API for the well-known JW Player, you could create more intricate text-to-transcript matching scenarios – such as having a web page that allows users to not only start playback at any region by clicking on the transcript text but to also continuously listen to the whole interview without interruption.

    Related: see the SAVS posts for more about audio/text synchronization.

    Example 3:

    Under the root AudioRegent folder is a folder called examples. Rather than write about it, you can just take a look at what's there:

    • DoReMi.wav

    • DoReMi.adl.xml

    • subfolder/tones.wav

    • subfolder2/C-D-E-F-G.wav

    • subfolder2/C-D-E-F-G.adl.xml

    Launch AudioRegent and type in "examples" at the prompt.

    This will run AudioRegent over the entire examples folder and will process DoReMi.wav and C-D-E-F-G.wav. It will skip over tones.wav because there's no SimpleADL file with the same filename prefix in the subfolder folder.

    After AudioRegent is done, take a look in the outOggs folder for your new files.

    By the way, you can also point AudioRegent to examples/subfolder2 if you want to omit processing DoReMi.wav. But if you only want to process DoReMi.wav and not C-D-E-F-G.wav, you'll have to use the command line options:

    $ python --wav="examples/DoReMi.wav" --adl="examples/DoReMi.adl.xml"


    1. What if the only region I want to define is the *entire* WAV file?

      • In that case, use the <region> element once for the entire duration of the WAV file and set <outputAsTracks> to "false".

    2. Do regions have to be sequential? Can they overlap?

      • No, regions don't have to be sequential. In other words, your first region could occur at a later point in the original WAV file than the second region you define. Also, regions can overlap if you need them to. In other words, two regions can share a portion of audio between them.

    3. When I set <outputAsTracks> to "false", does AudioRegent combine the regions in the same sequence as in the SimpleADL file?

      • No, it should combine them in "alphabetical" order. So in other words, make sure your region ID values are alphabetically arranged. In other words, try ID values like "_01", "_02", "_10", etc. instead of "_1", "_2", "_10" as the latter would be arranged in the following order: "_01", "_10", "_2" – i.e. out of sequence.

    4. What if I need to define a region within a recording for purposes of documentation, but I don't want any tracks to be generated by it?

      • Simply comment out the region that you want AudioRegent to ignore.

    5. If AudioRegent doesn't use the <statistics> element why bother to collect it in SimpleADL?

      • It seems to me that if you ever suspect that your file is corrupt, having these statistics could help you determine if the file is corrupt or not. Also, if you ever need to re-digitize the source material (if it was indeed analog to begin with), having information on the peak value and its location could possibly help you make your newly digitized file nearly identical to the original one, thereby keeping your region values valid.

    6. Can I use a different a time format for the <in> and <duration> values as well as for the <statistics> elements?

      • According to's SoX page, you should be able to use hh:mm:ss.frac format as well as using samples instead of seconds as time units, so for the <in> and <duration> values the answer is potentially a "yes". I say this because SimpleADL version 1.0 requires the use of seconds as the unit of time for all temporal elements. If you must use a different time format than seconds, you can either:

        • use a schema-less version of SimpleADL as found in AudioRegent version 1.1 or

        • alter the XML schema to your preferences and save the altered version of the schema on your own system or server.

      • I'm set on seconds just because it's a simple floating number and doesn't intermix hours/minutes/seconds, etc. nor is it dependent on the sample rate.

    7. I don't want to store my SimpleADL files in the same folder as my WAV files. What can I do if I want to use AudioRegent?

      • You can either:

        • temporarily move the SimpleADL files alongside their respective WAV files, run AudioRegent, and then move your SimpleADL files back to their original location or

        • use the command line options which let you explicitly declare the location of a WAV file and the SimpleADL file you want to use. Note that in this case you do not have to give your SimpleADL file the same prefix as your WAV file. For example:

          • $ python --wav="folderOne/foo.wav" --adl="folderTwo/bar.adl.xml"



    Related Content:

    Leave a Comment

    Your email address will not be published. Required fields are marked *