AudioRegent
AudioRegent 1.1
TERMS:
AudioRegent is licensed under the BSD software license.
AudioRegent
Copyright (c) 2010, Nitin Arora.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of Nitin Arora nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Table of Contents
Changing Settings: AudioRegent.xml
Introduction
AudioRegent is written in Python.
AudioRegent was written by Nitin Arora.
AudioRegent seeks to provide a simple yet effective way to non-destructively make web-deliverable audio files from master WAV files.
AudioRegent utilizes:
-
SoX, i.e. Sound Exchange. SoX is a command line audio editing tool.
-
SimpleADL, a simple/homegrown XML-based way to:
-
optionally capture some basic statistics about a master WAV audio file
-
define audio regions within the file
-
notate comments and textual components within each region like transcription text, lyrics, or dialog, etc.
AudioRegent and the accompanyning SimpleADL are intended to be used by advanced users conversant in XML and digital audio technologies and terminologies.
Installation (Windows)
AudioRegent has been tested on Windows XP (SP2 and SP3) and Windows 7. Memory and processor requirements are minimal. If you can run Python and SoX, you shouldn't have any problems.
To run AudioRegent you need to download AudioRegent-1.1.zip which contains the following:
-
AudioRegent.py – the AudioRegent program. Note: this will appear as AudioRegent.py.txt. You must rename the ".py.txt" extension to ".py" to make the program work. By renaming the extension you are acknowledging your acceptance of the licensing terms.
-
AudioRegent.xml – the AudioRegent setup file. Editing it in a text editor allows one to alter some program defaults.
-
AudioRegent.html – AudioRegent documentation
-
SimpleADLmaker.py – a simple Python script to create a SimpleADL template based on simple user input. Note: this will appear as SimpleADL_Maker.py.txt. You must rename the ".py.txt" extension to ".py" to make the program work. By renaming the extension you are acknowledging your acceptance of the licensing terms.
-
tempWavs – a folder where temporary WAV files are stored according to the region values in the SimpleADL file only if these regions are to be later concatenated together to form one large audio file that encompasses all regions.
-
outWavs – a folder where temporary WAV files are stored according to the region values in the SimpleADL file; these WAV files can be automatically deleted or not depending on the setting in AudioRegent.xml.
-
outOggs – the folder that contains the final output audio files, regardless of format chosen. These files will have effects added by SoX depending on the setting in AudioRegent.xml.
-
outLogs – the folder that contains log files of SoX's activities as dictated by AudioRegent. "reportLog_wav" and "reportLog_ogg" report on the initial and final stages of processing, respectively. The files are in tab-delimited format and can be viewed in a text editor or in a spreadsheet application.
-
examples – a folder with some example files.
Unzip AudioRegent-1.1.zip and place the root AudioRegent folder wherever you like on your system.
Additionally, you need to download SoX 14.3.0 for Windows and place at least the following files in the root AudioRegent folder:
-
cygwin1.dll
-
cyggomp-1.dll
-
sox.exe
The rest of the SoX files should ideally be placed somewhere within the AudioRegent folder so as to keep all of SoX's documentation and licensing information intact.
If you are using a different OS than Windows, you'll need to make sure SoX is executable from within the AudioRegent folder.
*As of May 2010, SoX is now up to version 14.3.1 and uses differently named .dll files for the Windows version. You should be able to use the newer version of SoX. The important thing to know is that all files necessary to run SoX must be placed in the AudioRegent folder.
Lastly, you need to download and install Python 2.5 or 2.6. Linux and Mac users probably already have Python installed by default.
*Python 3.* has replaced the print statement [print "Hello World"] with the print function [print ("Hello World")], meaning that you must use Python 2.* for AudioRegent or update all print statements in AudioRegent manually or via automatic conversion. Once Python 3.* becomes the only production version of Python, I will update the print statements myself in a future release of AudioRegent.
Running AudioRegent
To run AudioRegent enter "python AudioRegent.py" from the command line. If using a Windows installation of Python, you can double-click the AudioRegent.py file.
Currently AudioRegent only works on WAV files (foo.wav) that have a matching SimpleADL file (foo.adl.xml) in the same folder as the WAV file. A later release might allow for SimpleADL files to be in a different folder.
Changing Settings: AudioRegent.xml
Using a simple text editor, you can change some of the things AudioRegent does by changing the element values in the AudioRegent.xml file.
AudioRegent doesn't care about the attribute values, but you might want to leave them intact as a record of the default values.
Here are the default values:
<AudioRegentSetup>
<outputType default="ogg">ogg</outputType>
<SoxOptions default ="gain -n -3">gain -n -3</SoxOptions>
<comment default=""></comment>
<delete_outWavs default="true">true</delete_outWavs>
<timestampLogFiles default="false">false</timestampLogFiles>
</AudioRegentSetup>
-
For <outputType> choose from the following lowercase values: wav, aif, flac, or ogg. This is the format of the final audio files to be found in outOggs folder after running AudioRegent. Note: read the SoX format documentation (see "mp3") for information about rendering mp3 files; you cannot natively use SoX to make mp3 files. This is due to licensing concerns with MP3 technology. It may be simpler to output lossless WAV or AIF files via AudioRegent and use third party software to make MP3s from these lossless files.
-
For <SoxOptions> just use your preferred and *valid* SoX settings for derivative audio files. These effects will be present in the final audio files in the outOggs folder. AudioRegent will not fail if an invalid string is used, but SoX will and your audio files will not get properly made. Read the SoX documentation for more information about settings. Note that the default setting normalizes every track individually. If you are trying to maintain the original volume contour of the audio as a single entity, then obviously leveling individual tracks is not the thing to do.
-
For <comment> just enter your preferred comment 'tag' or leave blank/null. This will show up as embedded metadata in OGG, FLAC, and AIF files if one of these formats is chosen as the final audio file format for the outOggs folder. Read the SoX documentation for more information.
-
For <delete_outWavs> use "true" if you want to empty the *entire* outWavs folder automatically. The outWavs folder contains the WAV audio files that are made per the instructions in the SimpleADL file. That is to say these files represent the regions defined within a SimpleADL file and get made regardless of your final chosen output format. Use "false" if you want to leave the files in this folder intact. Be warned: using "false" means that *every* WAV file in the outWavs folder will have a derivative placed in outOggs. So if there are pre-existing WAV files in outWavs, you'll be making derivatives of them even if you didn't want to.
-
For <timestampLogFiles> use "false" if you don't want to timestamp the filenames for the log files. Doing so will overwrite the previous log files every time you run AudioRegent. Use "true" if you do want these filenames to be timestamped.
SimpleADL
SimpleADL stands for Simple Audio Decision List, even though in some ways it has more in common with a playlist than a decision list.
There's not yet an official schema for SimpleADL, but if you know audio, HTML/XHTML, and XML it's really easy to make SimpleADL files.
The basic tree structure of SimpleADL for a sample monaural file called example.wav is as follows:
<audioDecisionList filename="example.wav">
<region id="_01">
<in unit="seconds">0.5</in>
<duration unit="seconds">601</duration>
</region>
<region id="_02">
<in unit="seconds">612</in>
<duration unit="seconds">299.05</duration>
</region>
<outputAsTracks>true</outputAsTracks>
</audioDecisionList>
SimpleADL's <region> element provides an easily retained record of desired regions within an audio file.
The <in> element specfies where the region starts, while the <duration> element specifies the length of the region.
The <outputAsTracks> element instructs AudioRegent whether these regions should be extracted as multiple audio files per region or as one audio file which would essentially be a concatenation of each region. Specifically, this means that the <outputAsTracks> element value if "true" instructs AudioRegent to output one audio file per region, while an element value of "false" would tell AudioRegent to output only one audio file consisting of all regions spliced together.
Because SimpleADL has such a basic tree structure, it's easily extensible. For example, here's a SimpleADL file with added technical metadata about the WAV file, foo.wav. A <text> block is now also present.
<audioDecisionList filename="foo.wav">
<statistics>
<channel position="mono">
<minimumSamplePosition unit="seconds">145.854921</minimumSamplePosition>
<minimumSampleValue unit="dbfs">-0.440</minimumSampleValue>
<maximumSamplePosition unit="seconds">168.396961</maximumSamplePosition>
<maximumSampleValue unit="dbfs">-1.644</maximumSampleValue>
<RMS_level unit="dbfs">-26.969</RMS_level>
</channel>
<length unit="seconds">15</length>
</statistics>
<region id="_01">
<in unit="seconds">1</in>
<duration unit="seconds">9</duration>
<text type="xhtml">
<p type="comment">Hello World!</p>
</text>
</region>
<outputAsTracks>true</outputAsTracks>
</audioDecisionList>
I consider that last example above to be SimpleADL as I intended. So let's consider that to be an example of SimpleADL's structure henceforth.
Now for some simple rules for SimpleADL regarding elements. Required elements are highlighted in grey for easy reading. Interval notation is used to document acceptable numerical ranges when applicable.
|
Element |
Parent Element |
Has Children |
Description |
Required |
NumberOfOccurences |
Legal Values |
Additional Notes |
Attribute |
Description |
Required |
NumberOfOccurences |
Legal Values |
Additional Notes |
|
audioDecisionList |
n/a |
y |
root element |
y |
1 |
n/a |
|
filename |
name of corresponding WAV file |
y |
1 |
name of corresponding WAV file (include .wav extension) |
internally documents the file to be affected |
|
statistics |
audioDecisionList |
y |
audio statistics |
n |
1 |
n/a |
A place to embed technical metadata. |
|
|
|
|
|
|
|
length |
statistics |
n |
length of audio signal |
n |
1 |
rational number; non-decimal range: [0, total length of audio in seconds], decimal range: [0, 999999] |
|
unit |
unit of measure |
y |
1 |
"seconds"; you could also use the time format of your choice but you'll need to change the attribute value to reflect that |
provides temporal context for element's value |
|
channel |
statistics |
y |
audio channel |
n |
[1,2] |
n/a |
supports only "mono" if 1 <channel> element exists OR "left" AND "right" if 2 <channel> elements exist |
position |
location of audio channel |
y |
1 |
"mono"; "left", "right" |
establishes spatial context of child elements |
|
minimumSamplePosition |
channel |
n |
location of first occurrence of lowest amplitude value |
n |
1 |
rational number; non-decimal range: [0, total length of audio in seconds], decimal range: [0, 999999] |
|
unit |
unit of measure |
y |
1 |
"seconds"; you could also use the time format of your choice but you'll need to change the attribute value to reflect that |
provides temporal context for element's value |
|
minimumSampleValue |
channel |
n |
measurement of first occurrence of lowest amplitude value |
n |
1 |
rational number; non-decimal range: [-inf, 0], decimal range: [0, 999] |
|
unit |
unit of measure |
y |
1 |
"dbfs" |
provides amplitude context for element's value |
|
maximumSamplePosition |
channel |
n |
location of first occurrence of peak amplitude value, "Peak of the Peaks" |
n |
1 |
rational number; non-decimal range: [0, total length of audio in seconds], decimal range: [0, 999999] |
|
unit |
unit of measure |
y |
1 |
"seconds"; you could also use the time format of your choice but you'll need to change the attribute value to reflect that |
provides temporal context for element's value |
|
maximumSampleValue |
channel |
n |
measurement of first occurrence of peak amplitude value, "Peak of the Peaks" |
n |
1 |
rational number; non-decimal range: [-inf, 0], decimal range: [0, 999] |
|
unit |
unit of measure |
y |
1 |
"dbfs" |
provides amplitude context for element's value |
|
RMS_level |
channel |
n |
average power/amplitude of audio signal |
n |
1 |
rational number; non-decimal range: [-inf, 0], decimal range: [0, 999] |
|
unit |
unit of measure |
y |
1 |
"dbfs" |
provides amplitude context for element's value |
|
region |
audioDecisionList |
y |
defined segment of audio within WAV file |
y |
[1, +inf.) |
n/a |
|
id |
identifier for a defined section within the WAV file |
y |
1 |
null value or alpha-numeric (no whitespaces or illegal filename characters); each ID must be unique |
value will be appended to the derivative audio file if <outputAsTracks> element = "true" |
|
in |
region |
n |
the start position or "in" point of a region |
y |
1 |
rational number; non-decimal range: [0, total length of audio in seconds], decimal range: [0, 999] |
|
unit |
unit of measure |
y |
1 |
"seconds"; see FAQ item #6 for information regarding using different time formats with SoX |
provides temporal context for element's value |
|
duration |
region |
n |
the length of a region |
y |
1 |
rational number; non-decimal range: [0, total length of audio in seconds], decimal range: [0, 999] |
|
unit |
unit of measure |
y |
1 |
"seconds"; see FAQ item #6 for information regarding using different time formats with SoX |
provides temporal context for element's value |
|
text |
region |
y |
textual content (administrative notes, descriptions, transcripts, etc.) |
n |
1 |
n/a |
children must be coded in XHTML |
type |
declares textual markup language |
y |
1 |
"xhtml" |
markup needs to conform to specification for possible extraction via XSLT (in the case of transcripts, etc.) |
|
outputAsTracks |
audioDecisionList |
n |
signifies whether regions are separate "tracks" or intellectual components of a single audio stream |
y |
1 |
"true", "false" |
"true" = create [1, +inf.) audio derivatives per region; "false" = create 1 audio derivative (i.e. concatenate all regions) |
|
|
|
|
|
|
Note that the statistical elements used were based on the available statistics in Sony's Sound Forge 9.0. AudioRegent doesn't use the <statistics> element, so you could use the statistical measurements of your choice based on personal preference and what your software is capable of analyzing. You could also disregard this element, simply using a self-closing <statistics /> element. You could also omit the <statistics> element altogether.
It's safest to use UTF-8 encoding for SimpleADL files. Not doing so could cause AudioRegent to crash if your SimpleADL file contains certain diacritics.
How it Works
The diagram below shows how AudioRegent would work if the SimpleADL element <outputAsTracks> was set to “true” (Left Side of image) or if it was set to “false” (Right Side of image).
By default, all WAV files created by AudioRegent are deleted automatically. If you want to retain the WAV files in the outWavs folder, see Changing Settings: AudioRegent.xml.

Examples
Here are a few SimpleADL examples. Hopefully, these will give you an idea of what can be done with AudioRegent when it uses these SimpleADL files.
Example 1:
In the “foo.wav” example above, AudioRegent would produce a 9 second OGG file called foo_01.ogg that is equal to the 1 second to the 10 second mark on the original file, foo.wav. This is provided "ogg" was left as the <outputType> in AudioRegent.xml.
Example 2:
Assuming we use the default settings in AudioRegent.xml, consider the following SimpleADL file for a 45 second WAV file called AllForAPailOfWater.wav:
<audioDecisionList filename="AllForAPailOfWater.wav">
<statistics>
<channel position="mono">
<minimumSamplePosition unit="seconds">145.854921</minimumSamplePosition>
<minimumSampleValue unit="dbfs">-0.440</minimumSampleValue>
<maximumSamplePosition unit="seconds">168.396961</maximumSamplePosition>
<maximumSampleValue unit="dbfs">-1.644</maximumSampleValue>
<RMS_level unit="dbfs">-26.969</RMS_level>
</channel>
<length unit="seconds">45</length>
</statistics>
<region id="_part1">
<in unit="seconds">0</in>
<duration unit="seconds">20</duration>
<text type="xhtml">
<p type="soundEffect">Sound of phone ringing.</p>
<p type="transcript">Jack: Hello?</p>
<p type="transcript">Jill: Hi, Jack. It's me Jill.</p>
<p type="comment">Jack pauses for nearly 10 seconds.</p>
</text>
</region>
<region id="_part2">
<in unit="seconds">30</in>
<duration unit="seconds">15</duration>
<text type="xhtml">
<p type="transcript">Jill: Jack, are you there?</p>
<p type="transcript">Jack: What do you want?</p>
<p type="transcript">Jill: I just want to know how your crown is? Are you OK?</p>
<p type="transcript">Jack: Jill, you can't come tumbling after me anymore. I mean it. Goodbye.</p>
<p type="soundEffect">Sound of phone hanging up.</p>
</text>
</region>
<outputAsTracks>true</outputAsTracks>
</audioDecisionList>
AudioRegent would use SoX to produce two WAV files (AllForAPailOfWater_part1.wav and AllForAPailOfWater_part2.wav) in the outWavs folder; it would then convert those files to create two OGG files in the outOggs folder: AllForAPailOfWater_part1.ogg and AllForAPailOfWater_part2.ogg. These OGG files would include the effects in AudioRegent.xml's <SoxOptions> element. That it to say it “runs” the WAV files through your specified SoX effects to create the final audio files in the outOggs folder. As you can see the region “id” attributes got appended to the root filename.
Listening to both files back to back would let you listen to the conversation while being able to avoid having to hear Jack pause for 10 seconds before he can say anything.
Note that by changing the value of <outputAsTracks> to "false", only one file would get made. AudioRegent would actually tell SoX to make two separate WAV files (AllForAPailOfWater_part1.wav and AllForAPailOfWater_part2.wav) and will put them in the tempWavs folder. Then, AudioRegent would splice the two WAV files into one WAV (AllForAPailOfWater.wav) file in the outWavs folder; it would then convert that files to create one OGG file in the outOggs folder: AllForAPailOfWater.ogg, which would be a 35 second sound file – the conversation minus the pause. This OGG file would include the effects in AudioRegent.xml's <SoxOptions> element. As you can see, in this case the region “id” attribute doesn't get used although you MUST assign an “id” attribute so that the temporary WAV files can get made. See the FAQ item #3 for more information.
By the way, I realize that Jack's pause is part of the "story" of this conversation, and from a certain perspective it should be left in, but this is just an example.
Now you have the sound files, but what else can be done? Well, you also have a transcription of the conversation so by using XSL/XSLT (or even copy/paste!) you could extract the text in the <p> tags where the attribute is "transcript", wrap the OGG files inside the new HTML 5 <audio> tag and generate an HTML 5 page like so:
<!DOCTYPE HTML>
<html>
<body>
<div>
<p>Part 1:</p>
<audio src="AllForAPailOfWater_part1.ogg" controls="controls"></audio>
<p type="transcript">Jack: Hello?</p>
<p type="transcript">Jill: Hi, Jack. It's me Jill.</p>
</div>
<hr />
<div>
<p>Part 2:</p>
<audio src="AllForAPailOfWater_part2.ogg" controls="controls"></audio>
<p type="transcript">Jill: Jack, are you there?</p>
<p type="transcript">Jack: What do you want?</p>
<p type="transcript">Jill: I just want to know how your crown is? Are you OK?</p>
<p type="transcript">Jack: Jill, you can't come tumbling after me anymore. I mean it. Goodbye.</p>
</div>
</body>
</html>
This is what the page looks like in Firefox 3.0:

Of course, this approach limits the user to hearing only one segment at a time so this is definitely a candidate for using a XPSF playlist in conjunction with something like the JW Player in order to allow users to not only start playback at any region but to also continuously listen to the whole interview.
Example 3:
Under the root AudioRegent folder is a folder called examples. Rather than write about it, you can just take a look at what's there:
-
foo.wav
-
foo.adl.xml
-
subfolder/foo1.wav
-
subfolder2/foo2.wav
-
subfolder2/foo2.adl.xml
Launch AudioRegent and type in "examples" at the prompt.
This will run AudioRegent over the entire examples folder and will process foo.wav and foo2.wav. It will skip over foo1.wav because there's no SimpleADL file in the same folder to go along with it.
You can also point AudioRegent to "examples/subfolder2" if you want to omit processing foo.wav.
After AudioRegent is done, take a look in the outOggs folder for your new OGG files.
Technical Metadata Issues
For some reason, WAV files created by SoX are showing up with very little technical metadata using JHOVE. This might be a concern to people involved in preservation/archival activities. If that is the case, I recommend you use "wav" as your default output settings (by changing the <outputType> value in AudioRegent.xml). Then, use a third party audio editor such as Audacity to open the WAV files and re-save them, that is to say "normalize" the WAV files with another application. This can be done in batch with some applications for large-scale processing.
FAQ
-
What if the only region I want to define is the *entire* WAV file?
-
In that case, use the region element once for the entire duration of the WAV file. Just set <outputAsTracks> to "false".
-
-
Do regions have to be sequential? Can they overlap?
-
No, regions don't have to be sequential. In other words, your first region could occur at a later point in the original WAV file than the second region you define. Also, regions can overlap if you need them to. In other words, two regions can share a portion of audio between them.
-
-
When I set <outputAsTracks> to "false", does AudioRegent combine the regions in the same sequence as in the SimpleADL file?
-
No, it should combine them in "alphabetical" order. So in other words, make sure your region ID values are alphabetically arranged. In other words, try ID values like "_01", "_02", "_10", etc. instead of "_1", "_2", "_10" as the latter would – on a Windows system at least – be arranged in the following order: "_01", "_10", "_2" – i.e. out of sequence.
-
-
What if I need to define a region within a recording for purposes of documentation, but I don't want any tracks to be generated by it?
-
Simply comment out the region that you want AudioRegent to ignore.
-
-
If AudioRegent doesn't use the <statistics> element, why bother to collect it in SimpleADL?
-
It seems to me that if you ever suspect that your file is corrupt, statistics could help you determine if the file is corrupt or not. Also, if you ever need to re-digitize the source material, having information on the peak value and its location could help you make your newly digitized file nearly identical to the original one, thereby keeping your region values valid.
-
-
I know I can use the time format of my choice for <statistics> since AudioRegent ignores that element, but can I use a different a time format for the <in> and <duration> values?
-
According to About.com's SoX page, you should be able to use hh:mm:ss.frac format as well as using samples instead of seconds. You should change the "unit" attribute values if you do so. I simply prefer seconds since it isn't mixing different measurements as hh:mm:ss does. I strongly prefer it over samples since that would cause problems if you changed the sample rate of your master file.
-
Updates to Last Version
Version 1.1 changes to version 1.0:
-
No longer checking for the existence of SoX's executable and .dll files for the Windows version. This is because the newer version of SoX, 14.3.1, uses differently named .dll files.
-
Made SimpleADLmaker.py inform the user that it is tailored to Sony Sound Forge 9.0 and that they can delete or leave blank any statistical elements for which their audio software cannot retrieve values.
-
Fixed bug in version 1.0 in which the "reportLog_wav.txt" file was only reporting on the last WAV file that was processed.
-
Fixed bug that was sending final derivatives to the "OutOggs" folder, which doesn't exist; it should be "outOggs". This wasn't a problem with Windows, but made the script inoperable on Linux.
-
Updated documentation.
Ideas for future releases
-
Obviously, at some point a schema for SimpleADL is needed for easy validation, etc. But I think I need to sit on this for a while before making final decisions.
-
It might be nice to also have a script that can batch-produce HTML files so that something like the HTML 5 example form above can be automated. Also, pages that sync transcript text to the audio could be generated.
-
GUI version? I'm not sure about this. It takes like 5 seconds to tell AudioRegent what folder to look at so I'm not too preoccupied with a graphical interface. I'd rather concentrate on a version of AudioRegent that would support a Hyper-SimpleADL structure that would look like this:
<audioDecisionList>
<hyperRegion id="">
<hypoRegion>
<in />
<duration />
</hypoRegion>
<hypoRegion>
<in />
<duration />
</hypoRegion>
</hyperRegion>
<hyperRegion id="">
<hypoRegion>
<in />
<duration />
</hypoRegion>
</hyperRegion>
</audioDecisionList>
This would allow one to make "tracks" that are themselves concatenations of regions within the WAV file. This would allow you to skip silences, etc. within a track itself. I supposed this would also eliminate the need for the <outputAsTracks> element. I'm also thinking about an even more advanced ADL format that would allow you to mix audio from different WAV files and specify particular channels to use from the audio files as well as any changes to speed, pitch, etc. But at that point I'm not sure someone wouldn't just seek out an audio editor with good XML export like perhaps Ardour? Still it's a thought …
-
I'd also like to make an image equivalent to AudioRegent for images using ImageMagick and an XML format one could call SimpleIDL, or Simple Image Decision List. A video equivalent (SimpleVDL) with FFmpeg is also an idea. Both shouldn't stray too far, in terms of coding, from AudioRegent.