Tuesday, November 03, 2015

Evolution of the waveform suite

The History of the WAVEFORM SUITE for MATLAB

While in graduate school, I wrote this MATLAB toolbox called "The Waveform Suite" (2011 paper by Reyes & West) to solve a problem that I was having.  I was working with seismic data retrieved from a proprietary database, on a system that barely functioned, with some codes that were hand-me-downs.  I needed to take the data, run a bunch of metrics on it, and then store the results.  Furthermore, I wanted to be able to work on my laptop or my home computer and not be limited to the archaic SUN machine that was my university workstation.  

First iteration - The messy desktop

It became quickly apparent that I spent an inordinate amount of time trying to wrestle with the data and metadata, which lay in a pile of variables on my MATLAB workspace.  Something like 
Sta  Cha  Sta2 starttime1 freq1 freq2 starttime2
Cha2  D  DD  DDfilt1 npoles1 npoles2 nyq 
lat1 lat2  lon1  lon2 filttype1 filttype2
wSize nfft minfiltfreq maxfiltfreq ans n m idx cntr

You get the idea.  You had to rememver that lat1 & lat2 belonged to Sta2, which had data DD, but that you filtered into a variable called DDfilt1.  Yes, some of the variable naming conventions were my own fault.  But even when the variables were created with easy-to-follow names, you still had to chase them everywhere.   The result was heavily dependent upon direct input from the user, one command at a time.  These operated with a few scripts that would:
  • get data
  • filter data
  • display spectrogram
to change which data you retrieved, or which filter was applied, or the display aspects of the spectrogram, you needed to edit a parameter file.  Otherwise, everything occurred in the main workspace. 

Each script worked with the variables in the main workspace, and were hardwired to specific variable names, either in the script itself or the accompanying parameter file.

Second iteration - Functions and Structs

The variables were tamed by grouping related values into structs, and the pile of global variables was tamed by turning the scripts into functions.  These two simple steps provided the most "bang for the buck". And I recommend them for every MATLAB program.

From individual variables to structs

For example, the "filter" information could be kept together using a struct with fields.  So, this kind of behavior replaced

npoles1 npoles2 filttype1 filttype2 minfiltfreq maxfiltfreq

filt1.type         filt2.type
     .freqrange         .freqrange
     .poles             .poles

Likewise, I could package the station and channel information together with the samplefrequency, starttime, and data. 

From scripts to functions

And the accompanying filter function could accept data and one of these filter variables. 
function D = filtfilt(myfilt, data, sampfreq);
  % code that filters using myfilt.type, myfilt.freqrange, and myfilt.poles

Now, instead of having to hard-code each operation and variable,  I could pass any variable into the function and get well packaged behavior.  The additional benefit was that my workspace no longer contained all the temporary variables, such as n, m, idx and cntr which were only needed in a loop, or during some intermediate step.

Third Iteration - Classes

At this point, I had already significantly cleaned up my workflow and was fairly pleased with the results.  However, a few annoyances remained.
  1. Difficult to debug run-time errors.  Sometimes unexpected, bad or unassigned values could cause crashes or (worse) incorrect answers.  These bad values, however, might have been assigned minutes ago or even weeks ago.
  2. Working with multiple waveforms required the user to create loops. This added to the number of lines of a code and reduced code readability
  3. User had to specifically calculate ancillary data, such as the vector of sample times, the nyquist frequency, or period.
  4. User would have to always specify the field that they're working on.  like w.data
The solution to #1 was to check values as they were assigned to the struct.  How? By using classes. Classes are super for hiding the ugly internals of the calculation and can intercept assignments, allowing you to do error-checking at the moment a value is assigned.  Better yet, when you create these checks, you can also either compensate for the bad data, or provide a useful (actionable) error message. 

Using classes could also fix #2, because the logic for dealing with multiple waveforms could be hidden, allowing the user to merely specify an operation on an array of waveforms. 

Classes provide the way to associate class-specific functions (AKA methods) with the data in the class.  Therefore, you can create functions to automatically calculate ancillary data, (#3).  An additional benefit is that standard operators (like +, -, /, etc) can be rewritten to specifically handle the details of the class.  For example

At this point, we've taken something like:

for n=1:numel(w)
  w(n).data = w(n).data + 1 % offset all data by 1
  w(n).freq = 20            % set the sample frequency
[w.freq] = deal(20);        % alternate method of setting freq

and allowed it to be rewritten as:

w = w + 1;               % offset all data by 1
freq = set(w,'freq',20); % set sample frequency to 20 for all

The first line  is much easier to read and deals with n-dimensional waveforms.  The second line became a little more convoluted, but provides data validation, and ensures that we can't accidentally assign [20 20] or 'oops' to the frequency values.

Fourth iteration - From old-style classes to the new 

Moving forward, we can take advantage of the way that MATLAB now handles classes to make the previous example even more readable:

w = w + 1;
[w.samplerate] = deal(20);

Although the samplerate usage is the same as it was for the struct, it is now validated!

How classes used to be

Classes in MATLAB were defined as a collection of files, all stored in the same folder named with @ + classname.  example: @waveform, @scnobject, etc.

Values within the Waveform Suite classes were accessed using "get" and "set" functions, much the same way that one uses them with graphics objects:
cv = calcval(get(w,'nyquist'), otherparms); 

How classes are now

cv = calcval([w.nyquist], otherparams);

While the old usage is still valid, the newer way to create classes is to have all the functions in the same file and there is much greater control over the variables that each contains.  They can be accessed as though they were structs, but can have value checking and /or be calculated on the fly.  The changes are pretty sweet and I look forward to continuing to work on them.

You might have noticed that there are square brackets [ ] around w.nyquist.  When I ask for a class property (variable), MATLAB returns each one separately.  The brackets group them together and prevent the "too many input arguments" error.

The Future of the WAVEFORM SUITE for MATLAB

The waveform Suite had been hosted on Google Code, which has closed.  It has now been migrated to GITHUB and is the core component of the GISMO suite, hosted under  "Geoscience Community Codes".  

  I am in the process of rewriting all the core classes in the newer style, and have still got some testing and standardization to do before I release it into the wild.  However, I am excited about the upcoming changes.

1 comment:

Noreply said...

Fantastic reading for someone who is going through the same process. Thank you for sharing! Really encouraging