Calculating syllable and song distances
It is possible to delineate higher-order
units in sounds, called “syllables” and “phrases”. A
syllable consists of a series of elements, and a phrase is
a repeated sequence of the same syllable. Moreover,
normally, each sound in the database corresponds to an
individual signal (e.g. a bird’s song). It is often of more
interest to a researcher to calculate the distance between
phrases and songs than distances between individual
elements. Luscinia therefore incorporates algorithms that
calculate phrase and song distances based on the distances
that are calculated between elements.
Phrase distances are calculated in two
ways. The first way is to compare the element distances in
each syllable in the two signals (if the syllables in a
given phrase differ in the number of elements they possess,
only “complete” syllables are considered here). An average
of the distances between each of the elements is
calculated, normalized by the log of the length of each of
the elements. EQUATION
The best-matching pair of syllables between the two phrases
is selected to represent the overall distance between the
two phrases.
Often, there may be differences in the way similar
syllables are segmented into different elements in
different phrases. Sometimes these differences may
represent important differences between the syllables, but
they may also result from difficulties in assessing field
recordings made in less than perfect conditions. Because
Luscinia builds up syllable comparisons from underlying
element comparisons, such segmentation differences tend to
be weighted very heavily in overall syllable distances. To
rectify this somewhat, Luscinia calculated syllable
distances in a second way: first the elements in the
syllable are “stitched” together after the process of
compression, and second, Luscinia carries out a comparison
of these stitched together syllables, as if they were
simply long elements. The distances resulting from these
comparisons are multiplied by 2 (an arbitrary parameter),
and if this value is smaller than that calculated in the
alternative method of calculating phrase distances, it
replaces that value. This method is far from perfect from a
conceptual point of view, but seems, in practice, to
provide a sensible weighting of segmentation errors.
If no syllables have been marked in a song, each element is
considered a separate “phrase” for the purpose of
generating phrase distances
Song distances are calculated in two
alternative ways. The first way simply finds the best
matches between phrases in the two songs, and averages
these to calculate an overall score. More precisely, the
algorithm takes each phrase in the first song, searches for
the phrase with the lowest distance to it in the second
song, and averages these values. This algorithm has the
disadvantage that it doesn’t take into account the sequence
of phrases within the songs.
The second algorithm is very similar to the first, but
instead of using individual phrases, it uses consecutive
pairs of phrases (the distance between two pairs of phrases
is calculated as the average of the distance between the
first of each pair and the second of each pair).