Calculating syllable and song distances

It is possible to delineate higher-order units in sounds, called “syllables” and “phrases”. A syllable consists of a series of elements, and a phrase is a repeated sequence of the same syllable. Moreover, normally, each sound in the database corresponds to an individual signal (e.g. a bird’s song). It is often of more interest to a researcher to calculate the distance between phrases and songs than distances between individual elements. Luscinia therefore incorporates algorithms that calculate phrase and song distances based on the distances that are calculated between elements.

Phrase distances are calculated in two ways. The first way is to compare the element distances in each syllable in the two signals (if the syllables in a given phrase differ in the number of elements they possess, only “complete” syllables are considered here). An average of the distances between each of the elements is calculated, normalized by the log of the length of each of the elements. EQUATION
The best-matching pair of syllables between the two phrases is selected to represent the overall distance between the two phrases.
Often, there may be differences in the way similar syllables are segmented into different elements in different phrases. Sometimes these differences may represent important differences between the syllables, but they may also result from difficulties in assessing field recordings made in less than perfect conditions. Because Luscinia builds up syllable comparisons from underlying element comparisons, such segmentation differences tend to be weighted very heavily in overall syllable distances. To rectify this somewhat, Luscinia calculated syllable distances in a second way: first the elements in the syllable are “stitched” together after the process of compression, and second, Luscinia carries out a comparison of these stitched together syllables, as if they were simply long elements. The distances resulting from these comparisons are multiplied by 2 (an arbitrary parameter), and if this value is smaller than that calculated in the alternative method of calculating phrase distances, it replaces that value. This method is far from perfect from a conceptual point of view, but seems, in practice, to provide a sensible weighting of segmentation errors.
If no syllables have been marked in a song, each element is considered a separate “phrase” for the purpose of generating phrase distances

Song distances are calculated in two alternative ways. The first way simply finds the best matches between phrases in the two songs, and averages these to calculate an overall score. More precisely, the algorithm takes each phrase in the first song, searches for the phrase with the lowest distance to it in the second song, and averages these values. This algorithm has the disadvantage that it doesn’t take into account the sequence of phrases within the songs.
The second algorithm is very similar to the first, but instead of using individual phrases, it uses consecutive pairs of phrases (the distance between two pairs of phrases is calculated as the average of the distance between the first of each pair and the second of each pair).