Measuring vibrato

"Vibrato" in this case refer to signals that rapidly oscillate rapidly in frequency. In human singing, this oscillation is slow enough for us to hear the frequency waver. In the case of singing by birds, the frequency oscillation is so fast that is perceived by us as a "buzz". In fact what Luscinia measures as vibrato is referred to as "buzziness" in some other publications. [I presume that the mechanism of human vibrato and "buzzy" bird song is similar, with the difference being due to differences in vocal tract dimensions. However, I have no evidence to back that up. Luscinia is concerned with measuring sounds themselves, and in referring to "vibrato" I am simply referring to rapid frequency oscillation of the signal.)

Examples of vibrato in a bird song are shown below:



Fig. 1: examples of vibrato in bird song: taken from chaffinches, swamp sparrows, white-crowned sparrows, and song sparrows. All were produced with the same spectrogram settings (Frame length = 5ms, Time step = 0.5ms).

Note that the examples differ in at least three respects: the amplitude of the oscillations, the frequency of the oscillations, and the form of the oscillations. In the latter case, many if not most vibrato's appear to be produced with an approximately sinusoidal wave. However, there are plenty of examples with sawtooth or more complex waves. At present, Luscinia does not have a method to quantify the was shape. However, it does record the oscillations frequency and amplitude.

The way it does this is currently quite simple: it carries out an FFT analysis of the frequency contour of the signal. If the peak and fundamental frequency are close together (<1.5 times the fundamental), then it uses the peak frequency, since this is less susceptible to averaging errors. If the fundamental and peak frequencies are quite different from one another, then the fundamental is used. The FFT uses a window of 16 spectrogram samples, giving a resolution of only 8 frequency bins (if the element is less than 16 samples long, the vibrato parameters are set to 0). The finished spectrum is calculated after the FFT by: 1) taking the natural log of the values; 2) adding 3 to the result of 1; 3) setting any scores less than 0 to 0. -3 was chosen as the cut-off amplitude on the basis of trial and error assessments of the algorithm.

An estimated vibrato frequency is calculated by taking the mean of the frequency intensities (in essentially the same way that mean frequency is calculated. The vibrato amplitude is calculated as the peak amplitude of the spectrum.