Voice Biometrics — Biometric Workshop

Waveform (time domain)

Spectrogram — Mel scale (scrolling)

Mel Filterbank Energies (26 filters)

Loading audio context…

Live Audio Stats

RMS Energy —

Pitch (F0) —

Spectral centroid —

Zero-crossing rate —

MFCCs (13 coefficients)

Each bar is one cepstral coefficient. C0 (energy) is omitted. The pattern across C1–C13 is a compact “voiceprint” of the current sound frame.

Speaker Verification

Record 3 times — your voice pattern is the biometric.

Enrolled profiles

No profiles enrolled yet

How voice recognition works

🎤

1. Capture

Microphone samples audio at 44.1 kHz — speak for ~10 seconds

→

📊

2. Frame + FFT

Short windows (25 ms) transformed to frequency domain

→

🔟

3. Mel filterbank

26 triangular filters on a perceptual frequency scale

→

🧮

4. MFCC

DCT compresses filterbank energies to 13 cepstral coefficients

→

🔍

5. Match

Mean MFCC vector compared via cosine similarity to enrolled profile