John Martinsson

DMEL: The Differentiable Log-Mel Spectrogram as a Trainable Layer in Neural Networks

Authors: John Martinsson, Maria Sandsten

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2024

Location: Seoul, Korea, Republic of

Keywords: differentiable spectrogram, log-Mel spectrogram, audio classification, neural networks, Gaussian window, hyperparameter optimization

Abstract

In this paper we present the differentiable log-Mel spectrogram (DMEL) for audio classification. DMEL uses a Gaussian window, with a window length that can be jointly optimized with the neural network. DMEL is used as the input layer in different neural networks and evaluated on standard audio datasets. We show that DMEL achieves a higher average test accuracy for sub-optimal initial choices of the window length when compared to a baseline with a fixed window length. In addition, we analyse the computational cost of DMEL and compare to a standard hyperparameter search over different window lengths, showing favorable results for DMEL. Finally, an empirical evaluation on a carefully designed dataset is performed to investigate if the differentiable spectrogram actually learns the optimal window length. The design of the dataset relies on the theory of spectrogram resolution. We also empirically evaluate the convergence rate to the optimal window length.

BibTeX

@INPROCEEDINGS{martinsson2024dmel,
  author={Martinsson, John and Sandsten, Maria},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={{DMEL: The Differentiable Log-Mel Spectrogram as a Trainable Layer in Neural Networks}}, 
  year={2024},
  volume={},
  number={},
  pages={},
  keywords={Differentiable spectrogram;Log-Mel spectrogram;Audio classification;Neural networks;Gaussian window;Hyperparameter optimization},
  doi={10.1109/ICASSP48485.2024.10446816}
}

License/copyright for PDF: “© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”