John Martinsson

Efficient and precise annotation of local structures in data

Authors: John Martinsson

Published in: Licentiate Thesis, Lund University

Year: 2024

Location: Lund, Sweden

Keywords: sound event annotation, weak labeling, machine learning, bioacoustics

Abstract

Machine learning models are used to help scientists analyze large amounts of data across all fields of science. These models become better with more data and larger models mainly through supervised learning. Both supervised learning and model validation benefit from annotated datasets where the annotations are of high quality. A key challenge is to annotate the amount of data that is needed to train large machine learning models. This is because annotation is a costly process and the collected labels can vary in quality. Methods that enable cheap annotation of high quality are therefore needed. In this thesis we consider ways to reduce the annotation cost and improve the label quality when annotating local structures in data. An example of a local structure is a sound event in an audio recording, or a visual object in an image. By automatically detecting the boundaries of these structures we allow the annotator to focus on the task of assigning a textual description to the local structure within those boundaries. In this setting we analyze the limits of a commonly used annotation method and compare that to an oracle method, which acts as an upper bound on what can be achieved. Further, we propose new ways to perform this kind of annotation that results in higher label quality for the studied datasets at a reduced cost. Finally, we study ways to reduce annotation cost by making the most use of each annotation that is given through better modelling approaches in general.

BibTeX

@misc{Martinsson2024_licentiate_thesis,
  author       = {{Martinsson, John}},
  isbn         = {{978-91-8104-199-6}},
  issn         = {{1404-028X}},
  keywords     = {{Annotation efficiency; Sound event detection; Machine learning}},
  language     = {{eng}},
  month        = {{10}},
  note         = {{Licentiate Thesis}},
  number       = {{3}},
  publisher    = {{Centre for Mathematical Sciences, Lund University}},
  series       = {{Licentiate Theses in Mathematical Sciences}},
  title        = {{Efficient and precise annotation of local structures in data}},
  url          = {{https://lup.lub.lu.se/search/files/195517213/Lic_avhandling_John_Martinsson_LUCRIS.pdf}},
  volume       = {{2024}},
  year         = {{2024}},
}