Apprentissage compositionnel des représentations audio - Questions du Jury

video

information

Type
Soutenance de thèse/HDR
performance location
Ircam, Salle Shannon (Paris)
date
December 1, 2025

Giovanni Bindi's thesis defence

Giovanni Bindi, doctoral student at Sorbonne University in the Computer Science, Telecommunications and Electronics (EDITE) graduate school in Paris, carried out her research entitled ‘Compositional learning of audio representations’ at the STMS laboratory (Ircam - Sorbonne University - CNRS - Ministry of Culture), as part of the Sound Analysis and Synthesis team, under the supervision of Philippe Esling.

The jury is composed of:

  • George Fazekas, Queen Mary University of London (Reviewer)
  • Magdalena Fuentes, New York University (Reviewer)
  • Ashley Burgoyne, Universiteit van Amsterdam (Examiner)
  • Mark Sandler, Queen Mary University of London (Examiner)
  • Geoffroy Peeters, Télécom Paris (Examiner)
  • Philippe Esling, Sorbonne University (Director)

Abstract:

This thesis explores the intersection of machine learning, generative models, and music composition. While machine learning has transformed many fields, its application to music presents unique challenges. We focus on compositional learning, which involves constructing complex musical structures from simpler, reusable components. Our goal is to provide an initial analysis of how this concept applies to musical audio.

Our framework consists of two phases: decomposition and recomposition. In the decomposition phase, we extract meaningful representations of instruments from polyphonic mixtures without requiring labeled data. This allows us to identify and separate different sound sources. In the recomposition phase, we introduce a generative approach that builds on these representations to create new musical arrangements. By structuring the process hierarchically—starting with drums and progressively adding other elements like bass and piano—we explore a flexible way to generate accompaniment.

Our findings suggest that compositional learning can improve source separation and structured music generation. While our approach shows promise, further work is needed to assess its broader applicability and generalization. We hope this research contributes to a better understanding of generative models in music and inspires future developments in computational creativity.

speakers


share


Do you notice a mistake?

IRCAM

1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43

opening times

Monday through Friday 9:30am-7pm
Closed Saturday and Sunday

subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles

Institut de Recherche et de Coordination Acoustique/Musique

Copyright © 2022 Ircam. All rights reserved.