PHALAR: Phasors for Learned Musical Audio Representations

Davide Marincione, Michele Mancusi, Giorgio Strano, Luca Cerovaz, Donato Crisostomi, Roberto Ribuoli, Emanuele Rodolà

May, 2026

Abstract

Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increase of up to 70% over the state-of-the-art while requiring <50% of the parameters and a 7× training speedup. By utilizing a Learned Spectral Pooling layer and a complex-valued head, PHALAR enforces pitch-equivariant and phase-equivariant biases. PHALAR establishes new retrieval state-of-the-art across MoisesDB, Slakh, and ChocoChorales, correlating significantly higher with human coherence judgment than semantic baselines. Finally, zero-shot beat tracking and linear chord probing confirm that PHALAR captures robust musical structures beyond the retrieval task.

Type

Conference paper

Publication

Proceedings of the 43rd International Conference on Machine Learning

PHALAR: Phasors for Learned Musical Audio Representations

Abstract

Davide Marincione

PhD Student

Michele Mancusi

PostDoctoral Researcher

Giorgio Strano

PhD Student

Luca Cerovaz

Research Intern

Donato Crisostomi

PhD Student

Emanuele Rodolà

Full Professor