1

Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation

Math and science reasoning benchmarks rely on pass@k, the fraction of sampled chains that reach gold, as the canonical per-example …

Luca Zhou, Sajel Shah, Emanuele Rodolà, Roberto Dessi

Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation

PHALAR: Phasors for Learned Musical Audio Representations

We train a contrastive learning music model for stem retrieval, it achieves state-of-the-art due to its phase-aware architecture.

Davide Marincione, Michele Mancusi, Giorgio Strano, Luca Cerovaz, Donato Crisostomi, Roberto Ribuoli, Emanuele Rodolà

PHALAR: Phasors for Learned Musical Audio Representations

Multi-Way Representation Alignment

The Platonic Representation Hypothesis suggests that independently trained neural networks converge to increasingly similar latent …

Akshit Achara, Tatiana Gaintseva, Mateo Mahaut, Pritish Chakraborty, Viktor Stenby Johansson, Melih Barsbey, Emanuele Rodolà, Donato Crisostomi

Multi-Way Representation Alignment

Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

Model merging combines knowledge from separately fine-tuned models, yet success factors remain poorly understood. While recent work …

Luca Zhou, Bo Zhao, Rose Yu, Emanuele Rodolà

Navigating the Latent Space Dynamics of Neural Models

Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower …

Marco Fumero, Luca Moschella, Emanuele Rodolà, Francesco Locatello

Language Models are Injective and Hence Invertible

Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs …

Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele Rodolà

Language Models are Injective and Hence Invertible

Implicit Inversion turns CLIP into a Decoder

CLIP is a discriminative model trained to align images and text in a shared embedding space. Due to its multimodal structure, it serves …

Antonio D'Orazio, Maria Rosaria Briglia, Donato Crisostomi, Dario Loi, Emanuele Rodolà, Iacopo Masi

Implicit Inversion turns CLIP into a Decoder

MASS: MoErging through Adaptive Subspace Selection

Model merging has recently emerged as a lightweight alternative to ensembling, combining multiple fine-tuned models into a single set …

Donato Crisostomi, Alessandro Zirilli, Antonio Andrea Gargiulo, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, Iacopo Masi, Emanuele Rodolà

MASS: MoErging through Adaptive Subspace Selection

Video Unlearning via Low-Rank Refusal Vector

Video generative models democratize the creation of visual content through intuitive instruction following, but they also inherit the …

Simone Facchiano, Stefano Saravalle, Matteo Migliarini, Edoardo De Matteis, Alessio Sampieri, Andrea Pilzer, Emanuele Rodolà, Indro Spinelli, Luca Franco, Fabio Galasso

EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding

Audio codecs power discrete music generative modelling, music streaming and immersive media by shrinking PCM audio to …

Luca Cerovaz, Michele Mancusi, Emanuele Rodolà

EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding