C2M3: Cycle-Consistent Multi-Model Merging

Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele Rodolà

September, 2024

Abstract

In this paper, we present a novel data-free method for merging neural networks in weight space. Differently from most existing works, our method optimizes for the permutations of network neurons globally across all layers. This allows us to enforce cycle consistency of the permutations when merging N >= 3 models, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, our approach yields the best results in the task.

Type

Conference paper

Publication

Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS 2024)

C2M3: Cycle-Consistent Multi-Model Merging

Abstract

Donato Crisostomi

PhD Student

Marco Fumero

PostDoctoral Researcher, ISTA

Daniele Baieri

Postdoctoral Researcher, University of Milano-Bicocca

Emanuele Rodolà

Full Professor