AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse

Abstract

Software reuse has long been recognized as a critical and widely studied topic in software engineering, offering substantial benefits in reducing development costs, improving software quality, and enhancing operational efficiency. This paradigm extends into deep learning through model reuse. Recently, model merging has emerged in the domain of large language models (LLMs) as a training-free approach that takes multiple task-specific models with the same architecture as source models, and merges them without retraining, enhancing model reuse within LLMs. However, no prior work has systematically investigated whether such an approach can be effectively applied to other deep learning models with different architectures across domains.

To bridge this gap, we present the first systematic study that evaluates five model merging techniques on three distinct model architectures across three domains, i.e. LLMs, image classification, and autonomous driving. Our findings reveal that directly applying existing model merging techniques leads to highly inconsistent results and falls notably short of their success within LLMs. Moreover, a single model merging technique often fails to handle the heterogeneous structural properties within a model, limiting its applicability on different model architectures across domains. Furthermore, the effectiveness of model merging techniques is highly sensitive to hyperparameter configurations, thereby constraining their potential for broader adoption.

Inspired by these insights, we propose AutoMerge, a novel search-based model merging framework that first segments complex models into multiple heterogeneous blocks and then systematically explores the merging space to identify the the merging technique and its hyperparameter configuration for each block by Bayesian optimization, producing an out-performing multi-task merged model from multiple source models trained on different tasks. Our evaluation demonstrates that AutoMerge preserves at least 87.98 % of the source models' capabilities after merging, achieving a 31.76 % improvement over prior merging techniques. Besides, our segmentation strategy in AutoMerge improves merging effectiveness by 22.21 %, compared to whole-model hyperparameter search. Further, AutoMerge achieves a 35.66 % efficiency gain relative to retraining-based model reuse.

Prototype and Documents

MY ALT TEXT

Given Empirical's Insight-1, our goal is to build a model merging framework for effective model reuse of multiple task-specific models with the same architecture across different domains. Building on Insight-2, we propose to apply different merging techniques and hyperparameters to heterogeneous structural properties within complex models. For multiple models that share the same architecture but are trained on distinct tasks, we first partition the models into multiple functional blocks based on their intrinsic structural properties through a model segmentor. Inspired by Insight-3, we adopt a search-based merger, which integrates an extendable set of merging techniques and validation datasets of different tasks, to determine the optimal merging configuration for these blocks pairs. After merging, we obtain a new merged model that maximally preserves the capabilities of both source models and minimizes the preservation discrepancy across tasks. For the ease of presentation, hereafter we illustrate AutoMerge using the case of merging two task-specific models. The same principles, however, generalize naturally to the merging of more than two models.

We have implemented a prototype of Argus with 3,256 lines of Python code. The prototype of Argus and documents are available on GitHub.

Ablation Study

Figures presents the ablation results of efficiency on the InterFuser architecture in terms of peak GPU memory usage and the GPU utilization. AutoMerge has a higher time consumption than the w/o segmentor variant, taking approximately 2.14 more hours, on average, to complete the merging process. However, the GPU memory usage, and the GPU utilization fluctuations are similar between AutoMerge and the w/o segmentor variant, suggesting that the computational load on the GPU remains largely unchanged.