Knowledge Distillation

Hydra is a multi-headed neural network for model distillation with a shared body network. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. Existing distillation methods often train a distillation network to imitate the prediction of a larger network. Hydra instead learns to distill the individual predictions of each ensemble member into separate light-weight head models while amortizing the computation through a shared heavy-weight body network. This retains the diversity of ensemble member predictions which is otherwise lost in knowledge distillation.

Source: Hydra: Preserving Ensemble Diversity for Model Distillation

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 2 18.18%
Earth Observation 1 9.09%
Virtual Try-on 1 9.09%
Management 1 9.09%
Benchmarking 1 9.09%
Recommendation Systems 1 9.09%
Experimental Design 1 9.09%
Decoder 1 9.09%
Domain Adaptation 1 9.09%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories