Main Article Content
Deep Learning Model Compression Techniques: Advances, Opportunities, and Perspective
Abstract
Recently, deep learning (DL) models have excelled in a wide range of fields. All of these successes are built on intricate DL models. The hundreds of millions or even billions of parameters and high-performance computing graphical processing units or tensor processing units are largely responsible for their achievement. DL model integration into real-time devices with tight latency limitations, limited memory, and power-constrained requirements is the key driving force behind investigation of DL model compression techniques. Also, there is an increase in data availability that encourages multimodal fusion in DL models to boost the models' predictive accuracy. In order to create compact DL models for deployment that is memory- and computationally efficient, the data included in the network parameters is compressed as much as possible, leaving only the bits necessary to carry out the task. A better trade-off between compression rate and accuracy loss should be established to take model acceleration and compression into consideration without severely reducing the model's performance. In this paper, we examine various DL model compression techniques used for both single-modality and multi-modal deep learning tasks. We explore over numerous DL model compression methods that have advanced in a number of applications. We then come up with the benefits and drawbacks of various compression and acceleration methods such as ineffectiveness in compressing more complicated networks with dimensionality-dependent complex structures, and, ultimately, the field's future prospects are given.