Search for {created_by:mark} {type:page}

Model Quantization

Algorithm Model Compression

Quantization Granularity Quantization is a magic spell to reduce the memory footprint of a model. But often quantization leads to a drop in the accuracy of the model. This is where the Granularity of quantization comes into the picture. Selecting the right gr...

Neural Arch Search

Algorithm Model Compression

Knowledge Distillation

Algorithm Model Compression

Pruning

Algorithm Model Compression

reference

Algorithm Model Compression

量化技术背景从CNN量化说起在传统CNN网络中，为了加速网络的推理速度，一种非常有效的方法是INT8量化，即将权重与激活(feature map) 的浮点数值量化成8-bit整型表示，这样做的好处：一是将原来32-bit数值用更低位数替代，从而减少计算前后的数据访问量，这对于目前数据访问时间远大于计算操作的微架构来说，可以节省可观的时间；二是这种情况下卷积操作或者矩阵操作(GEMM )可以使用整型计算，大部分情况下专门针对此设计的硬件单元整型计算速度是要大于浮点计算的，而且拥有更多整型计算单元，因此I...

ONNX OP statistics

Algorithm OP frequency

alphabet order 3 Abs 4325 Add 20 And 2 ArgMax 101 AveragePool 3467 BatchNormalization 3152 Cast 8 CategoryMapper 27 Ceil 407 Clip 2 Compress 2706 Concat 7714 Constant 788 ConstantOfShap...

data type

Algorithm Data type

Advanced Search

Search Terms

Content Type

Exact Matches

Tag Searches

Date Options

Search Results

7 total results found

Model Quantization

Neural Arch Search

Knowledge Distillation

Pruning

reference

ONNX OP statistics

data type

Updated after

Updated before

Created after

Created before