Advanced Search
Search Results
7 total results found
Model Quantization
Quantization Granularity Quantization is a magic spell to reduce the memory footprint of a model. But often quantization leads to a drop in the accuracy of the model. This is where the Granularity of quantization comes into the picture. Selecting the right gr...
Neural Arch Search
Knowledge Distillation
Pruning
reference
量化技术背景 从CNN量化说起 在传统CNN网络中,为了加速网络的推理速度,一种非常有效的方法是INT8量化 ,即将权重与激活(feature map) 的浮点数值量化成8-bit整型表示,这样做的好处:一是将原来32-bit数值用更低位数替代,从而减少计算前后的数据访问量,这对于目前数据访问时间远大于计算操作的微架构来说,可以节省可观的时间;二是这种情况下卷积操作或者矩阵操作(GEMM )可以使用整型计算,大部分情况下专门针对此设计的硬件单元整型计算速度是要大于浮点计算的,而且拥有更多整型计算单元,因此I...
ONNX OP statistics
alphabet order 3 Abs 4325 Add 20 And 2 ArgMax 101 AveragePool 3467 BatchNormalization 3152 Cast 8 CategoryMapper 27 Ceil 407 Clip 2 Compress 2706 Concat 7714 Constant 788 ConstantOfShap...