PyTorch quantization observer

PyTorch quantization observer

basic class

name inherit describe
ObserverBase ABC, nn.Module Base observer Module
UniformQuantizationObserverBase ObserverBase

standard observer

name inherit describe
MinMaxObserver UniformQuantizationObserverBase computing the quantization parameters based on the running min and max values
MovingAverageMinMaxObserver MinMaxObserver computing the quantization parameters based on the moving average of the min and max values
PerChannelMinMaxObserver UniformQuantizationObserverBase computing the quantization parameters based on the running per channel min and max values
MovingAveragePerChannelMinMaxObserver PerChannelMinMaxObserver computing the quantization parameters based on the running per channel min and max values
HistogramObserver UniformQuantizationObserverBase records the running histogram of tensor values along with min/max values.
PlaceholderObserver ObserverBase doesn’t do anything and just passes its configuration to the quantized module’s .from_float().
RecordingObserver ObserverBase mainly for debug and records the tensor values during runtime.
NoopObserver ObserverBase doesn’t do anything and just passes its configuration to the quantized module’s .from_float().
FixedQParamsObserver ObserverBase
ReuseInputObserver ObserverBase

substandard observer

name inherit describe
default_observer MinMaxObserver quant_min=0,
quant_max=127
default_placeholder_observer PlaceholderObserver Default placeholder observer, usually used for quantization to torch.float16.
default_debug_observer RecordingObserver Default debug-only observer.
default_weight_observer MinMaxObserver dtype=torch.qint8,
qscheme=torch.per_tensor_symmetric
default_histogram_observer HistogramObserver quant_min=0,
quant_max=127
default_per_channel_weight_observer PerChannelMinMaxObserver dtype=torch.qint8,
qscheme=torch.per_channel_symmetric
default_dynamic_quant_observer PlaceholderObserver dtype=torch.float,
compute_dtype=torch.quint8
default_float_qparams_observer PerChannelMinMaxObserver dtype=torch.quint8,
qscheme=torch.per_channel_affine_float_qparams,
ch_axis=0
weight_observer_range_neg_127_to_127 MinMaxObserver dtype=torch.qint8,
qscheme=torch.per_tensor_symmetric,
quant_min=-127,
quant_max=127,
eps=2 ** -12
per_channel_weight_observer_range_neg_127_to_127 MinMaxObserver dtype=torch.qint8,
qscheme=torch.per_channel_symmetric,
quant_min=-127,
quant_max=127,
eps=2 ** -12
default_float_qparams_observer_4bit PerChannelMinMaxObserver dtype=torch.quint4x2, qscheme=torch.per_channel_affine_float_qparams,
ch_axis=0
default_fixed_qparams_range_neg1to1_observer FixedQParamsObserver scale=2.0 / 256.0,
zero_point=128,
dtype=torch.quint8,
quant_min=0,
quant_max=255
default_fixed_qparams_range_0to1_observer FixedQParamsObserver scale=1.0 / 256.0,
zero_point=0,
dtype=torch.quint8,
quant_min=0,
quant_max=255
default_symmetric_fixed_qparams_observer default_fixed_qparams_range_neg1to1_observer
default_affine_fixed_qparams_observer default_fixed_qparams_range_0to1_observer
default_reuse_input_observer ReuseInputObserver

猜你喜欢

转载自blog.csdn.net/m0_70885101/article/details/131955469