Albumentations method catalog
- Preface
- Basic call
- Pixel-level transforms
-
- AdvancedBlur (generalized normal filter using randomly chosen parameters)
- Blur
- CLAHE (Contrast Limited Adaptive Histogram Equalization)
- ChannelDropout (randomly drop one or more channels)
- ChannelShuffle (channel shuffle)
- ColorJitter (color jitter [brightness, contrast, saturation])
- Defocus
- Downscale
- Emboss (relief effect)
- Equalize (histogram equalization)
- FDA (Fourier-Domain-Adaptation, realizing simple style migration)
- FancyPCA (RGB image color enhancement)
- FromFloat (multiply the maximum value to become an integer, the opposite of ToFloat)
- GaussNoise (Gaussian noise)
- GaussianBlur (Gaussian blur)
- GlassBlur (glass blur)
- HistogramMatching (histogram matching, which will cause hue changes)
- HueSaturationValue (hue, saturation, brightness)
- ISONoise (sensor noise)
- ImageCompression (image compression)
- InvertImg(255-img)
- MedianBlur (median filter)
- MotionBlur (motion blur)
- MultiplicativeNoise (multiplicative noise)
- Normalize
- PixelDistributionAdaptation
- Posterize (tone layering)
- RGBShift (RGB upper value shift for each channel)
- RandomBrightnessContrast (brightness, contrast)
- RandomFog (fog effect)
- RandomGamma (gamma transformation)
- RandomRain (rain effect)
- RandomShadow
- RandomSnow
- RandomSunFlare (solar flare effect)
- RandomToneCurve
- RingingOvershoot
- Sharpen
- Solarize (greater than threshold inversion)
- Spatter (lens raindrops and mud splash effect)
- Superpixels
- TemplateTransform
- ToFloat (except maximum normalization, opposite of FromFloat)
- ToGray (convert to grayscale (three channels))
- ToRGB (grayscale to three-channel RGB)
- ToSepia (add sepia filter)
- UnsharpMask (sharpening)
- ZoomBlur (zoom blur)
- Spatial-level transforms
-
- Affine
- BBoxSafeRandomCrop (contains cropping of all bboxes)
- CenterCrop (crop center area)
- CoarseDropout (rectangular area cutout)
- Crop
- CropAndPad (crop or pad image edges)
- CropNonEmptyMaskIfExists (crop + zoom, you can ignore part of the mask area)
- ElasticTransform (elastic transformation)
- Flip
- GridDistortion
- GridDropout (grid cutout)
- HorizontalFlip (horizontal flip)
- Lambda
- LongestMaxSize (the long side is scaled to the specified size)
- MaskDropout (randomly erase target instances)
- NoOp (no operation)
- OpticalDistortion (optical distortion (barrel, pincushion))
- PadIfNeeded (border padding)
- Perspective (perspective transformation)
- PiecewiseAffine (local affine transformation, similar to ElasticTransform, but very slow)
- PixelDropout (randomly discards pixel values)
- RandomCrop (random crop)
- RandomCropFromBorders (image edge cropping will change the size)
- RandomCropNearBBox (specify cropping near rect)
- RandomGridShuffle (blocked shuffle)
- RandomResizedCrop (cropping + scaling, random aspect ratio of cropping area)
- RandomRotate90 (randomly rotate 90 degrees n times, that is, 0°, 90°, 180°, 270° random rotation)
Preface
This article aims to explain in detail the use of enhancement methods, combine the source code to understand the meaning of parameters and the range of valid values, and combine the visualization results to intuitively understand the functions of each enhancement method and how different parameter values affect the enhanced image.
Referring to the official website, all enhancement methods are divided into two major categories: Pixel-level transforms and Spatial-level transforms. The difference between the two lies in whether the enhancement method will cause changes in the additional attributes of the image (such as masks, bounding boxes, keypoints). Pixel-level does not, but Spatial-level does. Spatial-level transforms have an overview table to record the additional attribute changes caused by each enhancement method. The enhancement methods for each category are sorted alphabetically for easy retrieval.
Release Notes
The version when this article was initially edited was Albumentations version: 1.3.0 . v1.3 has major changes compared to the previous version (new transformation methods, restructured level directories, etc.). It is recommended to update to version 1.3.0 and above, otherwise there will be some changes. The call cannot be made or the path is wrong. Some transformation methods in this article are in version 1.3.0 or above. If some functions cannot be called, you can update them.
Updated albumations: pip install -U albumentations
The code in the text is by default import albumentations as A
. If it appears A.transformxx
, it is equivalent to albumentations.transformxx
If there are any errors, please point them out in the comment area.
Extended reading
Official code website: https://github.com/albumentations-team/albumentations
Official documentation: https://albumentations.readthedocs.io/
Partially enhanced visualization: Albumentations data enhancement method (the results of VerticalFlip and HorizontalFlip are reversed in the article)
Road scene image enhancement: https://github.com/UjjwalSaxena/Automold–Road-Augmentation-Library
Albumentations already contains some of these implementations: RandomRain, RandomFog, RandomSunFlare, RandomShadow, RandomSnow.
Basic call
Notes
- When calling, pay attention to the default parameter p, which is mostly p=0.5, and occasionally p=1.
This parameter represents the probability of applying the transformation and will not be listed in subsequent parameter descriptions.
View base initialization parameters :get_base_init_args()
View transform initialization parameters :get_base_init_args()
- Many parameters accept input as a single number or a range of two numbers. The two digital interval forms are generally randomly sampled within this range. If it is a single number, some are converted to the default interval (such as the ColorJitter parameter, which is explained in detail), and some use the value directly (such as the Spatter parameter). Pay attention to the distinction.
- The method of each transformation
apply
is the core.init
In the method, some preprocessing work will be performed on the input parameters, such as converting a single number into an interval parameter, checking whether the parameter is within the valid interval, etc. - The get_params() method cannot be called alone to trace the parameters corresponding to the result graph, because when the get_params() method is called alone, random sampling is done again.
If you want to fix the parameters, you can set the interval boundary values of the input parameters to the same value, so that the random sampling can only be itself. - The bounding box refers to the coordinates after normalization (x/width, y/height), float type, non-integer absolute value.
- There are many methods involving image boundary supplementation, parameter border_mode visualization:
OpenCV filtering copyMakeBorder and borderInterpolate
OpenCV image processing | 1.16 Convolution boundary processing
OpenCV-expanding the boundary of the image
Demo Code
Demo code for calling the enhanced method, taking the Sharpen method as an example:
```python
import cv2
import albumentations as A
if __name__ == "__main__":
filename = 'src'
src_img = cv2.imread(f'imgs/{filename}.jpg')
dst_path = f'imgs/{filename}_aug.jpg'
transform = A.Sharpen(alpha=(0.2, 0.5), lightness=(0.5, 1.0), p=0.5)
img_aug = transform(image=src_img)['image']
cv2.imwrite(dst_path, img_aug)
```
to_tuple( )
This function is often used when converting a single input parameter into an interval parameter.
Note that the low parameter represents the padding value for the other boundary.
举例:
self.blur_limit = to_tuple(1, 3) # self.blur_limit = (1, 3)
self.blur_limit = to_tuple(5, 3) # self.blur_limit = (3, 5)
# source code
def to_tuple(param, low=None, bias=None):
"""Convert input argument to min-max tuple
Args:
param (scalar, tuple or list of 2+ elements): Input value.
If value is scalar, return value would be (offset - value, offset + value).
If value is tuple, return value would be value + offset (broadcasted).
low: Second element of tuple can be passed as optional argument
bias: An offset factor added to each element
"""
if low is not None and bias is not None:
raise ValueError("Arguments low and bias are mutually exclusive")
if param is None:
return param
if isinstance(param, (int, float)):
if low is None:
param = -param, +param
else:
param = (low, param) if low < param else (param, low)
elif isinstance(param, Sequence):
if len(param) != 2:
raise ValueError("to_tuple expects 1 or 2 values")
param = tuple(param)
else:
raise ValueError("Argument param must be either scalar (int, float) or tuple")
if bias is not None:
return tuple(bias + x for x in param)
return tuple(param)
Get the initialized default base parameters
method: get_base_init_args()
contains two parameters " always_apply
" and " "p
# source code
def get_base_init_args(self) -> Dict[str, Any]:
return {
"always_apply": self.always_apply, "p": self.p}
# demo code
transform1 = A.Emboss()
print(transform1.get_base_init_args())
# output
# {'always_apply': False, 'p': 0.5}
transform1 = A.Emboss(p=1)
print(transform1.get_base_init_args())
# output
# {'always_apply': False, 'p': 1}
Get the initialized default transform parameters
method: transformation parameters other than get_transform_init_args()
basic parametersalways_apply、p
Note: Before calling this function, you need to implement get_transform_init_args_names()
the method to specify the transform parameters that need to be obtained, because BasicTransform
the class does not implement this method.。
# source code from class Emboss(ImageOnlyTransform)
def get_transform_init_args_names(self): # 若变换的该方法未实现,需先实现
return ("alpha", "strength")
def get_transform_init_args(self) -> Dict[str, Any]:
return {
k: getattr(self, k) for k in self.get_transform_init_args_names()}
# demo code
transform1 = A.Emboss()
print(transform1.get_transform_init_args())
# output
# {'alpha': (0.2, 0.5), 'strength': (0.2, 0.7)}
transform1 = A.Emboss(alpha=(0.1, 0.5))
print(transform1.get_transform_init_args())
# output
# {'alpha': (0.1, 0.5), 'strength': (0.2, 0.7)}
Get random parameters
method: get_params_dependent_on_targets()
This method BasicTransform
is not implemented. You can refer to the following ChannelShuffle()
implementation to return the parameters you want to view.
Note: This function cannot be called separately to view the parameters corresponding to the result graph. The random number has changed when called separately.
# ChannelShuffle.get_params_dependent_on_targets
def get_params_dependent_on_targets(self, params):
img = params["image"]
ch_arr = list(range(img.shape[2]))
random.shuffle(ch_arr)
return {
"channels_shuffled": ch_arr}
# demo code
# 查看ChannelShuffle变换随机生成的channels_shuffled参数
param = A.ChannelShuffle().get_params_dependent_on_targets(
dict(image=src_img))['channels_shuffled']
Pixel-level transforms
Pixel-level transformation will only change the input image, and other corresponding targets such as masks, bounding boxes and keypoints will remain unchanged.
Pixel-level transforms will change just an input image and will leave any additional targets such as masks, bounding boxes, and keypoints unchanged.
Pixel-level transformations are listed below:
Function : Blur the input image using a Generalized Normal filter with a randomly selected parameters.
Parameter description:
ScaleFloatType = Union[float, Tuple[float, float]]
ScaleIntType = Union[int, Tuple[int, int]]
The following parameters only have blur_limit and rotate_limit are ScaleIntType, and the rest are ScaleFloatType, all of which can input an integer or a range. Integer input is automatically converted to a range based on internal logic. The final transformation application parameters are obtained by random sampling within the interval.
- blur_limit: The maximum Gaussian kernel for image blur. Can be 0 or a positive odd number. Default value: (3, 7).
If it is 0, it will be automatically calculated based on the sigma parameter:round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1
- sigmaX_limit: Gaussian kernel standard deviation in the X direction. It can be 0 or a positive number. Default value: 0.
If it is 0, it will be automatically calculated according to the ksize parameter:sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8
if it is a positive number, it will be converted into an interval range (0, sigma_limit), and the value will be randomly selected within this range. - sigmaY_limit: Gaussian kernel standard deviation in the Y direction.
- rotate_limit: Parameters for rotating Gaussian kernel. If the input is an integer, it will be converted to
(-rotate_limit, rotate_limit)
. Default value: (-90, 90). - beta_limit: Parameter that controls the shape of the distribution. 1 is a normal distribution. Default value: (0.5, 8.0).
- noise_limit: Multiplicative factor that controls noise intensity. Must be a positive number, preferably around 1.0. If it is a single number, it will be converted into the interval (0, noise_limit). Default value: (0.75, 1.25).
Note: blur_limit and sigmaX_limit (sigmaY_limit) have a calculation dependency, and both cannot be 0 at the same time! ! !
# source code
class AdvancedBlur(ImageOnlyTransform):
"""Blur the input image using a Generalized Normal filter with a randomly selected parameters.
This transform also adds multiplicative noise to generated kernel before convolution.
Args:
blur_limit: maximum Gaussian kernel size for blurring the input image.
Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
If set single value `blur_limit` will be in range (0, blur_limit).
Default: (3, 7).
sigmaX_limit: Gaussian kernel standard deviation. Must be in range [0, inf).
If set single value `sigmaX_limit` will be in range (0, sigma_limit).
If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
sigmaY_limit: Same as `sigmaY_limit` for another dimension.
rotate_limit: Range from which a random angle used to rotate Gaussian kernel is picked.
If limit is a single int an angle is picked from (-rotate_limit, rotate_limit). Default: (-90, 90).
beta_limit: Distribution shape parameter, 1 is the normal distribution. Values below 1.0 make distribution
tails heavier than normal, values above 1.0 make it lighter than normal. Default: (0.5, 8.0).
noise_limit: Multiplicative factor that control strength of kernel noise. Must be positive and preferably
centered around 1.0. If set single value `noise_limit` will be in range (0, noise_limit).
Default: (0.75, 1.25).
p (float): probability of applying the transform. Default: 0.5.
Reference:
https://arxiv.org/abs/2107.10833
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
blur_limit: ScaleIntType = (3, 7),
sigmaX_limit: ScaleFloatType = (0.2, 1.0),
sigmaY_limit: ScaleFloatType = (0.2, 1.0),
rotate_limit: ScaleIntType = 90,
beta_limit: ScaleFloatType = (0.5, 8.0),
noise_limit: ScaleFloatType = (0.9, 1.1),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.blur_limit = to_tuple(blur_limit, 3)
self.sigmaX_limit = self.__check_values(to_tuple(sigmaX_limit, 0.0), name="sigmaX_limit")
self.sigmaY_limit = self.__check_values(to_tuple(sigmaY_limit, 0.0), name="sigmaY_limit")
self.rotate_limit = to_tuple(rotate_limit)
self.beta_limit = to_tuple(beta_limit, low=0.0)
self.noise_limit = self.__check_values(to_tuple(noise_limit, 0.0), name="noise_limit")
if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
):
raise ValueError("AdvancedBlur supports only odd blur limits.")
if self.sigmaX_limit[0] == 0 and self.sigmaY_limit[0] == 0:
raise ValueError("sigmaX_limit and sigmaY_limit minimum value can not be both equal to 0.")
if not (self.beta_limit[0] < 1.0 < self.beta_limit[1]):
raise ValueError("Beta limit is expected to include 1.0")
@staticmethod
def __check_values(
value: Sequence[float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
) -> Sequence[float]:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{
name} values should be between {
bounds}")
return value
def apply(self, img: np.ndarray, kernel: np.ndarray = None, **params) -> np.ndarray:
return FMain.convolve(img, kernel=kernel)
def get_params(self) -> Dict[str, np.ndarray]:
ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
sigmaX = random.uniform(*self.sigmaX_limit)
sigmaY = random.uniform(*self.sigmaY_limit)
angle = np.deg2rad(random.uniform(*self.rotate_limit))
# Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
if random.random() < 0.5:
beta = random.uniform(self.beta_limit[0], 1)
else:
beta = random.uniform(1, self.beta_limit[1])
noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])
# Generate mesh grid centered at zero.
ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
# Shape (ksize, ksize, 2)
grid = np.stack(np.meshgrid(ax, ax), axis=-1)
# Calculate rotated sigma matrix
d_matrix = np.array([[sigmaX**2, 0], [0, sigmaY**2]])
u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))
inverse_sigma = np.linalg.inv(sigma_matrix)
# Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
# Add noise
kernel = kernel * noise_matrix
# Normalize kernel
kernel = kernel.astype(np.float32) / np.sum(kernel)
return {
"kernel": kernel}
def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str]:
return (
"blur_limit",
"sigmaX_limit",
"sigmaY_limit",
"rotate_limit",
"beta_limit",
"noise_limit",
)
Three result images randomly generated with default parameters. The visual images are compressed when displayed side by side, and the change is not obvious to the naked eye.
Function : Image blur
parameter description: blur_limit (int, (int, int)): Maximum kernel size of blurred image. Valid value range [3, inf), default value: (3, 7).
# source code
class Blur(ImageOnlyTransform):
"""Blur the input image using a random-sized kernel.
Args:
blur_limit (int, (int, int)): maximum kernel size for blurring the input image.
Should be in range [3, inf). Default: (3, 7).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
super().__init__(always_apply, p)
self.blur_limit = to_tuple(blur_limit, 3)
def apply(self, img: np.ndarray, ksize: int = 3, **params) -> np.ndarray:
return F.blur(img, ksize)
def get_params(self) -> Dict[str, Any]:
return {
"ksize": int(random.choice(np.arange(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("blur_limit",)
Function : Apply Contrast Limited Adaptive Histogram Equalization to the input image.
Extended reading:
Image Enhancement - CLAHE
CLAHE algorithm learning
# source code
class CLAHE(ImageOnlyTransform):
"""Apply Contrast Limited Adaptive Histogram Equalization to the input image.
Args:
clip_limit (float or (float, float)): upper threshold value for contrast limiting.
If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).
tile_grid_size ((int, int)): size of grid for histogram equalization. Default: (8, 8).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
"""
def __init__(self, clip_limit=4.0, tile_grid_size=(8, 8), always_apply=False, p=0.5):
super(CLAHE, self).__init__(always_apply, p)
self.clip_limit = to_tuple(clip_limit, 1)
self.tile_grid_size = tuple(tile_grid_size)
def apply(self, img, clip_limit=2, **params):
if not is_rgb_image(img) and not is_grayscale_image(img):
raise TypeError("CLAHE transformation expects 1-channel or 3-channel images.")
return F.clahe(img, clip_limit, self.tile_grid_size)
def get_params(self):
return {
"clip_limit": random.uniform(self.clip_limit[0], self.clip_limit[1])}
def get_transform_init_args_names(self):
return ("clip_limit", "tile_grid_size")
Function : Randomly drop some channels and fill them with fixed values.
Parameter description:
channel_drop_range (int, int):, [min_dropout_channel_num, max_dropout_channel_num](闭区间)
indicating that a number is randomly selected within the channel_drop_range range as the number of dropped channels. The specific drop channel ID is generated randomly by choice.
Among them min_dropout_channel_num > 0
(single-channel images are not supported), max_dropout_channel_num < image_channels
(all channels cannot be dropped), min_dropout_channel_num can be equal to max_dropout_channel_num, the default is (1,1), that is, a channel is randomly dropped.
fill_value (int, float): The pixel value used to fill the dropped channel, default 0.
Detailed explanation of drop mechanism :
-
Determine the number of channels for drop
num_drop_channels = random.randint(channel_drop_range[0], channel_drop_range[1])
-
Randomly select num_drop_channels channel drops among the image channels, and fill the selected channels with fill_value
channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)
-
Fill_value for the selected channels_to_drop channel
def channel_dropout(img, channels_to_drop, fill_value=0): if len(img.shape) == 2 or img.shape[2] == 1: raise NotImplementedError("Only one channel. ChannelDropout is not defined.") img = img.copy() img[..., channels_to_drop] = fill_value return img
ChannelDropout source code is as follows:
# source code
class ChannelDropout(ImageOnlyTransform):
"""Randomly Drop Channels in the input Image.
Args:
channel_drop_range (int, int): range from which we choose the number of channels to drop.
fill_value (int, float): pixel value for the dropped channel.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, uint16, unit32, float32
"""
def __init__(self, channel_drop_range=(1, 1), fill_value=0, always_apply=False, p=0.5):
super(ChannelDropout, self).__init__(always_apply, p)
self.channel_drop_range = channel_drop_range
self.min_channels = channel_drop_range[0]
self.max_channels = channel_drop_range[1]
if not 1 <= self.min_channels <= self.max_channels:
raise ValueError("Invalid channel_drop_range. Got: {}".format(channel_drop_range))
self.fill_value = fill_value
def apply(self, img, channels_to_drop=(0,), **params):
return F.channel_dropout(img, channels_to_drop, self.fill_value)
def get_params_dependent_on_targets(self, params):
img = params["image"]
num_channels = img.shape[-1]
if len(img.shape) == 2 or num_channels == 1:
raise NotImplementedError("Images has one channel. ChannelDropout is not defined.")
if self.max_channels >= num_channels:
raise ValueError("Can not drop all channels in ChannelDropout.")
num_drop_channels = random.randint(self.min_channels, self.max_channels)
channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)
return {
"channels_to_drop": channels_to_drop}
def get_transform_init_args_names(self):
return ("channel_drop_range", "fill_value")
@property
def targets_as_params(self):
return ["image"]
The image read by opencv is in BGR format. When channels_to_drop=[1], the G channel is dropped and filled with 0, so the green part of the upper right image becomes black.
When channels_to_drop=[0], drop the B channel and fill it with 0, so the blue part of the lower left image becomes black.
When channels_to_drop=[1,2], drop the G and R channels and fill them with 0, so the green and red parts of the lower right image become black, and the white bottom part has three RGB channels. The RG channel is set to 0, leaving only the B channel. is 255, so the background turns blue.
Function : Input image channel rearrangement (rearrange channels)
# source code
class ChannelShuffle(ImageOnlyTransform):
"""Randomly rearrange channels of the input RGB image.
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
@property
def targets_as_params(self):
return ["image"]
def apply(self, img, channels_shuffled=(0, 1, 2), **params):
return F.channel_shuffle(img, channels_shuffled)
def get_params_dependent_on_targets(self, params):
img = params["image"]
ch_arr = list(range(img.shape[2]))
random.shuffle(ch_arr) # 生成随机通道列表
return {
"channels_shuffled": ch_arr}
def get_transform_init_args_names(self):
return ()
####################### F.channel_shuffle
def channel_shuffle(img, channels_shuffled):
img = img[..., channels_shuffled]
return img
Upper right: The image read by opencv is in BGR format, channels_shuffled=[0,2,1], indicating the exchange of G channel and R channel, so the green and red in the figure are interchanged.
Lower right: channels_shuffled=[1,0,2], indicating B The channel is swapped with the G channel, so the blue and green colors in the picture are swapped.
Function : Randomly change the brightness, contrast, and saturation of the image (the parameters all represent the jitter amplitude)
Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision,
this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in
Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8
overflow, but we use value saturation.
Parameters (see the __check_values function in the source code below for details):
- Parameter initialization:
brightness, contrast, saturation, hueInput form: a number or a range (float or tuple(list) of float (min, max)).
The interval parameter (if it is input as a number, it will be converted internally into an interval) needs to meet the valid interval of each parameter (see below for the valid interval and number conversion interval rules).
soThe input requirements for each parameter are:
brightness, contrast, saturation: float ∈ [ 0 , + ∞ ) , tuple ( list ) ∈ [ 0 , + ∞ ) float \in [0 , +\infty), tuple(list)\in [ 0 , +\infty)float∈[0,+∞),tuple(list)∈[0,+∞)
hue: f l o a t ∈ [ 0 , 0.5 ] , t u p l e ( l i s t ) ∈ [ − 0.5 , 0.5 ] float \in [0 , 0.5], tuple(list)\in [-0.5, 0.5] float∈[0,0.5],tuple(list)∈[−0.5,0.5]- The valid intervals
for brightness, contrast and saturation are: The[0, +inf]
valid interval for hue is:[-0.5, 0.5]
- Digital conversion interval internal logic
brightness, contrast, saturation:[ max(0, 1 - input_value), 1 + input_value]
hue:[ - input_value, + input_value]
- The valid intervals
Apply (see the get_params function in the source code below for details) :
- Each transformation factor is determined: random.uniform (processed parameter interval)
- Each transformation is applied in a random order
# source code
class ColorJitter(ImageOnlyTransform):
"""Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision,
this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in
Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8
overflow, but we use value saturation.
Args:
brightness (float or tuple of float (min, max)): How much to jitter brightness.
brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of float (min, max)): How much to jitter contrast.
contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]
or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of float (min, max)): How much to jitter saturation.
saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]
or the given [min, max]. Should be non negative numbers.
hue (float or tuple of float (min, max)): How much to jitter hue.
hue_factor is chosen uniformly from [-hue, hue] or the given [min, max].
Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
"""
def __init__(
self,
brightness=0.2,
contrast=0.2,
saturation=0.2,
hue=0.2,
always_apply=False,
p=0.5,
):
super(ColorJitter, self).__init__(always_apply=always_apply, p=p)
self.brightness = self.__check_values(brightness, "brightness")
self.contrast = self.__check_values(contrast, "contrast")
self.saturation = self.__check_values(saturation, "saturation")
# hue参数初始化的offset和bounds均不同于上,
self.hue = self.__check_values(hue, "hue", offset=0, bounds=[-0.5, 0.5], clip=False)
@staticmethod
# 输入参数处理,需符合各参数有效区间
def __check_values(value, name, offset=1, bounds=(0, float("inf")), clip=True):
if isinstance(value, numbers.Number): # 数字转区间内部逻辑
if value < 0: # 单个数字输入不可为负数
raise ValueError("If {} is a single number, it must be non negative.".format(name))
value = [offset - value, offset + value]
if clip: # hue是不进行clip的,其他三个参数进行clip操作
value[0] = max(value[0], 0)
elif isinstance(value, (tuple, list)) and len(value) == 2:
if not bounds[0] <= value[0] <= value[1] <= bounds[1]: # 若是区间输入,需满足各自的有效区间
raise ValueError("{} values should be between {}".format(name, bounds))
else:
raise TypeError("{} should be a single number or a list/tuple with length 2.".format(name))
return value
def get_params(self):
brightness = random.uniform(self.brightness[0], self.brightness[1])
contrast = random.uniform(self.contrast[0], self.contrast[1])
saturation = random.uniform(self.saturation[0], self.saturation[1])
hue = random.uniform(self.hue[0], self.hue[1])
transforms = [
lambda x: F.adjust_brightness_torchvision(x, brightness),
lambda x: F.adjust_contrast_torchvision(x, contrast),
lambda x: F.adjust_saturation_torchvision(x, saturation),
lambda x: F.adjust_hue_torchvision(x, hue),
]
random.shuffle(transforms) # 各变换顺序随机
return {
"transforms": transforms}
def apply(self, img, transforms=(), **params):
if not F.is_rgb_image(img) and not F.is_grayscale_image(img): # 仅支持单通道和三通道图像输入
raise TypeError("ColorJitter transformation expects 1-channel or 3-channel images.")
for transform in transforms:
img = transform(img)
return img
def get_transform_init_args_names(self):
return ("brightness", "contrast", "saturation", "hue")
Note that each parameter factor displayed on the following result chart is the parameter passed in by calling the respective change function, not the parameter of ColorJitter. The corresponding relationship is described in the above parameter section!
Brightness change:
Parameter influence: The larger the factor, the brighter the image, and vice versa
. Logic:clip(img_value*factor)
# F.adjust_brightness_torchvision函数内容
def _adjust_brightness_torchvision_uint8(img, factor):
lut = np.arange(0, 256) * factor
lut = np.clip(lut, 0, 255).astype(np.uint8)
return cv2.LUT(img, lut)
@preserve_shape
def adjust_brightness_torchvision(img, factor):
if factor == 0:
return np.zeros_like(img)
elif factor == 1:
return img
if img.dtype == np.uint8:
return _adjust_brightness_torchvision_uint8(img, factor)
return clip(img * factor, img.dtype, MAX_VALUES_BY_DTYPE[img.dtype])
Contrast changes:
parameter influence: the smaller the factor, the smaller the contrast between light and dark in the image; the larger the factor, the greater the contrast between light and dark in the image.
logic:clip(img_value * factor + mean * (1 - factor))
# F.adjust_contrast_torchvision函数内容
def _adjust_contrast_torchvision_uint8(img, factor, mean):
lut = np.arange(0, 256) * factor
lut = lut + mean * (1 - factor)
lut = clip(lut, img.dtype, 255)
return cv2.LUT(img, lut)
@preserve_shape
def adjust_contrast_torchvision(img, factor):
if factor == 1:
return img
if is_grayscale_image(img):
mean = img.mean()
else:
mean = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY).mean()
if factor == 0:
return np.full_like(img, int(mean + 0.5), dtype=img.dtype)
if img.dtype == np.uint8:
return _adjust_contrast_torchvision_uint8(img, factor, mean)
return clip(
img.astype(np.float32) * factor + mean * (1 - factor),
img.dtype,
MAX_VALUES_BY_DTYPE[img.dtype],
)
Saturation changes:
parameter influence: the smaller the factor, the more grayscale the image is, the larger the factor, the brighter the image color.
logic:clip(img * factor + gray * (1 - factor)),原图和灰度图加权融合
# F.adjust_saturation_torchvision函数内容
@preserve_shape
def adjust_saturation_torchvision(img, factor, gamma=0):
if factor == 1:
return img
if is_grayscale_image(img):
gray = img
return gray
else:
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
gray = cv2.cvtColor(gray, cv2.COLOR_GRAY2RGB) # 三通道的值一致,方便后面与原图加权
if factor == 0:
return gray
# cv2.addWeighted:两个图像加权融合
# result = img * factor + gray * (1 - factor)+ gamma
result = cv2.addWeighted(img, factor, gray, 1 - factor, gamma=gamma)
if img.dtype == np.uint8:
return result
# OpenCV does not clip values for float dtype
return clip(result, img.dtype, MAX_VALUES_BY_DTYPE[img.dtype])
Hue change:
Parameter influence: The larger the factor, the more serious the hue shift. factor=0, the hue remains unchanged.
logic:图像转HSV颜色空间,np.mod(hue_value + factor * 180, 180) ,再转回RGB颜色空间
# F.adjust_hue_torchvision函数内容
def _adjust_hue_torchvision_uint8(img, factor):
img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
lut = np.arange(0, 256, dtype=np.int16)
lut = np.mod(lut + 180 * factor, 180).astype(np.uint8)
img[..., 0] = cv2.LUT(img[..., 0], lut)
return cv2.cvtColor(img, cv2.COLOR_HSV2RGB)
def adjust_hue_torchvision(img, factor):
if is_grayscale_image(img):
return img
if factor == 0:
return img
if img.dtype == np.uint8:
return _adjust_hue_torchvision_uint8(img, factor)
img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
img[..., 0] = np.mod(img[..., 0] + factor * 360, 360)
return cv2.cvtColor(img, cv2.COLOR_HSV2RGB)
Additional reading
What is the difference between contrast and saturation?
Contrast refers to the ratio of the highest brightness to the lowest brightness. When the image contrast is higher, the difference between light and dark in the image is more obvious; saturation refers to the purity of the color, the purer the color, the higher the saturation. For example, pure blue, pure red, and pure green belong to high saturation, while gray blue, rose red, and grass green belong to low saturation. Therefore, the higher the saturation of the image, the brighter the color of the image.Contrast and saturation are quite different in terms of subject, characteristics and functions. Let’s explain them in detail below:
1. Main body differences
1. Contrast: refers to the ratio of the highest brightness to the lowest brightness. When the image contrast is higher, the difference between light and dark in the image is more obvious.
2. Saturation: refers to the purity of color. When the saturation of an image is higher, the colors of the image are more vivid.2. Feature differences
1. Contrast: The greater the range of image color difference, the greater the contrast, and vice versa. When the contrast ratio reaches 120:1, vivid and rich colors can be easily displayed; and when the contrast ratio reaches 300:1, various levels of colors can be supported.
2. Saturation: Saturation depends on the ratio of chromatic components and achromatic components in the color. The greater the color component, the greater the saturation; the greater the achromatic component, the smaller the saturation.3. Differences in functions
1. Contrast: The greater the contrast, the clearer and eye-catching the image will be, and the more vivid and colorful the colors will be; otherwise, the entire picture will be gray. High contrast is very helpful for image clarity, detail expression, and grayscale expression.
2. Saturation: Chroma is related to the intensity of photometric lines and the intensity distribution at different wavelengths. The highest chromaticity is generally achieved by strong light of a single wavelength. When the wavelength distribution remains unchanged, the weaker the light intensity, the lower the chromaticity.
Function : Image defocus
Parameter: radius > 0, defocus radius. If it is a single number, the default conversion is [1, radius_input_value]. Default interval [3, 10]
alias_blur >= 0, sigma parameter of Gaussian blur. If it is a single number, the default conversion is [0, alias_blur input_value]. The default interval [0.1, 0.5]
parameter affects:The larger the radius parameter, the higher the degree of defocus. The alias_blur parameter changes, and the changes perceived by the naked eye are very small.。
# source code
class Defocus(ImageOnlyTransform):
"""
Apply defocus transform. See https://arxiv.org/abs/1903.12261.
Args:
radius ((int, int) or int): range for radius of defocusing.
If limit is a single int, the range will be [1, limit]. Default: (3, 10).
alias_blur ((float, float) or float): range for alias_blur of defocusing (sigma of gaussian blur).
If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
Any
"""
def __init__(
self,
radius: ScaleIntType = (3, 10),
alias_blur: ScaleFloatType = (0.1, 0.5),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.radius = to_tuple(radius, low=1)
self.alias_blur = to_tuple(alias_blur, low=0)
if self.radius[0] <= 0:
raise ValueError("Parameter radius must be positive")
if self.alias_blur[0] < 0:
raise ValueError("Parameter alias_blur must be non-negative")
def apply(self, img: np.ndarray, radius: int = 3, alias_blur: float = 0.5, **params) -> np.ndarray:
return F.defocus(img, radius, alias_blur)
def get_params(self) -> Dict[str, Any]:
return {
"radius": random_utils.randint(self.radius[0], self.radius[1] + 1),
"alias_blur": random_utils.uniform(self.alias_blur[0], self.alias_blur[1]),
}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("radius", "alias_blur")
radius parameter changes:
alias_blur parameter changes:
Function : Reduce image quality by downsampling first and then upsampling. The image size does not change before and after transformation.
Parameters: 0 < scale_min <= scale_max < 1
, indicating the magnification of image scaling. Equivalent to the scale parameter in the resize function.
interpolation can specify the scaling method, the default nearest neighbor method: cv2.INTER_NEAREST. There are three ways to specify, see the args description in the source code below.
# interpolation 参数举例:
# 方法一:表示下采样和上采样均使用NEAREST方法
interpolation = cv2.INTER_NEAREST
# 方法二:表示下采样使用最近邻差值,上采样使用双线性差值
interpolation = dict(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_LINEAR)
# 方法三:下采样使用AREA方法,上采样使用CUBIC方法
interpolation = Downscale.Interpolation(downscale=cv2.INTER_AREA, upscale=cv2.INTER_CUBIC)
interpolation options:
INTER_NEAREST
nearest neighbor interpolation
INTER_LINEAR
Bilinear interpolation (default)
INTER_AREA
uses pixel-area relationships for resampling. It is probably the preferred method for image downsampling as it produces cloud-free texture results.
But when the image is upsampled, it's similar to the INTER_NEAREST method.
INTER_CUBIC
Bicubic interpolation for 4x4 pixel neighborhoods
INTER_LANCZOS4
Lanczos interpolation for 8x8 pixel neighborhoods
# source code
class Downscale(ImageOnlyTransform):
"""Decreases image quality by downscaling and upscaling back.
Args:
scale_min (float): lower bound on the image scale. Should be < 1.
scale_max (float): lower bound on the image scale. Should be .
interpolation: cv2 interpolation method. Could be:
- single cv2 interpolation flag - selected method will be used for downscale and upscale.
- dict(downscale=flag, upscale=flag)
- Downscale.Interpolation(downscale=flag, upscale=flag) -
Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)
Targets:
image
Image types:
uint8, float32
"""
class Interpolation:
def __init__(self, *, downscale: int = cv2.INTER_NEAREST, upscale: int = cv2.INTER_NEAREST):
self.downscale = downscale
self.upscale = upscale
def __init__(
self,
scale_min: float = 0.25,
scale_max: float = 0.25,
interpolation: Optional[Union[int, Interpolation, Dict[str, int]]] = None,
always_apply: bool = False,
p: float = 0.5,
):
super(Downscale, self).__init__(always_apply, p)
if interpolation is None:
self.interpolation = self.Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)
warnings.warn(
"Using default interpolation INTER_NEAREST, which is sub-optimal."
"Please specify interpolation mode for downscale and upscale explicitly."
"For additional information see this PR https://github.com/albumentations-team/albumentations/pull/584"
)
elif isinstance(interpolation, int):
self.interpolation = self.Interpolation(downscale=interpolation, upscale=interpolation)
elif isinstance(interpolation, self.Interpolation):
self.interpolation = interpolation
elif isinstance(interpolation, dict):
self.interpolation = self.Interpolation(**interpolation)
else:
raise ValueError(
"Wrong interpolation data type. Supported types: `Optional[Union[int, Interpolation, Dict[str, int]]]`."
f" Got: {
type(interpolation)}"
)
if scale_min > scale_max:
raise ValueError("Expected scale_min be less or equal scale_max, got {} {}".format(scale_min, scale_max))
if scale_max >= 1:
raise ValueError("Expected scale_max to be less than 1, got {}".format(scale_max))
self.scale_min = scale_min
self.scale_max = scale_max
def apply(self, img: np.ndarray, scale: Optional[float] = None, **params) -> np.ndarray:
return F.downscale(
img,
scale=scale,
down_interpolation=self.interpolation.downscale,
up_interpolation=self.interpolation.upscale,
)
def get_params(self) -> Dict[str, Any]:
return {
"scale": random.uniform(self.scale_min, self.scale_max)}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return "scale_min", "scale_max"
def _to_dict(self) -> Dict[str, Any]:
result = super()._to_dict()
result["interpolation"] = {
"upscale": self.interpolation.upscale, "downscale": self.interpolation.downscale}
return result
To facilitate visualization, scale is set to 0.1. The following are the results of initializing and specifying different interpolation methods in three ways:
# demo code
import cv2
import matplotlib.pyplot as plt
import albumentations as A
if __name__ == "__main__":
filename = '0'
title_key = 'scale_method'
src_img = cv2.imread(f'imgs/{
filename}.jpg')
dst_path = f'imgs/{
filename}_aug.jpg'
transform1 = A.Downscale(scale_min=0.1,
scale_max=0.1,
interpolation=cv2.INTER_NEAREST,
p=1)
transform2 = A.Downscale(scale_min=0.1,
scale_max=0.1,
interpolation=dict(downscale=cv2.INTER_LINEAR,
upscale=cv2.INTER_LINEAR),
p=1)
transform3 = A.Downscale(scale_min=0.1,
scale_max=0.1,
interpolation=A.Downscale.Interpolation(
downscale=cv2.INTER_AREA,
upscale=cv2.INTER_AREA),
p=1)
img_aug1 = transform1(image=src_img)['image']
img_aug2 = transform2(image=src_img)['image']
img_aug3 = transform3(image=src_img)['image']
param1 = 'INTER_NEAREST'
param2 = 'INTER_LINEAR'
param3 = 'INTER_AREA'
fontsize = 10
plt.subplot(221)
plt.axis('off')
plt.title('src', fontdict={
'fontsize': fontsize})
plt.imshow(src_img[:, :, ::-1])
plt.subplot(222)
plt.axis('off')
plt.title(f'{
title_key}={
param1}', fontdict={
'fontsize': fontsize})
plt.imshow(img_aug1[:, :, ::-1])
plt.subplot(223)
plt.axis('off')
plt.title(f'{
title_key}={
param2}', fontdict={
'fontsize': fontsize})
plt.imshow(img_aug2[:, :, ::-1])
plt.subplot(224)
plt.axis('off')
plt.title(f'{
title_key}={
param3}', fontdict={
'fontsize': fontsize})
plt.imshow(img_aug3[:, :, ::-1])
plt.savefig(dst_path)
Function : Overlay relief effect
parameter description:
alpha ((float, float)): Adjust the visibility of the relief image. When it is 0, only the original image is retained. When it is 1.0, only the relief image is retained.
result = (1 - alpha) * src_image + alpha * emboss_image
strength ((float, float)):
The alpha parameter of the relief strength has a greater impact than the strength parameter.
# source code
class Emboss(ImageOnlyTransform):
"""Emboss the input image and overlays the result with the original image.
Args:
alpha ((float, float)): range to choose the visibility of the embossed image. At 0, only the original image is
visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).
strength ((float, float)): strength range of the embossing. Default: (0.2, 0.7).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
"""
def __init__(self, alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=False, p=0.5):
super(Emboss, self).__init__(always_apply, p)
self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
self.strength = self.__check_values(to_tuple(strength, 0.0), name="strength")
@staticmethod
def __check_values(value, name, bounds=(0, float("inf"))):
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError("{} values should be between {}".format(name, bounds))
return value
@staticmethod
def __generate_emboss_matrix(alpha_sample, strength_sample):
matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
matrix_effect = np.array(
[
[-1 - strength_sample, 0 - strength_sample, 0],
[0 - strength_sample, 1, 0 + strength_sample],
[0, 0 + strength_sample, 1 + strength_sample],
],
dtype=np.float32,
)
matrix = (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect
return matrix
def get_params(self):
alpha = random.uniform(*self.alpha)
strength = random.uniform(*self.strength)
emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
return {
"emboss_matrix": emboss_matrix}
def apply(self, img, emboss_matrix=None, **params):
return F.convolve(img, emboss_matrix) # 卷积
def get_transform_init_args_names(self):
return ("alpha", "strength")
The following is a comparison of the visualization results. The effect of the alpha parameter is more obvious than that of the strength parameter.
Function : Histogram equalization
parameter description: mode (str): {'cv', 'pil'}. Choose to use OpenCV or Pillow equalization method.
by_channels (bool): If True, it means performing histogram equalization on each channel separately; if False, it means converting the image to YCbCr format and then performing histogram equalization on the Y channel. Default value: True
mask (np.ndarray, callable): If this parameter is provided, it means that only the mask coverage will be transformed.
mask_params (list of str): Params for mask function.
Note: By_channels is set to False, the effect is more natural, and the difference in hue is smaller.
# source code
class Equalize(ImageOnlyTransform):
"""Equalize the image histogram.
Args:
mode (str): {'cv', 'pil'}. Use OpenCV or Pillow equalization method.
by_channels (bool): If True, use equalization by channels separately,
else convert image to YCbCr representation and use equalization by `Y` channel.
mask (np.ndarray, callable): If given, only the pixels selected by
the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable.
Function signature must include `image` argument.
mask_params (list of str): Params for mask function.
Targets:
image
Image types:
uint8
"""
def __init__(
self,
mode="cv",
by_channels=True,
mask=None,
mask_params=(),
always_apply=False,
p=0.5,
):
modes = ["cv", "pil"]
if mode not in modes:
raise ValueError("Unsupported equalization mode. Supports: {}. "
"Got: {}".format(modes, mode))
super(Equalize, self).__init__(always_apply, p)
self.mode = mode
self.by_channels = by_channels
self.mask = mask
self.mask_params = mask_params
def apply(self, image, mask=None, **params):
return F.equalize(image,
mode=self.mode,
by_channels=self.by_channels,
mask=mask)
def get_params_dependent_on_targets(self, params):
if not callable(self.mask):
return {
"mask": self.mask}
return {
"mask": self.mask(**params)}
@property
def targets_as_params(self):
return ["image"] + list(self.mask_params)
def get_transform_init_args_names(self):
return ("mode", "by_channels")
Function : Fourier Domain Adaptation from https://github.com/YanchaoYang/FDA ), to achieve simple style migration
parameter description:
reference_images (List[str] or List(np.ndarray)): Reference A list of images or a list of image paths. If multiple reference images are provided (the list length is greater than 1), one image style will be randomly selected for transformation.
beta_limit (float or tuple of float): The coefficient in the paper is recommended to be less than 0.3, and the default value is 0.1.
read_fn (Callable): Callable function for reading images, returning numpy array format. The default value is read_rgb_image.
# 默认读图函数,对应的reference_images参数应为路径列表:
def read_rgb_image(path):
image = cv2.imread(path, cv2.IMREAD_COLOR)
return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# 若参考图像已经是numpy array格式,read_fn函数恒等读入即可(lambda x: x):
target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
aug = A.FDA([target_image], read_fn=lambda x: x)
class FDA(ImageOnlyTransform):
"""
Fourier Domain Adaptation from https://github.com/YanchaoYang/FDA
Simple "style transfer".
Args:
reference_images (List[str] or List(np.ndarray)): List of file paths for reference images
or list of reference images.
beta_limit (float or tuple of float): coefficient beta from paper. Recommended less 0.3.
read_fn (Callable): Used-defined function to read image. Function should get image path and return numpy
array of image pixels.
Targets:
image
Image types:
uint8, float32
Reference:
https://github.com/YanchaoYang/FDA
https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf
Example:
>>> import numpy as np
>>> import albumentations as A
>>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
>>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
>>> result = aug(image=image)
"""
def __init__(
self,
reference_images: List[Union[str, np.ndarray]],
beta_limit=0.1,
read_fn=read_rgb_image,
always_apply=False,
p=0.5,
):
super(FDA, self).__init__(always_apply=always_apply, p=p)
self.reference_images = reference_images
self.read_fn = read_fn
self.beta_limit = to_tuple(beta_limit, low=0)
def apply(self, img, target_image=None, beta=0.1, **params):
return fourier_domain_adaptation(img=img, target_img=target_image, beta=beta)
def get_params_dependent_on_targets(self, params):
img = params["image"]
target_img = self.read_fn(random.choice(self.reference_images))
target_img = cv2.resize(target_img, dsize=(img.shape[1], img.shape[0]))
return {
"target_image": target_img}
def get_params(self):
return {
"beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}
@property
def targets_as_params(self):
return ["image"]
def get_transform_init_args_names(self):
return ("reference_images", "beta_limit", "read_fn")
def _to_dict(self):
raise NotImplementedError("FDA can not be serialized.")
Results of running with existing images ( beta_limit=0.1 ):
Results in the official project:
Features: RGB images are color enhanced via FancyPCA. FancyPCA has less color distortion.
Parameter description:
alpha (float): The degree of disturbance that affects eigenvalues and eigenvectors.
class FancyPCA(ImageOnlyTransform):
"""Augment RGB image using FancyPCA from Krizhevsky's paper
"ImageNet Classification with Deep Convolutional Neural Networks"
Args:
alpha (float): how much to perturb/scale the eigen vecs and vals.
scale is samples from gaussian distribution (mu=0, sigma=alpha)
Targets:
image
Image types:
3-channel uint8 images only
Credit:
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image
https://pixelatedbrian.github.io/2018-04-29-fancy_pca/
"""
def __init__(self, alpha=0.1, always_apply=False, p=0.5):
super(FancyPCA, self).__init__(always_apply=always_apply, p=p)
self.alpha = alpha
def apply(self, img, alpha=0.1, **params):
img = F.fancy_pca(img, alpha)
return img
def get_params(self):
return {
"alpha": random.gauss(0, self.alpha)}
def get_transform_init_args_names(self):
return ("alpha", )
Attached is the visualization result of the official website: https://pixelatedbrian.github.io/2018-04-29-fancy_pca/
The following are the three scene transformation results. The middle column is the FancyPCA result with very small color distortion.
Function : Multiply the pixel value by the maximum value to change the image from floating point to integer.
The opposite function is ToFloat, which divides by the maximum value and changes from integer to floating point ([0, 1.0])
# source code
class FromFloat(ImageOnlyTransform):
"""Take an input array where all values should lie in the range [0, 1.0], multiply them by `max_value` and then
cast the resulted value to a type specified by `dtype`. If `max_value` is None the transform will try to infer
the maximum value for the data type from the `dtype` argument.
This is the inverse transform for :class:`~albumentations.augmentations.transforms.ToFloat`.
Args:
max_value (float): maximum possible input value. Default: None.
dtype (string or numpy data type): data type of the output. See the `'Data types' page from the NumPy docs`_.
Default: 'uint16'.
p (float): probability of applying the transform. Default: 1.0.
Targets:
image
Image types:
float32
.. _'Data types' page from the NumPy docs:
https://docs.scipy.org/doc/numpy/user/basics.types.html
"""
def __init__(self, dtype="uint16", max_value=None, always_apply=False, p=1.0):
super(FromFloat, self).__init__(always_apply, p)
self.dtype = np.dtype(dtype)
self.max_value = max_value
def apply(self, img, **params):
return F.from_float(img, self.dtype, self.max_value)
def get_transform_init_args(self):
return {
"dtype": self.dtype.name, "max_value": self.max_value}
# F.from_float()
def from_float(img, dtype, max_value=None):
if max_value is None:
try:
max_value = MAX_VALUES_BY_DTYPE[dtype]
except KeyError:
raise RuntimeError(
"Can't infer the maximum value for dtype {}. You need to specify the maximum value manually by "
"passing the max_value argument".format(dtype)
)
return (img * max_value).astype(dtype)
# MAX_VALUES_BY_DTYPE = {
# np.dtype("uint8"): 255,
# np.dtype("uint16"): 65535,
# np.dtype("uint32"): 4294967295,
# np.dtype("float32"): 1.0,
# }
Function: Add Gaussian noise
parameter description:
var_limit ((float, float) or float): noise variance range. If it is a single float value, it will be converted into an interval range (0, var_limit). Default value: (10.0, 50.0).
mean (float): noise mean. Default value: 0
per_channel (bool): whether each channel is sampled independently. Default value: True
# source code
class GaussNoise(ImageOnlyTransform):
"""Apply gaussian noise to the input image.
Args:
var_limit ((float, float) or float): variance range for noise. If var_limit is a single float, the range
will be (0, var_limit). Default: (10.0, 50.0).
mean (float): mean of the noise. Default: 0
per_channel (bool): if set to True, noise will be sampled for each channel independently.
Otherwise, the noise will be sampled once for all channels. Default: True
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, var_limit=(10.0, 50.0), mean=0, per_channel=True, always_apply=False, p=0.5):
super(GaussNoise, self).__init__(always_apply, p)
if isinstance(var_limit, (tuple, list)):
if var_limit[0] < 0:
raise ValueError("Lower var_limit should be non negative.")
if var_limit[1] < 0:
raise ValueError("Upper var_limit should be non negative.")
self.var_limit = var_limit
elif isinstance(var_limit, (int, float)):
if var_limit < 0:
raise ValueError("var_limit should be non negative.")
self.var_limit = (0, var_limit)
else:
raise TypeError(
"Expected var_limit type to be one of (int, float, tuple, list), got {}".format(type(var_limit))
)
self.mean = mean
self.per_channel = per_channel
def apply(self, img, gauss=None, **params):
return F.gauss_noise(img, gauss=gauss)
def get_params_dependent_on_targets(self, params):
image = params["image"]
var = random.uniform(self.var_limit[0], self.var_limit[1])
sigma = var ** 0.5
random_state = np.random.RandomState(random.randint(0, 2 ** 32 - 1))
if self.per_channel:
gauss = random_state.normal(self.mean, sigma, image.shape)
else:
gauss = random_state.normal(self.mean, sigma, image.shape[:2])
if len(image.shape) == 3:
gauss = np.expand_dims(gauss, -1)
return {
"gauss": gauss}
@property
def targets_as_params(self):
return ["image"]
def get_transform_init_args_names(self):
return ("var_limit", "per_channel", "mean")
The larger the var_limit value, the more obvious the noise.
Function : Blur the image with Gaussian filter.
Parameter description :
- blur_limit (int, (int, int)): The maximum Gaussian kernel size of the blurred image. Must be 0 or an odd number, valid value range: [0, inf).
If the value is 0, the size will be calculated based on the value of sigma, and the calculation formula isround(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1
:
If the parameter is a single value, it will be converted into(0, blur_limit)
a random value within the range.
Default value: (3, 7) - sigma_limit (float, (float, float)): Gaussian kernel standard deviation, valid value range: [0, inf).
If the parameter is a single value, it will be converted into(0, sigma_limit)
a random value within the range.
If the value is 0, the size will be calculated based on the value of ksize, and the calculation formula issigma = 0.3*((ksize-1)*0.5 - 1) + 0.8
.
Default value: 0
If blur_limit and sigma_limit are both 0, the value of blur_limit will be modified to 3.
# source code
class GaussianBlur(ImageOnlyTransform):
"""Blur the input image using a Gaussian filter with a random kernel size.
Args:
blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
If set single value `blur_limit` will be in range (0, blur_limit).
Default: (3, 7).
sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
If set single value `sigma_limit` will be in range (0, sigma_limit).
If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
blur_limit: ScaleIntType = (3, 7),
sigma_limit: ScaleFloatType = 0,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.blur_limit = to_tuple(blur_limit, 0)
self.sigma_limit = to_tuple(sigma_limit if sigma_limit is not None else 0, 0)
if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
self.blur_limit = 3, max(3, self.blur_limit[1])
warnings.warn(
"blur_limit and sigma_limit minimum value can not be both equal to 0. "
"blur_limit minimum value changed to 3."
)
if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
):
raise ValueError("GaussianBlur supports only odd blur limits.")
def apply(self, img: np.ndarray, ksize: int = 3, sigma: float = 0, **params) -> np.ndarray:
return F.gaussian_blur(img, ksize, sigma=sigma)
def get_params(self) -> Dict[str, float]:
ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
if ksize != 0 and ksize % 2 != 1:
ksize = (ksize + 1) % (self.blur_limit[1] + 1)
return {
"ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("blur_limit", "sigma_limit")
Blur effects of different Gaussian kernel sizes (sigma takes the default value 0, calculated based on ksize):
Function : Add glass noise.
Parameter Description:
- sigma (float): The standard deviation of the Gaussian kernel. Default value: 0.7
- max_delta (int): Maximum distance for pixel exchange. Default value: 4
- iterations (int): number of repetitions, valid value range: [1, inf). Default value: 2
- mode (str): calculation mode (fast or exact), default value: fast. affect operating efficiency.
# source code
class GlassBlur(Blur):
"""Apply glass noise to the input image.
Args:
sigma (float): standard deviation for Gaussian kernel.
max_delta (int): max distance between pixels which are swapped.
iterations (int): number of repeats.
Should be in range [1, inf). Default: (2).
mode (str): mode of computation: fast or exact. Default: "fast".
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
Reference:
| https://arxiv.org/abs/1903.12261
| https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
"""
def __init__(
self,
sigma: float = 0.7,
max_delta: int = 4,
iterations: int = 2,
always_apply: bool = False,
mode: str = "fast",
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
if iterations < 1:
raise ValueError(f"Iterations should be more or equal to 1, but we got {
iterations}")
if mode not in ["fast", "exact"]:
raise ValueError(f"Mode should be 'fast' or 'exact', but we got {
mode}")
self.sigma = sigma
self.max_delta = max_delta
self.iterations = iterations
self.mode = mode
def apply(self, img: np.ndarray, dxy: np.ndarray = None, **params) -> np.ndarray: # type: ignore
assert dxy is not None
return F.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
img = params["image"]
# generate array containing all necessary values for transformations
width_pixels = img.shape[0] - self.max_delta * 2
height_pixels = img.shape[1] - self.max_delta * 2
total_pixels = width_pixels * height_pixels
dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))
return {
"dxy": dxy}
def get_transform_init_args_names(self) -> Tuple[str, str, str]:
return ("sigma", "max_delta", "iterations")
@property
def targets_as_params(self) -> List[str]:
return ["image"]
The larger the max_delta and iterations parameter values are, the stronger the frosted glass effect will be.
Function : Histogram matching. Adjusts the pixel values of the input image so that its histogram matches the histogram of the reference image. Each channel is performed independently, and the number of channels of the input image and the reference image is required to be consistent.
Histogram matching can serve as a lightweight normalization for image processing (e.g. feature matching), especially when the images are of different origins or conditions (e.g. lighting).
Parameter description: (The parameters are similar to the FDA transformation parameters, p=0.5 in FDA, and the default p=1 in HistogramMatching)
-
reference_images (List[str] or List(np.ndarray)): Reference image list or image path list. If multiple reference images are provided (the list length is greater than 1), one image style will be randomly selected for transformation.
-
blend_ratio (float, float): The weighting factor for the weighted superposition of the original image and the transformed image.
blend_ratio_sample
is the weight factor of the histogram matching image, and the weight factor of the original image is1 - blend_ratio_sample
.img = cv2.addWeighted( matched, blend_ratio, img, 1 - blend_ratio, 0, dtype=get_opencv_dtype_from_numpy(img.dtype), )
-
read_fn (Callable): Callable function for reading images, returning numpy array format. The default value is read_rgb_image.
# 默认读图函数,对应的reference_images参数应为路径列表:
def read_rgb_image(path):
image = cv2.imread(path, cv2.IMREAD_COLOR)
return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# 若参考图像已经是numpy array格式,read_fn函数恒等读入即可(lambda x: x):
target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
aug = A.HistogramMatching([target_image], read_fn=lambda x: x)
# source code
class HistogramMatching(ImageOnlyTransform):
"""
Apply histogram matching. It manipulates the pixels of an input image so that its histogram matches
the histogram of the reference image. If the images have multiple channels, the matching is done independently
for each channel, as long as the number of channels is equal in the input image and the reference.
Histogram matching can be used as a lightweight normalisation for image processing,
such as feature matching, especially in circumstances where the images have been taken from different
sources or in different conditions (i.e. lighting).
See:
https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html
Args:
reference_images (List[str] or List(np.ndarray)): List of file paths for reference images
or list of reference images.
blend_ratio (float, float): Tuple of min and max blend ratio. Matched image will be blended with original
with random blend factor for increased diversity of generated images.
read_fn (Callable): Used-defined function to read image. Function should get image path and return numpy
array of image pixels.
p (float): probability of applying the transform. Default: 1.0.
Targets:
image
Image types:
uint8, uint16, float32
"""
def __init__(
self,
reference_images: List[Union[str, np.ndarray]],
blend_ratio=(0.5, 1.0),
read_fn=read_rgb_image,
always_apply=False,
p=0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.reference_images = reference_images
self.read_fn = read_fn
self.blend_ratio = blend_ratio
def apply(self, img, reference_image=None, blend_ratio=0.5, **params):
return apply_histogram(img, reference_image, blend_ratio)
def get_params(self):
return {
"reference_image": self.read_fn(random.choice(self.reference_images)),
"blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
}
def get_transform_init_args_names(self):
return ("reference_images", "blend_ratio", "read_fn")
def _to_dict(self):
raise NotImplementedError("HistogramMatching can not be serialized.")
You can see that after the middle image is used as the target, the transformed image is also greenish.
Source of the following image: https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html
Function : Randomly change the hue, saturation, and brightness of the image.
Parameter description: hue_shift_limit, sat_shift_limit, val_shift_limit represent the change range of hue, saturation and brightness respectively. If the input is a single number, it will be converted into an interval ( -input_val, input_val)
, and the value will be randomly selected within this interval.
If the task is color sensitive, the hue_shift_limit range should be smaller.
# source code
class HueSaturationValue(ImageOnlyTransform):
"""Randomly change hue, saturation and value of the input image.
Args:
hue_shift_limit ((int, int) or int): range for changing hue. If hue_shift_limit is a single int, the range
will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).
sat_shift_limit ((int, int) or int): range for changing saturation. If sat_shift_limit is a single int,
the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).
val_shift_limit ((int, int) or int): range for changing value. If val_shift_limit is a single int, the range
will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
hue_shift_limit=20,
sat_shift_limit=30,
val_shift_limit=20,
always_apply=False,
p=0.5,
):
super(HueSaturationValue, self).__init__(always_apply, p)
self.hue_shift_limit = to_tuple(hue_shift_limit)
self.sat_shift_limit = to_tuple(sat_shift_limit)
self.val_shift_limit = to_tuple(val_shift_limit)
def apply(self, image, hue_shift=0, sat_shift=0, val_shift=0, **params):
if not is_rgb_image(image) and not is_grayscale_image(image):
raise TypeError(
"HueSaturationValue transformation expects 1-channel or 3-channel images."
)
return F.shift_hsv(image, hue_shift, sat_shift, val_shift)
def get_params(self):
return {
"hue_shift":
random.uniform(self.hue_shift_limit[0], self.hue_shift_limit[1]),
"sat_shift":
random.uniform(self.sat_shift_limit[0], self.sat_shift_limit[1]),
"val_shift":
random.uniform(self.val_shift_limit[0], self.val_shift_limit[1]),
}
def get_transform_init_args_names(self):
return ("hue_shift_limit", "sat_shift_limit", "val_shift_limit")
Function : Add camera sensor noise.
Parameter description: color_shift (float, float): Hue change range.
intensity ((float, float): Multiplier factor that controls color intensity and luminance noise.
# source code
class ISONoise(ImageOnlyTransform):
"""
Apply camera sensor noise.
Args:
color_shift (float, float): variance range for color hue change.
Measured as a fraction of 360 degree Hue angle in HLS colorspace.
intensity ((float, float): Multiplicative factor that control strength
of color and luminace noise.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
"""
def __init__(self,
color_shift=(0.01, 0.05),
intensity=(0.1, 0.5),
always_apply=False,
p=0.5):
super(ISONoise, self).__init__(always_apply, p)
self.intensity = intensity
self.color_shift = color_shift
def apply(self,
img,
color_shift=0.05,
intensity=1.0,
random_state=None,
**params):
return F.iso_noise(img, color_shift, intensity,
np.random.RandomState(random_state))
def get_params(self):
return {
"color_shift": random.uniform(self.color_shift[0],
self.color_shift[1]),
"intensity": random.uniform(self.intensity[0], self.intensity[1]),
"random_state": random.randint(0, 65536),
}
def get_transform_init_args_names(self):
return ("intensity", "color_shift")
For obvious visualization, the parameter settings are larger.
The input parameter is an interval, so color_shift=0.02 in the figure means color_shift=(0.02, 0.02) when calling.
JpegCompression has been deprecated and has the same function as ImageCompression.
Function : jpg and webp format image compression
parameter description: quality_lower (float): The lowest quality of the image. jpg in [0, 100], webp in [1, 100]. quality_upper
(float): The highest quality of the image. jpg in [0, 100], webp in [1, 100].
compression_type (ImageCompressionType): compression type, with two built-in options: ImageCompressionType.JPEG or ImageCompressionType.WEBP. Default type: ImageCompressionType.JPEG
The resolution will not change before and after compression.
# source code
class ImageCompression(ImageOnlyTransform):
"""Decrease Jpeg, WebP compression of an image.
Args:
quality_lower (float): lower bound on the image quality.
Should be in [0, 100] range for jpeg and [1, 100] for webp.
quality_upper (float): upper bound on the image quality.
Should be in [0, 100] range for jpeg and [1, 100] for webp.
compression_type (ImageCompressionType): should be ImageCompressionType.JPEG or ImageCompressionType.WEBP.
Default: ImageCompressionType.JPEG
Targets:
image
Image types:
uint8, float32
"""
class ImageCompressionType(IntEnum):
JPEG = 0
WEBP = 1
def __init__(
self,
quality_lower=99,
quality_upper=100,
compression_type=ImageCompressionType.JPEG,
always_apply=False,
p=0.5,
):
super(ImageCompression, self).__init__(always_apply, p)
self.compression_type = ImageCompression.ImageCompressionType(
compression_type)
low_thresh_quality_assert = 0
if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
low_thresh_quality_assert = 1
if not low_thresh_quality_assert <= quality_lower <= 100:
raise ValueError(
"Invalid quality_lower. Got: {}".format(quality_lower))
if not low_thresh_quality_assert <= quality_upper <= 100:
raise ValueError(
"Invalid quality_upper. Got: {}".format(quality_upper))
self.quality_lower = quality_lower
self.quality_upper = quality_upper
def apply(self, image, quality=100, image_type=".jpg", **params):
if not image.ndim == 2 and image.shape[-1] not in (1, 3, 4):
raise TypeError(
"ImageCompression transformation expects 1, 3 or 4 channel images."
)
return F.image_compression(image, quality, image_type)
def get_params(self):
image_type = ".jpg"
if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
image_type = ".webp"
return {
"quality": random.randint(self.quality_lower, self.quality_upper),
"image_type": image_type,
}
def get_transform_init_args(self):
return {
"quality_lower": self.quality_lower,
"quality_upper": self.quality_upper,
"compression_type": self.compression_type.value,
}
Function : 255 - pixel value
# F.invert(img)
def invert(img):
return 255 - img
Function: Use median filtering to achieve image blur.
Parameter description:
blur_limit (int or Tuple[int, int]): blur kernel size, the starting and ending values of the range must be odd numbers. Valid interval: [3, inf), default value: (3, 7)
# source code
class MedianBlur(Blur):
"""Blur the input image using a median filter with a random aperture linear size.
Args:
blur_limit (int): maximum aperture linear size for blurring the input image.
Must be odd and in range [3, inf). Default: (3, 7).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
super().__init__(blur_limit, always_apply, p)
if self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
raise ValueError("MedianBlur supports only odd blur limits.")
def apply(self, img: np.ndarray, ksize: int = 3, **params) -> np.ndarray:
return F.median_blur(img, ksize)
Function: Apply motion blur to the image.
Parameter description:
blur_limit (int or Tuple[int, int]): blur kernel size, the starting and ending values of the range must be odd numbers. Valid range: [3, inf), default value: (3, 7)
allow_shifted (bool): whether the core has shift. If True, it means creating a kernel without shift. If it is False, the kernel will be randomly shifted. Default value: True.
# source code
class MotionBlur(Blur):
"""Apply motion blur to the input image using a random-sized kernel.
Args:
blur_limit (int): maximum kernel size for blurring the input image.
Should be in range [3, inf). Default: (3, 7).
allow_shifted (bool): if set to true creates non shifted kernels only,
otherwise creates randomly shifted kernels. Default: True.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
blur_limit: ScaleIntType = 7,
allow_shifted: bool = True,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(blur_limit=blur_limit, always_apply=always_apply, p=p)
self.allow_shifted = allow_shifted
if not allow_shifted and self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
raise ValueError(f"Blur limit must be odd when centered=True. Got: {
self.blur_limit}")
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return super().get_transform_init_args_names() + ("allow_shifted",)
def apply(self, img: np.ndarray, kernel: np.ndarray = None, **params) -> np.ndarray: # type: ignore
return FMain.convolve(img, kernel=kernel)
def get_params(self) -> Dict[str, Any]:
ksize = random.choice(np.arange(self.blur_limit[0], self.blur_limit[1] + 1, 2))
if ksize <= 2:
raise ValueError("ksize must be > 2. Got: {}".format(ksize))
kernel = np.zeros((ksize, ksize), dtype=np.uint8)
x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
if x1 == x2:
y1, y2 = random.sample(range(ksize), 2)
else:
y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
def make_odd_val(v1, v2):
len_v = abs(v1 - v2) + 1
if len_v % 2 != 1:
if v2 > v1:
v2 -= 1
else:
v1 -= 1
return v1, v2
if not self.allow_shifted:
x1, x2 = make_odd_val(x1, x2)
y1, y2 = make_odd_val(y1, y2)
xc = (x1 + x2) / 2
yc = (y1 + y2) / 2
center = ksize / 2 - 0.5
dx = xc - center
dy = yc - center
x1, x2 = [int(i - dx) for i in [x1, x2]]
y1, y2 = [int(i - dy) for i in [y1, y2]]
cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)
# Normalize kernel
return {
"kernel": kernel.astype(np.float32) / np.sum(kernel)}
NoticeIt does not mean that the larger the blur_limit value is, the blurr the image is., blur_limit only represents the value range of ksize. The blur kernel in the code is sampled within the range of (0, ksize), so the final sampled value can be large or small.The blur_limit value only represents the upper limit of the blur degree.
Even if the blur_limit parameters are the same and the code is run several times, the degree of blur in the result graph will be different, but the result graph with the greatest degree of blur must be produced in the function with the largest blur_limit value.
Function : Multiply the image by a random number or array.
Parameter description: multiplier (float or tuple of floats): the number by which the image is multiplied. If the input is an interval, the multiplier factor will [multiplier[0], multiplier[1])
be randomly sampled within the interval. Default: (0.9, 1.1).
per_channel (bool): Whether to operate each channel individually. If True, the multiplier factor is different for each channel. Default False.
elementwise (bool): Whether it is a pixel-level operation. If it is True, the multiplicative factor of each pixel is randomly generated. Default False.
# source code
class MultiplicativeNoise(ImageOnlyTransform):
"""Multiply image to random number or array of numbers.
Args:
multiplier (float or tuple of floats): If single float image will be multiplied to this number.
If tuple of float multiplier will be in range `[multiplier[0], multiplier[1])`. Default: (0.9, 1.1).
per_channel (bool): If `False`, same values for all channels will be used.
If `True` use sample values for each channels. Default False.
elementwise (bool): If `False` multiply multiply all pixels in an image with a random value sampled once.
If `True` Multiply image pixels with values that are pixelwise randomly sampled. Defaule: False.
Targets:
image
Image types:
Any
"""
def __init__(
self,
multiplier=(0.9, 1.1),
per_channel=False,
elementwise=False,
always_apply=False,
p=0.5,
):
super(MultiplicativeNoise, self).__init__(always_apply, p)
self.multiplier = to_tuple(multiplier, multiplier)
self.per_channel = per_channel
self.elementwise = elementwise
def apply(self, img, multiplier=np.array([1]), **kwargs):
return F.multiply(img, multiplier)
def get_params_dependent_on_targets(self, params):
if self.multiplier[0] == self.multiplier[1]:
return {
"multiplier": np.array([self.multiplier[0]])}
img = params["image"]
h, w = img.shape[:2]
if self.per_channel:
c = 1 if F.is_grayscale_image(img) else img.shape[-1]
else:
c = 1
if self.elementwise:
shape = [h, w, c]
else:
shape = [c]
multiplier = np.random.uniform(self.multiplier[0], self.multiplier[1], shape)
if F.is_grayscale_image(img) and img.ndim == 2:
multiplier = np.squeeze(multiplier)
return {
"multiplier": multiplier}
@property
def targets_as_params(self):
return ["image"]
def get_transform_init_args_names(self):
return "multiplier", "per_channel", "elementwise"
There is more noise when elementwise =True, because each pixel is independent.
Function : Image normalization Normalization
formula: img = (img - mean * max_pixel_value) / (std * max_pixel_value)
is equivalent to: img = (img / max_pixel_value - mean) / std
Default parameters:
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225),
max_pixel_value=255.0
class Normalize(ImageOnlyTransform):
"""Normalization is applied by the formula: `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`
Args:
mean (float, list of float): mean values
std (float, list of float): std values
max_pixel_value (float): maximum possible pixel value
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225),
max_pixel_value=255.0,
always_apply=False,
p=1.0,
):
super(Normalize, self).__init__(always_apply, p)
self.mean = mean
self.std = std
self.max_pixel_value = max_pixel_value
def apply(self, image, **params):
return F.normalize(image, self.mean, self.std, self.max_pixel_value)
def get_transform_init_args_names(self):
return ("mean", "std", "max_pixel_value")
Function :
# source code
Function : Reduce the number of bits in each color channel to achieve tonal layering. So the valid range of parameter num_bits is [0, 8].
Parameters: num_bits ((int, int) or int, or list of ints [r, g, b], or list of ints [[r1, r1], [g1, g2], [b1, b2]]): number of high bits.
The smaller the num_bits number, the more obvious the tonal layering. Valid value range: [0, 8], default value: 4.
# source code
class Posterize(ImageOnlyTransform):
"""Reduce the number of bits for each color channel.
Args:
num_bits ((int, int) or int,
or list of ints [r, g, b],
or list of ints [[r1, r1], [g1, g2], [b1, b2]]): number of high bits.
If num_bits is a single value, the range will be [num_bits, num_bits].
Must be in range [0, 8]. Default: 4.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
"""
def __init__(self, num_bits=4, always_apply=False, p=0.5):
super(Posterize, self).__init__(always_apply, p)
if isinstance(num_bits, (list, tuple)):
if len(num_bits) == 3:
self.num_bits = [to_tuple(i, 0) for i in num_bits]
else:
self.num_bits = to_tuple(num_bits, 0)
else:
self.num_bits = to_tuple(num_bits, num_bits)
def apply(self, image, num_bits=1, **params):
return F.posterize(image, num_bits)
def get_params(self):
if len(self.num_bits) == 3:
return {
"num_bits":
[random.randint(i[0], i[1]) for i in self.num_bits]
}
return {
"num_bits": random.randint(self.num_bits[0], self.num_bits[1])}
def get_transform_init_args_names(self):
return ("num_bits", )
Function :
Parameter description of the value offset on each RGB channel : r_shift_limit, g_shift_limit, b_shift_limit ((int, int) or int) respectively represent the value offset on the R, G, and B channels. If it is input as a single number, it will be converted to Interval (-shift_limit, shift_limit)
, the final applied value is randomly sampled within the interval.
# source code
class RGBShift(ImageOnlyTransform):
"""Randomly shift values for each channel of the input RGB image.
Args:
r_shift_limit ((int, int) or int): range for changing values for the red channel. If r_shift_limit is a single
int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).
g_shift_limit ((int, int) or int): range for changing values for the green channel. If g_shift_limit is a
single int, the range will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).
b_shift_limit ((int, int) or int): range for changing values for the blue channel. If b_shift_limit is a single
int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
r_shift_limit=20,
g_shift_limit=20,
b_shift_limit=20,
always_apply=False,
p=0.5,
):
super(RGBShift, self).__init__(always_apply, p)
self.r_shift_limit = to_tuple(r_shift_limit)
self.g_shift_limit = to_tuple(g_shift_limit)
self.b_shift_limit = to_tuple(b_shift_limit)
def apply(self, image, r_shift=0, g_shift=0, b_shift=0, **params):
if not F.is_rgb_image(image):
raise TypeError("RGBShift transformation expects 3-channel images.")
return F.shift_rgb(image, r_shift, g_shift, b_shift)
def get_params(self):
return {
"r_shift": random.uniform(self.r_shift_limit[0], self.r_shift_limit[1]),
"g_shift": random.uniform(self.g_shift_limit[0], self.g_shift_limit[1]),
"b_shift": random.uniform(self.b_shift_limit[0], self.b_shift_limit[1]),
}
def get_transform_init_args_names(self):
return ("r_shift_limit", "g_shift_limit", "b_shift_limit")
# F.shift_rgb,对于逐像素应用统一计算公式可使用查找表方式(cv2.LUT,look up table)
def _shift_image_uint8(img, value):
max_value = MAX_VALUES_BY_DTYPE[img.dtype]
lut = np.arange(0, max_value + 1).astype("float32")
lut += value
lut = np.clip(lut, 0, max_value).astype(img.dtype)
return cv2.LUT(img, lut)
@preserve_shape
def _shift_rgb_uint8(img, r_shift, g_shift, b_shift):
if r_shift == g_shift == b_shift:
h, w, c = img.shape
img = img.reshape([h, w * c])
return _shift_image_uint8(img, r_shift)
result_img = np.empty_like(img)
shifts = [r_shift, g_shift, b_shift]
for i, shift in enumerate(shifts):
result_img[..., i] = _shift_image_uint8(img[..., i], shift)
return result_img
def shift_rgb(img, r_shift, g_shift, b_shift):
if img.dtype == np.uint8:
return _shift_rgb_uint8(img, r_shift, g_shift, b_shift)
return _shift_rgb_non_uint8(img, r_shift, g_shift, b_shift)
Function : Randomly change the brightness and contrast of the input image. Similar transformation: ColorJitter
parameter description:
- brightness_limit ((float, float) or float): brightness change factor. If entered as a single number, it will be converted into an interval
(-limit, limit)
. Default value: (-0.2, 0.2) - contrast_limit ((float, float) or float): Contrast change factor. If entered as a single number, it will be converted into an interval
(-limit, limit)
. Default value: (-0.2, 0.2) - brightness_by_max (Boolean): If True, the contrast is adjusted by the maximum value of the image dtype. If False, the contrast is adjusted by the average value of the image. Default value: True
# source code
class RandomBrightnessContrast(ImageOnlyTransform):
"""Randomly change brightness and contrast of the input image.
Args:
brightness_limit ((float, float) or float): factor range for changing brightness.
If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
contrast_limit ((float, float) or float): factor range for changing contrast.
If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
brightness_by_max (Boolean): If True adjust contrast by image dtype maximum,
else adjust contrast by image mean.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
brightness_limit=0.2,
contrast_limit=0.2,
brightness_by_max=True,
always_apply=False,
p=0.5,
):
super(RandomBrightnessContrast, self).__init__(always_apply, p)
self.brightness_limit = to_tuple(brightness_limit)
self.contrast_limit = to_tuple(contrast_limit)
self.brightness_by_max = brightness_by_max
def apply(self, img, alpha=1.0, beta=0.0, **params):
return F.brightness_contrast_adjust(img, alpha, beta,
self.brightness_by_max)
def get_params(self):
return {
"alpha":
1.0 +
random.uniform(self.contrast_limit[0], self.contrast_limit[1]),
"beta":
0.0 +
random.uniform(self.brightness_limit[0], self.brightness_limit[1]),
}
def get_transform_init_args_names(self):
return ("brightness_limit", "contrast_limit", "brightness_by_max")
Brightness change (contrast_limit=(0.1, 0.1), brightness_by_max=True):
Contrast change (brightness_limit=(0.01, 0.01), brightness_by_max=True):
brightness_by_max变化:
brightness_limit=(0.1, 0.1), contrast_limit=(0.1, 0.1)
brightness_limit=(-0.1, -0.1), contrast_limit=(-0.1, -0.1)
Function : Add fog effect to the input image.
Parameter description: All parameters are float type, and the valid interval is [0, 1].
fog_coef_lower, fog_coef_upper: the minimum and maximum value of the fog intensity coefficient. The final applied intensity parameter is sampled and obtained within this range. Default range: [0.3, 1]
alpha_coef : The transparency of the fog circle. Default value: 0.08
# source code
class RandomFog(ImageOnlyTransform):
"""Simulates fog for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
fog_coef_lower (float): lower limit for fog intensity coefficient. Should be in [0, 1] range.
fog_coef_upper (float): upper limit for fog intensity coefficient. Should be in [0, 1] range.
alpha_coef (float): transparency of the fog circles. Should be in [0, 1] range.
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
fog_coef_lower=0.3,
fog_coef_upper=1,
alpha_coef=0.08,
always_apply=False,
p=0.5,
):
super(RandomFog, self).__init__(always_apply, p)
if not 0 <= fog_coef_lower <= fog_coef_upper <= 1:
raise ValueError(
"Invalid combination if fog_coef_lower and fog_coef_upper. Got: {}"
.format((fog_coef_lower, fog_coef_upper)))
if not 0 <= alpha_coef <= 1:
raise ValueError(
"alpha_coef must be in range [0, 1]. Got: {}".format(
alpha_coef))
self.fog_coef_lower = fog_coef_lower
self.fog_coef_upper = fog_coef_upper
self.alpha_coef = alpha_coef
def apply(self, image, fog_coef=0.1, haze_list=(), **params):
return F.add_fog(image, fog_coef, self.alpha_coef, haze_list)
@property
def targets_as_params(self):
return ["image"]
def get_params_dependent_on_targets(self, params):
img = params["image"]
fog_coef = random.uniform(self.fog_coef_lower, self.fog_coef_upper)
height, width = imshape = img.shape[:2]
hw = max(1, int(width // 3 * fog_coef))
haze_list = []
midx = width // 2 - 2 * hw
midy = height // 2 - hw
index = 1
while midx > -hw or midy > -hw:
for _i in range(hw // 10 * index):
x = random.randint(midx, width - midx - hw)
y = random.randint(midy, height - midy - hw)
haze_list.append((x, y))
midx -= 3 * hw * width // sum(imshape)
midy -= 3 * hw * height // sum(imshape)
index += 1
return {
"haze_list": haze_list, "fog_coef": fog_coef}
def get_transform_init_args_names(self):
return ("fog_coef_lower", "fog_coef_upper", "alpha_coef")
Image enhancement - gamma transformation:
when gamma<1, the overall brightness is brightened;
when gamma>1, the overall darkening
# source code
class RandomGamma(ImageOnlyTransform):
"""
Args:
gamma_limit (float or (float, float)): If gamma_limit is a single float value,
the range will be (-gamma_limit, gamma_limit). Default: (80, 120).
eps: Deprecated.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, gamma_limit=(80, 120), eps=None, always_apply=False, p=0.5):
super(RandomGamma, self).__init__(always_apply, p)
self.gamma_limit = to_tuple(gamma_limit)
self.eps = eps
def apply(self, img, gamma=1, **params):
return F.gamma_transform(img, gamma=gamma)
def get_params(self):
return {
"gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}
def get_transform_init_args_names(self):
return ("gamma_limit", "eps")
Main parameters: gamma_limit
, default (80, 120), if only one value is entered, it will be converted to (-gamma_limit, gamma_limit). It can be seen
from get_params()
the function that gamma_limit is 100 times the gamma parameter, so when the value in the gamma_limit range is >100, the image Darken. When the value in the gamma_limit range is <100, the image becomes brighter.
Function : Add rain effect to the input image
Parameter description:
# 默认参数
slant_lower=-10,
slant_upper=10,
drop_length=20,
drop_width=1,
drop_color=(200, 200, 200),
blur_value=7,
brightness_coefficient=0.7,
rain_type=None
-
slant_lower, slant_upper: control the slope of the rain line, the value range is [-20, 20]. If slant_sample < 0, the rain line tilts to the left, otherwise it tilts to the right.
-
drop_length: Rainline length, value range [0, 100]. When the rain_type parameter is specified, the passed drop_length is invalid and the built-in value is used. See the code of the rain_type parameter.
-
drop_width: Rain line width, value range [1, 5].
-
drop_color (list of (r, g, b)): Rainline color.
# drop_length,drop_width, drop_color 都是绘制雨线(cv2.line)的参数 for (rain_drop_x0, rain_drop_y0) in rain_drops: rain_drop_x1 = rain_drop_x0 + slant rain_drop_y1 = rain_drop_y0 + drop_length cv2.line( image, (rain_drop_x0, rain_drop_y0), (rain_drop_x1, rain_drop_y1), drop_color, drop_width, )
-
blur_value (int): kernel_size of cv2.blur(), it is necessary to blur the rainy day scene, because most rainy days are hazy.
-
brightness_coefficient (float): brightness factor, value range [0, 1]. Because rainy days are often cloudy and lack light.
-
rain_type: Rain degree, One of [None, “drizzle”, “heavy”, “torrential”], increasing from left to right.
if self.rain_type == "drizzle": num_drops = area // 770 drop_length = 10 elif self.rain_type == "heavy": num_drops = width * height // 600 drop_length = 30 elif self.rain_type == "torrential": num_drops = area // 500 drop_length = 60 else: drop_length = self.drop_length num_drops = area // 600
# source code
class RandomRain(ImageOnlyTransform):
"""Adds rain effects.
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
slant_lower: should be in range [-20, 20].
slant_upper: should be in range [-20, 20].
drop_length: should be in range [0, 100].
drop_width: should be in range [1, 5].
drop_color (list of (r, g, b)): rain lines color.
blur_value (int): rainy view are blurry
brightness_coefficient (float): rainy days are usually shady. Should be in range [0, 1].
rain_type: One of [None, "drizzle", "heavy", "torrential"]
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
slant_lower=-10,
slant_upper=10,
drop_length=20,
drop_width=1,
drop_color=(200, 200, 200),
blur_value=7,
brightness_coefficient=0.7,
rain_type=None,
always_apply=False,
p=0.5,
):
super(RandomRain, self).__init__(always_apply, p)
if rain_type not in ["drizzle", "heavy", "torrential", None]:
raise ValueError("raint_type must be one of ({}). Got: {}".format(
["drizzle", "heavy", "torrential", None], rain_type))
if not -20 <= slant_lower <= slant_upper <= 20:
raise ValueError(
"Invalid combination of slant_lower and slant_upper. Got: {}".
format((slant_lower, slant_upper)))
if not 1 <= drop_width <= 5:
raise ValueError(
"drop_width must be in range [1, 5]. Got: {}".format(
drop_width))
if not 0 <= drop_length <= 100:
raise ValueError(
"drop_length must be in range [0, 100]. Got: {}".format(
drop_length))
if not 0 <= brightness_coefficient <= 1:
raise ValueError(
"brightness_coefficient must be in range [0, 1]. Got: {}".
format(brightness_coefficient))
self.slant_lower = slant_lower
self.slant_upper = slant_upper
self.drop_length = drop_length
self.drop_width = drop_width
self.drop_color = drop_color
self.blur_value = blur_value
self.brightness_coefficient = brightness_coefficient
self.rain_type = rain_type
def apply(self, image, slant=10, drop_length=20, rain_drops=(), **params):
return F.add_rain(
image,
slant,
drop_length,
self.drop_width,
self.drop_color,
self.blur_value,
self.brightness_coefficient,
rain_drops,
)
@property
def targets_as_params(self):
return ["image"]
def get_params_dependent_on_targets(self, params):
img = params["image"]
slant = int(random.uniform(self.slant_lower, self.slant_upper))
height, width = img.shape[:2]
area = height * width
if self.rain_type == "drizzle":
num_drops = area // 770
drop_length = 10
elif self.rain_type == "heavy":
num_drops = width * height // 600
drop_length = 30
elif self.rain_type == "torrential":
num_drops = area // 500
drop_length = 60
else:
drop_length = self.drop_length
num_drops = area // 600
rain_drops = []
for _i in range(
num_drops): # If You want heavy rain, try increasing this
if slant < 0:
x = random.randint(slant, width)
else:
x = random.randint(0, width - slant)
y = random.randint(0, height - drop_length)
rain_drops.append((x, y))
return {
"drop_length": drop_length,
"slant": slant,
"rain_drops": rain_drops
}
def get_transform_init_args_names(self):
return (
"slant_lower",
"slant_upper",
"drop_length",
"drop_width",
"drop_color",
"blur_value",
"brightness_coefficient",
"rain_type",
)
Visual analysis:
Parameters used for parameters not indicated on the graph.
When rain_type=None, drop_length takes effect. The length of 30 in the lower left is longer than the default length of 20 in the upper right rain line.
When rain_type is in ["drizzle", "heavy", "torrential"], drop_length is invalid and the built-in length is used. The corresponding length of torrential mode is 60. So although the drop_length values in the upper right and lower right images are the same, the lengths of the rain lines are different.
Function :
# source code
Function :
# source code
Function: Simulate solar flare effect
parameter description:
-
flare_roi (float, float, float, float): Flare position (x_min, y_min, x_max, y_max). All values are in the range [0, 1]. Default value: (0, 0, 1, 0.5)
-
angle_lower、angle_upper (float): 应满足 0 <= angle_lower < angle_upper <= 1
-
num_flare_circles_lower, num_flare_circles_upper (int): Number of flare circles. Should satisfy 0 <= num_flare_circles_lower < num_flare_circles_upper.
-
src_radius (int): Flare radius (src_radius is the largest radius, the inner radius is sampled at equal intervals), the default value is 400. Combined with the fixed value of image resolution, it doesn't matter if it is slightly larger, as the weight of the outer ring halo is very small.
num_times = src_radius // 10 rad = np.linspace(1, src_radius, num=num_times) # 等间隔采样 for i in range(num_times): cv2.circle(overlay, point, int(rad[i]), src_color, -1) ...
-
src_color ((int, int, int)): flare color
# source code
class RandomSunFlare(ImageOnlyTransform):
"""Simulates Sun Flare for the image
From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library
Args:
flare_roi (float, float, float, float): region of the image where flare will
appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].
angle_lower (float): should be in range [0, `angle_upper`].
angle_upper (float): should be in range [`angle_lower`, 1].
num_flare_circles_lower (int): lower limit for the number of flare circles.
Should be in range [0, `num_flare_circles_upper`].
num_flare_circles_upper (int): upper limit for the number of flare circles.
Should be in range [`num_flare_circles_lower`, inf].
src_radius (int):
src_color ((int, int, int)): color of the flare
Targets:
image
Image types:
uint8, float32
"""
def __init__(
self,
flare_roi=(0, 0, 1, 0.5),
angle_lower=0,
angle_upper=1,
num_flare_circles_lower=6,
num_flare_circles_upper=10,
src_radius=400,
src_color=(255, 255, 255),
always_apply=False,
p=0.5,
):
super(RandomSunFlare, self).__init__(always_apply, p)
(
flare_center_lower_x,
flare_center_lower_y,
flare_center_upper_x,
flare_center_upper_y,
) = flare_roi
if (
not 0 <= flare_center_lower_x < flare_center_upper_x <= 1
or not 0 <= flare_center_lower_y < flare_center_upper_y <= 1
):
raise ValueError("Invalid flare_roi. Got: {}".format(flare_roi))
if not 0 <= angle_lower < angle_upper <= 1:
raise ValueError(
"Invalid combination of angle_lower nad angle_upper. Got: {}".format((angle_lower, angle_upper))
)
if not 0 <= num_flare_circles_lower < num_flare_circles_upper:
raise ValueError(
"Invalid combination of num_flare_circles_lower nad num_flare_circles_upper. Got: {}".format(
(num_flare_circles_lower, num_flare_circles_upper)
)
)
self.flare_center_lower_x = flare_center_lower_x
self.flare_center_upper_x = flare_center_upper_x
self.flare_center_lower_y = flare_center_lower_y
self.flare_center_upper_y = flare_center_upper_y
self.angle_lower = angle_lower
self.angle_upper = angle_upper
self.num_flare_circles_lower = num_flare_circles_lower
self.num_flare_circles_upper = num_flare_circles_upper
self.src_radius = src_radius
self.src_color = src_color
def apply(self, image, flare_center_x=0.5, flare_center_y=0.5, circles=(), **params):
return F.add_sun_flare(
image,
flare_center_x,
flare_center_y,
self.src_radius,
self.src_color,
circles,
)
@property
def targets_as_params(self):
return ["image"]
def get_params_dependent_on_targets(self, params):
img = params["image"]
height, width = img.shape[:2]
angle = 2 * math.pi * random.uniform(self.angle_lower, self.angle_upper)
flare_center_x = random.uniform(self.flare_center_lower_x, self.flare_center_upper_x)
flare_center_y = random.uniform(self.flare_center_lower_y, self.flare_center_upper_y)
flare_center_x = int(width * flare_center_x)
flare_center_y = int(height * flare_center_y)
num_circles = random.randint(self.num_flare_circles_lower, self.num_flare_circles_upper)
circles = []
x = []
y = []
for rand_x in range(0, width, 10):
rand_y = math.tan(angle) * (rand_x - flare_center_x) + flare_center_y
x.append(rand_x)
y.append(2 * flare_center_y - rand_y)
for _i in range(num_circles):
alpha = random.uniform(0.05, 0.2)
r = random.randint(0, len(x) - 1)
rad = random.randint(1, max(height // 100 - 2, 2))
r_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
g_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
b_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
circles += [
(
alpha,
(int(x[r]), int(y[r])),
pow(rad, 3),
(r_color, g_color, b_color),
)
]
return {
"circles": circles,
"flare_center_x": flare_center_x,
"flare_center_y": flare_center_y,
}
def get_transform_init_args(self):
return {
"flare_roi": (
self.flare_center_lower_x,
self.flare_center_lower_y,
self.flare_center_upper_x,
self.flare_center_upper_y,
),
"angle_lower": self.angle_lower,
"angle_upper": self.angle_upper,
"num_flare_circles_lower": self.num_flare_circles_lower,
"num_flare_circles_upper": self.num_flare_circles_upper,
"src_radius": self.src_radius,
"src_color": self.src_color,
}
Function: Sharpen. (Similar methods include UnsharpMask
)
Parameter description: alpha ((float, float)): Controls the degree of visualization of the sharpened image. Alpha=0 means only retaining the original image, alpha=1.0 means only retaining the sharpened image.
lightness ((float, float)): Controls the brightness of the sharpened image.
# source code
class Sharpen(ImageOnlyTransform):
"""Sharpen the input image and overlays the result with the original image.
Args:
alpha ((float, float)): range to choose the visibility of the sharpened image. At 0, only the original image is
visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
lightness ((float, float)): range to choose the lightness of the sharpened image. Default: (0.5, 1.0).
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
"""
def __init__(self,
alpha=(0.2, 0.5),
lightness=(0.5, 1.0),
always_apply=False,
p=0.5):
super(Sharpen, self).__init__(always_apply, p)
self.alpha = self.__check_values(to_tuple(alpha, 0.0),
name="alpha",
bounds=(0.0, 1.0))
self.lightness = self.__check_values(to_tuple(lightness, 0.0),
name="lightness")
@staticmethod
def __check_values(value, name, bounds=(0, float("inf"))):
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError("{} values should be between {}".format(
name, bounds))
return value
@staticmethod
def __generate_sharpening_matrix(alpha_sample, lightness_sample):
matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]],
dtype=np.float32)
matrix_effect = np.array(
[[-1, -1, -1], [-1, 8 + lightness_sample, -1], [-1, -1, -1]],
dtype=np.float32,
)
matrix = (
1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect
return matrix
def get_params(self):
alpha = random.uniform(*self.alpha)
lightness = random.uniform(*self.lightness)
sharpening_matrix = self.__generate_sharpening_matrix(
alpha_sample=alpha, lightness_sample=lightness)
return {
"sharpening_matrix": sharpening_matrix}
def apply(self, img, sharpening_matrix=None, **params):
return F.convolve(img, sharpening_matrix)
def get_transform_init_args_names(self):
return ("alpha", "lightness")
The effect is stronger than UnsharpMask, and the sharpening effect of UnsharpMask is more natural.
Function: Invert pixels greater than the threshold (if the input is uint8, the inversion is 255 - pixel_value)
# source code
class Solarize(ImageOnlyTransform):
"""Invert all pixel values above a threshold.
Args:
threshold ((int, int) or int, or (float, float) or float): range for solarizing threshold.
If threshold is a single value, the range will be [threshold, threshold]. Default: 128.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
any
"""
def __init__(self, threshold=128, always_apply=False, p=0.5):
super(Solarize, self).__init__(always_apply, p)
if isinstance(threshold, (int, float)):
self.threshold = to_tuple(threshold, low=threshold)
else:
self.threshold = to_tuple(threshold, low=0)
def apply(self, image, threshold=0, **params):
return F.solarize(image, threshold)
def get_params(self):
return {
"threshold": random.uniform(self.threshold[0], self.threshold[1])
}
def get_transform_init_args_names(self):
return ("threshold", )
# F.solarize
def solarize(img, threshold=128):
"""Invert all pixel values above a threshold.
Args:
img (numpy.ndarray): The image to solarize.
threshold (int): All pixels above this greyscale level are inverted.
Returns:
numpy.ndarray: Solarized image.
"""
dtype = img.dtype
max_val = MAX_VALUES_BY_DTYPE[dtype]
if dtype == np.dtype("uint8"):
lut = [(i if i < threshold else max_val - i) for i in range(max_val + 1)]
prev_shape = img.shape
img = cv2.LUT(img, np.array(lut, dtype=dtype))
if len(prev_shape) != len(img.shape):
img = np.expand_dims(img, -1)
return img
result_img = img.copy()
cond = img >= threshold
result_img[cond] = max_val - result_img[cond]
return result_img
Function: Splash effect, which can simulate rain or mud blocking the lens.
Parameter description: mean (float, or tuple of floats): Generates (liquid layer)
the normal distribution mean of the liquid layer. [mean[0], mean[1])
If a single number is used directly as the mean, if it is an interval parameter, it means that a value is randomly sampled within this interval as the mean. Default: 0.65
std (float, or tuple of floats): Generates the normally distributed variance of the liquid layer. [std[0], std[1])
If a single number is used directly as the variance, if it is an interval parameter, it means that a value is randomly sampled within this interval as the variance. Default value: 0.3
gauss_sigma (float, or tuple of floats): Gaussian filter sigma value of the liquid layer. [sigma[0], sigma[1])
If a single number is used directly as the variance, if it is an interval parameter, it means that a value is randomly sampled as sigma within this interval . Default value: 2
cutout_threshold (float, or tuple of floats): Liquid layer filtering threshold. If a single number is used directly as the threshold, if it is an interval parameter, it means that a value is randomly sampled within this interval [cutout_threshold[0], cutout_threshold[1])
as the threshold. Default: 0.68
intensity (float, or tuple of floats): Splash intensity. If a single number is used directly as the threshold, if it is an interval parameter, it means that a value is randomly sampled within this interval [intensity[0], intensity[1])
as the threshold. Default: 0.6
mode (string, or list of strings): Splash type. Supported options are 'rain' and 'mud'. If the parameter is provided mode=["rain", "mud"]
, it means that a splash mode is randomly selected for the current image. Default: 'rain'
mean, std, gauss_sigma all affect the size of raindrops or mud spots.
cutout_threshold will affect the coverage density and area of raindrops or mud spots.
Intensity affects the severity of rain or mud spots.
All values are recommended if adjustments are neededOnly fine-tuning! ! ! !
The specific visual comparison results can be found at the end of the source code.
Note: The mean parameter cannot deviate too much from 0.65. It is recommended to use the default value. If it is set to 0.5, it will cause errors (rain mode) and cannot produce correct results. If the setting value is too large, the image will completely deviate from the desired result.
Error message: divide by zero encountered in true_divide m *= 1 / np.max(m, axis=(0, 1))
The following shows the results of different mean values in different modes:
rain mode:
mud mode:
# source code
class Spatter(ImageOnlyTransform):
"""
Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.
Args:
mean (float, or tuple of floats): Mean value of normal distribution for generating liquid layer.
If single float it will be used as mean.
If tuple of float mean will be sampled from range `[mean[0], mean[1])`. Default: (0.65).
std (float, or tuple of floats): Standard deviation value of normal distribution for generating liquid layer.
If single float it will be used as std.
If tuple of float std will be sampled from range `[std[0], std[1])`. Default: (0.3).
gauss_sigma (float, or tuple of floats): Sigma value for gaussian filtering of liquid layer.
If single float it will be used as gauss_sigma.
If tuple of float gauss_sigma will be sampled from range `[sigma[0], sigma[1])`. Default: (2).
cutout_threshold (float, or tuple of floats): Threshold for filtering liqued layer
(determines number of drops). If single float it will used as cutout_threshold.
If tuple of float cutout_threshold will be sampled from range `[cutout_threshold[0], cutout_threshold[1])`.
Default: (0.68).
intensity (float, or tuple of floats): Intensity of corruption.
If single float it will be used as intensity.
If tuple of float intensity will be sampled from range `[intensity[0], intensity[1])`. Default: (0.6).
mode (string, or list of strings): Type of corruption. Currently, supported options are 'rain' and 'mud'.
If list is provided type of corruption will be sampled list. Default: ("rain").
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
Reference:
| https://arxiv.org/pdf/1903.12261.pdf
| https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
"""
def __init__(
self,
mean: ScaleFloatType = 0.65,
std: ScaleFloatType = 0.3,
gauss_sigma: ScaleFloatType = 2,
cutout_threshold: ScaleFloatType = 0.68,
intensity: ScaleFloatType = 0.6,
mode: Union[str, Sequence[str]] = "rain",
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.mean = to_tuple(mean, mean)
self.std = to_tuple(std, std)
self.gauss_sigma = to_tuple(gauss_sigma, gauss_sigma)
self.intensity = to_tuple(intensity, intensity)
self.cutout_threshold = to_tuple(cutout_threshold, cutout_threshold)
self.mode = mode if isinstance(mode, (list, tuple)) else [mode]
for i in self.mode:
if i not in ["rain", "mud"]:
raise ValueError(
f"Unsupported color mode: {
mode}. Transform supports only `rain` and `mud` mods."
)
def apply(self,
img: np.ndarray,
non_mud: Optional[np.ndarray] = None,
mud: Optional[np.ndarray] = None,
drops: Optional[np.ndarray] = None,
mode: str = "",
**params) -> np.ndarray:
return F.spatter(img, non_mud, mud, drops, mode)
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_params_dependent_on_targets(
self, params: Dict[str, Any]) -> Dict[str, Any]:
h, w = params["image"].shape[:2]
mean = random.uniform(self.mean[0], self.mean[1])
std = random.uniform(self.std[0], self.std[1])
cutout_threshold = random.uniform(self.cutout_threshold[0],
self.cutout_threshold[1])
sigma = random.uniform(self.gauss_sigma[0], self.gauss_sigma[1])
mode = random.choice(self.mode)
intensity = random.uniform(self.intensity[0], self.intensity[1])
liquid_layer = random_utils.normal(size=(h, w), loc=mean, scale=std)
liquid_layer = gaussian_filter(liquid_layer,
sigma=sigma,
mode="nearest")
liquid_layer[liquid_layer < cutout_threshold] = 0
if mode == "rain":
liquid_layer = (liquid_layer * 255).astype(np.uint8)
dist = 255 - cv2.Canny(liquid_layer, 50, 150)
dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
_, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
dist = blur(dist, 3).astype(np.uint8)
dist = F.equalize(dist)
ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
dist = F.convolve(dist, ker)
dist = blur(dist, 3).astype(np.float32)
m = liquid_layer * dist
m *= 1 / np.max(m, axis=(0, 1))
drops = m[:, :, None] * np.array(
[238 / 255.0, 238 / 255.0, 175 / 255.0]) * intensity
mud = None
non_mud = None
else:
m = np.where(liquid_layer > cutout_threshold, 1, 0)
m = gaussian_filter(m.astype(np.float32),
sigma=sigma,
mode="nearest")
m[m < 1.2 * cutout_threshold] = 0
m = m[..., np.newaxis]
mud = m * np.array([20 / 255.0, 42 / 255.0, 63 / 255.0])
non_mud = 1 - m
drops = None
return {
"non_mud": non_mud,
"mud": mud,
"drops": drops,
"mode": mode,
}
def get_transform_init_args_names(
self) -> Tuple[str, str, str, str, str, str]:
return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode"
The following visualizes the results of different parameter changes. Parameters not shown on the figure use the default parameters.
mean change:
std changes:
gauss_sigma changes:
cutout_threshold changes:
Splash intensity changes:
Splash mode mode change:
The image in the lower right corner randomly selects the rain mode.
Conceptual understanding:
The superpixel concept is an image segmentation technology proposed and developed by Xiaofeng Ren in 2003. It refers to irregular pixel blocks with certain visual significance composed of adjacent pixels with similar texture, color, brightness and other characteristics. It uses the similarity of features between pixels to group pixels, and uses a small number of superpixels instead of a large number of pixels to express image features, which greatly reduces the complexity of image post-processing, so it is usually used as a preprocessing step in segmentation algorithms.
Function: Convert part or all of the image into superpixel representation, using the SLIC (simple linear iterative cluster) algorithm.
Parameter Description:
-
p_replace (float or tuple of float): Indicates the probability that the current image segmentation block has p_replace and is filled with average color.
p_replace=0, means to retain the original image;
p_replace=0.5, means that about half of all segmented blocks are filled with average color;
p_replace=1.0, means that all segmented blocks are filled with average color, generating a voronoi image (Tyson polygon image); -
n_segments (int, or tuple of int): Approximate number of superpixels generated (the algorithm may deviate from this number)
-
max_size (int or None): Indicates the maximum size of the long side of the image. If it exceeds the size, it will be proportionally resized to this size (the purpose is to speed up the algorithm), and the final result will be resized to the original size. If
max_size = None
it means not to reize. -
interpolation (OpenCV flag): opencv interpolation method, default linear interpolation (cv2.INTER_LINEAR).
The interpolation method can enumerate values:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4
# source code
class Superpixels(ImageOnlyTransform):
"""Transform images partially/completely to their superpixel representation.
This implementation uses skimage's version of the SLIC algorithm.
Args:
p_replace (float or tuple of float): Defines for any segment the probability that the pixels within that
segment are replaced by their average color (otherwise, the pixels are not changed).
Examples:
* A probability of ``0.0`` would mean, that the pixels in no
segment are replaced by their average color (image is not
changed at all).
* A probability of ``0.5`` would mean, that around half of all
segments are replaced by their average color.
* A probability of ``1.0`` would mean, that all segments are
replaced by their average color (resulting in a voronoi
image).
Behaviour based on chosen data types for this parameter:
* If a ``float``, then that ``flat`` will always be used.
* If ``tuple`` ``(a, b)``, then a random probability will be
sampled from the interval ``[a, b]`` per image.
n_segments (int, or tuple of int): Rough target number of how many superpixels to generate (the algorithm
may deviate from this number). Lower value will lead to coarser superpixels.
Higher values are computationally more intensive and will hence lead to a slowdown
* If a single ``int``, then that value will always be used as the
number of segments.
* If a ``tuple`` ``(a, b)``, then a value from the discrete
interval ``[a..b]`` will be sampled per image.
max_size (int or None): Maximum image size at which the augmentation is performed.
If the width or height of an image exceeds this value, it will be
downscaled before the augmentation so that the longest side matches `max_size`.
This is done to speed up the process. The final output image has the same size as the input image.
Note that in case `p_replace` is below ``1.0``,
the down-/upscaling will affect the not-replaced pixels too.
Use ``None`` to apply no down-/upscaling.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
"""
def __init__(
self,
p_replace: Union[float, Sequence[float]] = 0.1,
n_segments: Union[int, Sequence[int]] = 100,
max_size: Optional[int] = 128,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.p_replace = to_tuple(p_replace, p_replace)
self.n_segments = to_tuple(n_segments, n_segments)
self.max_size = max_size
self.interpolation = interpolation
if min(self.n_segments) < 1:
raise ValueError(f"n_segments must be >= 1. Got: {
n_segments}")
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("p_replace", "n_segments", "max_size", "interpolation")
def get_params(self) -> dict:
n_segments = random.randint(*self.n_segments)
p = random.uniform(*self.p_replace)
return {
"replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}
def apply(self, img: np.ndarray, replace_samples: Sequence[bool] = (False,), n_segments: int = 1, **kwargs):
return F.superpixels(img, n_segments, replace_samples, self.max_size, self.interpolation)
Below are the visualization results.
The larger n_segments is, the more image segmentation blocks there are.
The larger p_replace is, the higher the probability of being filled with uniform color, that is, more segmented blocks are filled.
Extended reading:
Dragon begets dragon, phoenix begets phoenix, SLIC begets super pixel
Function: Divide by the maximum value, convert to float32 input, and the pixel value range becomes [0, 1.0]
. If the maximum value is not specified, the maximum value will be determined by the image type:
MAX_VALUES_BY_DTYPE = {
np.dtype("uint8"): 255,
np.dtype("uint16"): 65535,
np.dtype("uint32"): 4294967295,
np.dtype("float32"): 1.0,
}
Its opposite function is
FromFloat
, namelyimg([0,1.0]) * max_value
# source code
class ToFloat(ImageOnlyTransform):
"""Divide pixel values by `max_value` to get a float32 output array where all values lie in the range [0, 1.0].
If `max_value` is None the transform will try to infer the maximum value by inspecting the data type of the input
image.
See Also:
:class:`~albumentations.augmentations.transforms.FromFloat`
Args:
max_value (float): maximum possible input value. Default: None.
p (float): probability of applying the transform. Default: 1.0.
Targets:
image
Image types:
any type
"""
def __init__(self, max_value=None, always_apply=False, p=1.0):
super(ToFloat, self).__init__(always_apply, p)
self.max_value = max_value
def apply(self, img, **params):
return F.to_float(img, self.max_value)
def get_transform_init_args_names(self):
return ("max_value",)
# F.to_float()
def to_float(img, max_value=None):
if max_value is None:
try:
max_value = MAX_VALUES_BY_DTYPE[img.dtype]
except KeyError:
raise RuntimeError(
"Can't infer the maximum value for dtype {}. You need to specify the maximum value manually by "
"passing the max_value argument".format(img.dtype)
)
return img.astype("float32") / max_value
Function: Randomly turn the image into grayscale. Note that the transformed grayscale image is still 3 channels.
# source code
class ToGray(ImageOnlyTransform):
"""Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater
than 127, invert the resulting grayscale image.
Args:
p (float): probability of applying the transform. Default: 0.5. # 应用该变换的概率值,p=1表示将所有图都变为灰度图。
Targets:
image
Image types:
uint8, float32
"""
def apply(self, img, **params):
if is_grayscale_image(img):
warnings.warn("The image is already gray.")
return img
if not is_rgb_image(img):
raise TypeError("ToGray transformation expects 3-channel images.")
return F.to_gray(img)
def get_transform_init_args_names(self):
return ()
# F.to_gray(img)
def to_gray(img):
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
return cv2.cvtColor(gray, cv2.COLOR_GRAY2RGB) # 灰度图转为三通道
The following is the visualization result. Note "x24BPP" below the grayscale image, which represents a three-channel image.
Function: Convert grayscale image to three-channel grayscale image
This transform is not included in version 1.3.0.
This transformation defaults to p=1. (ToGray defaults to p=0.5)
# source code
class ToRGB(ImageOnlyTransform):
"""Convert the input grayscale image to RGB.
Args:
p (float): probability of applying the transform. Default: 1.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, always_apply=True, p=1.0):
super(ToRGB, self).__init__(always_apply=always_apply, p=p)
def apply(self, img, **params):
if is_rgb_image(img):
warnings.warn("The image is already an RGB.")
return img
if not is_grayscale_image(img):
raise TypeError("ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1.")
return F.gray_to_rgb(img)
def get_transform_init_args_names(self):
return ()
# F.gray_to_rgb(img)
def gray_to_rgb(img):
return cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
Function: Add sepia filter to image
# source code
class ToSepia(ImageOnlyTransform):
"""Applies sepia filter to the input RGB image
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
"""
def __init__(self, always_apply=False, p=0.5):
super(ToSepia, self).__init__(always_apply, p)
self.sepia_transformation_matrix = np.matrix(
[[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]]
)
def apply(self, image, **params):
if not is_rgb_image(image):
raise TypeError("ToSepia transformation expects 3-channel images.")
return F.linear_transformation_rgb(image, self.sepia_transformation_matrix)
def get_transform_init_args_names(self):
return ()
# F.linear_transformation_rgb
@clipped
def linear_transformation_rgb(img, transformation_matrix):
result_img = cv2.transform(img, transformation_matrix)
return result_img
Function : Use USM algorithm to sharpen images.
Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.
Parameter Description:
-
Main parameters and default values:
blur_limit: Union[int, Sequence[int]] = (3, 7), sigma_limit: Union[float, Sequence[float]] = 0.0, alpha: Union[float, Sequence[float]] = (0.2, 0.5), threshold: int = 10
-
Parameter requirements:
-
blur_limit (int or (int, int)): Indicates the maximum Gaussian kernel size for blurring the input image. It must be 0 or an odd number. The valid value range is [0, inf)
. If it is 0, it will beround(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1
replaced by the calculation result
. If the input is a single number, it will be converted to the interval (0, blur_limit).源码中初始化有如下行: self.blur_limit = to_tuple(blur_limit, 3) # 表示3为另一边界值的填补值 举例: self.blur_limit = to_tuple(1, 3) # self.blur_limit = (1, 3) self.blur_limit = to_tuple(5, 3) # self.blur_limit = (3, 5)
-
sigma_limit (float or (float, float)): Gaussian kernel standard deviation, valid value range [0.0, inf).
If it is 0, it willsigma = 0.3*((ksize-1)*0.5 - 1) + 0.8
be replaced by the calculation result
. If the input is a single number, it will be converted to an interval (0, sigma_limit) -
alpha (float or (float, float)): Controls the transparency of sharpened images. The resulting image is a superimposition of the sharpened image and the original image, and alpha controls the overlay ratio of the sharpened image. Alpha = 0 means only the original image is returned, alpha = 1 means all sharpened parts are superimposed.
residual = image - blur # blur是应用高斯模糊(cv2.GaussianBlur)后的图像 sharp = image + alpha * residual # Avoid color noise artefacts. sharp = np.clip(sharp, 0, 1)
-
threshold (int): Controls the sharpening of areas with high pixel differences between the original image and the smoothed image. Valid value range [0, 255]. The larger the threshold value, the smaller the sharpening degree of the flat area (that is, the low pixel difference area between the original image and the smoothed image). (
(image - blur)*255 < threshold
As the area increases, this part does not participate in sharpening superposition)
In fact, it can be understood that the larger the value, the lighter the degree of sharpening.residual = image - blur # blur是应用高斯模糊(cv2.GaussianBlur)后的图像 # Do not sharpen noise mask = np.abs(residual) * 255 > threshold mask = mask.astype("float32")
-
Note: The lower limit values of blur_limit and sigma_limit cannot be 0 at the same time.
-
# source code
class UnsharpMask(ImageOnlyTransform):
"""
Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.
Args:
blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
If set single value `blur_limit` will be in range (0, blur_limit).
Default: (3, 7).
sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
If set single value `sigma_limit` will be in range (0, sigma_limit).
If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
alpha (float, (float, float)): range to choose the visibility of the sharpened image.
At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
Default: (0.2, 0.5).
threshold (int): Value to limit sharpening only for areas with high pixel difference between original image
and it's smoothed version. Higher threshold means less sharpening on flat areas.
Must be in range [0, 255]. Default: 10.
p (float): probability of applying the transform. Default: 0.5.
Reference:
arxiv.org/pdf/2107.10833.pdf
Targets:
image
"""
def __init__(
self,
blur_limit: Union[int, Sequence[int]] = (3, 7),
sigma_limit: Union[float, Sequence[float]] = 0.0,
alpha: Union[float, Sequence[float]] = (0.2, 0.5),
threshold: int = 10,
always_apply=False,
p=0.5,
):
super(UnsharpMask, self).__init__(always_apply, p)
self.blur_limit = to_tuple(blur_limit, 3)
self.sigma_limit = self.__check_values(to_tuple(sigma_limit, 0.0), name="sigma_limit")
self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
self.threshold = threshold
if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
self.blur_limit = 3, max(3, self.blur_limit[1])
raise ValueError("blur_limit and sigma_limit minimum value can not be both equal to 0.")
if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
):
raise ValueError("UnsharpMask supports only odd blur limits.")
@staticmethod
def __check_values(value, name, bounds=(0, float("inf"))):
if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
raise ValueError(f"{
name} values should be between {
bounds}")
return value
def get_params(self):
return {
"ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
"sigma": random.uniform(*self.sigma_limit),
"alpha": random.uniform(*self.alpha),
}
def apply(self, img, ksize=3, sigma=0, alpha=0.2, **params):
return F.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)
def get_transform_init_args_names(self):
return ("blur_limit", "sigma_limit", "alpha", "threshold")
# F.unsharp_mask()
def unsharp_mask(image: np.ndarray, ksize: int, sigma: float = 0.0, alpha: float = 0.2, threshold: int = 10):
blur_fn = _maybe_process_in_chunks(cv2.GaussianBlur, ksize=(ksize, ksize), sigmaX=sigma)
input_dtype = image.dtype
if input_dtype == np.uint8:
image = to_float(image)
elif input_dtype not in (np.uint8, np.float32):
raise ValueError("Unexpected dtype {} for UnsharpMask augmentation".format(input_dtype))
blur = blur_fn(image)
residual = image - blur
# Do not sharpen noise
mask = np.abs(residual) * 255 > threshold
mask = mask.astype("float32")
sharp = image + alpha * residual
# Avoid color noise artefacts.
sharp = np.clip(sharp, 0, 1)
soft_mask = blur_fn(mask)
output = soft_mask * sharp + (1 - soft_mask) * image
return from_float(output, dtype=input_dtype)
The visualization results are as follows. The left side is the original image and the right side is the sharpening result. For obvious effects, the parameters in the right picture are set to (ksize=5, sigma=0, alpha=1, threshold=0).
Extended reading:
The principle of Unsharp Mask (USM) sharpening algorithm and its implementation of
super-resolution paper reading—Real-ESRGAN (2021ICCV)
Function: Zoom blur.
Parameter description:
max_factor ((float, float) or float): The maximum factor range of fuzzy, the value should be greater than 1. If it is a single number, it takes a value between (1, max_factor). Default value (1, 1.31).
step_factor ((float, float) or float): The step value of the zoom factor. Default value (0.01, 0.03).
# source code
class ZoomBlur(ImageOnlyTransform):
"""
Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.
Args:
max_factor ((float, float) or float): range for max factor for blurring.
If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
All max_factor values should be larger than 1.
step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
All step_factor values should be positive.
p (float): probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
Any
"""
def __init__(
self,
max_factor: ScaleFloatType = 1.31,
step_factor: ScaleFloatType = (0.01, 0.03),
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.max_factor = to_tuple(max_factor, low=1.0)
self.step_factor = to_tuple(step_factor, step_factor)
if self.max_factor[0] < 1:
raise ValueError("Max factor must be larger or equal 1")
if self.step_factor[0] <= 0:
raise ValueError("Step factor must be positive")
def apply(self, img: np.ndarray, zoom_factors: np.ndarray = None, **params) -> np.ndarray:
assert zoom_factors is not None
return F.zoom_blur(img, zoom_factors)
def get_params(self) -> Dict[str, Any]:
max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
return {
"zoom_factors": np.arange(1.0, max_factor, step_factor)}
def get_transform_init_args_names(self) -> Tuple[str, str]:
return ("max_factor", "step_factor")
Spatial-level transforms
Spatial-level transformations will simultaneously change the input image as well as other properties such as masks, bounding boxes, and keypoints.
Spatial-level transforms will simultaneously change both an input image as well as additional targets such as masks, bounding boxes, and keypoints.
The following table shows which properties are supported by each transformation.
Transform | Image | Masks | BBoxes | Keypoints |
---|---|---|---|---|
Affine | ✓ | ✓ | ✓ | ✓ |
BBoxSafeRandomCrop | ✓ | ✓ | ✓ | |
CenterCrop | ✓ | ✓ | ✓ | ✓ |
CoarseDropout | ✓ | ✓ | ✓ | |
Crop | ✓ | ✓ | ✓ | ✓ |
CropAndPad | ✓ | ✓ | ✓ | ✓ |
CropNonEmptyMaskIfExists | ✓ | ✓ | ✓ | ✓ |
ElasticTransform | ✓ | ✓ | ✓ | |
Flip | ✓ | ✓ | ✓ | ✓ |
GridDistortion | ✓ | ✓ | ✓ | |
GridDropout | ✓ | ✓ | ||
HorizontalFlip | ✓ | ✓ | ✓ | ✓ |
Lambda | ✓ | ✓ | ✓ | ✓ |
LongestMaxSize | ✓ | ✓ | ✓ | ✓ |
MaskDropout | ✓ | ✓ | ||
NoOp | ✓ | ✓ | ✓ | ✓ |
OpticalDistortion | ✓ | ✓ | ✓ | |
PadIfNeeded | ✓ | ✓ | ✓ | ✓ |
Perspective | ✓ | ✓ | ✓ | ✓ |
PiecewiseAffine | ✓ | ✓ | ✓ | ✓ |
PixelDropout | ✓ | ✓ | ✓ | ✓ |
RandomCrop | ✓ | ✓ | ✓ | ✓ |
RandomCropFromBorders | ✓ | ✓ | ✓ | ✓ |
RandomCropNearBBox | ✓ | ✓ | ✓ | ✓ |
RandomGridShuffle | ✓ | ✓ | ✓ | |
RandomResizedCrop | ✓ | ✓ | ✓ | ✓ |
RandomRotate90 | ✓ | ✓ | ✓ | ✓ |
RandomScale | ✓ | ✓ | ✓ | ✓ |
RandomSizedBBoxSafeCrop | ✓ | ✓ | ✓ | |
RandomSizedCrop | ✓ | ✓ | ✓ | ✓ |
Resize | ✓ | ✓ | ✓ | ✓ |
Rotate | ✓ | ✓ | ✓ | ✓ |
SafeRotate | ✓ | ✓ | ✓ | ✓ |
ShiftScaleRotate | ✓ | ✓ | ✓ | ✓ |
SmallestMaxSize | ✓ | ✓ | ✓ | ✓ |
Transpose | ✓ | ✓ | ✓ | ✓ |
VerticalFlip | ✓ | ✓ | ✓ | ✓ |
Function: Random crop, the crop area contains all bboxes, that is, cropped within the range of the circumscribed rectangle of all bboxes to the edge of the image.
Parameter description:
erosion_rate (float): erosion rate, default value 0.0. This value represents the ratio of image edge shrinkage before crop.
# source code
class BBoxSafeRandomCrop(DualTransform):
"""Crop a random part of the input without loss of bboxes.
Args:
erosion_rate (float): erosion rate applied on input image height before crop.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes
Image types:
uint8, float32
"""
def __init__(self, erosion_rate=0.0, always_apply=False, p=1.0):
super(BBoxSafeRandomCrop, self).__init__(always_apply, p)
self.erosion_rate = erosion_rate
def apply(self, img, crop_height=0, crop_width=0, h_start=0, w_start=0, **params):
return F.random_crop(img, crop_height, crop_width, h_start, w_start)
def get_params_dependent_on_targets(self, params):
img_h, img_w = params["image"].shape[:2]
if len(params["bboxes"]) == 0: # less likely, this class is for use with bboxes.
erosive_h = int(img_h * (1.0 - self.erosion_rate))
crop_height = img_h if erosive_h >= img_h else random.randint(erosive_h, img_h)
return {
"h_start": random.random(),
"w_start": random.random(),
"crop_height": crop_height,
"crop_width": int(crop_height * img_w / img_h),
}
# get union of all bboxes
x, y, x2, y2 = union_of_bboxes(
width=img_w, height=img_h, bboxes=params["bboxes"], erosion_rate=self.erosion_rate
)
# find bigger region
bx, by = x * random.random(), y * random.random()
bx2, by2 = x2 + (1 - x2) * random.random(), y2 + (1 - y2) * random.random()
bw, bh = bx2 - bx, by2 - by
crop_height = img_h if bh >= 1.0 else int(img_h * bh)
crop_width = img_w if bw >= 1.0 else int(img_w * bw)
h_start = np.clip(0.0 if bh >= 1.0 else by / (1.0 - bh), 0.0, 1.0)
w_start = np.clip(0.0 if bw >= 1.0 else bx / (1.0 - bw), 0.0, 1.0)
return {
"h_start": h_start, "w_start": w_start, "crop_height": crop_height, "crop_width": crop_width}
def apply_to_bbox(self, bbox, crop_height=0, crop_width=0, h_start=0, w_start=0, rows=0, cols=0, **params):
return F.bbox_random_crop(bbox, crop_height, crop_width, h_start, w_start, rows, cols)
@property
def targets_as_params(self):
return ["image", "bboxes"]
def get_transform_init_args_names(self):
return ("erosion_rate",)
下图bboxes包含蝴蝶和小鸟坐标,裁剪结果均包含bboxes,裁剪后改变图像尺寸。
功能: 裁剪图像中心区域
参数说明: height、width (int): 裁剪区域高、宽。
# source code
class CenterCrop(DualTransform):
"""Crop the central part of the input.
Args:
height (int): height of the crop.
width (int): width of the crop.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Note:
It is recommended to use uint8 images as input.
Otherwise the operation will require internal conversion
float32 -> uint8 -> float32 that causes worse performance.
"""
def __init__(self, height, width, always_apply=False, p=1.0):
super(CenterCrop, self).__init__(always_apply, p)
self.height = height
self.width = width
def apply(self, img, **params):
return F.center_crop(img, self.height, self.width)
def apply_to_bbox(self, bbox, **params):
return F.bbox_center_crop(bbox, self.height, self.width, **params)
def apply_to_keypoint(self, keypoint, **params):
return F.keypoint_center_crop(keypoint, self.height, self.width, **params)
def get_transform_init_args_names(self):
return ("height", "width")
# F.center_crop
def get_center_crop_coords(height: int, width: int, crop_height: int, crop_width: int):
y1 = (height - crop_height) // 2
y2 = y1 + crop_height
x1 = (width - crop_width) // 2
x2 = x1 + crop_width
return x1, y1, x2, y2
def center_crop(img: np.ndarray, crop_height: int, crop_width: int):
height, width = img.shape[:2]
if height < crop_height or width < crop_width:
raise ValueError(
"Requested crop size ({crop_height}, {crop_width}) is "
"larger than the image size ({height}, {width})".format(
crop_height=crop_height, crop_width=crop_width, height=height, width=width
)
)
x1, y1, x2, y2 = get_center_crop_coords(height, width, crop_height, crop_width)
img = img[y1:y2, x1:x2]
return img
可以看到鸟的喙基本都在crop图的中心偏上一点的位置。
功能: 随机丢弃图像中的矩形区域,用固定值填充。(功能涵盖Cutout,额外增加mask处理)
参数说明:
-
max_holes (int): 需要cutout的最大区域个数。
-
max_height、max_width (int, float): 洞的最大尺寸。若为float,自动根据图像宽高计算(图像宽高 * float值)。
-
min_holes (int): 需要cutout的最小区域个数。若为
None
,等同于max_holes 数值。Default:None
. -
min_height、min_width (int, float): 洞的最小尺寸。若为
None
,等同于相应max数值。Default:None
.
若为float,自动根据图像宽高计算(图像宽高 * float值)。 -
fill_value (int, float, list of int, list of float): cutout区域像素填充值。
-
mask_fill_value (int, float, list of int, list of float): mask图像的cutout区域像素填充值。若为
None
,不进行任何操作,返回原始mask。 Default:None
.
# 构造函数,其余方法未拷贝,可点击标题跳转查看全部源码
class CoarseDropout(DualTransform):
"""CoarseDropout of the rectangular regions in the image.
Args:
max_holes (int): Maximum number of regions to zero out.
max_height (int, float): Maximum height of the hole.
If float, it is calculated as a fraction of the image height.
max_width (int, float): Maximum width of the hole.
If float, it is calculated as a fraction of the image width.
min_holes (int): Minimum number of regions to zero out. If `None`,
`min_holes` is be set to `max_holes`. Default: `None`.
min_height (int, float): Minimum height of the hole. Default: None. If `None`,
`min_height` is set to `max_height`. Default: `None`.
If float, it is calculated as a fraction of the image height.
min_width (int, float): Minimum width of the hole. If `None`, `min_height` is
set to `max_width`. Default: `None`.
If float, it is calculated as a fraction of the image width.
fill_value (int, float, list of int, list of float): value for dropped pixels.
mask_fill_value (int, float, list of int, list of float): fill value for dropped pixels
in mask. If `None` - mask is not affected. Default: `None`.
Targets:
image, mask
Image types:
uint8, float32
Reference:
| https://arxiv.org/abs/1708.04552
| https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
| https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py
"""
def __init__(
self,
max_holes=8,
max_height=8,
max_width=8,
min_holes=None,
min_height=None,
min_width=None,
fill_value=0,
mask_fill_value=None,
always_apply=False,
p=0.5,
):
super(CoarseDropout, self).__init__(always_apply, p)
self.max_holes = max_holes
self.max_height = max_height
self.max_width = max_width
self.min_holes = min_holes if min_holes is not None else max_holes
self.min_height = min_height if min_height is not None else max_height
self.min_width = min_width if min_width is not None else max_width
self.fill_value = fill_value
self.mask_fill_value = mask_fill_value
if not 0 < self.min_holes <= self.max_holes:
raise ValueError("Invalid combination of min_holes and max_holes. Got: {}".format([min_holes, max_holes]))
self.check_range(self.max_height)
self.check_range(self.min_height)
self.check_range(self.max_width)
self.check_range(self.min_width)
if not 0 < self.min_height <= self.max_height:
raise ValueError(
"Invalid combination of min_height and max_height. Got: {}".format([min_height, max_height])
)
if not 0 < self.min_width <= self.max_width:
raise ValueError("Invalid combination of min_width and max_width. Got: {}".format([min_width, max_width]))
def check_range(self, dimension):
if isinstance(dimension, float) and not 0 <= dimension < 1.0:
raise ValueError(
"Invalid value {}. If using floats, the value should be in the range [0.0, 1.0)".format(dimension)
)
...
...
...
未在图中声明的参数即使用的默认值。
功能: 裁剪图像,返回裁剪部分。
参数说明:
x_min (int): 裁剪区域的左上角x坐标,默认值:0
y_min (int): 裁剪区域的左上角y坐标,默认值:0
x_max (int): 裁剪区域的右下角x坐标,默认值:1024
y_max (int): 裁剪区域的右下角y坐标,默认值:1024
需注意此变换没有随机性,等同于img[y_min:y_max, x_min:x_max]。
# source code
class Crop(DualTransform):
"""Crop region from image.
Args:
x_min (int): Minimum upper left x coordinate.
y_min (int): Minimum upper left y coordinate.
x_max (int): Maximum lower right x coordinate.
y_max (int): Maximum lower right y coordinate.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
def __init__(self, x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0):
super(Crop, self).__init__(always_apply, p)
self.x_min = x_min
self.y_min = y_min
self.x_max = x_max
self.y_max = y_max
def apply(self, img, **params):
return F.crop(img, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max)
def apply_to_bbox(self, bbox, **params):
return F.bbox_crop(bbox, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max, **params)
def apply_to_keypoint(self, keypoint, **params):
return F.crop_keypoint_by_coords(keypoint, crop_coords=(self.x_min, self.y_min, self.x_max, self.y_max))
def get_transform_init_args_names(self):
return ("x_min", "y_min", "x_max", "y_max")
(plt画图结果并排展示有缩放,可以看下裁剪的面部区域)
功能: 按像素数或者图像占比裁剪或填充图像上下左右四个边缘。此变换永远不会裁剪高度或宽度低于 1
的图像。
注意此变换会resize变换后的图像到原始图像大小。若要保持变换后的尺寸,需设置参数keep_size=False
。
参数说明:
- px (int or tuple)、percent (float or tuple):
- px表示具体的像素数数值,percent 表示百分比(像素数除以宽或高 )
- px 和 percent 小于0表示crop操作,大于0表示pad操作。
- 这两个参数只能选择一个传值,另一个需为None。不可同时传值,也不可同时为None。
- 若传入参数为两个元素,表示图像四个边的px/percent值在该区间内随机采样。若
sample_independently=False
,只采样一次,四个边共用这个值。 - 若传入参数为四个元素,每个元素依次表征图像的top,right, bottom, left(顺时针),每个元素可以是 单个数字或两个数字的列表,数字表示固定值,列表表示范围内随机采样,含义与上述一致。
- pad_mode (int): OpenCV border mode. opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_CONSTANT
- pad_cval (number, Sequence[number]): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)。
若为单个数字,直接用作填充值。若为两个元素的列表,则从该区间随机采样一个值,作为该图像的边缘填充值。@staticmethod def _get_pad_value(pad_value: Union[float, Sequence[float]]) -> Union[int, float]: if isinstance(pad_value, (int, float)): return pad_value if len(pad_value) == 2: a, b = pad_value if isinstance(a, int) and isinstance(b, int): return random.randint(a, b) return random.uniform(a, b) return random.choice(pad_value)
- pad_cval_mask (number, Sequence[number]): 和 pad_cval含义一致,不过是针对mask操作的。
- keep_size (bool): crop或pad后的图像尺寸会改变。若设为True,表示将其resize到输入图像尺寸。若为False,则保留crop或pad之后变化了的尺寸。默认值:True。
- sample_independently (bool): 表示四个边操作的
px/percent
值是否独立采样。默认值:True。 - interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
# 构造函数
class CropAndPad(DualTransform):
"""Crop and pad images by pixel amounts or fractions of image sizes.
Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image).
Padding adds pixels to the sides (e.g. black pixels).
This transformation will never crop images below a height or width of ``1``.
Note:
This transformation automatically resizes images back to their original size. To deactivate this, add the
parameter ``keep_size=False``.
Args:
px (int or tuple):
The number of pixels to crop (negative values) or pad (positive values)
on each side of the image. Either this or the parameter `percent` may
be set, not both at the same time.
* If ``None``, then pixel-based cropping/padding will not be used.
* If ``int``, then that exact number of pixels will always be cropped/padded.
* If a ``tuple`` of two ``int`` s with values ``a`` and ``b``,
then each side will be cropped/padded by a random amount sampled
uniformly per image and side from the interval ``[a, b]``. If
however `sample_independently` is set to ``False``, only one
value will be sampled per image and used for all sides.
* If a ``tuple`` of four entries, then the entries represent top,
right, bottom, left. Each entry may be a single ``int`` (always
crop/pad by exactly that value), a ``tuple`` of two ``int`` s
``a`` and ``b`` (crop/pad by an amount within ``[a, b]``), a
``list`` of ``int`` s (crop/pad by a random value that is
contained in the ``list``).
percent (float or tuple):
The number of pixels to crop (negative values) or pad (positive values)
on each side of the image given as a *fraction* of the image
height/width. E.g. if this is set to ``-0.1``, the transformation will
always crop away ``10%`` of the image's height at both the top and the
bottom (both ``10%`` each), as well as ``10%`` of the width at the
right and left.
Expected value range is ``(-1.0, inf)``.
Either this or the parameter `px` may be set, not both
at the same time.
* If ``None``, then fraction-based cropping/padding will not be
used.
* If ``float``, then that fraction will always be cropped/padded.
* If a ``tuple`` of two ``float`` s with values ``a`` and ``b``,
then each side will be cropped/padded by a random fraction
sampled uniformly per image and side from the interval
``[a, b]``. If however `sample_independently` is set to
``False``, only one value will be sampled per image and used for
all sides.
* If a ``tuple`` of four entries, then the entries represent top,
right, bottom, left. Each entry may be a single ``float``
(always crop/pad by exactly that percent value), a ``tuple`` of
two ``float`` s ``a`` and ``b`` (crop/pad by a fraction from
``[a, b]``), a ``list`` of ``float`` s (crop/pad by a random
value that is contained in the list).
pad_mode (int): OpenCV border mode.
pad_cval (number, Sequence[number]):
The constant value to use if the pad mode is ``BORDER_CONSTANT``.
* If ``number``, then that value will be used.
* If a ``tuple`` of two ``number`` s and at least one of them is
a ``float``, then a random number will be uniformly sampled per
image from the continuous interval ``[a, b]`` and used as the
value. If both ``number`` s are ``int`` s, the interval is
discrete.
* If a ``list`` of ``number``, then a random value will be chosen
from the elements of the ``list`` and used as the value.
pad_cval_mask (number, Sequence[number]): Same as pad_cval but only for masks.
keep_size (bool):
After cropping and padding, the result image will usually have a
different height/width compared to the original input image. If this
parameter is set to ``True``, then the cropped/padded image will be
resized to the input image's size, i.e. the output shape is always identical to the input shape.
sample_independently (bool):
If ``False`` *and* the values for `px`/`percent` result in exactly
*one* probability distribution for all image sides, only one single
value will be sampled from that probability distribution and used for
all sides. I.e. the crop/pad amount then is the same for all sides.
If ``True``, four values will be sampled independently, one per side.
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
Targets:
image, mask, bboxes, keypoints
Image types:
any
"""
def __init__(
self,
px: Optional[Union[int, Sequence[float], Sequence[Tuple]]] = None,
percent: Optional[Union[float, Sequence[float], Sequence[Tuple]]] = None,
pad_mode: int = cv2.BORDER_CONSTANT,
pad_cval: Union[float, Sequence[float]] = 0,
pad_cval_mask: Union[float, Sequence[float]] = 0,
keep_size: bool = True,
sample_independently: bool = True,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1.0,
):
super().__init__(always_apply, p)
if px is None and percent is None:
raise ValueError("px and percent are empty!")
if px is not None and percent is not None:
raise ValueError("Only px or percent may be set!")
self.px = px
self.percent = percent
self.pad_mode = pad_mode
self.pad_cval = pad_cval
self.pad_cval_mask = pad_cval_mask
self.keep_size = keep_size
self.sample_independently = sample_independently
self.interpolation = interpolation
右下图res3的参数sample_independently设为True,不同边的pad像素值不同。
左下图res2的参数percent为负数,表示crop。
功能: 若mask为空,等同于随机裁剪+缩放;若有mask,可以指定忽略的mask区域 ,在忽略区域外进行随机采点并crop出指定宽高区域。mask==0区域默认忽略,还会将指定ignore_values区域置为0忽略。
crop的逻辑如下,在mask非忽略区域随机取个点,在向左上方随机移动一段距离作为crop区域的左上顶点,右下顶点则为左上顶点加宽和高后的点。本变换能增加目标被crop到的概率。
if mask.any():
mask = mask.sum(axis=-1) if mask.ndim == 3 else mask
non_zero_yx = np.argwhere(mask)
y, x = random.choice(non_zero_yx)
x_min = x - random.randint(0, self.width - 1)
y_min = y - random.randint(0, self.height - 1)
x_min = np.clip(x_min, 0, mask_width - self.width)
y_min = np.clip(y_min, 0, mask_height - self.height)
else:
x_min = random.randint(0, mask_width - self.width)
y_min = random.randint(0, mask_height - self.height)
x_max = x_min + self.width
y_max = y_min + self.height
参数说明:
height 、width (int): crop区域的目标宽高。
ignore_values (list of int): mask需要忽略的像素值,0是默认忽略区域。注意输入是列表形式。
ignore_channels (list of int): mask需要忽略的通道。注意输入是列表形式。
# source code
class CropNonEmptyMaskIfExists(DualTransform):
"""Crop area with mask if mask is non-empty, else make random crop.
Args:
height (int): vertical size of crop in pixels
width (int): horizontal size of crop in pixels
ignore_values (list of int): values to ignore in mask, `0` values are always ignored
(e.g. if background value is 5 set `ignore_values=[5]` to ignore)
ignore_channels (list of int): channels to ignore in mask
(e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
p (float): probability of applying the transform. Default: 1.0.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
def __init__(self, height, width, ignore_values=None, ignore_channels=None, always_apply=False, p=1.0):
super(CropNonEmptyMaskIfExists, self).__init__(always_apply, p)
if ignore_values is not None and not isinstance(ignore_values, list):
raise ValueError("Expected `ignore_values` of type `list`, got `{}`".format(type(ignore_values)))
if ignore_channels is not None and not isinstance(ignore_channels, list):
raise ValueError("Expected `ignore_channels` of type `list`, got `{}`".format(type(ignore_channels)))
self.height = height
self.width = width
self.ignore_values = ignore_values
self.ignore_channels = ignore_channels
def apply(self, img, x_min=0, x_max=0, y_min=0, y_max=0, **params):
return F.crop(img, x_min, y_min, x_max, y_max)
def apply_to_bbox(self, bbox, x_min=0, x_max=0, y_min=0, y_max=0, **params):
return F.bbox_crop(
bbox, x_min=x_min, x_max=x_max, y_min=y_min, y_max=y_max, rows=params["rows"], cols=params["cols"]
)
def apply_to_keypoint(self, keypoint, x_min=0, x_max=0, y_min=0, y_max=0, **params):
return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))
def _preprocess_mask(self, mask):
mask_height, mask_width = mask.shape[:2]
if self.ignore_values is not None:
ignore_values_np = np.array(self.ignore_values)
mask = np.where(np.isin(mask, ignore_values_np), 0, mask)
if mask.ndim == 3 and self.ignore_channels is not None:
target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
mask = np.take(mask, target_channels, axis=-1)
if self.height > mask_height or self.width > mask_width:
raise ValueError(
"Crop size ({},{}) is larger than image ({},{})".format(
self.height, self.width, mask_height, mask_width
)
)
return mask
def update_params(self, params, **kwargs):
super().update_params(params, **kwargs)
if "mask" in kwargs:
mask = self._preprocess_mask(kwargs["mask"])
elif "masks" in kwargs and len(kwargs["masks"]):
masks = kwargs["masks"]
mask = self._preprocess_mask(masks[0])
for m in masks[1:]:
mask |= self._preprocess_mask(m)
else:
raise RuntimeError("Can not find mask for CropNonEmptyMaskIfExists")
mask_height, mask_width = mask.shape[:2]
if mask.any():
mask = mask.sum(axis=-1) if mask.ndim == 3 else mask
non_zero_yx = np.argwhere(mask)
y, x = random.choice(non_zero_yx)
x_min = x - random.randint(0, self.width - 1)
y_min = y - random.randint(0, self.height - 1)
x_min = np.clip(x_min, 0, mask_width - self.width)
y_min = np.clip(y_min, 0, mask_height - self.height)
else:
x_min = random.randint(0, mask_width - self.width)
y_min = random.randint(0, mask_height - self.height)
x_max = x_min + self.width
y_max = y_min + self.height
params.update({
"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max})
return params
def get_transform_init_args_names(self):
return ("height", "width", "ignore_values", "ignore_channels")
下图的mask非忽略区域是小鸟和蝴蝶所在的矩形区域。res1,res2,res3是随机crop结果。
功能: 弹性变换。
附官方展示图:
参数说明:
- alpha (float): 扭曲变换参数。默认值1,值越大扭曲效果越明显(如alpha=500,sigma=50)
- sigma (float): 高斯滤波参数。默认值50,值越小扭曲效果越明显(如alpha=100,sigma=20)。
- alpha_affine (float): 仿射变换参数,将转化为区间
(-alpha_affine, alpha_affine)
,默认值50。 - interpolation (OpenCV flag): opencv插值方法,可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4。默认值:cv2.INTER_LINEAR(mask使用的是cv2.INTER_NEAREST,代码已写死,未接受外部指定)
- border_mode (OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_REFLECT_101
- value (int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
- mask_value (int, float, list of ints, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
- approximate (boolean): 平滑时是否使用固定的kernel size。若为True,在512+的大图像上处理可达到约两倍加速,同时会带来抖动的副作用。默认值:False
- same_dxdy (boolean): x和y方向上是否使用相同的随机偏移值。若为True,也可达到约两倍加速,同时会带来抖动的副作用。默认值:False
# 构造函数
class ElasticTransform(DualTransform):
"""Elastic deformation of images as described in [Simard2003]_ (with modifications).
Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
Convolutional Neural Networks applied to Visual Document Analysis", in
Proc. of the International Conference on Document Analysis and
Recognition, 2003.
Args:
alpha (float):
sigma (float): Gaussian filter parameter.
alpha_affine (float): The range will be (-alpha_affine, alpha_affine)
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
approximate (boolean): Whether to smooth displacement map with fixed kernel size.
Enabling this option gives ~2X speedup on large images.
same_dxdy (boolean): Whether to use same random generated shift for x and y.
Enabling this option gives ~2X speedup.
Targets:
image, mask
Image types:
uint8, float32
"""
def __init__(
self,
alpha=1,
sigma=50,
alpha_affine=50,
interpolation=cv2.INTER_LINEAR,
border_mode=cv2.BORDER_REFLECT_101,
value=None,
mask_value=None,
always_apply=False,
approximate=False,
same_dxdy=False,
p=0.5,
):
super(ElasticTransform, self).__init__(always_apply, p)
...
...
# F.elastic_transform
@preserve_shape
def elastic_transform(
img,
alpha,
sigma,
alpha_affine,
interpolation=cv2.INTER_LINEAR,
border_mode=cv2.BORDER_REFLECT_101,
value=None,
random_state=None,
approximate=False,
same_dxdy=False,
):
"""Elastic deformation of images as described in [Simard2003]_ (with modifications).
Based on https://gist.github.com/ernestum/601cdf56d2b424757de5
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
Convolutional Neural Networks applied to Visual Document Analysis", in
Proc. of the International Conference on Document Analysis and
Recognition, 2003.
"""
if random_state is None:
random_state = np.random.RandomState(1234)
height, width = img.shape[:2]
# Random affine
center_square = np.float32((height, width)) // 2
square_size = min((height, width)) // 3
alpha = float(alpha)
sigma = float(sigma)
alpha_affine = float(alpha_affine)
pts1 = np.float32(
[
center_square + square_size,
[center_square[0] + square_size, center_square[1] - square_size],
center_square - square_size,
]
)
pts2 = pts1 + random_state.uniform(-alpha_affine, alpha_affine, size=pts1.shape).astype(np.float32)
matrix = cv2.getAffineTransform(pts1, pts2)
warp_fn = _maybe_process_in_chunks(
cv2.warpAffine, M=matrix, dsize=(width, height), flags=interpolation, borderMode=border_mode, borderValue=value
)
img = warp_fn(img)
if approximate:
# Approximate computation smooth displacement map with a large enough kernel.
# On large images (512+) this is approximately 2X times faster
dx = random_state.rand(height, width).astype(np.float32) * 2 - 1
cv2.GaussianBlur(dx, (17, 17), sigma, dst=dx)
dx *= alpha
if same_dxdy:
# Speed up even more
dy = dx
else:
dy = random_state.rand(height, width).astype(np.float32) * 2 - 1
cv2.GaussianBlur(dy, (17, 17), sigma, dst=dy)
dy *= alpha
else:
dx = np.float32(gaussian_filter((random_state.rand(height, width) * 2 - 1), sigma) * alpha)
if same_dxdy:
# Speed up
dy = dx
else:
dy = np.float32(gaussian_filter((random_state.rand(height, width) * 2 - 1), sigma) * alpha)
x, y = np.meshgrid(np.arange(width), np.arange(height))
map_x = np.float32(x + dx)
map_y = np.float32(y + dy)
remap_fn = _maybe_process_in_chunks(
cv2.remap, map1=map_x, map2=map_y, interpolation=interpolation, borderMode=border_mode, borderValue=value
)
return remap_fn(img)
下图中未显示的参数均使用默认值。
sigma值较小时alpha对扭曲程度影响比较灵敏。
适用输入类型:image, mask, bboxes, keypoints
功能:水平翻转(d=1
)、垂直翻转(d=0
)、同时水平和垂直翻转(等同于图像旋转180°)(d=-1
)
d
是源码中随机生成的参数,控制翻转模式。
# source code
class Flip(DualTransform):
"""Flip the input either horizontally, vertically or both horizontally and vertically.
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
def apply(self, img, d=0, **params):
"""Args:
d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
-1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
180 degrees).
"""
return F.random_flip(img, d)
def get_params(self):
# Random int in the range [-1, 1]
return {
"d": random.randint(-1, 1)}
def apply_to_bbox(self, bbox, **params):
return F.bbox_flip(bbox, **params)
def apply_to_keypoint(self, keypoint, **params):
return F.keypoint_flip(keypoint, **params)
def get_transform_init_args_names(self):
return ()
功能: 网格畸变。
附官方展示图:
参数说明:
-
num_steps (int): 图像分块数(横纵相等).
-
distort_limit (float, (float, float)): 若输入为单个数字,将转化为区间
(-distort_limit, distort_limit)
。 默认范围: (-0.3, 0.3)。
在此区间会分别进行x和y方向上的采样:stepsx,stepsy。若值大于0,块处理后尺寸大于原始尺寸,小于0相反。 -
interpolation (OpenCV flag): 插值方法。Default: cv2.INTER_LINEAR.
可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. -
border_mode (OpenCV flag): 边缘像素补充方法. Default: cv2.BORDER_REFLECT_101
可枚举值:cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. -
value (int, float, list of ints, list of float): 边缘像素补充值,仅限常数补充时使用,即
border_mode = cv2.BORDER_CONSTANT
. -
mask_value (int, float, list of ints, list of float): mask的边缘像素补充值,仅限常数补充时使用,即
border_mode = cv2.BORDER_CONSTANT
. -
normalized (bool): 若设为True,失真范围不会超过图像边界,即图像内容与原图一致,不会丢失或者扩充图像边界。Default: False
# source code
class GridDistortion(DualTransform):
"""
Args:
num_steps (int): count of grid cells on each side.
distort_limit (float, (float, float)): If distort_limit is a single float, the range
will be (-distort_limit, distort_limit). Default: (-0.03, 0.03).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
normalized (bool): if true, distortion will be normalized to do not go outside the image. Default: False
See for more information: https://github.com/albumentations-team/albumentations/pull/722
Targets:
image, mask
Image types:
uint8, float32
"""
def __init__(
self,
num_steps: int = 5,
distort_limit: ScaleFloatType = 0.3,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
mask_value: Optional[ImageColorType] = None,
normalized: bool = False,
always_apply: bool = False,
p: float = 0.5,
):
super(GridDistortion, self).__init__(always_apply, p)
self.num_steps = num_steps
self.distort_limit = to_tuple(distort_limit)
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
self.normalized = normalized
def apply(
self, img: np.ndarray, stepsx: Tuple = (), stepsy: Tuple = (), interpolation: int = cv2.INTER_LINEAR, **params
) -> np.ndarray:
return F.grid_distortion(img, self.num_steps, stepsx, stepsy, interpolation, self.border_mode, self.value)
def apply_to_mask(self, img: np.ndarray, stepsx: Tuple = (), stepsy: Tuple = (), **params) -> np.ndarray:
return F.grid_distortion(
img, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
)
def apply_to_bbox(self, bbox: BoxInternalType, stepsx: Tuple = (), stepsy: Tuple = (), **params) -> BoxInternalType:
rows, cols = params["rows"], params["cols"]
mask = np.zeros((rows, cols), dtype=np.uint8)
bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
x_min, y_min, x_max, y_max = bbox_denorm[:4]
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
mask[y_min:y_max, x_min:x_max] = 1
mask = F.grid_distortion(
mask, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
)
bbox_returned = bbox_from_mask(mask)
bbox_returned = F.normalize_bbox(bbox_returned, rows, cols)
return bbox_returned
def _normalize(self, h, w, xsteps, ysteps):
# compensate for smaller last steps in source image.
x_step = w // self.num_steps
last_x_step = min(w, ((self.num_steps + 1) * x_step)) - (self.num_steps * x_step)
xsteps[-1] *= last_x_step / x_step
y_step = h // self.num_steps
last_y_step = min(h, ((self.num_steps + 1) * y_step)) - (self.num_steps * y_step)
ysteps[-1] *= last_y_step / y_step
# now normalize such that distortion never leaves image bounds.
tx = w / math.floor(w / self.num_steps)
ty = h / math.floor(h / self.num_steps)
xsteps = np.array(xsteps) * (tx / np.sum(xsteps))
ysteps = np.array(ysteps) * (ty / np.sum(ysteps))
return {
"stepsx": xsteps, "stepsy": ysteps}
@property
def targets_as_params(self):
return ["image"]
def get_params_dependent_on_targets(self, params):
h, w = params["image"].shape[:2]
stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
if self.normalized:
return self._normalize(h, w, stepsx, stepsy)
return {
"stepsx": stepsx, "stepsy": stepsy}
def get_transform_init_args_names(self):
return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"
可以看到图上象棋有纵向和横向的拉伸。
normalize参数设为true / false差别见如下结果:
功能: 网格方块用固定值填充(默认黑色)
参数说明:
- ratio,unit_size_min ,unit_size_max ,holes_number_x ,holes_number_y 都是控制grids大小的。优先通过unit_size_min ,unit_size_max确定grid size,若为None,则通过holes_number_x ,holes_number_y 确定,若也为None,默认holes_number=10进行计算。
- shift_x ,shift_y 控制grids的起点偏移,默认0,所以结果图中黑块偏左上。
- random_offset :若设为True,随机生成偏移值,shift_x ,shift_y设置失效。
- fill_value :grids填充值,默认0,即黑色。
- mask_fill_value :mask图的grids部分填充值。若为
None
,返回原始mask. Default:None
.
# source code
class GridDropout(DualTransform):
"""GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.
Args:
ratio (float): the ratio of the mask holes to the unit_size (same for horizontal and vertical directions).
Must be between 0 and 1. Default: 0.5.
unit_size_min (int): minimum size of the grid unit. Must be between 2 and the image shorter edge.
If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
unit_size_max (int): maximum size of the grid unit. Must be between 2 and the image shorter edge.
If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
holes_number_x (int): the number of grid units in x direction. Must be between 1 and image width//2.
If 'None', grid unit width is set as image_width//10. Default: `None`.
holes_number_y (int): the number of grid units in y direction. Must be between 1 and image height//2.
If `None`, grid unit height is set equal to the grid unit width or image height, whatever is smaller.
shift_x (int): offsets of the grid start in x direction from (0,0) coordinate.
Clipped between 0 and grid unit_width - hole_width. Default: 0.
shift_y (int): offsets of the grid start in y direction from (0,0) coordinate.
Clipped between 0 and grid unit height - hole_height. Default: 0.
random_offset (boolean): weather to offset the grid randomly between 0 and grid unit size - hole size
If 'True', entered shift_x, shift_y are ignored and set randomly. Default: `False`.
fill_value (int): value for the dropped pixels. Default = 0
mask_fill_value (int): value for the dropped pixels in mask.
If `None`, transformation is not applied to the mask. Default: `None`.
Targets:
image, mask
Image types:
uint8, float32
References:
https://arxiv.org/abs/2001.04086
"""
def __init__(
self,
ratio: float = 0.5,
unit_size_min: int = None,
unit_size_max: int = None,
holes_number_x: int = None,
holes_number_y: int = None,
shift_x: int = 0,
shift_y: int = 0,
random_offset: bool = False,
fill_value: int = 0,
mask_fill_value: int = None,
always_apply: bool = False,
p: float = 0.5,
):
super(GridDropout, self).__init__(always_apply, p)
self.ratio = ratio
self.unit_size_min = unit_size_min
self.unit_size_max = unit_size_max
self.holes_number_x = holes_number_x
self.holes_number_y = holes_number_y
self.shift_x = shift_x
self.shift_y = shift_y
self.random_offset = random_offset
self.fill_value = fill_value
self.mask_fill_value = mask_fill_value
if not 0 < self.ratio <= 1:
raise ValueError("ratio must be between 0 and 1.")
def apply(self, img: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params) -> np.ndarray:
return F.cutout(img, holes, self.fill_value)
def apply_to_mask(self, img: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params) -> np.ndarray:
if self.mask_fill_value is None:
return img
return F.cutout(img, holes, self.mask_fill_value)
def get_params_dependent_on_targets(self, params):
img = params["image"]
height, width = img.shape[:2]
# set grid using unit size limits
if self.unit_size_min and self.unit_size_max:
if not 2 <= self.unit_size_min <= self.unit_size_max:
raise ValueError("Max unit size should be >= min size, both at least 2 pixels.")
if self.unit_size_max > min(height, width):
raise ValueError("Grid size limits must be within the shortest image edge.")
unit_width = random.randint(self.unit_size_min, self.unit_size_max + 1)
unit_height = unit_width
else:
# set grid using holes numbers
if self.holes_number_x is None:
unit_width = max(2, width // 10)
else:
if not 1 <= self.holes_number_x <= width // 2:
raise ValueError("The hole_number_x must be between 1 and image width//2.")
unit_width = width // self.holes_number_x
if self.holes_number_y is None:
unit_height = max(min(unit_width, height), 2)
else:
if not 1 <= self.holes_number_y <= height // 2:
raise ValueError("The hole_number_y must be between 1 and image height//2.")
unit_height = height // self.holes_number_y
hole_width = int(unit_width * self.ratio)
hole_height = int(unit_height * self.ratio)
# min 1 pixel and max unit length - 1
hole_width = min(max(hole_width, 1), unit_width - 1)
hole_height = min(max(hole_height, 1), unit_height - 1)
# set offset of the grid
if self.shift_x is None:
shift_x = 0
else:
shift_x = min(max(0, self.shift_x), unit_width - hole_width)
if self.shift_y is None:
shift_y = 0
else:
shift_y = min(max(0, self.shift_y), unit_height - hole_height)
if self.random_offset:
shift_x = random.randint(0, unit_width - hole_width)
shift_y = random.randint(0, unit_height - hole_height)
holes = []
for i in range(width // unit_width + 1):
for j in range(height // unit_height + 1):
x1 = min(shift_x + unit_width * i, width)
y1 = min(shift_y + unit_height * j, height)
x2 = min(x1 + hole_width, width)
y2 = min(y1 + hole_height, height)
holes.append((x1, y1, x2, y2))
return {
"holes": holes}
@property
def targets_as_params(self):
return ["image"]
def get_transform_init_args_names(self):
return (
"ratio",
"unit_size_min",
"unit_size_max",
"holes_number_x",
"holes_number_y",
"shift_x",
"shift_y",
"random_offset",
"fill_value",
"mask_fill_value",
)
适用输入类型:image, mask, bboxes, keypoints
功能:输入沿y轴翻转
功能: 保持缩放比例缩放图像,将长边调整为指定尺寸。相反调整短边的函数为SmallestMaxSize。
参数说明: max_size (int, list of int): maximum size of smallest side of the image after the transformation. 若输入为list,将从中随机选择一个数作为max_size。
interpolation (OpenCV flag): opencv插值方法,可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4。默认值:cv2.INTER_LINEAR
class LongestMaxSize(DualTransform):
"""Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.
Args:
max_size (int, list of int): maximum size of the image after the transformation. When using a list, max size
will be randomly selected from the values in the list.
interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
def __init__(
self,
max_size: Union[int, Sequence[int]] = 1024,
interpolation: int = cv2.INTER_LINEAR,
always_apply: bool = False,
p: float = 1,
):
super(LongestMaxSize, self).__init__(always_apply, p)
self.interpolation = interpolation
self.max_size = max_size
def apply(
self, img: np.ndarray, max_size: int = 1024, interpolation: int = cv2.INTER_LINEAR, **params
) -> np.ndarray:
return F.longest_max_size(img, max_size=max_size, interpolation=interpolation)
def apply_to_bbox(self, bbox: Sequence[float], **params) -> Sequence[float]:
# Bounding box coordinates are scale invariant
return bbox
def apply_to_keypoint(self, keypoint: Sequence[float], max_size: int = 1024, **params) -> Sequence[float]:
height = params["rows"]
width = params["cols"]
scale = max_size / max([height, width])
return F.keypoint_scale(keypoint, scale, scale)
def get_params(self) -> Dict[str, int]:
return {
"max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("max_size", "interpolation")
功能: 随机将图像和mask中的目标实例归零。
参数说明 :max_objects: 可以清零的最大标签数,也可以是区间参数 [min, max],最终应用数值在此区间内随机采样获取。
image_fill_value: 图像中归零区域填充值,默认0。也可设为’inpaint’ ,对归零区域进行修复(仅支持三通道图像)。
mask_fill_value: mask的归零区域填充值,默认0。
# source code
class MaskDropout(DualTransform):
"""
Image & mask augmentation that zero out mask and image regions corresponding
to randomly chosen object instance from mask.
Mask must be single-channel image, zero values treated as background.
Image can be any number of channels.
Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254
"""
def __init__(
self,
max_objects=1,
image_fill_value=0,
mask_fill_value=0,
always_apply=False,
p=0.5,
):
"""
Args:
max_objects: Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
image_fill_value: Fill value to use when filling image.
Can be 'inpaint' to apply inpaining (works only for 3-chahnel images)
mask_fill_value: Fill value to use when filling mask.
Targets:
image, mask
Image types:
uint8, float32
"""
super(MaskDropout, self).__init__(always_apply, p)
self.max_objects = to_tuple(max_objects, 1)
self.image_fill_value = image_fill_value
self.mask_fill_value = mask_fill_value
@property
def targets_as_params(self):
return ["mask"]
def get_params_dependent_on_targets(self, params):
mask = params["mask"]
label_image, num_labels = label(mask, return_num=True)
if num_labels == 0:
dropout_mask = None
else:
objects_to_drop = random.randint(self.max_objects[0], self.max_objects[1])
objects_to_drop = min(num_labels, objects_to_drop)
if objects_to_drop == num_labels:
dropout_mask = mask > 0
else:
labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=np.bool)
for label_index in labels_index:
dropout_mask |= label_image == label_index
params.update({
"dropout_mask": dropout_mask})
return params
def apply(self, img, dropout_mask=None, **params):
if dropout_mask is None:
return img
if self.image_fill_value == "inpaint":
dropout_mask = dropout_mask.astype(np.uint8)
_, _, w, h = cv2.boundingRect(dropout_mask)
radius = min(3, max(w, h) // 2)
img = cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)
else:
img = img.copy()
img[dropout_mask] = self.image_fill_value
return img
def apply_to_mask(self, img, dropout_mask=None, **params):
if dropout_mask is None:
return img
img = img.copy()
img[dropout_mask] = self.mask_fill_value
return img
def get_transform_init_args_names(self):
return ("max_objects", "image_fill_value", "mask_fill_value")
下图中标注目标为鸟所在区域(矩形框),以下是image_fill_value不同时的结果。
适用输入类型:image, mask, bboxes, keypoints
功能:保持原输入(does nothing)
# source code
class NoOp(DualTransform):
"""Does nothing"""
def apply_to_keypoint(self, keypoint: KeypointInternalType, **params) -> KeypointInternalType:
return keypoint
def apply_to_bbox(self, bbox: BoxInternalType, **params) -> BoxInternalType:
return bbox
def apply(self, img: np.ndarray, **params) -> np.ndarray:
return img
def apply_to_mask(self, img: np.ndarray, **params) -> np.ndarray:
return img
def get_transform_init_args_names(self) -> Tuple:
return ()
功能: 桶形 / 枕形畸变
参数说明:
- distort_limit (float, (float, float)): 若输入为单个数字,将转化为区间
(-distort_limit, distort_limit)
,默认值: (-0.05, 0.05)
distort_limit_sample > 0时是桶形畸变,distort_limit_sample < 0时是枕形畸变。 - shift_limit (float, (float, float))): 若输入为单个数字,将转化为区间
(-shift_limit, shift_limit)
,默认值: (-0.05, 0.05) - interpolation (OpenCV flag): opencv插值方法,可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4。默认值:cv2.INTER_LINEAR(mask使用的是cv2.INTER_NEAREST,代码已写死,未接受外部指定)
- border_mode (OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_REFLECT_101
- value (int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
- mask_value (int, float, list of ints, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
拓展阅读——border_mode详解:
OpenCV滤波之copyMakeBorder和borderInterpolate
OpenCV图像处理|1.16 卷积边界处理
# source code
class OpticalDistortion(DualTransform):
"""
Args:
distort_limit (float, (float, float)): If distort_limit is a single float, the range
will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
shift_limit (float, (float, float))): If shift_limit is a single float, the range
will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
Default: cv2.BORDER_REFLECT_101
value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of ints,
list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
Targets:
image, mask, bbox
Image types:
uint8, float32
"""
def __init__(
self,
distort_limit: ScaleFloatType = 0.05,
shift_limit: ScaleFloatType = 0.05,
interpolation: int = cv2.INTER_LINEAR,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
mask_value: Optional[ImageColorType] = None,
always_apply: bool = False,
p: float = 0.5,
):
super(OpticalDistortion, self).__init__(always_apply, p)
self.shift_limit = to_tuple(shift_limit)
self.distort_limit = to_tuple(distort_limit)
self.interpolation = interpolation
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
def apply(
self, img: np.ndarray, k: int = 0, dx: int = 0, dy: int = 0, interpolation: int = cv2.INTER_LINEAR, **params
) -> np.ndarray:
return F.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)
def apply_to_mask(self, img: np.ndarray, k: int = 0, dx: int = 0, dy: int = 0, **params) -> np.ndarray:
return F.optical_distortion(img, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
def apply_to_bbox(self, bbox: BoxInternalType, k: int = 0, dx: int = 0, dy: int = 0, **params) -> BoxInternalType:
rows, cols = params["rows"], params["cols"]
mask = np.zeros((rows, cols), dtype=np.uint8)
bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
x_min, y_min, x_max, y_max = bbox_denorm[:4]
x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
mask[y_min:y_max, x_min:x_max] = 1
mask = F.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
bbox_returned = bbox_from_mask(mask)
bbox_returned = F.normalize_bbox(bbox_returned, rows, cols)
return bbox_returned
def get_params(self):
return {
"k": random.uniform(self.distort_limit[0], self.distort_limit[1]),
"dx": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
"dy": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
}
def get_transform_init_args_names(self):
return (
"distort_limit",
"shift_limit",
"interpolation",
"border_mode",
"value",
"mask_value",
)
下图为可视化结果,为变化明显,参数设置较大。默认参数变化很微小。
功能: 填充图像边缘到指定尺寸。(若图像大小大于指定尺寸,不进行任何操作,返回原图)
参数说明:
-
min_height ,min_width :结果图像的最小尺寸
-
position (Union[str, PositionType]):表示将原图置于什么位置,然后在其四周进行pad。(可以看code后面的可视化结果)
可枚举值:center,top_left,top_right,bottom_left,bottom_right,random -
border_mode (OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_REFLECT_101
-
value (int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
-
mask_value (int, float, list of in, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
# source code
class PadIfNeeded(DualTransform):
"""Pad side of the image / max if side is less than desired number.
Args:
min_height (int): minimal result image height.
min_width (int): minimal result image width.
pad_height_divisor (int): if not None, ensures image height is dividable by value of this argument.
pad_width_divisor (int): if not None, ensures image width is dividable by value of this argument.
position (Union[str, PositionType]): Position of the image. should be PositionType.CENTER or
PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT.
or PositionType.RANDOM. Default: PositionType.CENTER.
border_mode (OpenCV flag): OpenCV border mode.
value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
mask_value (int, float,
list of int,
list of float): padding value for mask if border_mode is cv2.BORDER_CONSTANT.
p (float): probability of applying the transform. Default: 1.0.
Targets:
image, mask, bbox, keypoints
Image types:
uint8, float32
"""
class PositionType(Enum):
CENTER = "center"
TOP_LEFT = "top_left"
TOP_RIGHT = "top_right"
BOTTOM_LEFT = "bottom_left"
BOTTOM_RIGHT = "bottom_right"
RANDOM = "random"
def __init__(
self,
min_height: Optional[int] = 1024,
min_width: Optional[int] = 1024,
pad_height_divisor: Optional[int] = None,
pad_width_divisor: Optional[int] = None,
position: Union[PositionType, str] = PositionType.CENTER,
border_mode: int = cv2.BORDER_REFLECT_101,
value: Optional[ImageColorType] = None,
mask_value: Optional[ImageColorType] = None,
always_apply: bool = False,
p: float = 1.0,
):
if (min_height is None) == (pad_height_divisor is None):
raise ValueError("Only one of 'min_height' and 'pad_height_divisor' parameters must be set")
if (min_width is None) == (pad_width_divisor is None):
raise ValueError("Only one of 'min_width' and 'pad_width_divisor' parameters must be set")
super(PadIfNeeded, self).__init__(always_apply, p)
self.min_height = min_height
self.min_width = min_width
self.pad_width_divisor = pad_width_divisor
self.pad_height_divisor = pad_height_divisor
self.position = PadIfNeeded.PositionType(position)
self.border_mode = border_mode
self.value = value
self.mask_value = mask_value
def update_params(self, params, **kwargs):
params = super(PadIfNeeded, self).update_params(params, **kwargs)
rows = params["rows"]
cols = params["cols"]
if self.min_height is not None:
if rows < self.min_height:
h_pad_top = int((self.min_height - rows) / 2.0)
h_pad_bottom = self.min_height - rows - h_pad_top
else:
h_pad_top = 0
h_pad_bottom = 0
else:
pad_remained = rows % self.pad_height_divisor
pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0
h_pad_top = pad_rows // 2
h_pad_bottom = pad_rows - h_pad_top
if self.min_width is not None:
if cols < self.min_width:
w_pad_left = int((self.min_width - cols) / 2.0)
w_pad_right = self.min_width - cols - w_pad_left
else:
w_pad_left = 0
w_pad_right = 0
else:
pad_remainder = cols % self.pad_width_divisor
pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0
w_pad_left = pad_cols // 2
w_pad_right = pad_cols - w_pad_left
h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
h_top=h_pad_top, h_bottom=h_pad_bottom, w_left=w_pad_left, w_right=w_pad_right
)
params.update(
{
"pad_top": h_pad_top,
"pad_bottom": h_pad_bottom,
"pad_left": w_pad_left,
"pad_right": w_pad_right,
}
)
return params
def apply(
self, img: np.ndarray, pad_top: int = 0, pad_bottom: int = 0, pad_left: int = 0, pad_right: int = 0, **params
) -> np.ndarray:
return F.pad_with_params(
img,
pad_top,
pad_bottom,
pad_left,
pad_right,
border_mode=self.border_mode,
value=self.value,
)
def apply_to_mask(
self, img: np.ndarray, pad_top: int = 0, pad_bottom: int = 0, pad_left: int = 0, pad_right: int = 0, **params
) -> np.ndarray:
return F.pad_with_params(
img,
pad_top,
pad_bottom,
pad_left,
pad_right,
border_mode=self.border_mode,
value=self.mask_value,
)
def apply_to_bbox(
self,
bbox: BoxInternalType,
pad_top: int = 0,
pad_bottom: int = 0,
pad_left: int = 0,
pad_right: int = 0,
rows: int = 0,
cols: int = 0,
**params
) -> BoxInternalType:
x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
bbox = x_min + pad_left, y_min + pad_top, x_max + pad_left, y_max + pad_top
return normalize_bbox(bbox, rows + pad_top + pad_bottom, cols + pad_left + pad_right)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
pad_top: int = 0,
pad_bottom: int = 0,
pad_left: int = 0,
pad_right: int = 0,
**params
) -> KeypointInternalType:
x, y, angle, scale = keypoint[:4]
return x + pad_left, y + pad_top, angle, scale
def get_transform_init_args_names(self):
return (
"min_height",
"min_width",
"pad_height_divisor",
"pad_width_divisor",
"border_mode",
"value",
"mask_value",
)
def __update_position_params(
self, h_top: int, h_bottom: int, w_left: int, w_right: int
) -> Tuple[int, int, int, int]:
if self.position == PadIfNeeded.PositionType.TOP_LEFT:
h_bottom += h_top
w_right += w_left
h_top = 0
w_left = 0
elif self.position == PadIfNeeded.PositionType.TOP_RIGHT:
h_bottom += h_top
w_left += w_right
h_top = 0
w_right = 0
elif self.position == PadIfNeeded.PositionType.BOTTOM_LEFT:
h_top += h_bottom
w_right += w_left
h_bottom = 0
w_left = 0
elif self.position == PadIfNeeded.PositionType.BOTTOM_RIGHT:
h_top += h_bottom
w_left += w_right
h_bottom = 0
w_right = 0
elif self.position == PadIfNeeded.PositionType.RANDOM:
h_pad = h_top + h_bottom
w_pad = w_left + w_right
h_top = random.randint(0, h_pad)
h_bottom = h_pad - h_top
w_left = random.randint(0, w_pad)
w_right = w_pad - w_left
return h_top, h_bottom, w_left, w_right
功能: 随机四点透视变换
参数说明:
- scale (float or (float, float)): 正态分布的标准差,用于控制新的子图像corners与完整图像corners的距离。
如果输入为单个数字,将转化为区间(0, scale)
,默认值:(0.05, 0.1) - keep_size (bool): 应用透视变换后是否将图像调整回原始大小。建议使用默认值True,若设为False,返回的图像是一个list,不是数组array,并且可能会有不同的shape。
- pad_mode(OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_CONSTANT
- pad_val(int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT),默认值:0
- mask_pad_val(int, float, list of in, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT),默认值:0
- fit_output (bool): 如果为 True,透视变换后图像平面大小和位置将被调整为捕获整个图像。 (如果 keep_size 设置为 True,则随后调整图像大小。)否则,部分转换后的图像可能会在图像平面之外。 使用大比例值时不应将此设置设置为 True,因为它可能会导致非常大的图像。默认值:False
scale越大,透视变换的角度越大;
keep_size建议设为True,保证与原始图像大小一致;
fit_output建议设为False,设为True会有黑边。
注意!!!
- 本变换速度非常慢,可以用
ElasticTransform
变换代替,至少快10倍。 - 对于坐标类输入(keypoints, bounding boxes, polygons, …),本变换依旧要先进行图像变换(image-based augmentation),致使速度很慢并且不完全正确。
- 本变换实际是对
skimage.transform.warp
函数的封装。
功能: 局部仿射变换。效果和弹性变换(ElasticTransform
)类似,局部扭曲。
(作者理解:给图像画一个网格,每个网格点向四周局部偏移)
Apply affine transformations that differ between local neighbourhoods.
This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.
参数说明:
- scale (float, tuple of float): 形变因子,值越大,代表偏离常规网格点的距离越大。
若参数absolute_scale=False(默认),scale 值乘以图像宽高才表示偏移距离,若absolute_scale=True,表示scale值为固定值。scale 参数建议值范围:(0.01,0.05)
,默认值(0.03, 0.05)
。 - nb_rows(int, tuple of int): 常规网格的行数,至少为2,大图像建议4以上。
- nb_cols(int, tuple of int): 常规网格的列数,至少为2,大图像建议4以上。
- interpolation(int): 插值方式
- 0: Nearest-neighbor
- 1: Bi-linear (default)
- 2: Bi-quadratic
- 3: Bi-cubic
- 4: Bi-quartic
- 5: Bi-quintic
- mask_interpolation(int): mask的插值方式,取值同interpolation
- cval(int): 新像素的填充值
- cval_mask(int): mask新像素的填充值
- mode(str): 图像边界pad方式, 枚举值{‘constant’, ‘edge’, ‘symmetric’, ‘reflect’, ‘wrap’}
- absolute_scale(bool): 第一个scale参数为绝对值还是相对值的flag
- keypoints_threshold(float): 距离map转换到关键点的阈值。
Used as threshold in conversion from distance maps to keypoints.
The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. UseNone
to use no min/max. Default: 0.01
# source code
class PiecewiseAffine(DualTransform):
"""Apply affine transformations that differ between local neighbourhoods.
This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point
around via affine transformations. This leads to local distortions.
This is mostly a wrapper around scikit-image's ``PiecewiseAffine``.
See also ``Affine`` for a similar technique.
Note:
This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.
Note:
For coordinate-based inputs (keypoints, bounding boxes, polygons, ...),
this augmenter still has to perform an image-based augmentation,
which will make it significantly slower and not fully correct for such inputs than other transforms.
Args:
scale (float, tuple of float): Each point on the regular grid is moved around via a normal distribution.
This scale factor is equivalent to the normal distribution's sigma.
Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
the image if ``absolute_scale=False`` (default), so this scale can be the same for different sized images.
Recommended values are in the range ``0.01`` to ``0.05`` (weak to strong augmentations).
* If a single ``float``, then that value will always be used as the scale.
* If a tuple ``(a, b)`` of ``float`` s, then a random value will
be uniformly sampled per image from the interval ``[a, b]``.
nb_rows (int, tuple of int): Number of rows of points that the regular grid should have.
Must be at least ``2``. For large images, you might want to pick a higher value than ``4``.
You might have to then adjust scale to lower values.
* If a single ``int``, then that value will always be used as the number of rows.
* If a tuple ``(a, b)``, then a value from the discrete interval
``[a..b]`` will be uniformly sampled per image.
nb_cols (int, tuple of int): Number of columns. Analogous to `nb_rows`.
interpolation (int): The order of interpolation. The order has to be in the range 0-5:
- 0: Nearest-neighbor
- 1: Bi-linear (default)
- 2: Bi-quadratic
- 3: Bi-cubic
- 4: Bi-quartic
- 5: Bi-quintic
mask_interpolation (int): same as interpolation but for mask.
cval (number): The constant value to use when filling in newly created pixels.
cval_mask (number): Same as cval but only for masks.
mode (str): {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
Points outside the boundaries of the input are filled according
to the given mode. Modes match the behaviour of `numpy.pad`.
absolute_scale (bool): Take `scale` as an absolute value rather than a relative value.
keypoints_threshold (float): Used as threshold in conversion from distance maps to keypoints.
The search for keypoints works by searching for the
argmin (non-inverted) or argmax (inverted) in each channel. This
parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
as a keypoint. Use ``None`` to use no min/max. Default: 0.01
Targets:
image, mask, keypoints, bboxes
Image types:
uint8, float32
"""
def __init__(
self,
scale: ScaleFloatType = (0.03, 0.05),
nb_rows: Union[int, Sequence[int]] = 4,
nb_cols: Union[int, Sequence[int]] = 4,
interpolation: int = 1,
mask_interpolation: int = 0,
cval: int = 0,
cval_mask: int = 0,
mode: str = "constant",
absolute_scale: bool = False,
always_apply: bool = False,
keypoints_threshold: float = 0.01,
p: float = 0.5,
):
super(PiecewiseAffine, self).__init__(always_apply, p)
self.scale = to_tuple(scale, scale)
self.nb_rows = to_tuple(nb_rows, nb_rows)
self.nb_cols = to_tuple(nb_cols, nb_cols)
self.interpolation = interpolation
self.mask_interpolation = mask_interpolation
self.cval = cval
self.cval_mask = cval_mask
self.mode = mode
self.absolute_scale = absolute_scale
self.keypoints_threshold = keypoints_threshold
def get_transform_init_args_names(self):
return (
"scale",
"nb_rows",
"nb_cols",
"interpolation",
"mask_interpolation",
"cval",
"cval_mask",
"mode",
"absolute_scale",
"keypoints_threshold",
)
@property
def targets_as_params(self):
return ["image"]
def get_params_dependent_on_targets(self, params) -> dict:
h, w = params["image"].shape[:2]
nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
nb_cells = nb_cols * nb_rows
scale = random.uniform(*self.scale)
jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
if not np.any(jitter > 0):
return {
"matrix": None}
y = np.linspace(0, h, nb_rows)
x = np.linspace(0, w, nb_cols)
# (H, W) and (H, W) for H=rows, W=cols
xx_src, yy_src = np.meshgrid(x, y)
# (1, HW, 2) => (HW, 2) for H=rows, W=cols
points_src = np.dstack([yy_src.flat, xx_src.flat])[0]
if self.absolute_scale:
jitter[:, 0] = jitter[:, 0] / h if h > 0 else 0.0
jitter[:, 1] = jitter[:, 1] / w if w > 0 else 0.0
jitter[:, 0] = jitter[:, 0] * h
jitter[:, 1] = jitter[:, 1] * w
points_dest = np.copy(points_src)
points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]
# Restrict all destination points to be inside the image plane.
# This is necessary, as otherwise keypoints could be augmented
# outside of the image plane and these would be replaced by
# (-1, -1), which would not conform with the behaviour of the other augmenters.
points_dest[:, 0] = np.clip(points_dest[:, 0], 0, h - 1)
points_dest[:, 1] = np.clip(points_dest[:, 1], 0, w - 1)
matrix = skimage.transform.PiecewiseAffineTransform()
matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])
return {
"matrix": matrix,
}
def apply(self, img: np.ndarray, matrix: skimage.transform.PiecewiseAffineTransform = None, **params) -> np.ndarray:
return F.piecewise_affine(img, matrix, self.interpolation, self.mode, self.cval)
def apply_to_mask(
self, img: np.ndarray, matrix: skimage.transform.PiecewiseAffineTransform = None, **params
) -> np.ndarray:
return F.piecewise_affine(img, matrix, self.mask_interpolation, self.mode, self.cval_mask)
def apply_to_bbox(
self,
bbox: BoxInternalType,
rows: int = 0,
cols: int = 0,
matrix: skimage.transform.PiecewiseAffineTransform = None,
**params
) -> BoxInternalType:
return F.bbox_piecewise_affine(bbox, matrix, rows, cols, self.keypoints_threshold)
def apply_to_keypoint(
self,
keypoint: KeypointInternalType,
rows: int = 0,
cols: int = 0,
matrix: skimage.transform.PiecewiseAffineTransform = None,
**params
):
return F.keypoint_piecewise_affine(keypoint, matrix, rows, cols, self.keypoints_threshold)
功能: 丢弃像素,即 设置某些像素值为0。
参数说明: dropout_prob:丢弃像素的概率。
per_channel:通道维度是否独立操作,若为True,表示每个通道单独生成drop mask。
drop_value:丢弃位置重置的像素值,默认值为0。若drop_value=None,则在数据范围内随机取值。
- uint8 - [0, 255]
- uint16 - [0, 65535]
- uint32 - [0, 4294967295]
- float, double - [0, 1]
mask_drop_value:mask丢弃位置重置的像素值,默认值为0。若mask_drop_value=None,mask值不变。
# source code
class PixelDropout(DualTransform):
"""Set pixels to 0 with some probability.
Args:
dropout_prob (float): pixel drop probability. Default: 0.01
per_channel (bool): if set to `True` drop mask will be sampled fo each channel,
otherwise the same mask will be sampled for all channels. Default: False
drop_value (number or sequence of numbers or None): Value that will be set in dropped place.
If set to None value will be sampled randomly, default ranges will be used:
- uint8 - [0, 255]
- uint16 - [0, 65535]
- uint32 - [0, 4294967295]
- float, double - [0, 1]
Default: 0
mask_drop_value (number or sequence of numbers or None): Value that will be set in dropped place in masks.
If set to None masks will be unchanged. Default: 0
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask
Image types:
any
"""
def __init__(
self,
dropout_prob: float = 0.01,
per_channel: bool = False,
drop_value: Optional[Union[float, Sequence[float]]] = 0,
mask_drop_value: Optional[Union[float, Sequence[float]]] = None,
always_apply: bool = False,
p: float = 0.5,
):
super().__init__(always_apply, p)
self.dropout_prob = dropout_prob
self.per_channel = per_channel
self.drop_value = drop_value
self.mask_drop_value = mask_drop_value
if self.mask_drop_value is not None and self.per_channel:
raise ValueError("PixelDropout supports mask only with per_channel=False")
def apply(
self, img: np.ndarray, drop_mask: np.ndarray = None, drop_value: Union[float, Sequence[float]] = (), **params
) -> np.ndarray:
assert drop_mask is not None
return F.pixel_dropout(img, drop_mask, drop_value)
def apply_to_mask(self, img: np.ndarray, drop_mask: np.ndarray = np.array([]), **params) -> np.ndarray:
if self.mask_drop_value is None:
return img
if img.ndim == 2:
drop_mask = np.squeeze(drop_mask)
return F.pixel_dropout(img, drop_mask, self.mask_drop_value)
def apply_to_bbox(self, bbox, **params):
return bbox
def apply_to_keypoint(self, keypoint, **params):
return keypoint
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
img = params["image"]
shape = img.shape if self.per_channel else img.shape[:2]
rnd = np.random.RandomState(random.randint(0, 1 << 31))
# Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])
drop_value: Union[float, Sequence[float], np.ndarray]
if drop_mask.ndim != img.ndim:
drop_mask = np.expand_dims(drop_mask, -1)
if self.drop_value is None:
drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])
if img.dtype in (np.uint8, np.uint16, np.uint32):
drop_value = rnd.randint(0, int(F.MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
elif img.dtype in [np.float32, np.double]:
drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtpye)
else:
raise ValueError(f"Unsupported dtype: {
img.dtype}")
else:
drop_value = self.drop_value
return {
"drop_mask": drop_mask, "drop_value": drop_value}
@property
def targets_as_params(self) -> List[str]:
return ["image"]
def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")
# F.pixel_dropout()
@preserve_shape
def pixel_dropout(image: np.ndarray, drop_mask: np.ndarray, drop_value: Union[float, Sequence[float]]) -> np.ndarray:
if isinstance(drop_value, (int, float)) and drop_value == 0:
drop_values = np.zeros_like(image)
else:
drop_values = np.full_like(image, drop_value) # type: ignore
return np.where(drop_mask, drop_values, image)
下图中右下角per_channel=True时通道独立进行pixel drop操作,所以会出现彩噪现象。
功能: 随机裁剪
参数说明: height、width (int): 裁剪区域的宽高。
class RandomCrop(DualTransform):
"""Crop a random part of the input.
Args:
height (int): height of the crop.
width (int): width of the crop.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
def __init__(self, height, width, always_apply=False, p=1.0):
super().__init__(always_apply, p)
self.height = height
self.width = width
def apply(self, img, h_start=0, w_start=0, **params):
return F.random_crop(img, self.height, self.width, h_start, w_start)
def get_params(self):
return {
"h_start": random.random(), "w_start": random.random()}
def apply_to_bbox(self, bbox, **params):
return F.bbox_random_crop(bbox, self.height, self.width, **params)
def apply_to_keypoint(self, keypoint, **params):
return F.keypoint_random_crop(keypoint, self.height, self.width, **params)
def get_transform_init_args_names(self):
return ("height", "width")
功能: 图像四周边缘裁剪掉部分,结果不resize,所以会改变原始图像尺寸。
参数说明:
以下四个参数表示四边的裁剪比例,有效范围: (0.0, 1.0),默认值均为0.1。
crop_left (float): 图像左侧裁剪比例。裁剪的像素值将在 [0, crop_left * width)范围内随机取值。
crop_right (float): 图像右侧裁剪比例。裁剪的像素值将在 [(1 - crop_right) * width, width)范围内随机取值。
crop_top (float): 图像顶侧裁剪比例。裁剪的像素值将在[0, crop_top * height)范围内随机取值。
crop_bottom (float): 图像底侧裁剪比例。裁剪的像素值将在[(1 - crop_bottom) * height, height)范围内随机取值。
# source code
class RandomCropFromBorders(DualTransform):
"""Crop bbox from image randomly cut parts from borders without resize at the end
Args:
crop_left (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from left side in range [0, crop_left * width)
crop_right (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from right side in range [(1 - crop_right) * width, width)
crop_top (float): singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from top side in range [0, crop_top * height)
crop_bottom (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
from bottom side in range [(1 - crop_bottom) * height, height)
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
def __init__(
self,
crop_left=0.1,
crop_right=0.1,
crop_top=0.1,
crop_bottom=0.1,
always_apply=False,
p=1.0,
):
super(RandomCropFromBorders, self).__init__(always_apply, p)
self.crop_left = crop_left
self.crop_right = crop_right
self.crop_top = crop_top
self.crop_bottom = crop_bottom
def get_params_dependent_on_targets(self, params):
img = params["image"]
x_min = random.randint(0, int(self.crop_left * img.shape[1]))
x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * img.shape[1])), img.shape[1])
y_min = random.randint(0, int(self.crop_top * img.shape[0]))
y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * img.shape[0])), img.shape[0])
return {
"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}
def apply(self, img, x_min=0, x_max=0, y_min=0, y_max=0, **params):
return F.clamping_crop(img, x_min, y_min, x_max, y_max)
def apply_to_mask(self, mask, x_min=0, x_max=0, y_min=0, y_max=0, **params):
return F.clamping_crop(mask, x_min, y_min, x_max, y_max)
def apply_to_bbox(self, bbox, x_min=0, x_max=0, y_min=0, y_max=0, **params):
rows, cols = params["rows"], params["cols"]
return F.bbox_crop(bbox, x_min, y_min, x_max, y_max, rows, cols)
def apply_to_keypoint(self, keypoint, x_min=0, x_max=0, y_min=0, y_max=0, **params):
return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))
@property
def targets_as_params(self):
return ["image"]
def get_transform_init_args_names(self):
return "crop_left", "crop_right", "crop_top", "crop_bottom"
结果图可以看到该操作会改变图像尺寸。
功能: 在指定box区域附近裁剪图像。
参数说明:
-
max_part_shift (float, (float, float)): 高和宽方向上相对于
cropping_bbox
最大偏移。 Default (0.3, 0.3). -
cropping_box_key (str): 指定的rect区域键值。 Default
cropping_bbox
。rect区域坐标为四个数分别对应左上角x,y坐标,右下角x,y坐标。注意cropping_bbox未支持多个区域指定,从以下代码可以看出。bbox = params[self.cropping_bbox_key] h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0]) w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])
class RandomCropNearBBox(DualTransform):
"""Crop bbox from image with random shift by x,y coordinates
Args:
max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
to `cropping_bbox` dimension.
If max_part_shift is a single float, the range will be (max_part_shift, max_part_shift).
Default (0.3, 0.3).
cropping_box_key (str): Additional target key for cropping box. Default `cropping_bbox`
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
Examples:
>>> aug = Compose(RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_box_key='test_box'),
>>> bbox_params=BboxParams("pascal_voc"))
>>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])
"""
def __init__(
self,
max_part_shift: Union[float, Tuple[float, float]] = (0.3, 0.3),
cropping_box_key: str = "cropping_bbox",
always_apply: bool = False,
p: float = 1.0,
):
super(RandomCropNearBBox, self).__init__(always_apply, p)
self.max_part_shift = to_tuple(max_part_shift, low=max_part_shift)
self.cropping_bbox_key = cropping_box_key
if min(self.max_part_shift) < 0 or max(self.max_part_shift) > 1:
raise ValueError("Invalid max_part_shift. Got: {}".format(max_part_shift))
def apply(
self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params
) -> np.ndarray:
return F.clamping_crop(img, x_min, y_min, x_max, y_max)
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
bbox = params[self.cropping_bbox_key]
h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])
x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)
y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)
x_min = max(0, x_min)
y_min = max(0, y_min)
return {
"x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}
def apply_to_bbox(self, bbox: Tuple[float, float, float, float], **params) -> Tuple[float, float, float, float]:
return F.bbox_crop(bbox, **params)
def apply_to_keypoint(
self,
keypoint: Tuple[float, float, float, float],
x_min: int = 0,
x_max: int = 0,
y_min: int = 0,
y_max: int = 0,
**params
) -> Tuple[float, float, float, float]:
return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))
指定的box区域是小鸟坐标,所以裁剪得到的三张图都包含小鸟。
**功能:**将图像分块,并随机打乱
参数说明: grid ((int, int)): 图像分为多少块,第一个数表示高度方向,第二个数表示宽度方向
# source code
class RandomGridShuffle(DualTransform):
"""
Random shuffle grid's cells on image.
Args:
grid ((int, int)): size of grid for splitting image.
Targets:
image, mask, keypoints
Image types:
uint8, float32
"""
def __init__(self,
grid: Tuple[int, int] = (3, 3),
always_apply: bool = False,
p: float = 0.5):
super(RandomGridShuffle, self).__init__(always_apply, p)
self.grid = grid
def apply(self, img: np.ndarray, tiles: np.ndarray = None, **params):
if tiles is not None:
img = F.swap_tiles_on_image(img, tiles)
return img
def apply_to_mask(self,
img: np.ndarray,
tiles: np.ndarray = None,
**params):
if tiles is not None:
img = F.swap_tiles_on_image(img, tiles)
return img
def apply_to_keypoint(self,
keypoint: Tuple[float, ...],
tiles: np.ndarray = None,
rows: int = 0,
cols: int = 0,
**params):
if tiles is None:
return keypoint
for (
current_left_up_corner_row,
current_left_up_corner_col,
old_left_up_corner_row,
old_left_up_corner_col,
height_tile,
width_tile,
) in tiles:
x, y = keypoint[:2]
if (old_left_up_corner_row <= y <
(old_left_up_corner_row + height_tile)) and (
old_left_up_corner_col <= x <
(old_left_up_corner_col + width_tile)):
x = x - old_left_up_corner_col + current_left_up_corner_col
y = y - old_left_up_corner_row + current_left_up_corner_row
keypoint = (x, y) + tuple(keypoint[2:])
break
return keypoint
def get_params_dependent_on_targets(self, params):
height, width = params["image"].shape[:2]
n, m = self.grid
if n <= 0 or m <= 0:
raise ValueError(
"Grid's values must be positive. Current grid [%s, %s]" %
(n, m))
if n > height // 2 or m > width // 2:
raise ValueError(
"Incorrect size cell of grid. Just shuffle pixels of image")
height_split = np.linspace(0, height, n + 1, dtype=np.int)
width_split = np.linspace(0, width, m + 1, dtype=np.int)
height_matrix, width_matrix = np.meshgrid(height_split,
width_split,
indexing="ij")
index_height_matrix = height_matrix[:-1, :-1]
index_width_matrix = width_matrix[:-1, :-1]
shifted_index_height_matrix = height_matrix[1:, 1:]
shifted_index_width_matrix = width_matrix[1:, 1:]
height_tile_sizes = shifted_index_height_matrix - index_height_matrix
width_tile_sizes = shifted_index_width_matrix - index_width_matrix
tiles_sizes = np.stack((height_tile_sizes, width_tile_sizes), axis=2)
index_matrix = np.indices((n, m))
new_index_matrix = np.stack(index_matrix, axis=2)
for bbox_size in np.unique(tiles_sizes.reshape(-1, 2), axis=0):
eq_mat = np.all(tiles_sizes == bbox_size, axis=2)
new_index_matrix[eq_mat] = random_utils.permutation(
new_index_matrix[eq_mat])
new_index_matrix = np.split(new_index_matrix, 2, axis=2)
old_x = index_height_matrix[new_index_matrix[0],
new_index_matrix[1]].reshape(-1)
old_y = index_width_matrix[new_index_matrix[0],
new_index_matrix[1]].reshape(-1)
shift_x = height_tile_sizes.reshape(-1)
shift_y = width_tile_sizes.reshape(-1)
curr_x = index_height_matrix.reshape(-1)
curr_y = index_width_matrix.reshape(-1)
tiles = np.stack([curr_x, curr_y, old_x, old_y, shift_x, shift_y],
axis=1)
return {
"tiles": tiles}
@property
def targets_as_params(self):
return ["image"]
def get_transform_init_args_names(self):
return ("grid", )
功能: 裁剪图像某个区域,并缩放至指定尺寸。相似功能:RandomResizedCrop
参数说明:
height、width (int): 缩放的目标尺寸。
scale ((float, float)): 相对原始图像的裁剪范围。
ratio ((float, float)): 宽高比变化范围。
interpolation (OpenCV flag): 插值方式。 Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
# source code
class RandomResizedCrop(_BaseRandomSizedCrop):
"""Torchvision's variant of crop a random part of the input and rescale it to some size.
Args:
height (int): height after crop and resize.
width (int): width after crop and resize.
scale ((float, float)): range of size of the origin size cropped
ratio ((float, float)): range of aspect ratio of the origin aspect ratio cropped
interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.
p (float): probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes, keypoints
Image types:
uint8, float32
"""
def __init__(
self,
height,
width,
scale=(0.08, 1.0),
ratio=(0.75, 1.3333333333333333),
interpolation=cv2.INTER_LINEAR,
always_apply=False,
p=1.0,
):
super(RandomResizedCrop, self).__init__(
height=height, width=width, interpolation=interpolation, always_apply=always_apply, p=p
)
self.scale = scale
self.ratio = ratio
def get_params_dependent_on_targets(self, params):
img = params["image"]
area = img.shape[0] * img.shape[1]
for _attempt in range(10):
target_area = random.uniform(*self.scale) * area
log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
aspect_ratio = math.exp(random.uniform(*log_ratio))
# aspect_ratio = w / h
w = int(round(math.sqrt(target_area * aspect_ratio))) # skipcq: PTC-W0028
h = int(round(math.sqrt(target_area / aspect_ratio))) # skipcq: PTC-W0028
if 0 < w <= img.shape[1] and 0 < h <= img.shape[0]:
i = random.randint(0, img.shape[0] - h)
j = random.randint(0, img.shape[1] - w)
return {
"crop_height": h,
"crop_width": w,
"h_start": i * 1.0 / (img.shape[0] - h + 1e-10),
"w_start": j * 1.0 / (img.shape[1] - w + 1e-10),
}
# Fallback to central crop
in_ratio = img.shape[1] / img.shape[0]
if in_ratio < min(self.ratio):
w = img.shape[1]
h = int(round(w / min(self.ratio)))
elif in_ratio > max(self.ratio):
h = img.shape[0]
w = int(round(h * max(self.ratio)))
else: # whole image
w = img.shape[1]
h = img.shape[0]
i = (img.shape[0] - h) // 2
j = (img.shape[1] - w) // 2
return {
"crop_height": h,
"crop_width": w,
"h_start": i * 1.0 / (img.shape[0] - h + 1e-10),
"w_start": j * 1.0 / (img.shape[1] - w + 1e-10),
}
def get_params(self):
return {
}
@property
def targets_as_params(self):
return ["image"]
def get_transform_init_args_names(self):
return "height", "width", "scale", "ratio", "interpolation"
Function: Rotate the picture 90 degrees 0 or more times, that is, randomly rotate the original picture 0°, 90°, 180°, and 270°.
class RandomRotate90(DualTransform):
"""Randomly rotate the input by 90 degrees zero or more times.
Args:
p (float): probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes, keypoints
"""
def apply(self, img, factor=0, **params):
"""
Args:
factor (int): number of times the input will be rotated by 90 degrees.
"""
return np.ascontiguousarray(np.rot90(img, factor))
def get_params(self):
# Random int in the range [0, 3]
return {
"factor": random.randint(0, 3)}
def apply_to_bbox(self, bbox, factor=0, **params):
return F.bbox_rot90(bbox, factor, **params)
def apply_to_keypoint(self, keypoint, factor=0, **params):
return F.keypoint_rot90(keypoint, factor, **params)
def get_transform_init_args_names(self):
return ()