Explicación detallada de la mejora de imágenes de Albumentaciones (actualizada continuamente)

Catálogo de métodos de albumentaciones.

Prefacio

Este artículo tiene como objetivo explicar en detalle el uso de métodos de mejora, combinar el código fuente para comprender el significado de los parámetros y el rango de valores válidos, y combinar los resultados de la visualización para comprender intuitivamente las funciones de cada método de mejora y cómo se diferencian los valores de los parámetros. ​Afecta la imagen mejorada.
Consulte el sitio web oficial, todos los métodos de mejora se dividen en dos categorías principales: transformaciones a nivel de píxeles y transformaciones a nivel espacial. La diferencia entre los dos radica en si el método de mejora provocará cambios en los atributos adicionales de la imagen (como máscaras, cuadros delimitadores, puntos clave). El nivel de píxel no lo hace, pero el nivel espacial sí. Las transformaciones de nivel espacial tienen una tabla de descripción general para registrar los cambios de atributos adicionales causados ​​por cada método de mejora. Los métodos de mejora para cada categoría están ordenados alfabéticamente para facilitar su recuperación.

Notas de lanzamiento

La versión cuando se editó inicialmente este artículo fue la versión Albumentations: 1.3.0 . v1.3 tiene cambios importantes respecto a la versión anterior (nuevos métodos de transformación, directorios de niveles reestructurados, etc.) Se recomienda actualizar a la versión 1.3.0 y superiores, de lo contrario habrá algunos cambios, no se puede realizar la llamada o la ruta es incorrecta. Algunos métodos de transformación de este artículo se encuentran en la versión 1.3.0 o superior. Si no se pueden llamar algunas funciones, puede actualizarlas.
Albumes actualizados: pip install -U albumentations
El código en el texto por defecto es import albumentations as A, si aparece A.transformxxequivale a albumentations.transformxx
Si hay algún error, indícalo en el área de comentarios.

Lectura ampliada

Sitio web del código oficial: https://github.com/albumentations-team/albumentations
Documentación oficial: https://albumentations.readthedocs.io/
Visualización parcialmente mejorada: método de mejora de datos de albumentaciones (los resultados de VerticalFlip y HorizontalFlip se invierten en el artículo )

Mejora de la imagen de la escena de la carretera: https://github.com/UjjwalSaxena/Automold–Road-Augmentation-Library
Albumentations ya contiene algunas de estas implementaciones: RandomRain, RandomFog, RandomSunFlare, RandomShadow, RandomSnow.


llamada básica

Notas

  1. Al llamar, preste atención al parámetro predeterminado p, que en su mayoría es p = 0,5 y ocasionalmente p = 1.
    Este parámetro representa la probabilidad de aplicar la transformación y no aparecerá en descripciones de parámetros posteriores.
    Ver parámetros de inicialización base : get_base_init_args()
    Ver parámetros de inicialización de transformación :get_base_init_args()
  2. Muchos parámetros aceptan entradas como un solo número o un rango de dos números. Las dos formas de intervalo digital generalmente se muestrean aleatoriamente dentro de este rango. Si es un solo número, algunas se convierten al intervalo predeterminado (como el parámetro ColorJitter, que se explica en detalle) y otras usan el valor directamente (como el parámetro Salpicaduras). Presta atención a la distinción.
  3. El método de cada transformación applyes el núcleo. initEn el método, se realizará algún trabajo de preprocesamiento en los parámetros de entrada, como convertir un solo número en un parámetro de intervalo, verificar si el parámetro está dentro del intervalo válido, etc.
  4. El método get_params() no se puede llamar solo para rastrear los parámetros correspondientes al gráfico de resultados, porque cuando se llama solo al método get_params(), el muestreo aleatorio se realiza nuevamente.
    Si desea corregir los parámetros, puede establecer los valores límite del intervalo de los parámetros de entrada en el mismo valor, de modo que el muestreo aleatorio solo pueda ser él mismo.
  5. El cuadro delimitador se refiere a las coordenadas después de la normalización (x/ancho, y/alto), tipo flotante y valor absoluto no entero.
  6. Hay muchos métodos que involucran la complementación de los límites de la imagen y la visualización del parámetro border_mode:
    Filtrado OpenCV copyMakeBorder y borderInterpolate
    Procesamiento de imágenes OpenCV | 1.16 Procesamiento de límites convolucionales
    OpenCV: expandir el límite de la imagen

Código de demostración

Código de demostración para llamar al método mejorado, tomando el método Sharpen como ejemplo:

```python
import cv2
import albumentations as A

if __name__ == "__main__":
        filename = 'src'
	    src_img = cv2.imread(f'imgs/{filename}.jpg')
	    dst_path = f'imgs/{filename}_aug.jpg'
	    transform = A.Sharpen(alpha=(0.2, 0.5), lightness=(0.5, 1.0), p=0.5)
	    img_aug = transform(image=src_img)['image']
	    cv2.imwrite(dst_path, img_aug)
```

to_tupla()

Esta función se utiliza a menudo al convertir un único parámetro de entrada en un parámetro de intervalo.
Tenga en cuenta que el parámetro bajo representa el valor de relleno para el otro límite.

举例:
self.blur_limit = to_tuple(1, 3) # self.blur_limit = (1, 3)
self.blur_limit = to_tuple(5, 3) # self.blur_limit = (3, 5)
# source code
def to_tuple(param, low=None, bias=None):
    """Convert input argument to min-max tuple
    Args:
        param (scalar, tuple or list of 2+ elements): Input value.
            If value is scalar, return value would be (offset - value, offset + value).
            If value is tuple, return value would be value + offset (broadcasted).
        low:  Second element of tuple can be passed as optional argument
        bias: An offset factor added to each element
    """
    if low is not None and bias is not None:
        raise ValueError("Arguments low and bias are mutually exclusive")

    if param is None:
        return param

    if isinstance(param, (int, float)):
        if low is None:
            param = -param, +param
        else:
            param = (low, param) if low < param else (param, low)
    elif isinstance(param, Sequence):
        if len(param) != 2:
            raise ValueError("to_tuple expects 1 or 2 values")
        param = tuple(param)
    else:
        raise ValueError("Argument param must be either scalar (int, float) or tuple")

    if bias is not None:
        return tuple(bias + x for x in param)

    return tuple(param)

Obtenga los parámetros base predeterminados inicializados

método: get_base_init_args()
contiene dos parámetros " always_apply" y " "p

# source code
    def get_base_init_args(self) -> Dict[str, Any]:
        return {
    
    "always_apply": self.always_apply, "p": self.p}
# demo code
transform1 = A.Emboss()
print(transform1.get_base_init_args())
# output
# {'always_apply': False, 'p': 0.5}

transform1 = A.Emboss(p=1)
print(transform1.get_base_init_args())
# output
# {'always_apply': False, 'p': 1}

Obtenga los parámetros de transformación predeterminados inicializados

método: parámetros de transformación distintos de los get_transform_init_args()
parámetros básicosalways_apply、p
Nota: Antes de llamar a esta función, debe implementar get_transform_init_args_names()el método para especificar los parámetros de transformación que deben obtenerse, porque BasicTransformla clase no implementa este método.

# source code from class Emboss(ImageOnlyTransform)
    def get_transform_init_args_names(self): # 若变换的该方法未实现,需先实现
        return ("alpha", "strength")
    def get_transform_init_args(self) -> Dict[str, Any]:
        return {
    
    k: getattr(self, k) for k in self.get_transform_init_args_names()}
# demo code
transform1 = A.Emboss()
print(transform1.get_transform_init_args())
# output
# {'alpha': (0.2, 0.5), 'strength': (0.2, 0.7)}

transform1 = A.Emboss(alpha=(0.1, 0.5))
print(transform1.get_transform_init_args())
# output
# {'alpha': (0.1, 0.5), 'strength': (0.2, 0.7)}

Obtener parámetros aleatorios

Método: get_params_dependent_on_targets()
este método BasicTransformno está implementado. Puede consultar la siguiente ChannelShuffle()implementación para devolver los parámetros que desea ver.
Nota: Esta función no se puede llamar por separado para ver los parámetros correspondientes al gráfico de resultados. El número aleatorio ha cambiado cuando se llama por separado.

# ChannelShuffle.get_params_dependent_on_targets
    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        ch_arr = list(range(img.shape[2]))
        random.shuffle(ch_arr)
        return {
    
    "channels_shuffled": ch_arr}
# demo code
# 查看ChannelShuffle变换随机生成的channels_shuffled参数
param = A.ChannelShuffle().get_params_dependent_on_targets(
         dict(image=src_img))['channels_shuffled']

Transformaciones a nivel de píxeles

像素级变换将仅更改输入图像,对应的其他targets例如mask、bounding boxes和keypoints将保持不变。

Pixel-level transforms will change just an input image and will leave any additional targets such as masks, bounding boxes, and keypoints unchanged.

像素级变换列举如下:

功能:Blur the input image using a Generalized Normal filter with a randomly selected parameters.
参数说明:
ScaleFloatType = Union[float, Tuple[float, float]]
ScaleIntType = Union[int, Tuple[int, int]]
以下参数只有 blur_limit和rotate_limit是ScaleIntType,其余为ScaleFloatType,都是可以输入一个整数或者一个范围。整数输入会根据内部逻辑自动转为区间。最后变换应用参数由在区间内随机采样获取。

  • blur_limit: 图像模糊的最大高斯核。可以为0或者正奇数。默认值:(3, 7)。
    若为0,会根据sigma参数自动计算: round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1
  • sigmaX_limit: X方向上的高斯核标准差. 可以为0或者正数。默认值:0
    若为0,会根据ksize参数自动计算: sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8
    若为正数,会转化为区间范围(0, sigma_limit),在该范围内随机取值。
  • sigmaY_limit: desviación estándar del núcleo gaussiano en la dirección Y.
  • rotar_limit: Parámetros para rotar el núcleo gaussiano. Si la entrada es un número entero, se convertirá a (-rotate_limit, rotate_limit). Valor predeterminado: (-90, 90).
  • beta_limit: Parámetro que controla la forma de la distribución. 1 es una distribución normal. Valor predeterminado: (0,5, 8,0).
  • noise_limit: Factor multiplicativo que controla la intensidad del ruido. Debe ser un número positivo, preferiblemente alrededor de 1,0. Si es un número único, se convertirá en el intervalo (0, límite_ruido). Valor predeterminado: (0,75, 1,25).
    Nota: blur_limit y sigmaX_limit (sigmaY_limit) tienen una dependencia de cálculo y ambos no pueden ser 0 al mismo tiempo. ! !
# source code
class AdvancedBlur(ImageOnlyTransform):
    """Blur the input image using a Generalized Normal filter with a randomly selected parameters.
        This transform also adds multiplicative noise to generated kernel before convolution.

    Args:
        blur_limit: maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigmaX_limit: Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigmaX_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        sigmaY_limit: Same as `sigmaY_limit` for another dimension.
        rotate_limit: Range from which a random angle used to rotate Gaussian kernel is picked.
            If limit is a single int an angle is picked from (-rotate_limit, rotate_limit). Default: (-90, 90).
        beta_limit: Distribution shape parameter, 1 is the normal distribution. Values below 1.0 make distribution
            tails heavier than normal, values above 1.0 make it lighter than normal. Default: (0.5, 8.0).
        noise_limit: Multiplicative factor that control strength of kernel noise. Must be positive and preferably
            centered around 1.0. If set single value `noise_limit` will be in range (0, noise_limit).
            Default: (0.75, 1.25).
        p (float): probability of applying the transform. Default: 0.5.

    Reference:
        https://arxiv.org/abs/2107.10833

    Targets:
        image
    Image types:
        uint8, float32
    """

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigmaX_limit: ScaleFloatType = (0.2, 1.0),
        sigmaY_limit: ScaleFloatType = (0.2, 1.0),
        rotate_limit: ScaleIntType = 90,
        beta_limit: ScaleFloatType = (0.5, 8.0),
        noise_limit: ScaleFloatType = (0.9, 1.1),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.blur_limit = to_tuple(blur_limit, 3)
        self.sigmaX_limit = self.__check_values(to_tuple(sigmaX_limit, 0.0), name="sigmaX_limit")
        self.sigmaY_limit = self.__check_values(to_tuple(sigmaY_limit, 0.0), name="sigmaY_limit")
        self.rotate_limit = to_tuple(rotate_limit)
        self.beta_limit = to_tuple(beta_limit, low=0.0)
        self.noise_limit = self.__check_values(to_tuple(noise_limit, 0.0), name="noise_limit")

        if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
            self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
        ):
            raise ValueError("AdvancedBlur supports only odd blur limits.")

        if self.sigmaX_limit[0] == 0 and self.sigmaY_limit[0] == 0:
            raise ValueError("sigmaX_limit and sigmaY_limit minimum value can not be both equal to 0.")

        if not (self.beta_limit[0] < 1.0 < self.beta_limit[1]):
            raise ValueError("Beta limit is expected to include 1.0")

    @staticmethod
    def __check_values(
        value: Sequence[float], name: str, bounds: Tuple[float, float] = (0, float("inf"))
    ) -> Sequence[float]:
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"{
      
      name} values should be between {
      
      bounds}")
        return value

    def apply(self, img: np.ndarray, kernel: np.ndarray = None, **params) -> np.ndarray:
        return FMain.convolve(img, kernel=kernel)

    def get_params(self) -> Dict[str, np.ndarray]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2)
        sigmaX = random.uniform(*self.sigmaX_limit)
        sigmaY = random.uniform(*self.sigmaY_limit)
        angle = np.deg2rad(random.uniform(*self.rotate_limit))

        # Split into 2 cases to avoid selection of narrow kernels (beta > 1) too often.
        if random.random() < 0.5:
            beta = random.uniform(self.beta_limit[0], 1)
        else:
            beta = random.uniform(1, self.beta_limit[1])

        noise_matrix = random_utils.uniform(self.noise_limit[0], self.noise_limit[1], size=[ksize, ksize])

        # Generate mesh grid centered at zero.
        ax = np.arange(-ksize // 2 + 1.0, ksize // 2 + 1.0)
        # Shape (ksize, ksize, 2)
        grid = np.stack(np.meshgrid(ax, ax), axis=-1)

        # Calculate rotated sigma matrix
        d_matrix = np.array([[sigmaX**2, 0], [0, sigmaY**2]])
        u_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
        sigma_matrix = np.dot(u_matrix, np.dot(d_matrix, u_matrix.T))

        inverse_sigma = np.linalg.inv(sigma_matrix)
        # Described in "Parameter Estimation For Multivariate Generalized Gaussian Distributions"
        kernel = np.exp(-0.5 * np.power(np.sum(np.dot(grid, inverse_sigma) * grid, 2), beta))
        # Add noise
        kernel = kernel * noise_matrix

        # Normalize kernel
        kernel = kernel.astype(np.float32) / np.sum(kernel)
        return {
    
    "kernel": kernel}

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str, str, str]:
        return (
            "blur_limit",
            "sigmaX_limit",
            "sigmaY_limit",
            "rotate_limit",
            "beta_limit",
            "noise_limit",
        )

Tres imágenes de resultados generadas aleatoriamente con parámetros predeterminados. Las imágenes visuales se comprimen cuando se muestran una al lado de la otra y el cambio no es evidente a simple vista.
Insertar descripción de la imagen aquí


Función : Descripción del parámetro de desenfoque de imagen
: blur_limit (int, (int, int)): tamaño máximo del núcleo de la imagen borrosa. Rango de valores válido [3, inf), valor predeterminado: (3, 7).

# source code
class Blur(ImageOnlyTransform):
    """Blur the input image using a random-sized kernel.

    Args:
        blur_limit (int, (int, int)): maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
        super().__init__(always_apply, p)
        self.blur_limit = to_tuple(blur_limit, 3)

    def apply(self, img: np.ndarray, ksize: int = 3, **params) -> np.ndarray:
        return F.blur(img, ksize)

    def get_params(self) -> Dict[str, Any]:
        return {
    
    "ksize": int(random.choice(np.arange(self.blur_limit[0], self.blur_limit[1] + 1, 2)))}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("blur_limit",)

Insertar descripción de la imagen aquí


Función : Aplicar ecualización de histograma adaptativo limitado por contraste a la imagen de entrada
Lectura adicional:
Mejora de imagen - CLAHE
Aprendizaje del algoritmo CLAHE

# source code
class CLAHE(ImageOnlyTransform):
    """Apply Contrast Limited Adaptive Histogram Equalization to the input image.
    Args:
        clip_limit (float or (float, float)): upper threshold value for contrast limiting.
            If clip_limit is a single float value, the range will be (1, clip_limit). Default: (1, 4).
        tile_grid_size ((int, int)): size of grid for histogram equalization. Default: (8, 8).
        p (float): probability of applying the transform. Default: 0.5.
    Targets:
        image
    Image types:
        uint8
    """

    def __init__(self, clip_limit=4.0, tile_grid_size=(8, 8), always_apply=False, p=0.5):
        super(CLAHE, self).__init__(always_apply, p)
        self.clip_limit = to_tuple(clip_limit, 1)
        self.tile_grid_size = tuple(tile_grid_size)

    def apply(self, img, clip_limit=2, **params):
        if not is_rgb_image(img) and not is_grayscale_image(img):
            raise TypeError("CLAHE transformation expects 1-channel or 3-channel images.")

        return F.clahe(img, clip_limit, self.tile_grid_size)

    def get_params(self):
        return {
    
    "clip_limit": random.uniform(self.clip_limit[0], self.clip_limit[1])}

    def get_transform_init_args_names(self):
        return ("clip_limit", "tile_grid_size")

Insertar descripción de la imagen aquí


Función : elimine aleatoriamente algunos canales y llénelos con valores fijos.
Descripción del parámetro:
channel_drop_range (int, int):, [min_dropout_channel_num, max_dropout_channel_num](闭区间)lo que indica que se selecciona aleatoriamente un número dentro del rango channel_drop_range como el número de canales eliminados. El ID del canal de entrega específico se genera aleatoriamente por elección.
Entre ellos min_dropout_channel_num > 0(no se admiten imágenes de un solo canal) max_dropout_channel_num < image_channels(no se pueden eliminar todos los canales), min_dropout_channel_num puede ser igual a max_dropout_channel_num, el valor predeterminado es (1,1), es decir, un canal se elimina aleatoriamente.

fill_value (int, float): el valor de píxel utilizado para llenar el canal eliminado, valor predeterminado 0.

Explicación detallada del mecanismo de caída :

  1. Determinar el número de canales para la caída

    num_drop_channels = random.randint(channel_drop_range[0], channel_drop_range[1])
    
  2. Seleccione aleatoriamente num_drop_channels canales entre los canales de imagen y complete los canales seleccionados con fill_value

    channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)
    
  3. Valor_de_relleno para el canal canales_para_eliminar seleccionado

    def channel_dropout(img, channels_to_drop, fill_value=0):
        if len(img.shape) == 2 or img.shape[2] == 1:
            raise NotImplementedError("Only one channel. ChannelDropout is not defined.")
    
        img = img.copy()
    
        img[..., channels_to_drop] = fill_value
    
        return img
    

El código fuente de ChannelDropout es el siguiente:

# source code
class ChannelDropout(ImageOnlyTransform):
    """Randomly Drop Channels in the input Image.

    Args:
        channel_drop_range (int, int): range from which we choose the number of channels to drop.
        fill_value (int, float): pixel value for the dropped channel.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, uint16, unit32, float32
    """

    def __init__(self, channel_drop_range=(1, 1), fill_value=0, always_apply=False, p=0.5):
        super(ChannelDropout, self).__init__(always_apply, p)

        self.channel_drop_range = channel_drop_range

        self.min_channels = channel_drop_range[0]
        self.max_channels = channel_drop_range[1]

        if not 1 <= self.min_channels <= self.max_channels:
            raise ValueError("Invalid channel_drop_range. Got: {}".format(channel_drop_range))

        self.fill_value = fill_value

    def apply(self, img, channels_to_drop=(0,), **params):
        return F.channel_dropout(img, channels_to_drop, self.fill_value)

    def get_params_dependent_on_targets(self, params):
        img = params["image"]

        num_channels = img.shape[-1]

        if len(img.shape) == 2 or num_channels == 1:
            raise NotImplementedError("Images has one channel. ChannelDropout is not defined.")

        if self.max_channels >= num_channels:
            raise ValueError("Can not drop all channels in ChannelDropout.")

        num_drop_channels = random.randint(self.min_channels, self.max_channels)

        channels_to_drop = random.sample(range(num_channels), k=num_drop_channels)

        return {
    
    "channels_to_drop": channels_to_drop}

    def get_transform_init_args_names(self):
        return ("channel_drop_range", "fill_value")

    @property
    def targets_as_params(self):
        return ["image"]

Insertar descripción de la imagen aquí
La imagen leída por opencv está en formato BGR. Cuando canales_to_drop = [1], el canal G se elimina y se llena con 0, por lo que la parte verde de la imagen superior derecha se vuelve negra.
Cuandochannels_to_drop=[0], suelte el canal B y rellénelo con 0, de modo que la parte azul de la imagen inferior izquierda se vuelva negra.
Cuandochannels_to_drop=[1,2], suelte los canales G y R y rellénelos con 0, de modo que las partes verde y roja de la imagen inferior derecha se vuelvan negras y la parte inferior blanca tenga tres canales RGB. El canal RG está configurado a 0, dejando solo el canal B. es 255, por lo que el fondo se vuelve azul.


Función : Reorganización del canal de imagen de entrada (reorganizar canales)

# source code
class ChannelShuffle(ImageOnlyTransform):
    """Randomly rearrange channels of the input RGB image.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    @property
    def targets_as_params(self):
        return ["image"]

    def apply(self, img, channels_shuffled=(0, 1, 2), **params):
        return F.channel_shuffle(img, channels_shuffled)

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        ch_arr = list(range(img.shape[2])) 
        random.shuffle(ch_arr) # 生成随机通道列表
        return {
    
    "channels_shuffled": ch_arr}

    def get_transform_init_args_names(self):
        return ()
        
####################### F.channel_shuffle
def channel_shuffle(img, channels_shuffled):
    img = img[..., channels_shuffled]
    return img

Insertar descripción de la imagen aquí
Arriba a la derecha: la imagen leída por opencv está en formato BGR, canales_shuffled = [0,2,1], lo que indica el intercambio del canal G y el canal R, por lo que el verde y el rojo en la figura están intercambiados. Abajo a la derecha: canales_shuffled =
[ 1,0,2], que indica B El canal se intercambia con el canal G, por lo que se intercambian los colores azul y verde de la imagen.


Función : cambia aleatoriamente el brillo, el contraste y la saturación de la imagen (todos los parámetros representan la amplitud de la fluctuación)

Cambia aleatoriamente el brillo, el contraste y la saturación de una imagen. En comparación con ColorJitter de torchvision,
esta transformación da resultados un poco diferentes porque Pillow (usado en torchvision) y OpenCV (usado en
Albumentations) transforman una imagen al formato HSV mediante fórmulas diferentes. Otra diferencia: Pillow usa
desbordamiento uint8, pero nosotros usamos saturación de valor.

Parámetros (consulte la función __check_values ​​​​en el código fuente a continuación para obtener más detalles):

  • Inicialización de parámetros:
    brillo, contraste, saturación, tonoFormulario de entrada: un número o un rango (flotante o tupla (lista) de flotante (mínimo, máximo)).
    El parámetro de intervalo (si se ingresa como un número, se convertirá internamente en un intervalo) debe cumplir con el intervalo válido de cada parámetro (consulte a continuación las reglas de intervalo válido y de intervalo de conversión de números).
    entoncesLos requisitos de entrada para cada parámetro son:
    brillo, contraste, saturación: float ∈ [ 0 , + ∞ ) , tuple ( list ) ∈ [ 0 , + ∞ ) float \in [0 , +\infty), tuple(list)\ en [ 0 , +\infty)flotar _ _ _ _[ 0 ,+ ) tu pl e ( lista ) _ _ _ _[ 0 ,+ )
    tono:float ∈ [ 0 , 0.5 ] , tupla ( lista ) ∈ [ − 0.5 , 0.5 ] float \in [0 , 0.5], tupla(lista)\in [-0.5, 0.5]flotar _ _ _ _[ 0 ,0.5 ] tupl e ( lista ) _ _ _ _ _[ -0,5 , _0,5 ]
    • Los intervalos válidos
      para brillo, contraste y saturación son: El [0, +inf]
      intervalo válido para tono es:[-0.5, 0.5]
    • Intervalo de conversión digital lógica interna
      brillo, contraste, saturación: [ max(0, 1 - input_value), 1 + input_value]
      tono:[ - input_value, + input_value]

Aplicar (consulte la función get_params en el código fuente a continuación para obtener más detalles) :

  • Se determina cada factor de transformación: aleatorio.uniforme (intervalo de parámetros procesados)
  • Cada transformación se aplica en un orden aleatorio.
# source code
class ColorJitter(ImageOnlyTransform):
    """Randomly changes the brightness, contrast, and saturation of an image. Compared to ColorJitter from torchvision,
    this transform gives a little bit different results because Pillow (used in torchvision) and OpenCV (used in
    Albumentations) transform an image to HSV format by different formulas. Another difference - Pillow uses uint8
    overflow, but we use value saturation.

    Args:
        brightness (float or tuple of float (min, max)): How much to jitter brightness.
            brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]
            or the given [min, max]. Should be non negative numbers.
        contrast (float or tuple of float (min, max)): How much to jitter contrast.
            contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]
            or the given [min, max]. Should be non negative numbers.
        saturation (float or tuple of float (min, max)): How much to jitter saturation.
            saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]
            or the given [min, max]. Should be non negative numbers.
        hue (float or tuple of float (min, max)): How much to jitter hue.
            hue_factor is chosen uniformly from [-hue, hue] or the given [min, max].
            Should have 0 <= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
    """

    def __init__(
        self,
        brightness=0.2,
        contrast=0.2,
        saturation=0.2,
        hue=0.2,
        always_apply=False,
        p=0.5,
    ):
        super(ColorJitter, self).__init__(always_apply=always_apply, p=p)

        self.brightness = self.__check_values(brightness, "brightness")
        self.contrast = self.__check_values(contrast, "contrast")
        self.saturation = self.__check_values(saturation, "saturation")
        # hue参数初始化的offset和bounds均不同于上,
        self.hue = self.__check_values(hue, "hue", offset=0, bounds=[-0.5, 0.5], clip=False)

    @staticmethod
    # 输入参数处理,需符合各参数有效区间
    def __check_values(value, name, offset=1, bounds=(0, float("inf")), clip=True):
        if isinstance(value, numbers.Number): # 数字转区间内部逻辑
            if value < 0: # 单个数字输入不可为负数
                raise ValueError("If {} is a single number, it must be non negative.".format(name))
            value = [offset - value, offset + value]
            if clip: # hue是不进行clip的,其他三个参数进行clip操作
                value[0] = max(value[0], 0)
        elif isinstance(value, (tuple, list)) and len(value) == 2:
            if not bounds[0] <= value[0] <= value[1] <= bounds[1]: # 若是区间输入,需满足各自的有效区间
                raise ValueError("{} values should be between {}".format(name, bounds))
        else:
            raise TypeError("{} should be a single number or a list/tuple with length 2.".format(name))

        return value

    def get_params(self):
        brightness = random.uniform(self.brightness[0], self.brightness[1])
        contrast = random.uniform(self.contrast[0], self.contrast[1])
        saturation = random.uniform(self.saturation[0], self.saturation[1])
        hue = random.uniform(self.hue[0], self.hue[1])

        transforms = [
            lambda x: F.adjust_brightness_torchvision(x, brightness),
            lambda x: F.adjust_contrast_torchvision(x, contrast),
            lambda x: F.adjust_saturation_torchvision(x, saturation),
            lambda x: F.adjust_hue_torchvision(x, hue),
        ]
        random.shuffle(transforms) # 各变换顺序随机

        return {
    
    "transforms": transforms}

    def apply(self, img, transforms=(), **params):
        if not F.is_rgb_image(img) and not F.is_grayscale_image(img): # 仅支持单通道和三通道图像输入
            raise TypeError("ColorJitter transformation expects 1-channel or 3-channel images.")

        for transform in transforms:
            img = transform(img)
        return img

    def get_transform_init_args_names(self):
        return ("brightness", "contrast", "saturation", "hue")

Tenga en cuenta que cada factor de parámetro que se muestra en el siguiente cuadro de resultados es el parámetro pasado al llamar a la función de cambio respectiva, no el parámetro de ColorJitter. ¡La relación correspondiente se describe en la sección de parámetros anterior!
Cambio de brillo:
Influencia del parámetro: Cuanto mayor sea el factor, más brillante será la imagen y viceversa
. Lógica:clip(img_value*factor)

# F.adjust_brightness_torchvision函数内容

def _adjust_brightness_torchvision_uint8(img, factor):
    lut = np.arange(0, 256) * factor
    lut = np.clip(lut, 0, 255).astype(np.uint8)
    return cv2.LUT(img, lut)
    
@preserve_shape
def adjust_brightness_torchvision(img, factor):
    if factor == 0:
        return np.zeros_like(img)
    elif factor == 1:
        return img

    if img.dtype == np.uint8:
        return _adjust_brightness_torchvision_uint8(img, factor)

    return clip(img * factor, img.dtype, MAX_VALUES_BY_DTYPE[img.dtype])

Insertar descripción de la imagen aquí

Cambios de contraste:
influencia del parámetro: cuanto menor es el factor, menor es el contraste entre la luz y la oscuridad en la imagen; cuanto mayor es el factor, mayor es el contraste entre la luz y la oscuridad en la imagen.
lógica:clip(img_value * factor + mean * (1 - factor))

# F.adjust_contrast_torchvision函数内容

def _adjust_contrast_torchvision_uint8(img, factor, mean):
    lut = np.arange(0, 256) * factor
    lut = lut + mean * (1 - factor)
    lut = clip(lut, img.dtype, 255)

    return cv2.LUT(img, lut)

@preserve_shape
def adjust_contrast_torchvision(img, factor):
    if factor == 1:
        return img

    if is_grayscale_image(img):
        mean = img.mean()
    else:
        mean = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY).mean()

    if factor == 0:
        return np.full_like(img, int(mean + 0.5), dtype=img.dtype)

    if img.dtype == np.uint8:
        return _adjust_contrast_torchvision_uint8(img, factor, mean)

    return clip(
        img.astype(np.float32) * factor + mean * (1 - factor),
        img.dtype,
        MAX_VALUES_BY_DTYPE[img.dtype],
    )

Insertar descripción de la imagen aquí

Cambios de saturación:
influencia del parámetro: cuanto menor es el factor, más escala de grises es la imagen, cuanto mayor es el factor, más brillante es el color de la imagen.
lógica:clip(img * factor + gray * (1 - factor)),原图和灰度图加权融合

# F.adjust_saturation_torchvision函数内容

@preserve_shape
def adjust_saturation_torchvision(img, factor, gamma=0):
    if factor == 1:
        return img

    if is_grayscale_image(img):
        gray = img
        return gray
    else:
        gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
        gray = cv2.cvtColor(gray, cv2.COLOR_GRAY2RGB) # 三通道的值一致,方便后面与原图加权

    if factor == 0:
        return gray

	# cv2.addWeighted:两个图像加权融合
	# result = img * factor + gray * (1 - factor)+ gamma
    result = cv2.addWeighted(img, factor, gray, 1 - factor, gamma=gamma) 
    if img.dtype == np.uint8:
        return result

    # OpenCV does not clip values for float dtype
    return clip(result, img.dtype, MAX_VALUES_BY_DTYPE[img.dtype])

Insertar descripción de la imagen aquí

Cambio de tono:
Influencia del parámetro: cuanto mayor sea el factor, más grave será el cambio de tono. factor=0, el tono permanece sin cambios.
lógica:图像转HSV颜色空间,np.mod(hue_value + factor * 180, 180) ,再转回RGB颜色空间

# F.adjust_hue_torchvision函数内容
def _adjust_hue_torchvision_uint8(img, factor):
    img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)

    lut = np.arange(0, 256, dtype=np.int16)
    lut = np.mod(lut + 180 * factor, 180).astype(np.uint8)
    img[..., 0] = cv2.LUT(img[..., 0], lut)

    return cv2.cvtColor(img, cv2.COLOR_HSV2RGB)

def adjust_hue_torchvision(img, factor):
    if is_grayscale_image(img):
        return img

    if factor == 0:
        return img

    if img.dtype == np.uint8:
        return _adjust_hue_torchvision_uint8(img, factor)

    img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    img[..., 0] = np.mod(img[..., 0] + factor * 360, 360)
    return cv2.cvtColor(img, cv2.COLOR_HSV2RGB)

Insertar descripción de la imagen aquí

Lectura adicional

¿Cuál es la diferencia entre contraste y saturación?
El contraste se refiere a la relación entre el brillo más alto y el brillo más bajo. Cuando el contraste de la imagen es mayor, la diferencia entre la luz y la oscuridad en la imagen es más obvia; la saturación se refiere a la pureza del color, cuanto más puro es el color, mayor es la saturación. Por ejemplo, el azul puro, el rojo puro y el verde puro pertenecen a la saturación alta, mientras que el azul gris, el rojo rosa y el verde hierba pertenecen a la saturación baja, por lo que cuanto mayor sea la saturación de la imagen, más brillante será el color de la imagen.

El contraste y la saturación son bastante diferentes en cuanto a tema, características y funciones, te los explicamos detalladamente a continuación:

1. Diferencias del cuerpo principal
1. Contraste: se refiere a la relación entre el brillo más alto y el brillo más bajo. Cuando el contraste de la imagen es mayor, la diferencia entre la luz y la oscuridad en la imagen es más obvia.
2. Saturación: se refiere a la pureza del color. Cuando la saturación de una imagen es mayor, los colores de la imagen son más vivos.

2. Diferencias de características
1. Contraste: cuanto mayor sea el rango de diferencia de color de la imagen, mayor será el contraste y viceversa. Cuando la relación de contraste alcanza 120:1, se pueden mostrar fácilmente colores vivos y ricos; y cuando la relación de contraste alcanza 300:1, se pueden admitir varios niveles de colores.
2. Saturación: La saturación depende de la proporción de componentes cromáticos y componentes acromáticos en el color. Cuanto mayor sea el componente de color, mayor será la saturación; cuanto mayor sea el componente acromático, menor será la saturación.

3. Diferencias de funciones
1. Contraste: Cuanto mayor sea el contraste, más clara y llamativa será la imagen, y más vivos y coloridos serán los colores, de lo contrario toda la imagen será gris. El alto contraste es muy útil para la claridad de la imagen, la expresión de detalles y la expresión en escala de grises.
2. Saturación: La croma está relacionada con la intensidad de las líneas fotométricas y la distribución de intensidad en diferentes longitudes de onda. La cromaticidad más alta generalmente se logra con luz intensa de una sola longitud de onda. Cuando la distribución de longitudes de onda permanece sin cambios, cuanto más débil es la intensidad de la luz, menor es la cromaticidad.


Función : Desenfoque de imagen
Parámetro: radio > 0, radio de desenfoque. Si es un solo número, la conversión predeterminada es [1, radio_entrada_valor]. Intervalo predeterminado [3, 10]
alias_blur >= 0, parámetro sigma del desenfoque gaussiano. Si es un solo número, la conversión predeterminada es [0, alias_blur input_value]. El parámetro de intervalo predeterminado [0.1, 0.5]
afecta:Cuanto mayor sea el parámetro del radio, mayor será el grado de desenfoque. El parámetro alias_blur cambia y los cambios percibidos a simple vista son muy pequeños.

# source code
class Defocus(ImageOnlyTransform):
    """
    Apply defocus transform. See https://arxiv.org/abs/1903.12261.

    Args: 
        radius ((int, int) or int): range for radius of defocusing.
            If limit is a single int, the range will be [1, limit]. Default: (3, 10).
        alias_blur ((float, float) or float): range for alias_blur of defocusing (sigma of gaussian blur).
            If limit is a single float, the range will be (0, limit). Default: (0.1, 0.5).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        Any
    """

    def __init__(
        self,
        radius: ScaleIntType = (3, 10),
        alias_blur: ScaleFloatType = (0.1, 0.5),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.radius = to_tuple(radius, low=1)
        self.alias_blur = to_tuple(alias_blur, low=0)

        if self.radius[0] <= 0:
            raise ValueError("Parameter radius must be positive")

        if self.alias_blur[0] < 0:
            raise ValueError("Parameter alias_blur must be non-negative")

    def apply(self, img: np.ndarray, radius: int = 3, alias_blur: float = 0.5, **params) -> np.ndarray:
        return F.defocus(img, radius, alias_blur)

    def get_params(self) -> Dict[str, Any]:
        return {
    
    
            "radius": random_utils.randint(self.radius[0], self.radius[1] + 1),
            "alias_blur": random_utils.uniform(self.alias_blur[0], self.alias_blur[1]),
        }

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("radius", "alias_blur")

cambios de parámetro de radio:
Insertar descripción de la imagen aquí
cambios de parámetro alias_blur:
Insertar descripción de la imagen aquí


Función : Reduce la calidad de la imagen reduciendo primero la resolución y luego aumentando la resolución. El tamaño de la imagen no cambia antes y después de la transformación.
Parámetros: 0 < scale_min <= scale_max < 1, que indica la ampliación de la escala de la imagen. Equivalente al parámetro de escala en la función de cambio de tamaño.
la interpolación puede especificar el método de escala, el método vecino más cercano predeterminado: cv2.INTER_NEAREST. Hay tres formas de especificarlo; consulte la descripción de los argumentos en el código fuente a continuación.

# interpolation 参数举例:
# 方法一:表示下采样和上采样均使用NEAREST方法
interpolation = cv2.INTER_NEAREST 
# 方法二:表示下采样使用最近邻差值,上采样使用双线性差值
interpolation = dict(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_LINEAR) 
# 方法三:下采样使用AREA方法,上采样使用CUBIC方法
interpolation = Downscale.Interpolation(downscale=cv2.INTER_AREA, upscale=cv2.INTER_CUBIC) 

Opciones de interpolación:
INTER_NEARESTinterpolación del vecino más cercano
INTER_LINEARLa interpolación bilineal (predeterminada)
INTER_AREAutiliza relaciones de área de píxeles para el remuestreo. Probablemente sea el método preferido para reducir la resolución de imágenes, ya que produce resultados de textura sin nubes.
Pero cuando se aumenta el muestreo de la imagen, es similar al método INTER_NEAREST.
INTER_CUBICInterpolación bicúbica para vecindades de 4x4 píxeles
INTER_LANCZOS4Interpolación de Lanczos para vecindades de 8x8 píxeles

# source code
class Downscale(ImageOnlyTransform):
    """Decreases image quality by downscaling and upscaling back.

    Args:
        scale_min (float): lower bound on the image scale. Should be < 1.
        scale_max (float):  lower bound on the image scale. Should be .
        interpolation: cv2 interpolation method. Could be:
            - single cv2 interpolation flag - selected method will be used for downscale and upscale.
            - dict(downscale=flag, upscale=flag)
            - Downscale.Interpolation(downscale=flag, upscale=flag) -
            Default: Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)

    Targets:
        image

    Image types:
        uint8, float32
    """

    class Interpolation:
        def __init__(self, *, downscale: int = cv2.INTER_NEAREST, upscale: int = cv2.INTER_NEAREST):
            self.downscale = downscale
            self.upscale = upscale

    def __init__(
        self,
        scale_min: float = 0.25,
        scale_max: float = 0.25,
        interpolation: Optional[Union[int, Interpolation, Dict[str, int]]] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super(Downscale, self).__init__(always_apply, p)
        if interpolation is None:
            self.interpolation = self.Interpolation(downscale=cv2.INTER_NEAREST, upscale=cv2.INTER_NEAREST)
            warnings.warn(
                "Using default interpolation INTER_NEAREST, which is sub-optimal."
                "Please specify interpolation mode for downscale and upscale explicitly."
                "For additional information see this PR https://github.com/albumentations-team/albumentations/pull/584"
            )
        elif isinstance(interpolation, int):
            self.interpolation = self.Interpolation(downscale=interpolation, upscale=interpolation)
        elif isinstance(interpolation, self.Interpolation):
            self.interpolation = interpolation
        elif isinstance(interpolation, dict):
            self.interpolation = self.Interpolation(**interpolation)
        else:
            raise ValueError(
                "Wrong interpolation data type. Supported types: `Optional[Union[int, Interpolation, Dict[str, int]]]`."
                f" Got: {
      
      type(interpolation)}"
            )

        if scale_min > scale_max:
            raise ValueError("Expected scale_min be less or equal scale_max, got {} {}".format(scale_min, scale_max))
        if scale_max >= 1:
            raise ValueError("Expected scale_max to be less than 1, got {}".format(scale_max))
        self.scale_min = scale_min
        self.scale_max = scale_max

    def apply(self, img: np.ndarray, scale: Optional[float] = None, **params) -> np.ndarray:
        return F.downscale(
            img,
            scale=scale,
            down_interpolation=self.interpolation.downscale,
            up_interpolation=self.interpolation.upscale,
        )

    def get_params(self) -> Dict[str, Any]:
        return {
    
    "scale": random.uniform(self.scale_min, self.scale_max)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return "scale_min", "scale_max"

    def _to_dict(self) -> Dict[str, Any]:
        result = super()._to_dict()
        result["interpolation"] = {
    
    "upscale": self.interpolation.upscale, "downscale": self.interpolation.downscale}
        return result

Para facilitar la visualización, la escala se establece en 0,1. Los siguientes son los resultados de inicializar y especificar diferentes métodos de interpolación de tres maneras:
Insertar descripción de la imagen aquí

# demo code
import cv2
import matplotlib.pyplot as plt
import albumentations as A

if __name__ == "__main__":
    filename = '0'
    title_key = 'scale_method'
    src_img = cv2.imread(f'imgs/{
      
      filename}.jpg')
    dst_path = f'imgs/{
      
      filename}_aug.jpg'
    transform1 = A.Downscale(scale_min=0.1,
                             scale_max=0.1,
                             interpolation=cv2.INTER_NEAREST,
                             p=1)
    transform2 = A.Downscale(scale_min=0.1,
                             scale_max=0.1,
                             interpolation=dict(downscale=cv2.INTER_LINEAR,
                                                upscale=cv2.INTER_LINEAR),
                             p=1)
    transform3 = A.Downscale(scale_min=0.1,
                             scale_max=0.1,
                             interpolation=A.Downscale.Interpolation(
                                 downscale=cv2.INTER_AREA,
                                 upscale=cv2.INTER_AREA),
                             p=1)
    img_aug1 = transform1(image=src_img)['image']
    img_aug2 = transform2(image=src_img)['image']
    img_aug3 = transform3(image=src_img)['image']

    param1 = 'INTER_NEAREST'
    param2 = 'INTER_LINEAR'
    param3 = 'INTER_AREA'

    fontsize = 10
    plt.subplot(221)
    plt.axis('off')
    plt.title('src', fontdict={
    
    'fontsize': fontsize})
    plt.imshow(src_img[:, :, ::-1])
    plt.subplot(222)
    plt.axis('off')
    plt.title(f'{
      
      title_key}={
      
      param1}', fontdict={
    
    'fontsize': fontsize})
    plt.imshow(img_aug1[:, :, ::-1])
    plt.subplot(223)
    plt.axis('off')
    plt.title(f'{
      
      title_key}={
      
      param2}', fontdict={
    
    'fontsize': fontsize})
    plt.imshow(img_aug2[:, :, ::-1])
    plt.subplot(224)
    plt.axis('off')
    plt.title(f'{
      
      title_key}={
      
      param3}', fontdict={
    
    'fontsize': fontsize})
    plt.imshow(img_aug3[:, :, ::-1])
    plt.savefig(dst_path)

Función :
Descripción del parámetro de efecto de relieve de superposición
: alfa ((flotante, flotante)): ajusta la visibilidad de la imagen en relieve. Cuando es 0, solo se retiene la imagen original. Cuando es 1.0, solo se retiene la imagen en relieve.
result = (1 - alpha) * src_image + alpha * emboss_image
fuerza ((flotación, flotación)):
El parámetro alfa de la fuerza del relieve tiene un impacto mayor que el parámetro de fuerza.

# source code
class Emboss(ImageOnlyTransform):
    """Emboss the input image and overlays the result with the original image.

    Args:
        alpha ((float, float)): range to choose the visibility of the embossed image. At 0, only the original image is
            visible,at 1.0 only its embossed version is visible. Default: (0.2, 0.5).
        strength ((float, float)): strength range of the embossing. Default: (0.2, 0.7).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image
    """

    def __init__(self, alpha=(0.2, 0.5), strength=(0.2, 0.7), always_apply=False, p=0.5):
        super(Emboss, self).__init__(always_apply, p)
        self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
        self.strength = self.__check_values(to_tuple(strength, 0.0), name="strength")

    @staticmethod
    def __check_values(value, name, bounds=(0, float("inf"))):
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError("{} values should be between {}".format(name, bounds))
        return value

    @staticmethod
    def __generate_emboss_matrix(alpha_sample, strength_sample):
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]], dtype=np.float32)
        matrix_effect = np.array(
            [
                [-1 - strength_sample, 0 - strength_sample, 0],
                [0 - strength_sample, 1, 0 + strength_sample],
                [0, 0 + strength_sample, 1 + strength_sample],
            ],
            dtype=np.float32,
        )
        matrix = (1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect
        return matrix

    def get_params(self):
        alpha = random.uniform(*self.alpha)
        strength = random.uniform(*self.strength)
        emboss_matrix = self.__generate_emboss_matrix(alpha_sample=alpha, strength_sample=strength)
        return {
    
    "emboss_matrix": emboss_matrix}

    def apply(self, img, emboss_matrix=None, **params):
        return F.convolve(img, emboss_matrix) # 卷积

    def get_transform_init_args_names(self):
        return ("alpha", "strength")

La siguiente es una comparación de los resultados de la visualización: el efecto del parámetro alfa es más obvio que el del parámetro de fuerza.
Insertar descripción de la imagen aquí


Función :
Descripción del parámetro de ecualización de histograma : modo (cadena): {'cv', 'pil'}. Elija utilizar el método de ecualización OpenCV o Pillow.
by_channels (bool): si es Verdadero, significa realizar la ecualización del histograma en cada canal por separado; si es Falso, significa convertir la imagen al formato YCbCr y luego realizar la ecualización del histograma en el canal Y. Valor predeterminado: Máscara verdadera
(np.ndarray, invocable): si se proporciona este parámetro, significa que solo se transformará la cobertura de la máscara.
mask_params (lista de str): parámetros para la función de máscara.

Nota: By_channels está configurado en False, el efecto es más natural y la diferencia de tono es menor.

# source code
class Equalize(ImageOnlyTransform):
    """Equalize the image histogram.

    Args:
        mode (str): {'cv', 'pil'}. Use OpenCV or Pillow equalization method.
        by_channels (bool): If True, use equalization by channels separately,
            else convert image to YCbCr representation and use equalization by `Y` channel.
        mask (np.ndarray, callable): If given, only the pixels selected by
            the mask are included in the analysis. Maybe 1 channel or 3 channel array or callable.
            Function signature must include `image` argument.
        mask_params (list of str): Params for mask function.

    Targets:
        image

    Image types:
        uint8
    """

    def __init__(
            self,
            mode="cv",
            by_channels=True,
            mask=None,
            mask_params=(),
            always_apply=False,
            p=0.5,
    ):
        modes = ["cv", "pil"]
        if mode not in modes:
            raise ValueError("Unsupported equalization mode. Supports: {}. "
                             "Got: {}".format(modes, mode))

        super(Equalize, self).__init__(always_apply, p)
        self.mode = mode
        self.by_channels = by_channels
        self.mask = mask
        self.mask_params = mask_params

    def apply(self, image, mask=None, **params):
        return F.equalize(image,
                          mode=self.mode,
                          by_channels=self.by_channels,
                          mask=mask)

    def get_params_dependent_on_targets(self, params):
        if not callable(self.mask):
            return {
    
    "mask": self.mask}

        return {
    
    "mask": self.mask(**params)}

    @property
    def targets_as_params(self):
        return ["image"] + list(self.mask_params)

    def get_transform_init_args_names(self):
        return ("mode", "by_channels")

Insertar descripción de la imagen aquí

Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


Función : Adaptación del dominio de Fourier de https://github.com/YanchaoYang/FDA ), para lograr
una descripción del parámetro de migración de estilo simple:
reference_images (List[str] o List(np.ndarray)): Referencia Una lista de imágenes o una lista de recorridos de imágenes. Si se proporcionan varias imágenes de referencia (la longitud de la lista es mayor que 1), se seleccionará aleatoriamente un estilo de imagen para la transformación.
beta_limit (flotante o tupla de flotante): se recomienda que el coeficiente en el documento sea inferior a 0,3 y el valor predeterminado es 0,1.
read_fn (Invocable): función invocable para leer imágenes y devolver un formato de matriz numpy. El valor predeterminado es read_rgb_image.

# 默认读图函数,对应的reference_images参数应为路径列表:
def read_rgb_image(path):
    image = cv2.imread(path, cv2.IMREAD_COLOR)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# 若参考图像已经是numpy array格式,read_fn函数恒等读入即可(lambda x: x):
target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
aug = A.FDA([target_image], read_fn=lambda x: x)

Insertar descripción de la imagen aquí

class FDA(ImageOnlyTransform):
    """
    Fourier Domain Adaptation from https://github.com/YanchaoYang/FDA
    Simple "style transfer".

    Args:
        reference_images (List[str] or List(np.ndarray)): List of file paths for reference images
            or list of reference images.
        beta_limit (float or tuple of float): coefficient beta from paper. Recommended less 0.3.
        read_fn (Callable): Used-defined function to read image. Function should get image path and return numpy
            array of image pixels.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
        https://github.com/YanchaoYang/FDA
        https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_FDA_Fourier_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2020_paper.pdf

    Example:
        >>> import numpy as np
        >>> import albumentations as A
        >>> image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
        >>> aug = A.Compose([A.FDA([target_image], p=1, read_fn=lambda x: x)])
        >>> result = aug(image=image)

    """

    def __init__(
        self,
        reference_images: List[Union[str, np.ndarray]],
        beta_limit=0.1,
        read_fn=read_rgb_image,
        always_apply=False,
        p=0.5,
    ):
        super(FDA, self).__init__(always_apply=always_apply, p=p)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.beta_limit = to_tuple(beta_limit, low=0)

    def apply(self, img, target_image=None, beta=0.1, **params):
        return fourier_domain_adaptation(img=img, target_img=target_image, beta=beta)

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        target_img = self.read_fn(random.choice(self.reference_images))
        target_img = cv2.resize(target_img, dsize=(img.shape[1], img.shape[0]))

        return {
    
    "target_image": target_img}

    def get_params(self):
        return {
    
    "beta": random.uniform(self.beta_limit[0], self.beta_limit[1])}

    @property
    def targets_as_params(self):
        return ["image"]

    def get_transform_init_args_names(self):
        return ("reference_images", "beta_limit", "read_fn")

    def _to_dict(self):
        raise NotImplementedError("FDA can not be serialized.")

Resultados de ejecutar con imágenes existentes ( beta_limit=0.1 ):
Insertar descripción de la imagen aquí
Resultados en el proyecto oficial:
Insertar descripción de la imagen aquí


Características: Las imágenes RGB se mejoran en color a través de FancyPCA. FancyPCA tiene menos distorsión de color.
Descripción del parámetro:
alfa (flotante): el grado de perturbación que afecta los valores propios y los vectores propios.

class FancyPCA(ImageOnlyTransform):
    """Augment RGB image using FancyPCA from Krizhevsky's paper
    "ImageNet Classification with Deep Convolutional Neural Networks"

    Args:
        alpha (float):  how much to perturb/scale the eigen vecs and vals.
            scale is samples from gaussian distribution (mu=0, sigma=alpha)

    Targets:
        image

    Image types:
        3-channel uint8 images only

    Credit:
        http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
        https://deshanadesai.github.io/notes/Fancy-PCA-with-Scikit-Image
        https://pixelatedbrian.github.io/2018-04-29-fancy_pca/
    """

    def __init__(self, alpha=0.1, always_apply=False, p=0.5):
        super(FancyPCA, self).__init__(always_apply=always_apply, p=p)
        self.alpha = alpha

    def apply(self, img, alpha=0.1, **params):
        img = F.fancy_pca(img, alpha)
        return img

    def get_params(self):
        return {
    
    "alpha": random.gauss(0, self.alpha)}

    def get_transform_init_args_names(self):
        return ("alpha", )

Insertar descripción de la imagen aquí
Adjunto el resultado de visualización del sitio web oficial: https://pixelatedbrian.github.io/2018-04-29-fancy_pca/
Los siguientes son los resultados de la transformación de tres escenas: la columna del medio es el resultado de FancyPCA con una distorsión de color muy pequeña.
Insertar descripción de la imagen aquí


Función : multiplica el valor del píxel por el valor máximo para cambiar la imagen de punto flotante a entero.

La función opuesta es ToFloat, que divide por el valor máximo y cambia de entero a punto flotante ([0, 1.0])

# source code
class FromFloat(ImageOnlyTransform):
    """Take an input array where all values should lie in the range [0, 1.0], multiply them by `max_value` and then
    cast the resulted value to a type specified by `dtype`. If `max_value` is None the transform will try to infer
    the maximum value for the data type from the `dtype` argument.

    This is the inverse transform for :class:`~albumentations.augmentations.transforms.ToFloat`.

    Args:
        max_value (float): maximum possible input value. Default: None.
        dtype (string or numpy data type): data type of the output. See the `'Data types' page from the NumPy docs`_.
            Default: 'uint16'.
        p (float): probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        float32

    .. _'Data types' page from the NumPy docs:
       https://docs.scipy.org/doc/numpy/user/basics.types.html
    """

    def __init__(self, dtype="uint16", max_value=None, always_apply=False, p=1.0):
        super(FromFloat, self).__init__(always_apply, p)
        self.dtype = np.dtype(dtype)
        self.max_value = max_value

    def apply(self, img, **params):
        return F.from_float(img, self.dtype, self.max_value)

    def get_transform_init_args(self):
        return {
    
    "dtype": self.dtype.name, "max_value": self.max_value}
# F.from_float()
def from_float(img, dtype, max_value=None):
    if max_value is None:
        try:
            max_value = MAX_VALUES_BY_DTYPE[dtype]
        except KeyError:
            raise RuntimeError(
                "Can't infer the maximum value for dtype {}. You need to specify the maximum value manually by "
                "passing the max_value argument".format(dtype)
            )
    return (img * max_value).astype(dtype)
# MAX_VALUES_BY_DTYPE = {
    
    
#     np.dtype("uint8"): 255,
#     np.dtype("uint16"): 65535,
#     np.dtype("uint32"): 4294967295,
#     np.dtype("float32"): 1.0,
# }


Función: Agregar
descripción del parámetro de ruido gaussiano:
var_limit ((float, float) o float): rango de variación de ruido. Si es un valor flotante único, se convertirá en un rango de intervalo (0, var_limit). Valor predeterminado: (10.0 , 50.0).Mean
(flotante): ruido medio.Valor predeterminado: 0
por canal (bool): si cada canal se muestrea de forma independiente. Valor predeterminado: Verdadero

# source code
class GaussNoise(ImageOnlyTransform):
    """Apply gaussian noise to the input image.

    Args:
        var_limit ((float, float) or float): variance range for noise. If var_limit is a single float, the range
            will be (0, var_limit). Default: (10.0, 50.0).
        mean (float): mean of the noise. Default: 0
        per_channel (bool): if set to True, noise will be sampled for each channel independently.
            Otherwise, the noise will be sampled once for all channels. Default: True
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(self, var_limit=(10.0, 50.0), mean=0, per_channel=True, always_apply=False, p=0.5):
        super(GaussNoise, self).__init__(always_apply, p)
        if isinstance(var_limit, (tuple, list)):
            if var_limit[0] < 0:
                raise ValueError("Lower var_limit should be non negative.")
            if var_limit[1] < 0:
                raise ValueError("Upper var_limit should be non negative.")
            self.var_limit = var_limit
        elif isinstance(var_limit, (int, float)):
            if var_limit < 0:
                raise ValueError("var_limit should be non negative.")

            self.var_limit = (0, var_limit)
        else:
            raise TypeError(
                "Expected var_limit type to be one of (int, float, tuple, list), got {}".format(type(var_limit))
            )

        self.mean = mean
        self.per_channel = per_channel

    def apply(self, img, gauss=None, **params):
        return F.gauss_noise(img, gauss=gauss)

    def get_params_dependent_on_targets(self, params):
        image = params["image"]
        var = random.uniform(self.var_limit[0], self.var_limit[1])
        sigma = var ** 0.5
        random_state = np.random.RandomState(random.randint(0, 2 ** 32 - 1))

        if self.per_channel:
            gauss = random_state.normal(self.mean, sigma, image.shape)
        else:
            gauss = random_state.normal(self.mean, sigma, image.shape[:2])
            if len(image.shape) == 3:
                gauss = np.expand_dims(gauss, -1)

        return {
    
    "gauss": gauss}

    @property
    def targets_as_params(self):
        return ["image"]

    def get_transform_init_args_names(self):
        return ("var_limit", "per_channel", "mean")

Cuanto mayor sea el valor de var_limit, más evidente será el ruido.
Insertar descripción de la imagen aquí


Función : Desenfoque la imagen con filtro gaussiano.
Descripción del parámetro :

  • Blur_limit (int, (int, int)): el tamaño máximo del núcleo gaussiano de la imagen borrosa. Debe ser 0 o un número impar, rango de valores válido: [0, inf).
    Si el valor es 0, el tamaño se calculará en función del valor de sigma y la fórmula de cálculo es round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1:
    Si el parámetro es un valor único, se convertirá en (0, blur_limit)un valor aleatorio dentro del rango.
    Valor predeterminado: (3, 7)
  • sigma_limit (float, (float, float)): desviación estándar del núcleo gaussiano, rango de valores válido: [0, inf).
    Si el parámetro es un valor único, se convertirá en (0, sigma_limit)un valor aleatorio dentro del rango.
    Si el valor es 0, el tamaño se calculará en función del valor de ksize y la fórmula de cálculo es sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8.
    Valor predeterminado: 0
    Si blur_limit y sigma_limit son ambos 0, el valor de blur_limit se modificará a 3.
# source code
class GaussianBlur(ImageOnlyTransform):
    """Blur the input image using a Gaussian filter with a random kernel size.

    Args:
        blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        blur_limit: ScaleIntType = (3, 7),
        sigma_limit: ScaleFloatType = 0,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.blur_limit = to_tuple(blur_limit, 0)
        self.sigma_limit = to_tuple(sigma_limit if sigma_limit is not None else 0, 0)

        if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
            self.blur_limit = 3, max(3, self.blur_limit[1])
            warnings.warn(
                "blur_limit and sigma_limit minimum value can not be both equal to 0. "
                "blur_limit minimum value changed to 3."
            )

        if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
            self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
        ):
            raise ValueError("GaussianBlur supports only odd blur limits.")

    def apply(self, img: np.ndarray, ksize: int = 3, sigma: float = 0, **params) -> np.ndarray:
        return F.gaussian_blur(img, ksize, sigma=sigma)

    def get_params(self) -> Dict[str, float]:
        ksize = random.randrange(self.blur_limit[0], self.blur_limit[1] + 1)
        if ksize != 0 and ksize % 2 != 1:
            ksize = (ksize + 1) % (self.blur_limit[1] + 1)

        return {
    
    "ksize": ksize, "sigma": random.uniform(*self.sigma_limit)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("blur_limit", "sigma_limit")

Efectos de desenfoque de diferentes tamaños de núcleo gaussiano (sigma toma el valor predeterminado 0, calculado en función de ksize):
Insertar descripción de la imagen aquí


Función : Añade ruido de cristal.
Descripción de parámetros:

  • sigma (flotante): la desviación estándar del núcleo gaussiano. Valor predeterminado: 0,7
  • max_delta (int): distancia máxima para el intercambio de píxeles. Valor predeterminado: 4
  • iteraciones (int): número de repeticiones, rango de valores válido: [1, inf) Valor predeterminado: 2
  • mode (str): modo de cálculo (rápido o exacto), valor predeterminado: rápido. afectar la eficiencia operativa.
# source code
class GlassBlur(Blur):
    """Apply glass noise to the input image.

    Args:
        sigma (float): standard deviation for Gaussian kernel.
        max_delta (int): max distance between pixels which are swapped.
        iterations (int): number of repeats.
            Should be in range [1, inf). Default: (2).
        mode (str): mode of computation: fast or exact. Default: "fast".
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
    |  https://arxiv.org/abs/1903.12261
    |  https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
    """

    def __init__(
        self,
        sigma: float = 0.7,
        max_delta: int = 4,
        iterations: int = 2,
        always_apply: bool = False,
        mode: str = "fast",
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        if iterations < 1:
            raise ValueError(f"Iterations should be more or equal to 1, but we got {
      
      iterations}")

        if mode not in ["fast", "exact"]:
            raise ValueError(f"Mode should be 'fast' or 'exact', but we got {
      
      mode}")

        self.sigma = sigma
        self.max_delta = max_delta
        self.iterations = iterations
        self.mode = mode

    def apply(self, img: np.ndarray, dxy: np.ndarray = None, **params) -> np.ndarray:  # type: ignore
        assert dxy is not None
        return F.glass_blur(img, self.sigma, self.max_delta, self.iterations, dxy, self.mode)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, np.ndarray]:
        img = params["image"]

        # generate array containing all necessary values for transformations
        width_pixels = img.shape[0] - self.max_delta * 2
        height_pixels = img.shape[1] - self.max_delta * 2
        total_pixels = width_pixels * height_pixels
        dxy = random_utils.randint(-self.max_delta, self.max_delta, size=(total_pixels, self.iterations, 2))

        return {
    
    "dxy": dxy}

    def get_transform_init_args_names(self) -> Tuple[str, str, str]:
        return ("sigma", "max_delta", "iterations")

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

Cuanto mayores sean los valores de los parámetros max_delta e iteraciones, más fuerte será el efecto de vidrio esmerilado.
Insertar descripción de la imagen aquí


Función : Coincidencia de histograma. Ajusta los valores de píxeles de la imagen de entrada para que su histograma coincida con el histograma de la imagen de referencia. Cada canal se ejecuta de forma independiente y se requiere que el número de canales de la imagen de entrada y la imagen de referencia sean consistentes.
La coincidencia de histogramas puede servir como una normalización ligera para el procesamiento de imágenes (por ejemplo, coincidencia de características), especialmente cuando las imágenes tienen diferentes orígenes o condiciones (por ejemplo, iluminación).

Descripción del parámetro: (Los parámetros son similares a los parámetros de transformación FDA, p = 0,5 en FDA y p = 1 predeterminado en HistogramMatching)

  • reference_images (List[str] o List(np.ndarray)): lista de imágenes de referencia o lista de rutas de imágenes. Si se proporcionan varias imágenes de referencia (la longitud de la lista es mayor que 1), se seleccionará aleatoriamente un estilo de imagen para la transformación.

  • blend_ratio (flotante, flotante): el factor de ponderación para la superposición ponderada de la imagen original y la imagen transformada. blend_ratio_samplees el factor de peso de la imagen coincidente del histograma y el factor de peso de la imagen original es 1 - blend_ratio_sample.

    	img = cv2.addWeighted(
    	    matched,
    	    blend_ratio,
    	    img,
    	    1 - blend_ratio,
    	    0,
    	    dtype=get_opencv_dtype_from_numpy(img.dtype),
    	)
    
  • read_fn (Invocable): función invocable para leer imágenes y devolver un formato de matriz numpy. El valor predeterminado es read_rgb_image.

# 默认读图函数,对应的reference_images参数应为路径列表:
def read_rgb_image(path):
    image = cv2.imread(path, cv2.IMREAD_COLOR)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# 若参考图像已经是numpy array格式,read_fn函数恒等读入即可(lambda x: x):
target_image = np.random.randint(0, 256, [100, 100, 3], dtype=np.uint8)
aug = A.HistogramMatching([target_image], read_fn=lambda x: x)
# source code
class HistogramMatching(ImageOnlyTransform):
    """
    Apply histogram matching. It manipulates the pixels of an input image so that its histogram matches
    the histogram of the reference image. If the images have multiple channels, the matching is done independently
    for each channel, as long as the number of channels is equal in the input image and the reference.

    Histogram matching can be used as a lightweight normalisation for image processing,
    such as feature matching, especially in circumstances where the images have been taken from different
    sources or in different conditions (i.e. lighting).

    See:
        https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html

    Args:
        reference_images (List[str] or List(np.ndarray)): List of file paths for reference images
            or list of reference images.
        blend_ratio (float, float): Tuple of min and max blend ratio. Matched image will be blended with original
            with random blend factor for increased diversity of generated images.
        read_fn (Callable): Used-defined function to read image. Function should get image path and return numpy
            array of image pixels.
        p (float): probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        uint8, uint16, float32
    """

    def __init__(
        self,
        reference_images: List[Union[str, np.ndarray]],
        blend_ratio=(0.5, 1.0),
        read_fn=read_rgb_image,
        always_apply=False,
        p=0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.reference_images = reference_images
        self.read_fn = read_fn
        self.blend_ratio = blend_ratio

    def apply(self, img, reference_image=None, blend_ratio=0.5, **params):
        return apply_histogram(img, reference_image, blend_ratio)

    def get_params(self):
        return {
    
    
            "reference_image": self.read_fn(random.choice(self.reference_images)),
            "blend_ratio": random.uniform(self.blend_ratio[0], self.blend_ratio[1]),
        }

    def get_transform_init_args_names(self):
        return ("reference_images", "blend_ratio", "read_fn")

    def _to_dict(self):
        raise NotImplementedError("HistogramMatching can not be serialized.")

Puede ver que después de utilizar la imagen del medio como objetivo, la imagen transformada también es verdosa.
Insertar descripción de la imagen aquí

Fuente de la siguiente imagen: https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_histogram_matching.html
Insertar descripción de la imagen aquí


Función : cambia aleatoriamente el tono, la saturación y el brillo de la imagen.
Descripción del parámetro: hue_shift_limit, sat_shift_limit, val_shift_limit representan el rango de cambio de tono, saturación y brillo respectivamente. Si la entrada es un número único, se convertirá en un intervalo ( -input_val, input_val)y el valor se seleccionará aleatoriamente dentro de este intervalo.

Si la tarea es sensible al color, el rango hue_shift_limit debería ser menor.

# source code
class HueSaturationValue(ImageOnlyTransform):
    """Randomly change hue, saturation and value of the input image.

    Args:
        hue_shift_limit ((int, int) or int): range for changing hue. If hue_shift_limit is a single int, the range
            will be (-hue_shift_limit, hue_shift_limit). Default: (-20, 20).
        sat_shift_limit ((int, int) or int): range for changing saturation. If sat_shift_limit is a single int,
            the range will be (-sat_shift_limit, sat_shift_limit). Default: (-30, 30).
        val_shift_limit ((int, int) or int): range for changing value. If val_shift_limit is a single int, the range
            will be (-val_shift_limit, val_shift_limit). Default: (-20, 20).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        hue_shift_limit=20,
        sat_shift_limit=30,
        val_shift_limit=20,
        always_apply=False,
        p=0.5,
    ):
        super(HueSaturationValue, self).__init__(always_apply, p)
        self.hue_shift_limit = to_tuple(hue_shift_limit)
        self.sat_shift_limit = to_tuple(sat_shift_limit)
        self.val_shift_limit = to_tuple(val_shift_limit)

    def apply(self, image, hue_shift=0, sat_shift=0, val_shift=0, **params):
        if not is_rgb_image(image) and not is_grayscale_image(image):
            raise TypeError(
                "HueSaturationValue transformation expects 1-channel or 3-channel images."
            )
        return F.shift_hsv(image, hue_shift, sat_shift, val_shift)

    def get_params(self):
        return {
    
    
            "hue_shift":
            random.uniform(self.hue_shift_limit[0], self.hue_shift_limit[1]),
            "sat_shift":
            random.uniform(self.sat_shift_limit[0], self.sat_shift_limit[1]),
            "val_shift":
            random.uniform(self.val_shift_limit[0], self.val_shift_limit[1]),
        }

    def get_transform_init_args_names(self):
        return ("hue_shift_limit", "sat_shift_limit", "val_shift_limit")

Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


Función : Añade ruido del sensor de la cámara.
Descripción del parámetro: color_shift (flotante, flotante): rango de cambio de tono.
intensidad ((float, float): Factor multiplicador que controla la intensidad del color y el ruido de luminancia.

# source code
class ISONoise(ImageOnlyTransform):
    """
    Apply camera sensor noise.

    Args:
        color_shift (float, float): variance range for color hue change.
            Measured as a fraction of 360 degree Hue angle in HLS colorspace.
        intensity ((float, float): Multiplicative factor that control strength
            of color and luminace noise.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8
    """

    def __init__(self,
                 color_shift=(0.01, 0.05),
                 intensity=(0.1, 0.5),
                 always_apply=False,
                 p=0.5):
        super(ISONoise, self).__init__(always_apply, p)
        self.intensity = intensity
        self.color_shift = color_shift

    def apply(self,
              img,
              color_shift=0.05,
              intensity=1.0,
              random_state=None,
              **params):
        return F.iso_noise(img, color_shift, intensity,
                           np.random.RandomState(random_state))

    def get_params(self):
        return {
    
    
            "color_shift": random.uniform(self.color_shift[0],
                                          self.color_shift[1]),
            "intensity": random.uniform(self.intensity[0], self.intensity[1]),
            "random_state": random.randint(0, 65536),
        }

    def get_transform_init_args_names(self):
        return ("intensity", "color_shift")

Para una visualización obvia, la configuración de los parámetros es mayor.
El parámetro de entrada es un intervalo, por lo que color_shift=0.02 en la figura significa color_shift=(0.02, 0.02) al llamar.
Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


JpegCompression ha quedado obsoleto y tiene la misma función que ImageCompression.
Función :
descripción del parámetro de compresión de imágenes en formato jpg y webp : calidad_inferior (flotante): la calidad más baja de la imagen. jpg en [0, 100], webp en [1, 100]. calidad_superior
(flotante): la calidad más alta de la imagen .jpg en [0, 100], webp en [1, 100].compression_type
(ImageCompressionType): tipo de compresión, con dos opciones integradas: ImageCompressionType.JPEG o ImageCompressionType.WEBP. Tipo predeterminado: ImageCompressionType.JPEG
La resolución no cambiar antes y después de la compresión.

# source code
class ImageCompression(ImageOnlyTransform):
    """Decrease Jpeg, WebP compression of an image.

    Args:
        quality_lower (float): lower bound on the image quality.
                               Should be in [0, 100] range for jpeg and [1, 100] for webp.
        quality_upper (float): upper bound on the image quality.
                               Should be in [0, 100] range for jpeg and [1, 100] for webp.
        compression_type (ImageCompressionType): should be ImageCompressionType.JPEG or ImageCompressionType.WEBP.
            Default: ImageCompressionType.JPEG

    Targets:
        image

    Image types:
        uint8, float32
    """

    class ImageCompressionType(IntEnum):
        JPEG = 0
        WEBP = 1

    def __init__(
        self,
        quality_lower=99,
        quality_upper=100,
        compression_type=ImageCompressionType.JPEG,
        always_apply=False,
        p=0.5,
    ):
        super(ImageCompression, self).__init__(always_apply, p)

        self.compression_type = ImageCompression.ImageCompressionType(
            compression_type)
        low_thresh_quality_assert = 0

        if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
            low_thresh_quality_assert = 1

        if not low_thresh_quality_assert <= quality_lower <= 100:
            raise ValueError(
                "Invalid quality_lower. Got: {}".format(quality_lower))
        if not low_thresh_quality_assert <= quality_upper <= 100:
            raise ValueError(
                "Invalid quality_upper. Got: {}".format(quality_upper))

        self.quality_lower = quality_lower
        self.quality_upper = quality_upper

    def apply(self, image, quality=100, image_type=".jpg", **params):
        if not image.ndim == 2 and image.shape[-1] not in (1, 3, 4):
            raise TypeError(
                "ImageCompression transformation expects 1, 3 or 4 channel images."
            )
        return F.image_compression(image, quality, image_type)

    def get_params(self):
        image_type = ".jpg"

        if self.compression_type == ImageCompression.ImageCompressionType.WEBP:
            image_type = ".webp"

        return {
    
    
            "quality": random.randint(self.quality_lower, self.quality_upper),
            "image_type": image_type,
        }

    def get_transform_init_args(self):
        return {
    
    
            "quality_lower": self.quality_lower,
            "quality_upper": self.quality_upper,
            "compression_type": self.compression_type.value,
        }

Función : 255 - valor de píxeles

# F.invert(img)
def invert(img):
    return 255 - img

Insertar descripción de la imagen aquí


Función: utilice el filtrado mediano para lograr una imagen borrosa.
Descripción del parámetro:
blur_limit (int o Tuple [int, int]): tamaño del núcleo de desenfoque, los valores inicial y final del rango deben ser números impares. Intervalo válido: [3, inf), valor predeterminado: (3, 7)

# source code
class MedianBlur(Blur):
    """Blur the input image using a median filter with a random aperture linear size.

    Args:
        blur_limit (int): maximum aperture linear size for blurring the input image.
            Must be odd and in range [3, inf). Default: (3, 7).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(self, blur_limit: ScaleIntType = 7, always_apply: bool = False, p: float = 0.5):
        super().__init__(blur_limit, always_apply, p)

        if self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
            raise ValueError("MedianBlur supports only odd blur limits.")

    def apply(self, img: np.ndarray, ksize: int = 3, **params) -> np.ndarray:
        return F.median_blur(img, ksize)

Insertar descripción de la imagen aquí

Insertar descripción de la imagen aquí


Función: Aplicar desenfoque de movimiento a la imagen.
Descripción del parámetro:
blur_limit (int o Tuple [int, int]): tamaño del núcleo de desenfoque, los valores inicial y final del rango deben ser números impares. Rango válido: [3, inf), valor predeterminado: (3, 7)
enable_shifted (bool): si el núcleo tiene desplazamiento. Si es Verdadero, significa crear un núcleo sin desplazamiento. Si es Falso, el núcleo se desplazará aleatoriamente. . Valor predeterminado: Verdadero.

# source code
class MotionBlur(Blur):
    """Apply motion blur to the input image using a random-sized kernel.

    Args:
        blur_limit (int): maximum kernel size for blurring the input image.
            Should be in range [3, inf). Default: (3, 7).
        allow_shifted (bool): if set to true creates non shifted kernels only,
            otherwise creates randomly shifted kernels. Default: True.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        blur_limit: ScaleIntType = 7,
        allow_shifted: bool = True,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(blur_limit=blur_limit, always_apply=always_apply, p=p)
        self.allow_shifted = allow_shifted

        if not allow_shifted and self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1:
            raise ValueError(f"Blur limit must be odd when centered=True. Got: {
      
      self.blur_limit}")

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return super().get_transform_init_args_names() + ("allow_shifted",)

    def apply(self, img: np.ndarray, kernel: np.ndarray = None, **params) -> np.ndarray:  # type: ignore
        return FMain.convolve(img, kernel=kernel)

    def get_params(self) -> Dict[str, Any]:
        ksize = random.choice(np.arange(self.blur_limit[0], self.blur_limit[1] + 1, 2))
        if ksize <= 2:
            raise ValueError("ksize must be > 2. Got: {}".format(ksize))
        kernel = np.zeros((ksize, ksize), dtype=np.uint8)
        x1, x2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)
        if x1 == x2:
            y1, y2 = random.sample(range(ksize), 2)
        else:
            y1, y2 = random.randint(0, ksize - 1), random.randint(0, ksize - 1)

        def make_odd_val(v1, v2):
            len_v = abs(v1 - v2) + 1
            if len_v % 2 != 1:
                if v2 > v1:
                    v2 -= 1
                else:
                    v1 -= 1
            return v1, v2

        if not self.allow_shifted:
            x1, x2 = make_odd_val(x1, x2)
            y1, y2 = make_odd_val(y1, y2)

            xc = (x1 + x2) / 2
            yc = (y1 + y2) / 2

            center = ksize / 2 - 0.5
            dx = xc - center
            dy = yc - center
            x1, x2 = [int(i - dx) for i in [x1, x2]]
            y1, y2 = [int(i - dy) for i in [y1, y2]]

        cv2.line(kernel, (x1, y1), (x2, y2), 1, thickness=1)

        # Normalize kernel
        return {
    
    "kernel": kernel.astype(np.float32) / np.sum(kernel)}

AvisoNo significa que cuanto mayor sea el valor de blur_limit, más borrosa será la imagen., Blur_limit solo representa el rango de valores de ksize. El núcleo de desenfoque en el código se muestrea dentro del rango de (0, ksize), por lo que el valor final de la muestra puede ser grande o pequeño.El valor de blur_limit solo representa el límite superior del grado de desenfoque..
Incluso si los parámetros de blur_limit son los mismos y el código se ejecuta varias veces, el grado de desenfoque en el gráfico de resultados será diferente, pero el gráfico de resultados con el mayor grado de desenfoque debe producirse en la función con el valor de blur_limit más grande.
Insertar descripción de la imagen aquí


Función : multiplica la imagen por un número aleatorio o una matriz.
Descripción del parámetro: multiplicador (flotante o tupla de flotantes): el número por el que se multiplica la imagen. Si la entrada es un intervalo, el factor multiplicador se muestreará [multiplier[0], multiplier[1])aleatoriamente dentro del intervalo. Valor predeterminado: (0.9, 1.1)
per_channel (bool): si se debe operar cada canal individualmente. Si es Verdadero, el factor multiplicador es diferente para cada canal. Predeterminado Falso
elemento por elemento (bool): si se trata de una operación a nivel de píxel. Si es Verdadero, el factor multiplicativo de cada píxel se genera aleatoriamente. Predeterminado Falso.

# source code
class MultiplicativeNoise(ImageOnlyTransform):
    """Multiply image to random number or array of numbers.

    Args:
        multiplier (float or tuple of floats): If single float image will be multiplied to this number.
            If tuple of float multiplier will be in range `[multiplier[0], multiplier[1])`. Default: (0.9, 1.1).
        per_channel (bool): If `False`, same values for all channels will be used.
            If `True` use sample values for each channels. Default False.
        elementwise (bool): If `False` multiply multiply all pixels in an image with a random value sampled once.
            If `True` Multiply image pixels with values that are pixelwise randomly sampled. Defaule: False.

    Targets:
        image

    Image types:
        Any
    """

    def __init__(
        self,
        multiplier=(0.9, 1.1),
        per_channel=False,
        elementwise=False,
        always_apply=False,
        p=0.5,
    ):
        super(MultiplicativeNoise, self).__init__(always_apply, p)
        self.multiplier = to_tuple(multiplier, multiplier)
        self.per_channel = per_channel
        self.elementwise = elementwise

    def apply(self, img, multiplier=np.array([1]), **kwargs):
        return F.multiply(img, multiplier)

    def get_params_dependent_on_targets(self, params):
        if self.multiplier[0] == self.multiplier[1]:
            return {
    
    "multiplier": np.array([self.multiplier[0]])}

        img = params["image"]

        h, w = img.shape[:2]

        if self.per_channel:
            c = 1 if F.is_grayscale_image(img) else img.shape[-1]
        else:
            c = 1

        if self.elementwise:
            shape = [h, w, c]
        else:
            shape = [c]

        multiplier = np.random.uniform(self.multiplier[0], self.multiplier[1], shape)
        if F.is_grayscale_image(img) and img.ndim == 2:
            multiplier = np.squeeze(multiplier)

        return {
    
    "multiplier": multiplier}

    @property
    def targets_as_params(self):
        return ["image"]

    def get_transform_init_args_names(self):
        return "multiplier", "per_channel", "elementwise"

Hay más ruido cuando elemento = Verdadero, porque cada píxel es independiente.
Insertar descripción de la imagen aquí


Función : Normalización de imágenes
Fórmula de normalización: img = (img - media * max_pixel_value) / (std * max_pixel_value)
es equivalente a: img = (img / max_pixel_value - media) / std
Parámetros predeterminados:

mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225),
max_pixel_value=255.0
class Normalize(ImageOnlyTransform):
    """Normalization is applied by the formula: `img = (img - mean * max_pixel_value) / (std * max_pixel_value)`

    Args:
        mean (float, list of float): mean values
        std  (float, list of float): std values
        max_pixel_value (float): maximum possible pixel value

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225),
        max_pixel_value=255.0,
        always_apply=False,
        p=1.0,
    ):
        super(Normalize, self).__init__(always_apply, p)
        self.mean = mean
        self.std = std
        self.max_pixel_value = max_pixel_value

    def apply(self, image, **params):
        return F.normalize(image, self.mean, self.std, self.max_pixel_value)

    def get_transform_init_args_names(self):
        return ("mean", "std", "max_pixel_value")

Función :

# source code


Función : Reduce el número de bits en cada canal de color para lograr capas tonales. Entonces el rango válido del parámetro num_bits es [0, 8].
Parámetros: num_bits ((int, int) o int, o lista de enteros [r, g, b], o lista de enteros [[r1, r1], [g1, g2], [b1, b2]]): número de bits altos.
Cuanto menor sea el número de num_bits, más obvia será la estratificación tonal. Rango de valores válido: [0, 8], valor predeterminado: 4.

# source code
class Posterize(ImageOnlyTransform):
    """Reduce the number of bits for each color channel.

    Args:
        num_bits ((int, int) or int,
                  or list of ints [r, g, b],
                  or list of ints [[r1, r1], [g1, g2], [b1, b2]]): number of high bits.
            If num_bits is a single value, the range will be [num_bits, num_bits].
            Must be in range [0, 8]. Default: 4.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
    image

    Image types:
        uint8
    """

    def __init__(self, num_bits=4, always_apply=False, p=0.5):
        super(Posterize, self).__init__(always_apply, p)

        if isinstance(num_bits, (list, tuple)):
            if len(num_bits) == 3:
                self.num_bits = [to_tuple(i, 0) for i in num_bits]
            else:
                self.num_bits = to_tuple(num_bits, 0)
        else:
            self.num_bits = to_tuple(num_bits, num_bits)

    def apply(self, image, num_bits=1, **params):
        return F.posterize(image, num_bits)

    def get_params(self):
        if len(self.num_bits) == 3:
            return {
    
    
                "num_bits":
                [random.randint(i[0], i[1]) for i in self.num_bits]
            }
        return {
    
    "num_bits": random.randint(self.num_bits[0], self.num_bits[1])}

    def get_transform_init_args_names(self):
        return ("num_bits", )

Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


Función :
Descripción del parámetro del valor de compensación en cada canal RGB : r_shift_limit, g_shift_limit, b_shift_limit ((int, int) o int) representan respectivamente el valor de compensación en los canales R, G y B. Si se ingresa como un solo número , se convertirá a Intervalo (-shift_limit, shift_limit), el valor final aplicado se muestrea aleatoriamente dentro del intervalo.

# source code
class RGBShift(ImageOnlyTransform):
    """Randomly shift values for each channel of the input RGB image.

    Args:
        r_shift_limit ((int, int) or int): range for changing values for the red channel. If r_shift_limit is a single
            int, the range will be (-r_shift_limit, r_shift_limit). Default: (-20, 20).
        g_shift_limit ((int, int) or int): range for changing values for the green channel. If g_shift_limit is a
            single int, the range  will be (-g_shift_limit, g_shift_limit). Default: (-20, 20).
        b_shift_limit ((int, int) or int): range for changing values for the blue channel. If b_shift_limit is a single
            int, the range will be (-b_shift_limit, b_shift_limit). Default: (-20, 20).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        r_shift_limit=20,
        g_shift_limit=20,
        b_shift_limit=20,
        always_apply=False,
        p=0.5,
    ):
        super(RGBShift, self).__init__(always_apply, p)
        self.r_shift_limit = to_tuple(r_shift_limit)
        self.g_shift_limit = to_tuple(g_shift_limit)
        self.b_shift_limit = to_tuple(b_shift_limit)

    def apply(self, image, r_shift=0, g_shift=0, b_shift=0, **params):
        if not F.is_rgb_image(image):
            raise TypeError("RGBShift transformation expects 3-channel images.")
        return F.shift_rgb(image, r_shift, g_shift, b_shift)

    def get_params(self):
        return {
    
    
            "r_shift": random.uniform(self.r_shift_limit[0], self.r_shift_limit[1]),
            "g_shift": random.uniform(self.g_shift_limit[0], self.g_shift_limit[1]),
            "b_shift": random.uniform(self.b_shift_limit[0], self.b_shift_limit[1]),
        }

    def get_transform_init_args_names(self):
        return ("r_shift_limit", "g_shift_limit", "b_shift_limit")
# F.shift_rgb,对于逐像素应用统一计算公式可使用查找表方式(cv2.LUT,look up table)
def _shift_image_uint8(img, value):
    max_value = MAX_VALUES_BY_DTYPE[img.dtype]

    lut = np.arange(0, max_value + 1).astype("float32")
    lut += value

    lut = np.clip(lut, 0, max_value).astype(img.dtype)
    return cv2.LUT(img, lut)


@preserve_shape
def _shift_rgb_uint8(img, r_shift, g_shift, b_shift):
    if r_shift == g_shift == b_shift:
        h, w, c = img.shape
        img = img.reshape([h, w * c])

        return _shift_image_uint8(img, r_shift)

    result_img = np.empty_like(img)
    shifts = [r_shift, g_shift, b_shift]
    for i, shift in enumerate(shifts):
        result_img[..., i] = _shift_image_uint8(img[..., i], shift)

    return result_img


def shift_rgb(img, r_shift, g_shift, b_shift):
    if img.dtype == np.uint8:
        return _shift_rgb_uint8(img, r_shift, g_shift, b_shift)

    return _shift_rgb_non_uint8(img, r_shift, g_shift, b_shift)

Insertar descripción de la imagen aquí


Función : cambia aleatoriamente el brillo y el contraste de la imagen de entrada. Transformación similar:
descripción del parámetro ColorJitter:

  • brillo_limit ((float, float) o float): factor de cambio de brillo. Si se ingresa como un solo número, se convertirá en un intervalo (-limit, limit). Valor predeterminado: (-0.2, 0.2)
  • contrast_limit ((float, float) o float): factor de cambio de contraste. Si se ingresa como un solo número, se convertirá en un intervalo (-limit, limit). Valor predeterminado: (-0.2, 0.2)
  • brillo_by_max (booleano): si es verdadero, el contraste se ajusta según el valor máximo del tipo de imagen. Si es falso, el contraste se ajusta según el valor promedio de la imagen. Valor predeterminado: Verdadero
# source code
class RandomBrightnessContrast(ImageOnlyTransform):
    """Randomly change brightness and contrast of the input image.

    Args:
        brightness_limit ((float, float) or float): factor range for changing brightness.
            If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
        contrast_limit ((float, float) or float): factor range for changing contrast.
            If limit is a single float, the range will be (-limit, limit). Default: (-0.2, 0.2).
        brightness_by_max (Boolean): If True adjust contrast by image dtype maximum,
            else adjust contrast by image mean.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        brightness_limit=0.2,
        contrast_limit=0.2,
        brightness_by_max=True,
        always_apply=False,
        p=0.5,
    ):
        super(RandomBrightnessContrast, self).__init__(always_apply, p)
        self.brightness_limit = to_tuple(brightness_limit)
        self.contrast_limit = to_tuple(contrast_limit)
        self.brightness_by_max = brightness_by_max

    def apply(self, img, alpha=1.0, beta=0.0, **params):
        return F.brightness_contrast_adjust(img, alpha, beta,
                                            self.brightness_by_max)

    def get_params(self):
        return {
    
    
            "alpha":
            1.0 +
            random.uniform(self.contrast_limit[0], self.contrast_limit[1]),
            "beta":
            0.0 +
            random.uniform(self.brightness_limit[0], self.brightness_limit[1]),
        }

    def get_transform_init_args_names(self):
        return ("brightness_limit", "contrast_limit", "brightness_by_max")

Cambio de brillo (contrast_limit=(0.1, 0.1), brillo_by_max=True):
Insertar descripción de la imagen aquí

Cambio de contraste (brightness_limit=(0.01, 0.01), brillo_by_max=True):
Insertar descripción de la imagen aquí

brillo_by_max变化:
límite_brillo=(0.1, 0.1), límite_contraste=(0.1, 0.1)
Insertar descripción de la imagen aquí
límite_brillo=(-0.1, -0.1), límite_contraste=(-0.1, -0.1)
Insertar descripción de la imagen aquí


Función : Agregar efecto de niebla a la imagen de entrada
Descripción del parámetro: Todos los parámetros son de tipo flotante y el intervalo válido es [0, 1].
fog_coef_lower, fog_coef_upper: el valor mínimo y máximo del coeficiente de intensidad de niebla. El parámetro de intensidad aplicado final se muestrea y se obtiene dentro de este rango. Rango predeterminado: [0.3, 1]
alpha_coef: la transparencia del círculo de niebla. Valor predeterminado: 0,08

# source code 
class RandomFog(ImageOnlyTransform):
    """Simulates fog for the image

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        fog_coef_lower (float): lower limit for fog intensity coefficient. Should be in [0, 1] range.
        fog_coef_upper (float): upper limit for fog intensity coefficient. Should be in [0, 1] range.
        alpha_coef (float): transparency of the fog circles. Should be in [0, 1] range.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        fog_coef_lower=0.3,
        fog_coef_upper=1,
        alpha_coef=0.08,
        always_apply=False,
        p=0.5,
    ):
        super(RandomFog, self).__init__(always_apply, p)

        if not 0 <= fog_coef_lower <= fog_coef_upper <= 1:
            raise ValueError(
                "Invalid combination if fog_coef_lower and fog_coef_upper. Got: {}"
                .format((fog_coef_lower, fog_coef_upper)))
        if not 0 <= alpha_coef <= 1:
            raise ValueError(
                "alpha_coef must be in range [0, 1]. Got: {}".format(
                    alpha_coef))

        self.fog_coef_lower = fog_coef_lower
        self.fog_coef_upper = fog_coef_upper
        self.alpha_coef = alpha_coef

    def apply(self, image, fog_coef=0.1, haze_list=(), **params):
        return F.add_fog(image, fog_coef, self.alpha_coef, haze_list)

    @property
    def targets_as_params(self):
        return ["image"]

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        fog_coef = random.uniform(self.fog_coef_lower, self.fog_coef_upper)

        height, width = imshape = img.shape[:2]

        hw = max(1, int(width // 3 * fog_coef))

        haze_list = []
        midx = width // 2 - 2 * hw
        midy = height // 2 - hw
        index = 1

        while midx > -hw or midy > -hw:
            for _i in range(hw // 10 * index):
                x = random.randint(midx, width - midx - hw)
                y = random.randint(midy, height - midy - hw)
                haze_list.append((x, y))

            midx -= 3 * hw * width // sum(imshape)
            midy -= 3 * hw * height // sum(imshape)
            index += 1

        return {
    
    "haze_list": haze_list, "fog_coef": fog_coef}

    def get_transform_init_args_names(self):
        return ("fog_coef_lower", "fog_coef_upper", "alpha_coef")

Insertar descripción de la imagen aquí


Mejora de imagen - transformación gamma:
Insertar descripción de la imagen aquí
cuando gamma<1, el brillo general se ilumina;
cuando gamma>1, el oscurecimiento general

# source code
class RandomGamma(ImageOnlyTransform):
    """
    Args:
        gamma_limit (float or (float, float)): If gamma_limit is a single float value,
            the range will be (-gamma_limit, gamma_limit). Default: (80, 120).
        eps: Deprecated.
    Targets:
        image
    Image types:
        uint8, float32
    """

    def __init__(self, gamma_limit=(80, 120), eps=None, always_apply=False, p=0.5):
        super(RandomGamma, self).__init__(always_apply, p)
        self.gamma_limit = to_tuple(gamma_limit)
        self.eps = eps

    def apply(self, img, gamma=1, **params):
        return F.gamma_transform(img, gamma=gamma)

    def get_params(self):
        return {
    
    "gamma": random.uniform(self.gamma_limit[0], self.gamma_limit[1]) / 100.0}

    def get_transform_init_args_names(self):
        return ("gamma_limit", "eps")

Parámetros principales: gamma_limitpredeterminado (80, 120), si solo se ingresa un valor, se convertirá a (-gamma_limit, gamma_limit). Se puede ver
en get_params()la función que gamma_limit es 100 veces el parámetro gamma, por lo que cuando el valor en el rango gamma_limit es >100, la imagen se oscurece. Cuando el valor en el rango gamma_limit es <100, la imagen se vuelve más brillante.
Insertar descripción de la imagen aquí

Función : Agregar efecto de lluvia a la imagen de entrada
Descripción del parámetro:

# 默认参数
slant_lower=-10,
slant_upper=10,
drop_length=20,
drop_width=1,
drop_color=(200, 200, 200),
blur_value=7,
brightness_coefficient=0.7,
rain_type=None
  • slant_lower, slant_upper: controla la pendiente de la línea de lluvia, el rango de valores es [-20, 20]. Si slant_sample < 0, la línea de lluvia se inclina hacia la izquierda; de lo contrario, se inclina hacia la derecha.

  • drop_length: longitud de la línea de lluvia, rango de valores [0, 100]. Cuando se especifica el parámetro rain_type, el drop_length pasado no es válido y se utiliza el valor incorporado. Consulte el código del parámetro rain_type.

  • drop_width: ancho de línea de lluvia, rango de valores [1, 5].

  • drop_color (lista de (r, g, b)): color de la línea de lluvia.

    # drop_length,drop_width, drop_color 都是绘制雨线(cv2.line)的参数
    for (rain_drop_x0, rain_drop_y0) in rain_drops:
        rain_drop_x1 = rain_drop_x0 + slant
        rain_drop_y1 = rain_drop_y0 + drop_length
    
        cv2.line(
            image,
            (rain_drop_x0, rain_drop_y0),
            (rain_drop_x1, rain_drop_y1),
            drop_color,
            drop_width,
        )
    
  • blur_value (int): kernel_size de cv2.blur(), es necesario desenfocar la escena del día lluvioso, porque la mayoría de los días lluviosos son brumosos.

  • brillo_coeficiente (flotante): factor de brillo, rango de valores [0, 1]. Porque los días de lluvia suelen estar nublados y faltar luz.

  • rain_type: Grado de lluvia, uno de [Ninguno, “llovizna”, “pesada”, “torrencial”], aumentando de izquierda a derecha.

    if self.rain_type == "drizzle":
        num_drops = area // 770
        drop_length = 10
    elif self.rain_type == "heavy":
        num_drops = width * height // 600
        drop_length = 30
    elif self.rain_type == "torrential":
        num_drops = area // 500
        drop_length = 60
    else:
        drop_length = self.drop_length
        num_drops = area // 600
    
# source code
class RandomRain(ImageOnlyTransform):
    """Adds rain effects.

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        slant_lower: should be in range [-20, 20].
        slant_upper: should be in range [-20, 20].
        drop_length: should be in range [0, 100].
        drop_width: should be in range [1, 5].
        drop_color (list of (r, g, b)): rain lines color.
        blur_value (int): rainy view are blurry
        brightness_coefficient (float): rainy days are usually shady. Should be in range [0, 1].
        rain_type: One of [None, "drizzle", "heavy", "torrential"]

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        slant_lower=-10,
        slant_upper=10,
        drop_length=20,
        drop_width=1,
        drop_color=(200, 200, 200),
        blur_value=7,
        brightness_coefficient=0.7,
        rain_type=None,
        always_apply=False,
        p=0.5,
    ):
        super(RandomRain, self).__init__(always_apply, p)

        if rain_type not in ["drizzle", "heavy", "torrential", None]:
            raise ValueError("raint_type must be one of ({}). Got: {}".format(
                ["drizzle", "heavy", "torrential", None], rain_type))
        if not -20 <= slant_lower <= slant_upper <= 20:
            raise ValueError(
                "Invalid combination of slant_lower and slant_upper. Got: {}".
                format((slant_lower, slant_upper)))
        if not 1 <= drop_width <= 5:
            raise ValueError(
                "drop_width must be in range [1, 5]. Got: {}".format(
                    drop_width))
        if not 0 <= drop_length <= 100:
            raise ValueError(
                "drop_length must be in range [0, 100]. Got: {}".format(
                    drop_length))
        if not 0 <= brightness_coefficient <= 1:
            raise ValueError(
                "brightness_coefficient must be in range [0, 1]. Got: {}".
                format(brightness_coefficient))

        self.slant_lower = slant_lower
        self.slant_upper = slant_upper

        self.drop_length = drop_length
        self.drop_width = drop_width
        self.drop_color = drop_color
        self.blur_value = blur_value
        self.brightness_coefficient = brightness_coefficient
        self.rain_type = rain_type

    def apply(self, image, slant=10, drop_length=20, rain_drops=(), **params):
        return F.add_rain(
            image,
            slant,
            drop_length,
            self.drop_width,
            self.drop_color,
            self.blur_value,
            self.brightness_coefficient,
            rain_drops,
        )

    @property
    def targets_as_params(self):
        return ["image"]

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        slant = int(random.uniform(self.slant_lower, self.slant_upper))

        height, width = img.shape[:2]
        area = height * width

        if self.rain_type == "drizzle":
            num_drops = area // 770
            drop_length = 10
        elif self.rain_type == "heavy":
            num_drops = width * height // 600
            drop_length = 30
        elif self.rain_type == "torrential":
            num_drops = area // 500
            drop_length = 60
        else:
            drop_length = self.drop_length
            num_drops = area // 600

        rain_drops = []

        for _i in range(
                num_drops):  # If You want heavy rain, try increasing this
            if slant < 0:
                x = random.randint(slant, width)
            else:
                x = random.randint(0, width - slant)

            y = random.randint(0, height - drop_length)

            rain_drops.append((x, y))

        return {
    
    
            "drop_length": drop_length,
            "slant": slant,
            "rain_drops": rain_drops
        }

    def get_transform_init_args_names(self):
        return (
            "slant_lower",
            "slant_upper",
            "drop_length",
            "drop_width",
            "drop_color",
            "blur_value",
            "brightness_coefficient",
            "rain_type",
        )

Análisis visual:
Parámetros utilizados para parámetros no indicados en el gráfico.
Cuando rain_type = Ninguno, drop_length surte efecto. La longitud de 30 en la parte inferior izquierda es más larga que la longitud predeterminada de 20 en la línea de lluvia superior derecha.
Cuando rain_type está en ["drizzle", "heavy", "torrential"], drop_length no es válido y se utiliza la longitud incorporada. La longitud correspondiente del modo torrencial es 60. Entonces, aunque los valores de drop_length en las imágenes superior derecha e inferior derecha son los mismos, las longitudes de las líneas de lluvia son diferentes.
Insertar descripción de la imagen aquí


Función :

# source code


Función :

# source code


Función: Simular el efecto de una llamarada solar.
Descripción del parámetro:

  • flare_roi (flotar, flotar, flotar, flotar): Posición de la llamarada (x_min, y_min, x_max, y_max). Todos los valores están en el rango [0, 1]. Valor predeterminado: (0, 0, 1, 0,5)

  • ángulo_inferior, ángulo_superior (flotante): 应满足 0 <= ángulo_inferior < ángulo_superior <= 1

  • num_flare_circles_lower, num_flare_circles_upper (int): número de círculos de destellos. Debe satisfacer 0 <= num_flare_circles_lower < num_flare_circles_upper.

  • src_radius (int): radio de destello (src_radius es el radio más grande, el radio interior se muestrea a intervalos iguales), el valor predeterminado es 400. Combinado con el valor fijo de la resolución de la imagen, no importa si es un poco más grande, ya que el peso del halo del anillo exterior es muy pequeño.

    num_times = src_radius // 10
    rad = np.linspace(1, src_radius, num=num_times) # 等间隔采样
    for i in range(num_times):
        cv2.circle(overlay, point, int(rad[i]), src_color, -1)
        ...
    
  • src_color ((int, int, int)): color de destello

# source code
class RandomSunFlare(ImageOnlyTransform):
    """Simulates Sun Flare for the image

    From https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library

    Args:
        flare_roi (float, float, float, float): region of the image where flare will
            appear (x_min, y_min, x_max, y_max). All values should be in range [0, 1].
        angle_lower (float): should be in range [0, `angle_upper`].
        angle_upper (float): should be in range [`angle_lower`, 1].
        num_flare_circles_lower (int): lower limit for the number of flare circles.
            Should be in range [0, `num_flare_circles_upper`].
        num_flare_circles_upper (int): upper limit for the number of flare circles.
            Should be in range [`num_flare_circles_lower`, inf].
        src_radius (int):
        src_color ((int, int, int)): color of the flare

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        flare_roi=(0, 0, 1, 0.5),
        angle_lower=0,
        angle_upper=1,
        num_flare_circles_lower=6,
        num_flare_circles_upper=10,
        src_radius=400,
        src_color=(255, 255, 255),
        always_apply=False,
        p=0.5,
    ):
        super(RandomSunFlare, self).__init__(always_apply, p)

        (
            flare_center_lower_x,
            flare_center_lower_y,
            flare_center_upper_x,
            flare_center_upper_y,
        ) = flare_roi

        if (
            not 0 <= flare_center_lower_x < flare_center_upper_x <= 1
            or not 0 <= flare_center_lower_y < flare_center_upper_y <= 1
        ):
            raise ValueError("Invalid flare_roi. Got: {}".format(flare_roi))
        if not 0 <= angle_lower < angle_upper <= 1:
            raise ValueError(
                "Invalid combination of angle_lower nad angle_upper. Got: {}".format((angle_lower, angle_upper))
            )
        if not 0 <= num_flare_circles_lower < num_flare_circles_upper:
            raise ValueError(
                "Invalid combination of num_flare_circles_lower nad num_flare_circles_upper. Got: {}".format(
                    (num_flare_circles_lower, num_flare_circles_upper)
                )
            )

        self.flare_center_lower_x = flare_center_lower_x
        self.flare_center_upper_x = flare_center_upper_x

        self.flare_center_lower_y = flare_center_lower_y
        self.flare_center_upper_y = flare_center_upper_y

        self.angle_lower = angle_lower
        self.angle_upper = angle_upper
        self.num_flare_circles_lower = num_flare_circles_lower
        self.num_flare_circles_upper = num_flare_circles_upper

        self.src_radius = src_radius
        self.src_color = src_color

    def apply(self, image, flare_center_x=0.5, flare_center_y=0.5, circles=(), **params):
        return F.add_sun_flare(
            image,
            flare_center_x,
            flare_center_y,
            self.src_radius,
            self.src_color,
            circles,
        )

    @property
    def targets_as_params(self):
        return ["image"]

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        height, width = img.shape[:2]

        angle = 2 * math.pi * random.uniform(self.angle_lower, self.angle_upper)

        flare_center_x = random.uniform(self.flare_center_lower_x, self.flare_center_upper_x)
        flare_center_y = random.uniform(self.flare_center_lower_y, self.flare_center_upper_y)

        flare_center_x = int(width * flare_center_x)
        flare_center_y = int(height * flare_center_y)

        num_circles = random.randint(self.num_flare_circles_lower, self.num_flare_circles_upper)

        circles = []

        x = []
        y = []

        for rand_x in range(0, width, 10):
            rand_y = math.tan(angle) * (rand_x - flare_center_x) + flare_center_y
            x.append(rand_x)
            y.append(2 * flare_center_y - rand_y)

        for _i in range(num_circles):
            alpha = random.uniform(0.05, 0.2)
            r = random.randint(0, len(x) - 1)
            rad = random.randint(1, max(height // 100 - 2, 2))

            r_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
            g_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])
            b_color = random.randint(max(self.src_color[0] - 50, 0), self.src_color[0])

            circles += [
                (
                    alpha,
                    (int(x[r]), int(y[r])),
                    pow(rad, 3),
                    (r_color, g_color, b_color),
                )
            ]

        return {
    
    
            "circles": circles,
            "flare_center_x": flare_center_x,
            "flare_center_y": flare_center_y,
        }

    def get_transform_init_args(self):
        return {
    
    
            "flare_roi": (
                self.flare_center_lower_x,
                self.flare_center_lower_y,
                self.flare_center_upper_x,
                self.flare_center_upper_y,
            ),
            "angle_lower": self.angle_lower,
            "angle_upper": self.angle_upper,
            "num_flare_circles_lower": self.num_flare_circles_lower,
            "num_flare_circles_upper": self.num_flare_circles_upper,
            "src_radius": self.src_radius,
            "src_color": self.src_color,
        }

Insertar descripción de la imagen aquí

Función: Afilar. (Los métodos similares incluyen UnsharpMask)
Descripción del parámetro: alfa ((flotante, flotante)): controla el grado de visualización de la imagen nítida. Alfa=0 significa conservar únicamente la imagen original, alfa=1,0 significa conservar únicamente la imagen nítida.
luminosidad ((flotante, flotante)): controla el brillo de la imagen nítida.

# source code
class Sharpen(ImageOnlyTransform):
    """Sharpen the input image and overlays the result with the original image.

    Args:
        alpha ((float, float)): range to choose the visibility of the sharpened image. At 0, only the original image is
            visible, at 1.0 only its sharpened version is visible. Default: (0.2, 0.5).
        lightness ((float, float)): range to choose the lightness of the sharpened image. Default: (0.5, 1.0).
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image
    """

    def __init__(self,
                 alpha=(0.2, 0.5),
                 lightness=(0.5, 1.0),
                 always_apply=False,
                 p=0.5):
        super(Sharpen, self).__init__(always_apply, p)
        self.alpha = self.__check_values(to_tuple(alpha, 0.0),
                                         name="alpha",
                                         bounds=(0.0, 1.0))
        self.lightness = self.__check_values(to_tuple(lightness, 0.0),
                                             name="lightness")

    @staticmethod
    def __check_values(value, name, bounds=(0, float("inf"))):
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError("{} values should be between {}".format(
                name, bounds))
        return value

    @staticmethod
    def __generate_sharpening_matrix(alpha_sample, lightness_sample):
        matrix_nochange = np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]],
                                   dtype=np.float32)
        matrix_effect = np.array(
            [[-1, -1, -1], [-1, 8 + lightness_sample, -1], [-1, -1, -1]],
            dtype=np.float32,
        )

        matrix = (
            1 - alpha_sample) * matrix_nochange + alpha_sample * matrix_effect
        return matrix

    def get_params(self):
        alpha = random.uniform(*self.alpha)
        lightness = random.uniform(*self.lightness)
        sharpening_matrix = self.__generate_sharpening_matrix(
            alpha_sample=alpha, lightness_sample=lightness)
        return {
    
    "sharpening_matrix": sharpening_matrix}

    def apply(self, img, sharpening_matrix=None, **params):
        return F.convolve(img, sharpening_matrix)

    def get_transform_init_args_names(self):
        return ("alpha", "lightness")

El efecto es más fuerte que el de UnsharpMask y el efecto de nitidez de UnsharpMask es más natural.
Insertar descripción de la imagen aquí


Función: invertir píxeles mayores que el umbral (si la entrada es uint8, la inversión es 255 - valor_píxel)

# source code
class Solarize(ImageOnlyTransform):
    """Invert all pixel values above a threshold.

    Args:
        threshold ((int, int) or int, or (float, float) or float): range for solarizing threshold.
            If threshold is a single value, the range will be [threshold, threshold]. Default: 128.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        any
    """

    def __init__(self, threshold=128, always_apply=False, p=0.5):
        super(Solarize, self).__init__(always_apply, p)

        if isinstance(threshold, (int, float)):
            self.threshold = to_tuple(threshold, low=threshold)
        else:
            self.threshold = to_tuple(threshold, low=0)

    def apply(self, image, threshold=0, **params):
        return F.solarize(image, threshold)

    def get_params(self):
        return {
    
    
            "threshold": random.uniform(self.threshold[0], self.threshold[1])
        }

    def get_transform_init_args_names(self):
        return ("threshold", )
# F.solarize
def solarize(img, threshold=128):
    """Invert all pixel values above a threshold.

    Args:
        img (numpy.ndarray): The image to solarize.
        threshold (int): All pixels above this greyscale level are inverted.

    Returns:
        numpy.ndarray: Solarized image.

    """
    dtype = img.dtype
    max_val = MAX_VALUES_BY_DTYPE[dtype]

    if dtype == np.dtype("uint8"):
        lut = [(i if i < threshold else max_val - i) for i in range(max_val + 1)]

        prev_shape = img.shape
        img = cv2.LUT(img, np.array(lut, dtype=dtype))

        if len(prev_shape) != len(img.shape):
            img = np.expand_dims(img, -1)
        return img

    result_img = img.copy()
    cond = img >= threshold
    result_img[cond] = max_val - result_img[cond]
    return result_img

Insertar descripción de la imagen aquí


Función: Efecto salpicadura, que puede simular lluvia o barro bloqueando la lente.
Descripción del parámetro: media (flotante o tupla de flotadores): Genera (liquid layer)la media de distribución normal de la capa líquida. [mean[0], mean[1])Si un solo número se usa directamente como media, si es un parámetro de intervalo, significa que se muestra aleatoriamente un valor dentro de este intervalo como media. Valor predeterminado: 0,65
std (flotante o tupla de flotadores): genera la variación distribuida normalmente de la capa líquida. [std[0], std[1])Si se usa un solo número directamente como varianza, si es un parámetro de intervalo, significa que se muestra aleatoriamente un valor dentro de este intervalo como varianza. Valor predeterminado: 0,3
gauss_sigma (flotador o tupla de flotadores): valor sigma del filtro gaussiano de la capa líquida. [sigma[0], sigma[1])Si se usa un solo número directamente como varianza, si es un parámetro de intervalo, significa que un valor se muestra aleatoriamente como sigma dentro de este intervalo . Valor predeterminado: 2
cutout_threshold (flotante o tupla de flotantes): Umbral de filtrado de capa líquida. Si se utiliza un solo número directamente como umbral, si es un parámetro de intervalo, significa que se muestrea aleatoriamente un valor dentro de este intervalo [cutout_threshold[0], cutout_threshold[1])como umbral. Predeterminado: 0,68
intensidad (flotante o tupla de flotantes): Intensidad del chapoteo. Si se utiliza un solo número directamente como umbral, si es un parámetro de intervalo, significa que se muestrea aleatoriamente un valor dentro de este intervalo [intensity[0], intensity[1])como umbral. Valor predeterminado: modo 0.6
(cadena o lista de cadenas): tipo Splash. Las opciones admitidas son "lluvia" y "barro". Si se proporciona el parámetro mode=["rain", "mud"], significa que se selecciona aleatoriamente un modo de presentación para la imagen actual. Predeterminado: 'lluvia'

mean, std, gauss_sigma afectan el tamaño de las gotas de lluvia o las manchas de barro.
cutout_threshold afectará la densidad de cobertura y el área de las gotas de lluvia o manchas de barro.
La intensidad afecta la severidad de la lluvia o las manchas de barro.
Se recomiendan todos los valores si se necesitan ajustes.Sólo ajuste fino! ! ! !
Los resultados de la comparación visual específicos se pueden encontrar al final del código fuente.

Nota: El parámetro medio no puede desviarse demasiado de 0,65. Se recomienda utilizar el valor predeterminado. Si se establece en 0,5, provocará errores (modo lluvia) y no podrá producir resultados correctos. Si el valor de configuración es demasiado grande, la imagen se desviará completamente del resultado deseado.
Mensaje de error: división por cero encontrada en true_divide m *= 1 / np.max(m, axis=(0, 1))

A continuación se muestran los resultados de diferentes valores medios en diferentes modos:
modo lluvia:
Insertar descripción de la imagen aquí

modo barro:
Insertar descripción de la imagen aquí

# source code
class Spatter(ImageOnlyTransform):
    """
    Apply spatter transform. It simulates corruption which can occlude a lens in the form of rain or mud.

    Args:
        mean (float, or tuple of floats): Mean value of normal distribution for generating liquid layer.
            If single float it will be used as mean.
            If tuple of float mean will be sampled from range `[mean[0], mean[1])`. Default: (0.65).
        std (float, or tuple of floats): Standard deviation value of normal distribution for generating liquid layer.
            If single float it will be used as std.
            If tuple of float std will be sampled from range `[std[0], std[1])`. Default: (0.3).
        gauss_sigma (float, or tuple of floats): Sigma value for gaussian filtering of liquid layer.
            If single float it will be used as gauss_sigma.
            If tuple of float gauss_sigma will be sampled from range `[sigma[0], sigma[1])`. Default: (2).
        cutout_threshold (float, or tuple of floats): Threshold for filtering liqued layer
            (determines number of drops). If single float it will used as cutout_threshold.
            If tuple of float cutout_threshold will be sampled from range `[cutout_threshold[0], cutout_threshold[1])`.
            Default: (0.68).
        intensity (float, or tuple of floats): Intensity of corruption.
            If single float it will be used as intensity.
            If tuple of float intensity will be sampled from range `[intensity[0], intensity[1])`. Default: (0.6).
        mode (string, or list of strings): Type of corruption. Currently, supported options are 'rain' and 'mud'.
             If list is provided type of corruption will be sampled list. Default: ("rain").
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32

    Reference:
    |  https://arxiv.org/pdf/1903.12261.pdf
    |  https://github.com/hendrycks/robustness/blob/master/ImageNet-C/create_c/make_imagenet_c.py
    """

    def __init__(
        self,
        mean: ScaleFloatType = 0.65,
        std: ScaleFloatType = 0.3,
        gauss_sigma: ScaleFloatType = 2,
        cutout_threshold: ScaleFloatType = 0.68,
        intensity: ScaleFloatType = 0.6,
        mode: Union[str, Sequence[str]] = "rain",
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)

        self.mean = to_tuple(mean, mean)
        self.std = to_tuple(std, std)
        self.gauss_sigma = to_tuple(gauss_sigma, gauss_sigma)
        self.intensity = to_tuple(intensity, intensity)
        self.cutout_threshold = to_tuple(cutout_threshold, cutout_threshold)
        self.mode = mode if isinstance(mode, (list, tuple)) else [mode]
        for i in self.mode:
            if i not in ["rain", "mud"]:
                raise ValueError(
                    f"Unsupported color mode: {
      
      mode}. Transform supports only `rain` and `mud` mods."
                )

    def apply(self,
              img: np.ndarray,
              non_mud: Optional[np.ndarray] = None,
              mud: Optional[np.ndarray] = None,
              drops: Optional[np.ndarray] = None,
              mode: str = "",
              **params) -> np.ndarray:
        return F.spatter(img, non_mud, mud, drops, mode)

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_params_dependent_on_targets(
            self, params: Dict[str, Any]) -> Dict[str, Any]:
        h, w = params["image"].shape[:2]

        mean = random.uniform(self.mean[0], self.mean[1])
        std = random.uniform(self.std[0], self.std[1])
        cutout_threshold = random.uniform(self.cutout_threshold[0],
                                          self.cutout_threshold[1])
        sigma = random.uniform(self.gauss_sigma[0], self.gauss_sigma[1])
        mode = random.choice(self.mode)
        intensity = random.uniform(self.intensity[0], self.intensity[1])

        liquid_layer = random_utils.normal(size=(h, w), loc=mean, scale=std)
        liquid_layer = gaussian_filter(liquid_layer,
                                       sigma=sigma,
                                       mode="nearest")
        liquid_layer[liquid_layer < cutout_threshold] = 0

        if mode == "rain":
            liquid_layer = (liquid_layer * 255).astype(np.uint8)
            dist = 255 - cv2.Canny(liquid_layer, 50, 150)
            dist = cv2.distanceTransform(dist, cv2.DIST_L2, 5)
            _, dist = cv2.threshold(dist, 20, 20, cv2.THRESH_TRUNC)
            dist = blur(dist, 3).astype(np.uint8)
            dist = F.equalize(dist)

            ker = np.array([[-2, -1, 0], [-1, 1, 1], [0, 1, 2]])
            dist = F.convolve(dist, ker)
            dist = blur(dist, 3).astype(np.float32)

            m = liquid_layer * dist
            m *= 1 / np.max(m, axis=(0, 1))

            drops = m[:, :, None] * np.array(
                [238 / 255.0, 238 / 255.0, 175 / 255.0]) * intensity
            mud = None
            non_mud = None
        else:
            m = np.where(liquid_layer > cutout_threshold, 1, 0)
            m = gaussian_filter(m.astype(np.float32),
                                sigma=sigma,
                                mode="nearest")
            m[m < 1.2 * cutout_threshold] = 0
            m = m[..., np.newaxis]
            mud = m * np.array([20 / 255.0, 42 / 255.0, 63 / 255.0])
            non_mud = 1 - m
            drops = None

        return {
    
    
            "non_mud": non_mud,
            "mud": mud,
            "drops": drops,
            "mode": mode,
        }

    def get_transform_init_args_names(
            self) -> Tuple[str, str, str, str, str, str]:
        return "mean", "std", "gauss_sigma", "intensity", "cutout_threshold", "mode"

A continuación se visualizan los resultados de diferentes cambios de parámetros. Los parámetros que no se muestran en la figura utilizan los parámetros predeterminados.
cambio medio:
Insertar descripción de la imagen aquí

cambios estándar:
Insertar descripción de la imagen aquí

cambios gauss_sigma:
Insertar descripción de la imagen aquí
cambios cutout_threshold:
Insertar descripción de la imagen aquí

Cambios de intensidad de las salpicaduras:
significar

Cambio de modo del modo Splash:
la imagen en la esquina inferior derecha selecciona aleatoriamente el modo lluvia.
Insertar descripción de la imagen aquí


Comprensión conceptual:
el concepto de superpíxel es una tecnología de segmentación de imágenes propuesta y desarrollada por Xiaofeng Ren en 2003. Se refiere a bloques de píxeles irregulares con cierto significado visual compuestos por píxeles adyacentes con textura, color, brillo y otras características similares. Utiliza la similitud de características entre píxeles para agrupar píxeles y utiliza una pequeña cantidad de superpíxeles en lugar de una gran cantidad de píxeles para expresar características de la imagen, lo que reduce en gran medida la complejidad del posprocesamiento de la imagen, por lo que generalmente se usa como preprocesamiento. paso en los algoritmos de segmentación.

Función: Convierte parte o la totalidad de la imagen en una representación de superpíxeles, utilizando el algoritmo SLIC (simple lineal iterative cluster).

Descripción de parámetros:

  • p_replace (flotante o tupla de flotante): indica la probabilidad de que el bloque de segmentación de imagen actual tenga p_replace y esté lleno de color promedio.

    p_replace=0, significa retener la imagen original;
    p_replace=0.5, significa que aproximadamente la mitad de todos los bloques segmentados están rellenos con un color promedio;
    p_replace=1.0, significa que todos los bloques segmentados están rellenos con un color promedio, generando una imagen voronoi (Tyson imagen poligonal);

  • n_segments (int o tupla de int): número aproximado de superpíxeles generados (el algoritmo puede desviarse de este número)

  • max_size (int o Ninguno): Indica el tamaño máximo del lado largo de la imagen, si excede el tamaño, se redimensionará proporcionalmente a este tamaño (el propósito es acelerar el algoritmo), y el resultado final será redimensionado al tamaño original. Si max_size = Noneeso significa no reizen.

  • interpolación (bandera OpenCV): método de interpolación opencv, interpolación lineal predeterminada (cv2.INTER_LINEAR).

    El método de interpolación puede enumerar valores:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4

# source code
class Superpixels(ImageOnlyTransform):
    """Transform images partially/completely to their superpixel representation.
    This implementation uses skimage's version of the SLIC algorithm.

    Args:
        p_replace (float or tuple of float): Defines for any segment the probability that the pixels within that
            segment are replaced by their average color (otherwise, the pixels are not changed).
            Examples:
                * A probability of ``0.0`` would mean, that the pixels in no
                  segment are replaced by their average color (image is not
                  changed at all).
                * A probability of ``0.5`` would mean, that around half of all
                  segments are replaced by their average color.
                * A probability of ``1.0`` would mean, that all segments are
                  replaced by their average color (resulting in a voronoi
                  image).
            Behaviour based on chosen data types for this parameter:
                * If a ``float``, then that ``flat`` will always be used.
                * If ``tuple`` ``(a, b)``, then a random probability will be
                  sampled from the interval ``[a, b]`` per image.
        n_segments (int, or tuple of int): Rough target number of how many superpixels to generate (the algorithm
            may deviate from this number). Lower value will lead to coarser superpixels.
            Higher values are computationally more intensive and will hence lead to a slowdown
            * If a single ``int``, then that value will always be used as the
              number of segments.
            * If a ``tuple`` ``(a, b)``, then a value from the discrete
              interval ``[a..b]`` will be sampled per image.
        max_size (int or None): Maximum image size at which the augmentation is performed.
            If the width or height of an image exceeds this value, it will be
            downscaled before the augmentation so that the longest side matches `max_size`.
            This is done to speed up the process. The final output image has the same size as the input image.
            Note that in case `p_replace` is below ``1.0``,
            the down-/upscaling will affect the not-replaced pixels too.
            Use ``None`` to apply no down-/upscaling.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image
    """

    def __init__(
        self,
        p_replace: Union[float, Sequence[float]] = 0.1,
        n_segments: Union[int, Sequence[int]] = 100,
        max_size: Optional[int] = 128,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.p_replace = to_tuple(p_replace, p_replace)
        self.n_segments = to_tuple(n_segments, n_segments)
        self.max_size = max_size
        self.interpolation = interpolation

        if min(self.n_segments) < 1:
            raise ValueError(f"n_segments must be >= 1. Got: {
      
      n_segments}")

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("p_replace", "n_segments", "max_size", "interpolation")

    def get_params(self) -> dict:
        n_segments = random.randint(*self.n_segments)
        p = random.uniform(*self.p_replace)
        return {
    
    "replace_samples": random_utils.random(n_segments) < p, "n_segments": n_segments}

    def apply(self, img: np.ndarray, replace_samples: Sequence[bool] = (False,), n_segments: int = 1, **kwargs):
        return F.superpixels(img, n_segments, replace_samples, self.max_size, self.interpolation)

A continuación se muestran los resultados de la visualización.
Cuanto más grandes sean n_segments, más bloques de segmentación de imágenes habrá.
Cuanto mayor sea p_replace, mayor será la probabilidad de que se rellene con un color uniforme, es decir, se rellenarán más bloques segmentados.

Insertar descripción de la imagen aquí

Lectura ampliada:
Dragón engendra dragón, fénix engendra fénix, SLIC engendra superpíxel



Función: Dividir por el valor máximo, convertir a entrada float32 y el rango de valores de píxeles se convierte en [0, 1.0]
Si no se especifica el valor máximo, el valor máximo estará determinado por el tipo de imagen:

MAX_VALUES_BY_DTYPE = {
    
    
    np.dtype("uint8"): 255,
    np.dtype("uint16"): 65535,
    np.dtype("uint32"): 4294967295,
    np.dtype("float32"): 1.0,
}

Su función opuesta es FromFloat, a saberimg([0,1.0]) * max_value

# source code
class ToFloat(ImageOnlyTransform):
    """Divide pixel values by `max_value` to get a float32 output array where all values lie in the range [0, 1.0].
    If `max_value` is None the transform will try to infer the maximum value by inspecting the data type of the input
    image.

    See Also:
        :class:`~albumentations.augmentations.transforms.FromFloat`

    Args:
        max_value (float): maximum possible input value. Default: None.
        p (float): probability of applying the transform. Default: 1.0.

    Targets:
        image

    Image types:
        any type

    """

    def __init__(self, max_value=None, always_apply=False, p=1.0):
        super(ToFloat, self).__init__(always_apply, p)
        self.max_value = max_value

    def apply(self, img, **params):
        return F.to_float(img, self.max_value)

    def get_transform_init_args_names(self):
        return ("max_value",)
# F.to_float()
def to_float(img, max_value=None):
    if max_value is None:
        try:
            max_value = MAX_VALUES_BY_DTYPE[img.dtype]
        except KeyError:
            raise RuntimeError(
                "Can't infer the maximum value for dtype {}. You need to specify the maximum value manually by "
                "passing the max_value argument".format(img.dtype)
            )
    return img.astype("float32") / max_value

Función: Convierte aleatoriamente la imagen en escala de grises. Tenga en cuenta que la imagen transformada en escala de grises sigue teniendo 3 canales.

# source code
class ToGray(ImageOnlyTransform):
    """Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater
    than 127, invert the resulting grayscale image.

    Args:
        p (float): probability of applying the transform. Default: 0.5. # 应用该变换的概率值,p=1表示将所有图都变为灰度图。

    Targets:
        image

    Image types:
        uint8, float32
    """

    def apply(self, img, **params):
        if is_grayscale_image(img):
            warnings.warn("The image is already gray.")
            return img
        if not is_rgb_image(img):
            raise TypeError("ToGray transformation expects 3-channel images.")

        return F.to_gray(img)

    def get_transform_init_args_names(self):
        return ()
# F.to_gray(img)
def to_gray(img):
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    return cv2.cvtColor(gray, cv2.COLOR_GRAY2RGB) # 灰度图转为三通道

El siguiente es el resultado de la visualización: observe "x24BPP" debajo de la imagen en escala de grises, que representa una imagen de tres canales.
Insertar descripción de la imagen aquí


Función: convertir una imagen en escala de grises en una imagen en escala de grises de tres canales

Esta transformación no está incluida en la versión 1.3.0.
Esta transformación por defecto es p=1. (ToGray por defecto es p=0.5)

# source code
class ToRGB(ImageOnlyTransform):
    """Convert the input grayscale image to RGB.
    Args:
        p (float): probability of applying the transform. Default: 1.
    Targets:
        image
    Image types:
        uint8, float32
    """

    def __init__(self, always_apply=True, p=1.0):
        super(ToRGB, self).__init__(always_apply=always_apply, p=p)

    def apply(self, img, **params):
        if is_rgb_image(img):
            warnings.warn("The image is already an RGB.")
            return img
        if not is_grayscale_image(img):
            raise TypeError("ToRGB transformation expects 2-dim images or 3-dim with the last dimension equal to 1.")

        return F.gray_to_rgb(img)

    def get_transform_init_args_names(self):
        return ()
# F.gray_to_rgb(img)
def gray_to_rgb(img):
    return cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)

Función: Agregar filtro sepia a la imagen

# source code
class ToSepia(ImageOnlyTransform):
    """Applies sepia filter to the input RGB image

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        uint8, float32
    """

    def __init__(self, always_apply=False, p=0.5):
        super(ToSepia, self).__init__(always_apply, p)
        self.sepia_transformation_matrix = np.matrix(
            [[0.393, 0.769, 0.189], [0.349, 0.686, 0.168], [0.272, 0.534, 0.131]]
        )

    def apply(self, image, **params):
        if not is_rgb_image(image):
            raise TypeError("ToSepia transformation expects 3-channel images.")
        return F.linear_transformation_rgb(image, self.sepia_transformation_matrix)

    def get_transform_init_args_names(self):
        return ()
# F.linear_transformation_rgb
@clipped
def linear_transformation_rgb(img, transformation_matrix):
    result_img = cv2.transform(img, transformation_matrix)

    return result_img

Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


Función : utilice el algoritmo USM para enfocar las imágenes.

Enfoque la imagen de entrada utilizando el procesamiento de máscara de enfoque y superponga el resultado con la imagen original.

Descripción de parámetros:

  1. Parámetros principales y valores predeterminados:

    blur_limit: Union[int, Sequence[int]] = (3, 7),
    sigma_limit: Union[float, Sequence[float]] = 0.0,
    alpha: Union[float, Sequence[float]] = (0.2, 0.5),
    threshold: int = 10
    
  2. Requisitos de parámetros:

    • blur_limit (int o (int, int)): indica el tamaño máximo del núcleo gaussiano para desenfocar la imagen de entrada. Debe ser 0 o un número impar. El rango de valores válido es [0, inf)
      . Si es 0, será round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1reemplazado por el resultado del cálculo
      . Si la entrada es un número único, se convertirá al intervalo ( 0, límite_desenfoque).

      源码中初始化有如下行:
      self.blur_limit = to_tuple(blur_limit, 3)  # 表示3为另一边界值的填补值
      举例:
      self.blur_limit = to_tuple(1, 3) # self.blur_limit = (1, 3)
      self.blur_limit = to_tuple(5, 3) # self.blur_limit = (3, 5)
      
    • sigma_limit (float o (float, float)): desviación estándar del núcleo gaussiano, rango de valores válido [0.0, inf).
      Si es 0, será sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8reemplazado por el resultado del cálculo
      . Si la entrada es un número único, será convertido a un intervalo (0, sigma_limit)

    • alfa (flotante o (flotante, flotante)): controla la transparencia de las imágenes nítidas. La imagen resultante es una superposición de la imagen enfocada y la imagen original, y alfa controla la proporción de superposición de la imagen enfocada. Alfa = 0 significa que solo se devuelve la imagen original, alfa = 1 significa que todas las partes nítidas están superpuestas.

      residual = image - blur # blur是应用高斯模糊(cv2.GaussianBlur)后的图像	
      sharp = image + alpha * residual
      # Avoid color noise artefacts.
      sharp = np.clip(sharp, 0, 1)
      
    • umbral (int): controla la nitidez de áreas con grandes diferencias de píxeles entre la imagen original y la imagen suavizada. Rango de valores válido [0, 255]. Cuanto mayor sea el valor umbral, menor será el grado de nitidez del área plana (es decir, el área de baja diferencia de píxeles entre la imagen original y la imagen suavizada). ( (image - blur)*255 < thresholdA medida que aumenta el área, esta parte no participa en la superposición de nitidez).
      De hecho, se puede entender que cuanto mayor sea el valor, más claro será el grado de nitidez.

      residual = image - blur # blur是应用高斯模糊(cv2.GaussianBlur)后的图像	
      # Do not sharpen noise
      mask = np.abs(residual) * 255 > threshold
      mask = mask.astype("float32")
      
    • Nota: Los valores límite inferiores de blur_limit y sigma_limit no pueden ser 0 al mismo tiempo.

# source code
class UnsharpMask(ImageOnlyTransform):
    """
    Sharpen the input image using Unsharp Masking processing and overlays the result with the original image.

    Args:
        blur_limit (int, (int, int)): maximum Gaussian kernel size for blurring the input image.
            Must be zero or odd and in range [0, inf). If set to 0 it will be computed from sigma
            as `round(sigma * (3 if img.dtype == np.uint8 else 4) * 2 + 1) + 1`.
            If set single value `blur_limit` will be in range (0, blur_limit).
            Default: (3, 7).
        sigma_limit (float, (float, float)): Gaussian kernel standard deviation. Must be in range [0, inf).
            If set single value `sigma_limit` will be in range (0, sigma_limit).
            If set to 0 sigma will be computed as `sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8`. Default: 0.
        alpha (float, (float, float)): range to choose the visibility of the sharpened image.
            At 0, only the original image is visible, at 1.0 only its sharpened version is visible.
            Default: (0.2, 0.5).
        threshold (int): Value to limit sharpening only for areas with high pixel difference between original image
            and it's smoothed version. Higher threshold means less sharpening on flat areas.
            Must be in range [0, 255]. Default: 10.
        p (float): probability of applying the transform. Default: 0.5.

    Reference:
        arxiv.org/pdf/2107.10833.pdf

    Targets:
        image
    """

    def __init__(
        self,
        blur_limit: Union[int, Sequence[int]] = (3, 7),
        sigma_limit: Union[float, Sequence[float]] = 0.0,
        alpha: Union[float, Sequence[float]] = (0.2, 0.5),
        threshold: int = 10,
        always_apply=False,
        p=0.5,
    ):
        super(UnsharpMask, self).__init__(always_apply, p)
        self.blur_limit = to_tuple(blur_limit, 3)
        self.sigma_limit = self.__check_values(to_tuple(sigma_limit, 0.0), name="sigma_limit")
        self.alpha = self.__check_values(to_tuple(alpha, 0.0), name="alpha", bounds=(0.0, 1.0))
        self.threshold = threshold

        if self.blur_limit[0] == 0 and self.sigma_limit[0] == 0:
            self.blur_limit = 3, max(3, self.blur_limit[1])
            raise ValueError("blur_limit and sigma_limit minimum value can not be both equal to 0.")

        if (self.blur_limit[0] != 0 and self.blur_limit[0] % 2 != 1) or (
            self.blur_limit[1] != 0 and self.blur_limit[1] % 2 != 1
        ):
            raise ValueError("UnsharpMask supports only odd blur limits.")

    @staticmethod
    def __check_values(value, name, bounds=(0, float("inf"))):
        if not bounds[0] <= value[0] <= value[1] <= bounds[1]:
            raise ValueError(f"{
      
      name} values should be between {
      
      bounds}")
        return value

    def get_params(self):
        return {
    
    
            "ksize": random.randrange(self.blur_limit[0], self.blur_limit[1] + 1, 2),
            "sigma": random.uniform(*self.sigma_limit),
            "alpha": random.uniform(*self.alpha),
        }

    def apply(self, img, ksize=3, sigma=0, alpha=0.2, **params):
        return F.unsharp_mask(img, ksize, sigma=sigma, alpha=alpha, threshold=self.threshold)

    def get_transform_init_args_names(self):
        return ("blur_limit", "sigma_limit", "alpha", "threshold")
# F.unsharp_mask()
def unsharp_mask(image: np.ndarray, ksize: int, sigma: float = 0.0, alpha: float = 0.2, threshold: int = 10):
    blur_fn = _maybe_process_in_chunks(cv2.GaussianBlur, ksize=(ksize, ksize), sigmaX=sigma)

    input_dtype = image.dtype
    if input_dtype == np.uint8:
        image = to_float(image)
    elif input_dtype not in (np.uint8, np.float32):
        raise ValueError("Unexpected dtype {} for UnsharpMask augmentation".format(input_dtype))

    blur = blur_fn(image)
    residual = image - blur

    # Do not sharpen noise
    mask = np.abs(residual) * 255 > threshold
    mask = mask.astype("float32")

    sharp = image + alpha * residual
    # Avoid color noise artefacts.
    sharp = np.clip(sharp, 0, 1)

    soft_mask = blur_fn(mask)
    output = soft_mask * sharp + (1 - soft_mask) * image
    return from_float(output, dtype=input_dtype)

Los resultados de la visualización son los siguientes: el lado izquierdo es la imagen original y el lado derecho es el resultado de la nitidez. Para efectos obvios, los parámetros en la imagen de la derecha se establecen en (ksize=5, sigma=0, alpha=1, umbral=0).
Insertar descripción de la imagen aquí

Lectura ampliada:
El principio del algoritmo de nitidez Unsharp Mask (USM) y su implementación de
lectura en papel de súper resolución: Real-ESRGAN (2021ICCV)


Función: Desenfoque de zoom.
Descripción del parámetro:
max_factor ((float, float) o float): el rango máximo del factor difuso, el valor debe ser mayor que 1. Si es un solo número, toma un valor entre (1, max_factor). Valor predeterminado (1, 1.31).
step_factor ((float, float) o float): el valor de paso del factor de zoom. Valor predeterminado (0,01, 0,03).

# source code
class ZoomBlur(ImageOnlyTransform):
    """
    Apply zoom blur transform. See https://arxiv.org/abs/1903.12261.

    Args:
        max_factor ((float, float) or float): range for max factor for blurring.
            If max_factor is a single float, the range will be (1, limit). Default: (1, 1.31).
            All max_factor values should be larger than 1.
        step_factor ((float, float) or float): If single float will be used as step parameter for np.arange.
            If tuple of float step_factor will be in range `[step_factor[0], step_factor[1])`. Default: (0.01, 0.03).
            All step_factor values should be positive.
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image

    Image types:
        Any
    """

    def __init__(
        self,
        max_factor: ScaleFloatType = 1.31,
        step_factor: ScaleFloatType = (0.01, 0.03),
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.max_factor = to_tuple(max_factor, low=1.0)
        self.step_factor = to_tuple(step_factor, step_factor)

        if self.max_factor[0] < 1:
            raise ValueError("Max factor must be larger or equal 1")
        if self.step_factor[0] <= 0:
            raise ValueError("Step factor must be positive")

    def apply(self, img: np.ndarray, zoom_factors: np.ndarray = None, **params) -> np.ndarray:
        assert zoom_factors is not None
        return F.zoom_blur(img, zoom_factors)

    def get_params(self) -> Dict[str, Any]:
        max_factor = random.uniform(self.max_factor[0], self.max_factor[1])
        step_factor = random.uniform(self.step_factor[0], self.step_factor[1])
        return {
    
    "zoom_factors": np.arange(1.0, max_factor, step_factor)}

    def get_transform_init_args_names(self) -> Tuple[str, str]:
        return ("max_factor", "step_factor")

Insertar descripción de la imagen aquí


Transformaciones a nivel espacial

Las transformaciones a nivel espacial cambiarán simultáneamente la imagen de entrada, así como otras propiedades como máscaras, cuadros delimitadores y puntos clave.

Las transformaciones a nivel espacial cambiarán simultáneamente tanto una imagen de entrada como objetivos adicionales como máscaras, cuadros delimitadores y puntos clave.

La siguiente tabla muestra qué propiedades son compatibles con cada transformación.

Transformar Imagen Máscaras Cajas B Puntos clave
afín
BBoxSafeRandomCrop
Cultivo central
Abandono grueso
Cultivo
Cultivo y almohadilla
RecorteNonEmptyMaskIfExists
Transformación elástica
Voltear
Distorsión de cuadrícula
Abandono de cuadrícula
Voltear horizontalmente
lambda
Tamaño máximo más largo
Abandono de máscara
NoOp
Distorsión óptica
Pad si es necesario
Perspectiva
Por partesAfín
Abandono de píxeles
Cultivo aleatorio
Cultivo aleatorio desde fronteras
Cultivo aleatorioCerca de BBox
AleatorioGridShuffle
Recorte de tamaño aleatorio
Rotación aleatoria90
Escala aleatoria
RandomSizeBBoxSafeCrop
Cultivo de tamaño aleatorio
Cambiar tamaño
Girar
Rotación segura
MayúsEscalaRotar
Tamaño máximo más pequeño
Transponer
Voltear verticalmente

Función: Recorte aleatorio, el área de recorte contiene todos los bboxes, es decir, recortada dentro del rango del rectángulo circunscrito de todos los bboxes hasta el borde de la imagen.
Descripción del parámetro:
erosion_rate (flotante): tasa de erosión, valor predeterminado 0,0. Este valor representa la proporción de reducción del borde de la imagen antes del recorte.

# source code
class BBoxSafeRandomCrop(DualTransform):
    """Crop a random part of the input without loss of bboxes.
    Args:
        erosion_rate (float): erosion rate applied on input image height before crop.
        p (float): probability of applying the transform. Default: 1.
    Targets:
        image, mask, bboxes
    Image types:
        uint8, float32
    """

    def __init__(self, erosion_rate=0.0, always_apply=False, p=1.0):
        super(BBoxSafeRandomCrop, self).__init__(always_apply, p)
        self.erosion_rate = erosion_rate

    def apply(self, img, crop_height=0, crop_width=0, h_start=0, w_start=0, **params):
        return F.random_crop(img, crop_height, crop_width, h_start, w_start)

    def get_params_dependent_on_targets(self, params):
        img_h, img_w = params["image"].shape[:2]
        if len(params["bboxes"]) == 0:  # less likely, this class is for use with bboxes.
            erosive_h = int(img_h * (1.0 - self.erosion_rate))
            crop_height = img_h if erosive_h >= img_h else random.randint(erosive_h, img_h)
            return {
    
    
                "h_start": random.random(),
                "w_start": random.random(),
                "crop_height": crop_height,
                "crop_width": int(crop_height * img_w / img_h),
            }
        # get union of all bboxes
        x, y, x2, y2 = union_of_bboxes(
            width=img_w, height=img_h, bboxes=params["bboxes"], erosion_rate=self.erosion_rate
        )
        # find bigger region
        bx, by = x * random.random(), y * random.random()
        bx2, by2 = x2 + (1 - x2) * random.random(), y2 + (1 - y2) * random.random()
        bw, bh = bx2 - bx, by2 - by
        crop_height = img_h if bh >= 1.0 else int(img_h * bh)
        crop_width = img_w if bw >= 1.0 else int(img_w * bw)
        h_start = np.clip(0.0 if bh >= 1.0 else by / (1.0 - bh), 0.0, 1.0)
        w_start = np.clip(0.0 if bw >= 1.0 else bx / (1.0 - bw), 0.0, 1.0)
        return {
    
    "h_start": h_start, "w_start": w_start, "crop_height": crop_height, "crop_width": crop_width}

    def apply_to_bbox(self, bbox, crop_height=0, crop_width=0, h_start=0, w_start=0, rows=0, cols=0, **params):
        return F.bbox_random_crop(bbox, crop_height, crop_width, h_start, w_start, rows, cols)

    @property
    def targets_as_params(self):
        return ["image", "bboxes"]

    def get_transform_init_args_names(self):
        return ("erosion_rate",)

下图bboxes包含蝴蝶和小鸟坐标,裁剪结果均包含bboxes,裁剪后改变图像尺寸。
Insertar descripción de la imagen aquí

功能: 裁剪图像中心区域
参数说明: height、width (int): 裁剪区域高、宽。

# source code
class CenterCrop(DualTransform):
    """Crop the central part of the input.

    Args:
        height (int): height of the crop.
        width (int): width of the crop.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Note:
        It is recommended to use uint8 images as input.
        Otherwise the operation will require internal conversion
        float32 -> uint8 -> float32 that causes worse performance.
    """

    def __init__(self, height, width, always_apply=False, p=1.0):
        super(CenterCrop, self).__init__(always_apply, p)
        self.height = height
        self.width = width

    def apply(self, img, **params):
        return F.center_crop(img, self.height, self.width)

    def apply_to_bbox(self, bbox, **params):
        return F.bbox_center_crop(bbox, self.height, self.width, **params)

    def apply_to_keypoint(self, keypoint, **params):
        return F.keypoint_center_crop(keypoint, self.height, self.width, **params)

    def get_transform_init_args_names(self):
        return ("height", "width")
# F.center_crop
def get_center_crop_coords(height: int, width: int, crop_height: int, crop_width: int):
    y1 = (height - crop_height) // 2
    y2 = y1 + crop_height
    x1 = (width - crop_width) // 2
    x2 = x1 + crop_width
    return x1, y1, x2, y2


def center_crop(img: np.ndarray, crop_height: int, crop_width: int):
    height, width = img.shape[:2]
    if height < crop_height or width < crop_width:
        raise ValueError(
            "Requested crop size ({crop_height}, {crop_width}) is "
            "larger than the image size ({height}, {width})".format(
                crop_height=crop_height, crop_width=crop_width, height=height, width=width
            )
        )
    x1, y1, x2, y2 = get_center_crop_coords(height, width, crop_height, crop_width)
    img = img[y1:y2, x1:x2]
    return img

可以看到鸟的喙基本都在crop图的中心偏上一点的位置。
Insertar descripción de la imagen aquí


功能: 随机丢弃图像中的矩形区域,用固定值填充。(功能涵盖Cutout,额外增加mask处理)
参数说明:

  • max_holes (int): 需要cutout的最大区域个数。

  • max_height、max_width (int, float): 洞的最大尺寸。若为float,自动根据图像宽高计算(图像宽高 * float值)。

  • min_holes (int): 需要cutout的最小区域个数。若为 None,等同于max_holes 数值。Default: None.

  • min_height、min_width (int, float): 洞的最小尺寸。若为 None,等同于相应max数值。Default: None.
    若为float,自动根据图像宽高计算(图像宽高 * float值)。

  • fill_value (int, float, list of int, list of float): cutout区域像素填充值。

  • mask_fill_value (int, float, list of int, list of float): mask图像的cutout区域像素填充值。若为 None,不进行任何操作,返回原始mask。 Default: None.

# 构造函数,其余方法未拷贝,可点击标题跳转查看全部源码
class CoarseDropout(DualTransform):
    """CoarseDropout of the rectangular regions in the image.

    Args:
        max_holes (int): Maximum number of regions to zero out.
        max_height (int, float): Maximum height of the hole.
        If float, it is calculated as a fraction of the image height.
        max_width (int, float): Maximum width of the hole.
        If float, it is calculated as a fraction of the image width.
        min_holes (int): Minimum number of regions to zero out. If `None`,
            `min_holes` is be set to `max_holes`. Default: `None`.
        min_height (int, float): Minimum height of the hole. Default: None. If `None`,
            `min_height` is set to `max_height`. Default: `None`.
            If float, it is calculated as a fraction of the image height.
        min_width (int, float): Minimum width of the hole. If `None`, `min_height` is
            set to `max_width`. Default: `None`.
            If float, it is calculated as a fraction of the image width.

        fill_value (int, float, list of int, list of float): value for dropped pixels.
        mask_fill_value (int, float, list of int, list of float): fill value for dropped pixels
            in mask. If `None` - mask is not affected. Default: `None`.

    Targets:
        image, mask

    Image types:
        uint8, float32

    Reference:
    |  https://arxiv.org/abs/1708.04552
    |  https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
    |  https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py
    """

    def __init__(
        self,
        max_holes=8,
        max_height=8,
        max_width=8,
        min_holes=None,
        min_height=None,
        min_width=None,
        fill_value=0,
        mask_fill_value=None,
        always_apply=False,
        p=0.5,
    ):
        super(CoarseDropout, self).__init__(always_apply, p)
        self.max_holes = max_holes
        self.max_height = max_height
        self.max_width = max_width
        self.min_holes = min_holes if min_holes is not None else max_holes
        self.min_height = min_height if min_height is not None else max_height
        self.min_width = min_width if min_width is not None else max_width
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value
        if not 0 < self.min_holes <= self.max_holes:
            raise ValueError("Invalid combination of min_holes and max_holes. Got: {}".format([min_holes, max_holes]))

        self.check_range(self.max_height)
        self.check_range(self.min_height)
        self.check_range(self.max_width)
        self.check_range(self.min_width)

        if not 0 < self.min_height <= self.max_height:
            raise ValueError(
                "Invalid combination of min_height and max_height. Got: {}".format([min_height, max_height])
            )
        if not 0 < self.min_width <= self.max_width:
            raise ValueError("Invalid combination of min_width and max_width. Got: {}".format([min_width, max_width]))

    def check_range(self, dimension):
        if isinstance(dimension, float) and not 0 <= dimension < 1.0:
            raise ValueError(
                "Invalid value {}. If using floats, the value should be in the range [0.0, 1.0)".format(dimension)
            )
    ...
    ...
    ...

未在图中声明的参数即使用的默认值。
Insertar descripción de la imagen aquí


功能: 裁剪图像,返回裁剪部分。
参数说明:

x_min (int): 裁剪区域的左上角x坐标,默认值:0
y_min (int): 裁剪区域的左上角y坐标,默认值:0
x_max (int): 裁剪区域的右下角x坐标,默认值:1024
y_max (int): 裁剪区域的右下角y坐标,默认值:1024

需注意此变换没有随机性,等同于img[y_min:y_max, x_min:x_max]。

# source code
class Crop(DualTransform):
    """Crop region from image.

    Args:
        x_min (int): Minimum upper left x coordinate.
        y_min (int): Minimum upper left y coordinate.
        x_max (int): Maximum lower right x coordinate.
        y_max (int): Maximum lower right y coordinate.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    def __init__(self, x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0):
        super(Crop, self).__init__(always_apply, p)
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max

    def apply(self, img, **params):
        return F.crop(img, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max)

    def apply_to_bbox(self, bbox, **params):
        return F.bbox_crop(bbox, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max, **params)

    def apply_to_keypoint(self, keypoint, **params):
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(self.x_min, self.y_min, self.x_max, self.y_max))

    def get_transform_init_args_names(self):
        return ("x_min", "y_min", "x_max", "y_max")

(plt画图结果并排展示有缩放,可以看下裁剪的面部区域)
Insertar descripción de la imagen aquí

功能: 按像素数或者图像占比裁剪或填充图像上下左右四个边缘。此变换永远不会裁剪高度或宽度低于 1 的图像。
注意此变换会resize变换后的图像到原始图像大小。若要保持变换后的尺寸,需设置参数keep_size=False
参数说明:

  • px (int or tuple)、percent (float or tuple):
    • px表示具体的像素数数值,percent 表示百分比(像素数除以宽或高 )
    • px 和 percent 小于0表示crop操作,大于0表示pad操作。
    • 这两个参数只能选择一个传值,另一个需为None。不可同时传值,也不可同时为None。
    • 若传入参数为两个元素,表示图像四个边的px/percent值在该区间内随机采样。若sample_independently=False,只采样一次,四个边共用这个值。
    • 若传入参数为四个元素,每个元素依次表征图像的top,right, bottom, left(顺时针),每个元素可以是 单个数字或两个数字的列表,数字表示固定值,列表表示范围内随机采样,含义与上述一致。
  • pad_mode (int): OpenCV border mode. opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_CONSTANT
  • pad_cval (number, Sequence[number]): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)。
    若为单个数字,直接用作填充值。若为两个元素的列表,则从该区间随机采样一个值,作为该图像的边缘填充值。
    @staticmethod
    def _get_pad_value(pad_value: Union[float, Sequence[float]]) -> Union[int, float]:
        if isinstance(pad_value, (int, float)):
            return pad_value
    
        if len(pad_value) == 2:
            a, b = pad_value
            if isinstance(a, int) and isinstance(b, int):
                return random.randint(a, b)
    
            return random.uniform(a, b)
    
        return random.choice(pad_value)
    
  • pad_cval_mask (number, Sequence[number]): 和 pad_cval含义一致,不过是针对mask操作的。
  • keep_size (bool): crop或pad后的图像尺寸会改变。若设为True,表示将其resize到输入图像尺寸。若为False,则保留crop或pad之后变化了的尺寸。默认值:True。
  • sample_independently (bool): 表示四个边操作的px/percent值是否独立采样。默认值:True。
  • interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
    cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
    Default: cv2.INTER_LINEAR.
# 构造函数
class CropAndPad(DualTransform):
    """Crop and pad images by pixel amounts or fractions of image sizes.
    Cropping removes pixels at the sides (i.e. extracts a subimage from a given full image).
    Padding adds pixels to the sides (e.g. black pixels).
    This transformation will never crop images below a height or width of ``1``.

    Note:
        This transformation automatically resizes images back to their original size. To deactivate this, add the
        parameter ``keep_size=False``.

    Args:
        px (int or tuple):
            The number of pixels to crop (negative values) or pad (positive values)
            on each side of the image. Either this or the parameter `percent` may
            be set, not both at the same time.
                * If ``None``, then pixel-based cropping/padding will not be used.
                * If ``int``, then that exact number of pixels will always be cropped/padded.
                * If a ``tuple`` of two ``int`` s with values ``a`` and ``b``,
                  then each side will be cropped/padded by a random amount sampled
                  uniformly per image and side from the interval ``[a, b]``. If
                  however `sample_independently` is set to ``False``, only one
                  value will be sampled per image and used for all sides.
                * If a ``tuple`` of four entries, then the entries represent top,
                  right, bottom, left. Each entry may be a single ``int`` (always
                  crop/pad by exactly that value), a ``tuple`` of two ``int`` s
                  ``a`` and ``b`` (crop/pad by an amount within ``[a, b]``), a
                  ``list`` of ``int`` s (crop/pad by a random value that is
                  contained in the ``list``).
        percent (float or tuple):
            The number of pixels to crop (negative values) or pad (positive values)
            on each side of the image given as a *fraction* of the image
            height/width. E.g. if this is set to ``-0.1``, the transformation will
            always crop away ``10%`` of the image's height at both the top and the
            bottom (both ``10%`` each), as well as ``10%`` of the width at the
            right and left.
            Expected value range is ``(-1.0, inf)``.
            Either this or the parameter `px` may be set, not both
            at the same time.
                * If ``None``, then fraction-based cropping/padding will not be
                  used.
                * If ``float``, then that fraction will always be cropped/padded.
                * If a ``tuple`` of two ``float`` s with values ``a`` and ``b``,
                  then each side will be cropped/padded by a random fraction
                  sampled uniformly per image and side from the interval
                  ``[a, b]``. If however `sample_independently` is set to
                  ``False``, only one value will be sampled per image and used for
                  all sides.
                * If a ``tuple`` of four entries, then the entries represent top,
                  right, bottom, left. Each entry may be a single ``float``
                  (always crop/pad by exactly that percent value), a ``tuple`` of
                  two ``float`` s ``a`` and ``b`` (crop/pad by a fraction from
                  ``[a, b]``), a ``list`` of ``float`` s (crop/pad by a random
                  value that is contained in the list).
        pad_mode (int): OpenCV border mode.
        pad_cval (number, Sequence[number]):
            The constant value to use if the pad mode is ``BORDER_CONSTANT``.
                * If ``number``, then that value will be used.
                * If a ``tuple`` of two ``number`` s and at least one of them is
                  a ``float``, then a random number will be uniformly sampled per
                  image from the continuous interval ``[a, b]`` and used as the
                  value. If both ``number`` s are ``int`` s, the interval is
                  discrete.
                * If a ``list`` of ``number``, then a random value will be chosen
                  from the elements of the ``list`` and used as the value.
        pad_cval_mask (number, Sequence[number]): Same as pad_cval but only for masks.
        keep_size (bool):
            After cropping and padding, the result image will usually have a
            different height/width compared to the original input image. If this
            parameter is set to ``True``, then the cropped/padded image will be
            resized to the input image's size, i.e. the output shape is always identical to the input shape.
        sample_independently (bool):
            If ``False`` *and* the values for `px`/`percent` result in exactly
            *one* probability distribution for all image sides, only one single
            value will be sampled from that probability distribution and used for
            all sides. I.e. the crop/pad amount then is the same for all sides.
            If ``True``, four values will be sampled independently, one per side.
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        any
    """

    def __init__(
        self,
        px: Optional[Union[int, Sequence[float], Sequence[Tuple]]] = None,
        percent: Optional[Union[float, Sequence[float], Sequence[Tuple]]] = None,
        pad_mode: int = cv2.BORDER_CONSTANT,
        pad_cval: Union[float, Sequence[float]] = 0,
        pad_cval_mask: Union[float, Sequence[float]] = 0,
        keep_size: bool = True,
        sample_independently: bool = True,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super().__init__(always_apply, p)

        if px is None and percent is None:
            raise ValueError("px and percent are empty!")
        if px is not None and percent is not None:
            raise ValueError("Only px or percent may be set!")

        self.px = px
        self.percent = percent

        self.pad_mode = pad_mode
        self.pad_cval = pad_cval
        self.pad_cval_mask = pad_cval_mask

        self.keep_size = keep_size
        self.sample_independently = sample_independently

        self.interpolation = interpolation

右下图res3的参数sample_independently设为True,不同边的pad像素值不同。
左下图res2的参数percent为负数,表示crop。

Insertar descripción de la imagen aquí


功能: 若mask为空,等同于随机裁剪+缩放;若有mask,可以指定忽略的mask区域 ,在忽略区域外进行随机采点并crop出指定宽高区域。mask==0区域默认忽略,还会将指定ignore_values区域置为0忽略。
crop的逻辑如下,在mask非忽略区域随机取个点,在向左上方随机移动一段距离作为crop区域的左上顶点,右下顶点则为左上顶点加宽和高后的点。本变换能增加目标被crop到的概率。

if mask.any():
    mask = mask.sum(axis=-1) if mask.ndim == 3 else mask
    non_zero_yx = np.argwhere(mask)
    y, x = random.choice(non_zero_yx)
    x_min = x - random.randint(0, self.width - 1)
    y_min = y - random.randint(0, self.height - 1)
    x_min = np.clip(x_min, 0, mask_width - self.width)
    y_min = np.clip(y_min, 0, mask_height - self.height)
else:
    x_min = random.randint(0, mask_width - self.width)
    y_min = random.randint(0, mask_height - self.height)

x_max = x_min + self.width
y_max = y_min + self.height

参数说明:
height 、width (int): crop区域的目标宽高。
ignore_values (list of int): mask需要忽略的像素值,0是默认忽略区域。注意输入是列表形式。
ignore_channels (list of int): mask需要忽略的通道。注意输入是列表形式。

# source code
class CropNonEmptyMaskIfExists(DualTransform):
    """Crop area with mask if mask is non-empty, else make random crop.

    Args:
        height (int): vertical size of crop in pixels
        width (int): horizontal size of crop in pixels
        ignore_values (list of int): values to ignore in mask, `0` values are always ignored
            (e.g. if background value is 5 set `ignore_values=[5]` to ignore)
        ignore_channels (list of int): channels to ignore in mask
            (e.g. if background is a first channel set `ignore_channels=[0]` to ignore)
        p (float): probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    def __init__(self, height, width, ignore_values=None, ignore_channels=None, always_apply=False, p=1.0):
        super(CropNonEmptyMaskIfExists, self).__init__(always_apply, p)

        if ignore_values is not None and not isinstance(ignore_values, list):
            raise ValueError("Expected `ignore_values` of type `list`, got `{}`".format(type(ignore_values)))
        if ignore_channels is not None and not isinstance(ignore_channels, list):
            raise ValueError("Expected `ignore_channels` of type `list`, got `{}`".format(type(ignore_channels)))

        self.height = height
        self.width = width
        self.ignore_values = ignore_values
        self.ignore_channels = ignore_channels

    def apply(self, img, x_min=0, x_max=0, y_min=0, y_max=0, **params):
        return F.crop(img, x_min, y_min, x_max, y_max)

    def apply_to_bbox(self, bbox, x_min=0, x_max=0, y_min=0, y_max=0, **params):
        return F.bbox_crop(
            bbox, x_min=x_min, x_max=x_max, y_min=y_min, y_max=y_max, rows=params["rows"], cols=params["cols"]
        )

    def apply_to_keypoint(self, keypoint, x_min=0, x_max=0, y_min=0, y_max=0, **params):
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    def _preprocess_mask(self, mask):
        mask_height, mask_width = mask.shape[:2]

        if self.ignore_values is not None:
            ignore_values_np = np.array(self.ignore_values)
            mask = np.where(np.isin(mask, ignore_values_np), 0, mask)

        if mask.ndim == 3 and self.ignore_channels is not None:
            target_channels = np.array([ch for ch in range(mask.shape[-1]) if ch not in self.ignore_channels])
            mask = np.take(mask, target_channels, axis=-1)

        if self.height > mask_height or self.width > mask_width:
            raise ValueError(
                "Crop size ({},{}) is larger than image ({},{})".format(
                    self.height, self.width, mask_height, mask_width
                )
            )

        return mask

    def update_params(self, params, **kwargs):
        super().update_params(params, **kwargs)
        if "mask" in kwargs:
            mask = self._preprocess_mask(kwargs["mask"])
        elif "masks" in kwargs and len(kwargs["masks"]):
            masks = kwargs["masks"]
            mask = self._preprocess_mask(masks[0])
            for m in masks[1:]:
                mask |= self._preprocess_mask(m)
        else:
            raise RuntimeError("Can not find mask for CropNonEmptyMaskIfExists")

        mask_height, mask_width = mask.shape[:2]

        if mask.any():
            mask = mask.sum(axis=-1) if mask.ndim == 3 else mask
            non_zero_yx = np.argwhere(mask)
            y, x = random.choice(non_zero_yx)
            x_min = x - random.randint(0, self.width - 1)
            y_min = y - random.randint(0, self.height - 1)
            x_min = np.clip(x_min, 0, mask_width - self.width)
            y_min = np.clip(y_min, 0, mask_height - self.height)
        else:
            x_min = random.randint(0, mask_width - self.width)
            y_min = random.randint(0, mask_height - self.height)

        x_max = x_min + self.width
        y_max = y_min + self.height

        params.update({
    
    "x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max})
        return params

    def get_transform_init_args_names(self):
        return ("height", "width", "ignore_values", "ignore_channels")

下图的mask非忽略区域是小鸟和蝴蝶所在的矩形区域。res1,res2,res3是随机crop结果。
Insertar descripción de la imagen aquí

功能: 弹性变换。
附官方展示图:
Insertar descripción de la imagen aquí

参数说明:

  • alpha (float): 扭曲变换参数。默认值1,值越大扭曲效果越明显(如alpha=500,sigma=50)
  • sigma (float): 高斯滤波参数。默认值50,值越小扭曲效果越明显(如alpha=100,sigma=20)。
  • alpha_affine (float): 仿射变换参数,将转化为区间(-alpha_affine, alpha_affine),默认值50。
  • interpolation (OpenCV flag): opencv插值方法,可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4。默认值:cv2.INTER_LINEAR(mask使用的是cv2.INTER_NEAREST,代码已写死,未接受外部指定)
  • border_mode (OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_REFLECT_101
  • value (int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
  • mask_value (int, float, list of ints, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
  • approximate (boolean): 平滑时是否使用固定的kernel size。若为True,在512+的大图像上处理可达到约两倍加速,同时会带来抖动的副作用。默认值:False
  • same_dxdy (boolean): x和y方向上是否使用相同的随机偏移值。若为True,也可达到约两倍加速,同时会带来抖动的副作用。默认值:False
# 构造函数
class ElasticTransform(DualTransform):
    """Elastic deformation of images as described in [Simard2003]_ (with modifications).
    Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

    .. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
         Convolutional Neural Networks applied to Visual Document Analysis", in
         Proc. of the International Conference on Document Analysis and
         Recognition, 2003.

    Args:
        alpha (float):
        sigma (float): Gaussian filter parameter.
        alpha_affine (float): The range will be (-alpha_affine, alpha_affine)
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        approximate (boolean): Whether to smooth displacement map with fixed kernel size.
                               Enabling this option gives ~2X speedup on large images.
        same_dxdy (boolean): Whether to use same random generated shift for x and y.
                             Enabling this option gives ~2X speedup.

    Targets:
        image, mask

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        alpha=1,
        sigma=50,
        alpha_affine=50,
        interpolation=cv2.INTER_LINEAR,
        border_mode=cv2.BORDER_REFLECT_101,
        value=None,
        mask_value=None,
        always_apply=False,
        approximate=False,
        same_dxdy=False,
        p=0.5,
    ):
        super(ElasticTransform, self).__init__(always_apply, p)
        ...
        ...
    
# F.elastic_transform
@preserve_shape
def elastic_transform(
    img,
    alpha,
    sigma,
    alpha_affine,
    interpolation=cv2.INTER_LINEAR,
    border_mode=cv2.BORDER_REFLECT_101,
    value=None,
    random_state=None,
    approximate=False,
    same_dxdy=False,
):
    """Elastic deformation of images as described in [Simard2003]_ (with modifications).
    Based on https://gist.github.com/ernestum/601cdf56d2b424757de5

    .. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
         Convolutional Neural Networks applied to Visual Document Analysis", in
         Proc. of the International Conference on Document Analysis and
         Recognition, 2003.
    """
    if random_state is None:
        random_state = np.random.RandomState(1234)

    height, width = img.shape[:2]

    # Random affine
    center_square = np.float32((height, width)) // 2
    square_size = min((height, width)) // 3
    alpha = float(alpha)
    sigma = float(sigma)
    alpha_affine = float(alpha_affine)

    pts1 = np.float32(
        [
            center_square + square_size,
            [center_square[0] + square_size, center_square[1] - square_size],
            center_square - square_size,
        ]
    )
    pts2 = pts1 + random_state.uniform(-alpha_affine, alpha_affine, size=pts1.shape).astype(np.float32)
    matrix = cv2.getAffineTransform(pts1, pts2)

    warp_fn = _maybe_process_in_chunks(
        cv2.warpAffine, M=matrix, dsize=(width, height), flags=interpolation, borderMode=border_mode, borderValue=value
    )
    img = warp_fn(img)

    if approximate:
        # Approximate computation smooth displacement map with a large enough kernel.
        # On large images (512+) this is approximately 2X times faster
        dx = random_state.rand(height, width).astype(np.float32) * 2 - 1
        cv2.GaussianBlur(dx, (17, 17), sigma, dst=dx)
        dx *= alpha
        if same_dxdy:
            # Speed up even more
            dy = dx
        else:
            dy = random_state.rand(height, width).astype(np.float32) * 2 - 1
            cv2.GaussianBlur(dy, (17, 17), sigma, dst=dy)
            dy *= alpha
    else:
        dx = np.float32(gaussian_filter((random_state.rand(height, width) * 2 - 1), sigma) * alpha)
        if same_dxdy:
            # Speed up
            dy = dx
        else:
            dy = np.float32(gaussian_filter((random_state.rand(height, width) * 2 - 1), sigma) * alpha)

    x, y = np.meshgrid(np.arange(width), np.arange(height))

    map_x = np.float32(x + dx)
    map_y = np.float32(y + dy)

    remap_fn = _maybe_process_in_chunks(
        cv2.remap, map1=map_x, map2=map_y, interpolation=interpolation, borderMode=border_mode, borderValue=value
    )
    return remap_fn(img)

下图中未显示的参数均使用默认值。
sigma值较小时alpha对扭曲程度影响比较灵敏。
Insertar descripción de la imagen aquí

Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


适用输入类型:image, mask, bboxes, keypoints
功能:水平翻转(d=1)、垂直翻转(d=0)、同时水平和垂直翻转(等同于图像旋转180°)(d=-1
d是源码中随机生成的参数,控制翻转模式。

# source code
class Flip(DualTransform):
    """Flip the input either horizontally, vertically or both horizontally and vertically.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    def apply(self, img, d=0, **params):
        """Args:
        d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,
                -1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by
                180 degrees).
        """
        return F.random_flip(img, d)

    def get_params(self):
        # Random int in the range [-1, 1]
        return {
    
    "d": random.randint(-1, 1)}

    def apply_to_bbox(self, bbox, **params):
        return F.bbox_flip(bbox, **params)

    def apply_to_keypoint(self, keypoint, **params):
        return F.keypoint_flip(keypoint, **params)

    def get_transform_init_args_names(self):
        return ()

Insertar descripción de la imagen aquí

功能: 网格畸变。
附官方展示图:
Insertar descripción de la imagen aquí
参数说明:

  • num_steps (int): 图像分块数(横纵相等).

  • distort_limit (float, (float, float)): 若输入为单个数字,将转化为区间(-distort_limit, distort_limit)。 默认范围: (-0.3, 0.3)。
    在此区间会分别进行x和y方向上的采样:stepsx,stepsy。若值大于0,块处理后尺寸大于原始尺寸,小于0相反。

  • interpolation (OpenCV flag): 插值方法。Default: cv2.INTER_LINEAR.
    可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.

  • border_mode (OpenCV flag): 边缘像素补充方法. Default: cv2.BORDER_REFLECT_101
    可枚举值:cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.

  • value (int, float, list of ints, list of float): 边缘像素补充值,仅限常数补充时使用,即border_mode = cv2.BORDER_CONSTANT.

  • mask_value (int, float, list of ints, list of float): mask的边缘像素补充值,仅限常数补充时使用,即border_mode = cv2.BORDER_CONSTANT.

  • normalized (bool): 若设为True,失真范围不会超过图像边界,即图像内容与原图一致,不会丢失或者扩充图像边界。Default: False

# source code
class GridDistortion(DualTransform):
    """
    Args:
        num_steps (int): count of grid cells on each side.
        distort_limit (float, (float, float)): If distort_limit is a single float, the range
            will be (-distort_limit, distort_limit). Default: (-0.03, 0.03).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.
        normalized (bool): if true, distortion will be normalized to do not go outside the image. Default: False
            See for more information: https://github.com/albumentations-team/albumentations/pull/722

    Targets:
        image, mask

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        num_steps: int = 5,
        distort_limit: ScaleFloatType = 0.3,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[ImageColorType] = None,
        mask_value: Optional[ImageColorType] = None,
        normalized: bool = False,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super(GridDistortion, self).__init__(always_apply, p)
        self.num_steps = num_steps
        self.distort_limit = to_tuple(distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value
        self.normalized = normalized

    def apply(
        self, img: np.ndarray, stepsx: Tuple = (), stepsy: Tuple = (), interpolation: int = cv2.INTER_LINEAR, **params
    ) -> np.ndarray:
        return F.grid_distortion(img, self.num_steps, stepsx, stepsy, interpolation, self.border_mode, self.value)

    def apply_to_mask(self, img: np.ndarray, stepsx: Tuple = (), stepsy: Tuple = (), **params) -> np.ndarray:
        return F.grid_distortion(
            img, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
        )

    def apply_to_bbox(self, bbox: BoxInternalType, stepsx: Tuple = (), stepsy: Tuple = (), **params) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = F.grid_distortion(
            mask, self.num_steps, stepsx, stepsy, cv2.INTER_NEAREST, self.border_mode, self.mask_value
        )
        bbox_returned = bbox_from_mask(mask)
        bbox_returned = F.normalize_bbox(bbox_returned, rows, cols)
        return bbox_returned

    def _normalize(self, h, w, xsteps, ysteps):

        # compensate for smaller last steps in source image.
        x_step = w // self.num_steps
        last_x_step = min(w, ((self.num_steps + 1) * x_step)) - (self.num_steps * x_step)
        xsteps[-1] *= last_x_step / x_step

        y_step = h // self.num_steps
        last_y_step = min(h, ((self.num_steps + 1) * y_step)) - (self.num_steps * y_step)
        ysteps[-1] *= last_y_step / y_step

        # now normalize such that distortion never leaves image bounds.
        tx = w / math.floor(w / self.num_steps)
        ty = h / math.floor(h / self.num_steps)
        xsteps = np.array(xsteps) * (tx / np.sum(xsteps))
        ysteps = np.array(ysteps) * (ty / np.sum(ysteps))

        return {
    
    "stepsx": xsteps, "stepsy": ysteps}

    @property
    def targets_as_params(self):
        return ["image"]

    def get_params_dependent_on_targets(self, params):
        h, w = params["image"].shape[:2]

        stepsx = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]
        stepsy = [1 + random.uniform(self.distort_limit[0], self.distort_limit[1]) for _ in range(self.num_steps + 1)]

        if self.normalized:
            return self._normalize(h, w, stepsx, stepsy)

        return {
    
    "stepsx": stepsx, "stepsy": stepsy}

    def get_transform_init_args_names(self):
        return "num_steps", "distort_limit", "interpolation", "border_mode", "value", "mask_value", "normalized"

可以看到图上象棋有纵向和横向的拉伸。
Insertar descripción de la imagen aquí
normalize参数设为true / false差别见如下结果:
Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


功能: 网格方块用固定值填充(默认黑色)
参数说明:

  • ratio,unit_size_min ,unit_size_max ,holes_number_x ,holes_number_y 都是控制grids大小的。优先通过unit_size_min ,unit_size_max确定grid size,若为None,则通过holes_number_x ,holes_number_y 确定,若也为None,默认holes_number=10进行计算。
  • shift_x ,shift_y 控制grids的起点偏移,默认0,所以结果图中黑块偏左上。
  • random_offset :若设为True,随机生成偏移值,shift_x ,shift_y设置失效。
  • fill_value :grids填充值,默认0,即黑色。
  • mask_fill_value :mask图的grids部分填充值。若为 None,返回原始mask. Default: None.
# source code
class GridDropout(DualTransform):
    """GridDropout, drops out rectangular regions of an image and the corresponding mask in a grid fashion.

    Args:
        ratio (float): the ratio of the mask holes to the unit_size (same for horizontal and vertical directions).
            Must be between 0 and 1. Default: 0.5.
        unit_size_min (int): minimum size of the grid unit. Must be between 2 and the image shorter edge.
            If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
        unit_size_max (int): maximum size of the grid unit. Must be between 2 and the image shorter edge.
            If 'None', holes_number_x and holes_number_y are used to setup the grid. Default: `None`.
        holes_number_x (int): the number of grid units in x direction. Must be between 1 and image width//2.
            If 'None', grid unit width is set as image_width//10. Default: `None`.
        holes_number_y (int): the number of grid units in y direction. Must be between 1 and image height//2.
            If `None`, grid unit height is set equal to the grid unit width or image height, whatever is smaller.
        shift_x (int): offsets of the grid start in x direction from (0,0) coordinate.
            Clipped between 0 and grid unit_width - hole_width. Default: 0.
        shift_y (int): offsets of the grid start in y direction from (0,0) coordinate.
            Clipped between 0 and grid unit height - hole_height. Default: 0.
        random_offset (boolean): weather to offset the grid randomly between 0 and grid unit size - hole size
            If 'True', entered shift_x, shift_y are ignored and set randomly. Default: `False`.
        fill_value (int): value for the dropped pixels. Default = 0
        mask_fill_value (int): value for the dropped pixels in mask.
            If `None`, transformation is not applied to the mask. Default: `None`.

    Targets:
        image, mask

    Image types:
        uint8, float32

    References:
        https://arxiv.org/abs/2001.04086

    """

    def __init__(
        self,
        ratio: float = 0.5,
        unit_size_min: int = None,
        unit_size_max: int = None,
        holes_number_x: int = None,
        holes_number_y: int = None,
        shift_x: int = 0,
        shift_y: int = 0,
        random_offset: bool = False,
        fill_value: int = 0,
        mask_fill_value: int = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super(GridDropout, self).__init__(always_apply, p)
        self.ratio = ratio
        self.unit_size_min = unit_size_min
        self.unit_size_max = unit_size_max
        self.holes_number_x = holes_number_x
        self.holes_number_y = holes_number_y
        self.shift_x = shift_x
        self.shift_y = shift_y
        self.random_offset = random_offset
        self.fill_value = fill_value
        self.mask_fill_value = mask_fill_value
        if not 0 < self.ratio <= 1:
            raise ValueError("ratio must be between 0 and 1.")

    def apply(self, img: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params) -> np.ndarray:
        return F.cutout(img, holes, self.fill_value)

    def apply_to_mask(self, img: np.ndarray, holes: Iterable[Tuple[int, int, int, int]] = (), **params) -> np.ndarray:
        if self.mask_fill_value is None:
            return img

        return F.cutout(img, holes, self.mask_fill_value)

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        height, width = img.shape[:2]
        # set grid using unit size limits
        if self.unit_size_min and self.unit_size_max:
            if not 2 <= self.unit_size_min <= self.unit_size_max:
                raise ValueError("Max unit size should be >= min size, both at least 2 pixels.")
            if self.unit_size_max > min(height, width):
                raise ValueError("Grid size limits must be within the shortest image edge.")
            unit_width = random.randint(self.unit_size_min, self.unit_size_max + 1)
            unit_height = unit_width
        else:
            # set grid using holes numbers
            if self.holes_number_x is None:
                unit_width = max(2, width // 10)
            else:
                if not 1 <= self.holes_number_x <= width // 2:
                    raise ValueError("The hole_number_x must be between 1 and image width//2.")
                unit_width = width // self.holes_number_x
            if self.holes_number_y is None:
                unit_height = max(min(unit_width, height), 2)
            else:
                if not 1 <= self.holes_number_y <= height // 2:
                    raise ValueError("The hole_number_y must be between 1 and image height//2.")
                unit_height = height // self.holes_number_y

        hole_width = int(unit_width * self.ratio)
        hole_height = int(unit_height * self.ratio)
        # min 1 pixel and max unit length - 1
        hole_width = min(max(hole_width, 1), unit_width - 1)
        hole_height = min(max(hole_height, 1), unit_height - 1)
        # set offset of the grid
        if self.shift_x is None:
            shift_x = 0
        else:
            shift_x = min(max(0, self.shift_x), unit_width - hole_width)
        if self.shift_y is None:
            shift_y = 0
        else:
            shift_y = min(max(0, self.shift_y), unit_height - hole_height)
        if self.random_offset:
            shift_x = random.randint(0, unit_width - hole_width)
            shift_y = random.randint(0, unit_height - hole_height)
        holes = []
        for i in range(width // unit_width + 1):
            for j in range(height // unit_height + 1):
                x1 = min(shift_x + unit_width * i, width)
                y1 = min(shift_y + unit_height * j, height)
                x2 = min(x1 + hole_width, width)
                y2 = min(y1 + hole_height, height)
                holes.append((x1, y1, x2, y2))

        return {
    
    "holes": holes}

    @property
    def targets_as_params(self):
        return ["image"]

    def get_transform_init_args_names(self):
        return (
            "ratio",
            "unit_size_min",
            "unit_size_max",
            "holes_number_x",
            "holes_number_y",
            "shift_x",
            "shift_y",
            "random_offset",
            "fill_value",
            "mask_fill_value",
        )

Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí
Insertar descripción de la imagen aquí


适用输入类型:image, mask, bboxes, keypoints
功能:输入沿y轴翻转
Insertar descripción de la imagen aquí

功能: 保持缩放比例缩放图像,将长边调整为指定尺寸。相反调整短边的函数为SmallestMaxSize。
参数说明: max_size (int, list of int): maximum size of smallest side of the image after the transformation. 若输入为list,将从中随机选择一个数作为max_size。
interpolation (OpenCV flag): opencv插值方法,可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4。默认值:cv2.INTER_LINEAR

class LongestMaxSize(DualTransform):
    """Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

    Args:
        max_size (int, list of int): maximum size of the image after the transformation. When using a list, max size
            will be randomly selected from the values in the list.
        interpolation (OpenCV flag): interpolation method. Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        max_size: Union[int, Sequence[int]] = 1024,
        interpolation: int = cv2.INTER_LINEAR,
        always_apply: bool = False,
        p: float = 1,
    ):
        super(LongestMaxSize, self).__init__(always_apply, p)
        self.interpolation = interpolation
        self.max_size = max_size

    def apply(
        self, img: np.ndarray, max_size: int = 1024, interpolation: int = cv2.INTER_LINEAR, **params
    ) -> np.ndarray:
        return F.longest_max_size(img, max_size=max_size, interpolation=interpolation)

    def apply_to_bbox(self, bbox: Sequence[float], **params) -> Sequence[float]:
        # Bounding box coordinates are scale invariant
        return bbox

    def apply_to_keypoint(self, keypoint: Sequence[float], max_size: int = 1024, **params) -> Sequence[float]:
        height = params["rows"]
        width = params["cols"]

        scale = max_size / max([height, width])
        return F.keypoint_scale(keypoint, scale, scale)

    def get_params(self) -> Dict[str, int]:
        return {
    
    "max_size": self.max_size if isinstance(self.max_size, int) else random.choice(self.max_size)}

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("max_size", "interpolation")

Insertar descripción de la imagen aquí


功能: 随机将图像和mask中的目标实例归零。
参数说明 :max_objects: 可以清零的最大标签数,也可以是区间参数 [min, max],最终应用数值在此区间内随机采样获取。
image_fill_value: 图像中归零区域填充值,默认0。也可设为’inpaint’ ,对归零区域进行修复(仅支持三通道图像)。
mask_fill_value: mask的归零区域填充值,默认0。

# source code
class MaskDropout(DualTransform):
    """
    Image & mask augmentation that zero out mask and image regions corresponding
    to randomly chosen object instance from mask.

    Mask must be single-channel image, zero values treated as background.
    Image can be any number of channels.

    Inspired by https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/114254
    """

    def __init__(
        self,
        max_objects=1,
        image_fill_value=0,
        mask_fill_value=0,
        always_apply=False,
        p=0.5,
    ):
        """
        Args:
            max_objects: Maximum number of labels that can be zeroed out. Can be tuple, in this case it's [min, max]
            image_fill_value: Fill value to use when filling image.
                Can be 'inpaint' to apply inpaining (works only  for 3-chahnel images)
            mask_fill_value: Fill value to use when filling mask.

        Targets:
            image, mask

        Image types:
            uint8, float32
        """
        super(MaskDropout, self).__init__(always_apply, p)
        self.max_objects = to_tuple(max_objects, 1)
        self.image_fill_value = image_fill_value
        self.mask_fill_value = mask_fill_value

    @property
    def targets_as_params(self):
        return ["mask"]

    def get_params_dependent_on_targets(self, params):
        mask = params["mask"]

        label_image, num_labels = label(mask, return_num=True)

        if num_labels == 0:
            dropout_mask = None
        else:
            objects_to_drop = random.randint(self.max_objects[0], self.max_objects[1])
            objects_to_drop = min(num_labels, objects_to_drop)

            if objects_to_drop == num_labels:
                dropout_mask = mask > 0
            else:
                labels_index = random.sample(range(1, num_labels + 1), objects_to_drop)
                dropout_mask = np.zeros((mask.shape[0], mask.shape[1]), dtype=np.bool)
                for label_index in labels_index:
                    dropout_mask |= label_image == label_index

        params.update({
    
    "dropout_mask": dropout_mask})
        return params

    def apply(self, img, dropout_mask=None, **params):
        if dropout_mask is None:
            return img

        if self.image_fill_value == "inpaint":
            dropout_mask = dropout_mask.astype(np.uint8)
            _, _, w, h = cv2.boundingRect(dropout_mask)
            radius = min(3, max(w, h) // 2)
            img = cv2.inpaint(img, dropout_mask, radius, cv2.INPAINT_NS)
        else:
            img = img.copy()
            img[dropout_mask] = self.image_fill_value

        return img

    def apply_to_mask(self, img, dropout_mask=None, **params):
        if dropout_mask is None:
            return img

        img = img.copy()
        img[dropout_mask] = self.mask_fill_value
        return img

    def get_transform_init_args_names(self):
        return ("max_objects", "image_fill_value", "mask_fill_value")

下图中标注目标为鸟所在区域(矩形框),以下是image_fill_value不同时的结果。
Insertar descripción de la imagen aquí


适用输入类型:image, mask, bboxes, keypoints
功能:保持原输入(does nothing)

# source code
class NoOp(DualTransform):
    """Does nothing"""

    def apply_to_keypoint(self, keypoint: KeypointInternalType, **params) -> KeypointInternalType:
        return keypoint

    def apply_to_bbox(self, bbox: BoxInternalType, **params) -> BoxInternalType:
        return bbox

    def apply(self, img: np.ndarray, **params) -> np.ndarray:
        return img

    def apply_to_mask(self, img: np.ndarray, **params) -> np.ndarray:
        return img

    def get_transform_init_args_names(self) -> Tuple:
        return ()

功能: 桶形 / 枕形畸变
参数说明:

  • distort_limit (float, (float, float)): 若输入为单个数字,将转化为区间(-distort_limit, distort_limit),默认值: (-0.05, 0.05)
    distort_limit_sample > 0时是桶形畸变,distort_limit_sample < 0时是枕形畸变。
  • shift_limit (float, (float, float))): 若输入为单个数字,将转化为区间(-shift_limit, shift_limit),默认值: (-0.05, 0.05)
  • interpolation (OpenCV flag): opencv插值方法,可枚举值:cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4。默认值:cv2.INTER_LINEAR(mask使用的是cv2.INTER_NEAREST,代码已写死,未接受外部指定)
  • border_mode (OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_REFLECT_101
  • value (int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)
  • mask_value (int, float, list of ints, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)

拓展阅读——border_mode详解:
OpenCV滤波之copyMakeBorder和borderInterpolate
OpenCV图像处理|1.16 卷积边界处理
Insertar descripción de la imagen aquíInsertar descripción de la imagen aquí

# source code
class OpticalDistortion(DualTransform):
    """
    Args:
        distort_limit (float, (float, float)): If distort_limit is a single float, the range
            will be (-distort_limit, distort_limit). Default: (-0.05, 0.05).
        shift_limit (float, (float, float))): If shift_limit is a single float, the range
            will be (-shift_limit, shift_limit). Default: (-0.05, 0.05).
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        border_mode (OpenCV flag): flag that is used to specify the pixel extrapolation method. Should be one of:
            cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101.
            Default: cv2.BORDER_REFLECT_101
        value (int, float, list of ints, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of ints,
                    list of float): padding value if border_mode is cv2.BORDER_CONSTANT applied for masks.

    Targets:
        image, mask, bbox

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        distort_limit: ScaleFloatType = 0.05,
        shift_limit: ScaleFloatType = 0.05,
        interpolation: int = cv2.INTER_LINEAR,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[ImageColorType] = None,
        mask_value: Optional[ImageColorType] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super(OpticalDistortion, self).__init__(always_apply, p)
        self.shift_limit = to_tuple(shift_limit)
        self.distort_limit = to_tuple(distort_limit)
        self.interpolation = interpolation
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def apply(
        self, img: np.ndarray, k: int = 0, dx: int = 0, dy: int = 0, interpolation: int = cv2.INTER_LINEAR, **params
    ) -> np.ndarray:
        return F.optical_distortion(img, k, dx, dy, interpolation, self.border_mode, self.value)

    def apply_to_mask(self, img: np.ndarray, k: int = 0, dx: int = 0, dy: int = 0, **params) -> np.ndarray:
        return F.optical_distortion(img, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)

    def apply_to_bbox(self, bbox: BoxInternalType, k: int = 0, dx: int = 0, dy: int = 0, **params) -> BoxInternalType:
        rows, cols = params["rows"], params["cols"]
        mask = np.zeros((rows, cols), dtype=np.uint8)
        bbox_denorm = F.denormalize_bbox(bbox, rows, cols)
        x_min, y_min, x_max, y_max = bbox_denorm[:4]
        x_min, y_min, x_max, y_max = int(x_min), int(y_min), int(x_max), int(y_max)
        mask[y_min:y_max, x_min:x_max] = 1
        mask = F.optical_distortion(mask, k, dx, dy, cv2.INTER_NEAREST, self.border_mode, self.mask_value)
        bbox_returned = bbox_from_mask(mask)
        bbox_returned = F.normalize_bbox(bbox_returned, rows, cols)
        return bbox_returned

    def get_params(self):
        return {
    
    
            "k": random.uniform(self.distort_limit[0], self.distort_limit[1]),
            "dx": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
            "dy": round(random.uniform(self.shift_limit[0], self.shift_limit[1])),
        }

    def get_transform_init_args_names(self):
        return (
            "distort_limit",
            "shift_limit",
            "interpolation",
            "border_mode",
            "value",
            "mask_value",
        )

下图为可视化结果,为变化明显,参数设置较大。默认参数变化很微小。
Insertar descripción de la imagen aquí


功能: 填充图像边缘到指定尺寸。(若图像大小大于指定尺寸,不进行任何操作,返回原图)
参数说明:

  • min_height ,min_width :结果图像的最小尺寸

  • position (Union[str, PositionType]):表示将原图置于什么位置,然后在其四周进行pad。(可以看code后面的可视化结果)
    可枚举值:center,top_left,top_right,bottom_left,bottom_right,random

  • border_mode (OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_REFLECT_101

  • value (int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)

  • mask_value (int, float, list of in, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT)

# source code 
class PadIfNeeded(DualTransform):
    """Pad side of the image / max if side is less than desired number.

    Args:
        min_height (int): minimal result image height.
        min_width (int): minimal result image width.
        pad_height_divisor (int): if not None, ensures image height is dividable by value of this argument.
        pad_width_divisor (int): if not None, ensures image width is dividable by value of this argument.
        position (Union[str, PositionType]): Position of the image. should be PositionType.CENTER or
            PositionType.TOP_LEFT or PositionType.TOP_RIGHT or PositionType.BOTTOM_LEFT or PositionType.BOTTOM_RIGHT.
            or PositionType.RANDOM. Default: PositionType.CENTER.
        border_mode (OpenCV flag): OpenCV border mode.
        value (int, float, list of int, list of float): padding value if border_mode is cv2.BORDER_CONSTANT.
        mask_value (int, float,
                    list of int,
                    list of float): padding value for mask if border_mode is cv2.BORDER_CONSTANT.
        p (float): probability of applying the transform. Default: 1.0.

    Targets:
        image, mask, bbox, keypoints

    Image types:
        uint8, float32
    """

    class PositionType(Enum):
        CENTER = "center"
        TOP_LEFT = "top_left"
        TOP_RIGHT = "top_right"
        BOTTOM_LEFT = "bottom_left"
        BOTTOM_RIGHT = "bottom_right"
        RANDOM = "random"

    def __init__(
        self,
        min_height: Optional[int] = 1024,
        min_width: Optional[int] = 1024,
        pad_height_divisor: Optional[int] = None,
        pad_width_divisor: Optional[int] = None,
        position: Union[PositionType, str] = PositionType.CENTER,
        border_mode: int = cv2.BORDER_REFLECT_101,
        value: Optional[ImageColorType] = None,
        mask_value: Optional[ImageColorType] = None,
        always_apply: bool = False,
        p: float = 1.0,
    ):
        if (min_height is None) == (pad_height_divisor is None):
            raise ValueError("Only one of 'min_height' and 'pad_height_divisor' parameters must be set")

        if (min_width is None) == (pad_width_divisor is None):
            raise ValueError("Only one of 'min_width' and 'pad_width_divisor' parameters must be set")

        super(PadIfNeeded, self).__init__(always_apply, p)
        self.min_height = min_height
        self.min_width = min_width
        self.pad_width_divisor = pad_width_divisor
        self.pad_height_divisor = pad_height_divisor
        self.position = PadIfNeeded.PositionType(position)
        self.border_mode = border_mode
        self.value = value
        self.mask_value = mask_value

    def update_params(self, params, **kwargs):
        params = super(PadIfNeeded, self).update_params(params, **kwargs)
        rows = params["rows"]
        cols = params["cols"]

        if self.min_height is not None:
            if rows < self.min_height:
                h_pad_top = int((self.min_height - rows) / 2.0)
                h_pad_bottom = self.min_height - rows - h_pad_top
            else:
                h_pad_top = 0
                h_pad_bottom = 0
        else:
            pad_remained = rows % self.pad_height_divisor
            pad_rows = self.pad_height_divisor - pad_remained if pad_remained > 0 else 0

            h_pad_top = pad_rows // 2
            h_pad_bottom = pad_rows - h_pad_top

        if self.min_width is not None:
            if cols < self.min_width:
                w_pad_left = int((self.min_width - cols) / 2.0)
                w_pad_right = self.min_width - cols - w_pad_left
            else:
                w_pad_left = 0
                w_pad_right = 0
        else:
            pad_remainder = cols % self.pad_width_divisor
            pad_cols = self.pad_width_divisor - pad_remainder if pad_remainder > 0 else 0

            w_pad_left = pad_cols // 2
            w_pad_right = pad_cols - w_pad_left

        h_pad_top, h_pad_bottom, w_pad_left, w_pad_right = self.__update_position_params(
            h_top=h_pad_top, h_bottom=h_pad_bottom, w_left=w_pad_left, w_right=w_pad_right
        )

        params.update(
            {
    
    
                "pad_top": h_pad_top,
                "pad_bottom": h_pad_bottom,
                "pad_left": w_pad_left,
                "pad_right": w_pad_right,
            }
        )
        return params

    def apply(
        self, img: np.ndarray, pad_top: int = 0, pad_bottom: int = 0, pad_left: int = 0, pad_right: int = 0, **params
    ) -> np.ndarray:
        return F.pad_with_params(
            img,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.value,
        )

    def apply_to_mask(
        self, img: np.ndarray, pad_top: int = 0, pad_bottom: int = 0, pad_left: int = 0, pad_right: int = 0, **params
    ) -> np.ndarray:
        return F.pad_with_params(
            img,
            pad_top,
            pad_bottom,
            pad_left,
            pad_right,
            border_mode=self.border_mode,
            value=self.mask_value,
        )

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        pad_top: int = 0,
        pad_bottom: int = 0,
        pad_left: int = 0,
        pad_right: int = 0,
        rows: int = 0,
        cols: int = 0,
        **params
    ) -> BoxInternalType:
        x_min, y_min, x_max, y_max = denormalize_bbox(bbox, rows, cols)[:4]
        bbox = x_min + pad_left, y_min + pad_top, x_max + pad_left, y_max + pad_top
        return normalize_bbox(bbox, rows + pad_top + pad_bottom, cols + pad_left + pad_right)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        pad_top: int = 0,
        pad_bottom: int = 0,
        pad_left: int = 0,
        pad_right: int = 0,
        **params
    ) -> KeypointInternalType:
        x, y, angle, scale = keypoint[:4]
        return x + pad_left, y + pad_top, angle, scale

    def get_transform_init_args_names(self):
        return (
            "min_height",
            "min_width",
            "pad_height_divisor",
            "pad_width_divisor",
            "border_mode",
            "value",
            "mask_value",
        )

    def __update_position_params(
        self, h_top: int, h_bottom: int, w_left: int, w_right: int
    ) -> Tuple[int, int, int, int]:
        if self.position == PadIfNeeded.PositionType.TOP_LEFT:
            h_bottom += h_top
            w_right += w_left
            h_top = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.TOP_RIGHT:
            h_bottom += h_top
            w_left += w_right
            h_top = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_LEFT:
            h_top += h_bottom
            w_right += w_left
            h_bottom = 0
            w_left = 0

        elif self.position == PadIfNeeded.PositionType.BOTTOM_RIGHT:
            h_top += h_bottom
            w_left += w_right
            h_bottom = 0
            w_right = 0

        elif self.position == PadIfNeeded.PositionType.RANDOM:
            h_pad = h_top + h_bottom
            w_pad = w_left + w_right
            h_top = random.randint(0, h_pad)
            h_bottom = h_pad - h_top
            w_left = random.randint(0, w_pad)
            w_right = w_pad - w_left

        return h_top, h_bottom, w_left, w_right

Insertar descripción de la imagen aquí


功能: 随机四点透视变换
参数说明:

  • scale (float or (float, float)): 正态分布的标准差,用于控制新的子图像corners与完整图像corners的距离。
    如果输入为单个数字,将转化为区间(0, scale),默认值:(0.05, 0.1)
  • keep_size (bool): 应用透视变换后是否将图像调整回原始大小。建议使用默认值True,若设为False,返回的图像是一个list,不是数组array,并且可能会有不同的shape。
  • pad_mode(OpenCV flag): opencv边界像素补充方法。可枚举值:cv2.BORDER_CONSTANT(常数), cv2.BORDER_REPLICATE(复制), cv2.BORDER_REFLECT(镜像 ), cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101(镜像)。默认值:cv2.BORDER_CONSTANT
  • pad_val(int, float, list of ints, list of float): 边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT),默认值:0
  • mask_pad_val(int, float, list of in, list of float): 处理mask的边界像素补充值(仅限border_mode=cv2.BORDER_CONSTANT),默认值:0
  • fit_output (bool): 如果为 True,透视变换后图像平面大小和位置将被调整为捕获整个图像。 (如果 keep_size 设置为 True,则随后调整图像大小。)否则,部分转换后的图像可能会在图像平面之外。 使用大比例值时不应将此设置设置为 True,因为它可能会导致非常大的图像。默认值:False

scale越大,透视变换的角度越大;
keep_size建议设为True,保证与原始图像大小一致;
fit_output建议设为False,设为True会有黑边。

Insertar descripción de la imagen aquí

注意!!!

  1. 本变换速度非常慢,可以用ElasticTransform变换代替,至少快10倍。
  2. 对于坐标类输入(keypoints, bounding boxes, polygons, …),本变换依旧要先进行图像变换(image-based augmentation),致使速度很慢并且不完全正确。
  3. 本变换实际是对skimage.transform.warp函数的封装。

功能: 局部仿射变换。效果和弹性变换(ElasticTransform)类似,局部扭曲。
(作者理解:给图像画一个网格,每个网格点向四周局部偏移)

Apply affine transformations that differ between local neighbourhoods.
This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point around via affine transformations. This leads to local distortions.

参数说明:

  1. scale (float, tuple of float): 形变因子,值越大,代表偏离常规网格点的距离越大。
    若参数absolute_scale=False(默认),scale 值乘以图像宽高才表示偏移距离,若absolute_scale=True,表示scale值为固定值。scale 参数建议值范围:(0.01,0.05),默认值(0.03, 0.05)
  2. nb_rows(int, tuple of int): 常规网格的行数,至少为2,大图像建议4以上。
  3. nb_cols(int, tuple of int): 常规网格的列数,至少为2,大图像建议4以上。
  4. interpolation(int): 插值方式
    • 0: Nearest-neighbor
    • 1: Bi-linear (default)
    • 2: Bi-quadratic
    • 3: Bi-cubic
    • 4: Bi-quartic
    • 5: Bi-quintic
  5. mask_interpolation(int): mask的插值方式,取值同interpolation
  6. cval(int): 新像素的填充值
  7. cval_mask(int): mask新像素的填充值
  8. mode(str): 图像边界pad方式, 枚举值{‘constant’, ‘edge’, ‘symmetric’, ‘reflect’, ‘wrap’}
  9. absolute_scale(bool): 第一个scale参数为绝对值还是相对值的flag
  10. keypoints_threshold(float): 距离map转换到关键点的阈值。

    Used as threshold in conversion from distance maps to keypoints.
    The search for keypoints works by searching for the argmin (non-inverted) or argmax (inverted) in each channel. This parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit as a keypoint. Use None to use no min/max. Default: 0.01

# source code
class PiecewiseAffine(DualTransform):
    """Apply affine transformations that differ between local neighbourhoods.
    This augmentation places a regular grid of points on an image and randomly moves the neighbourhood of these point
    around via affine transformations. This leads to local distortions.

    This is mostly a wrapper around scikit-image's ``PiecewiseAffine``.
    See also ``Affine`` for a similar technique.

    Note:
        This augmenter is very slow. Try to use ``ElasticTransformation`` instead, which is at least 10x faster.

    Note:
        For coordinate-based inputs (keypoints, bounding boxes, polygons, ...),
        this augmenter still has to perform an image-based augmentation,
        which will make it significantly slower and not fully correct for such inputs than other transforms.

    Args:
        scale (float, tuple of float): Each point on the regular grid is moved around via a normal distribution.
            This scale factor is equivalent to the normal distribution's sigma.
            Note that the jitter (how far each point is moved in which direction) is multiplied by the height/width of
            the image if ``absolute_scale=False`` (default), so this scale can be the same for different sized images.
            Recommended values are in the range ``0.01`` to ``0.05`` (weak to strong augmentations).
                * If a single ``float``, then that value will always be used as the scale.
                * If a tuple ``(a, b)`` of ``float`` s, then a random value will
                  be uniformly sampled per image from the interval ``[a, b]``.
        nb_rows (int, tuple of int): Number of rows of points that the regular grid should have.
            Must be at least ``2``. For large images, you might want to pick a higher value than ``4``.
            You might have to then adjust scale to lower values.
                * If a single ``int``, then that value will always be used as the number of rows.
                * If a tuple ``(a, b)``, then a value from the discrete interval
                  ``[a..b]`` will be uniformly sampled per image.
        nb_cols (int, tuple of int): Number of columns. Analogous to `nb_rows`.
        interpolation (int): The order of interpolation. The order has to be in the range 0-5:
             - 0: Nearest-neighbor
             - 1: Bi-linear (default)
             - 2: Bi-quadratic
             - 3: Bi-cubic
             - 4: Bi-quartic
             - 5: Bi-quintic
        mask_interpolation (int): same as interpolation but for mask.
        cval (number): The constant value to use when filling in newly created pixels.
        cval_mask (number): Same as cval but only for masks.
        mode (str): {'constant', 'edge', 'symmetric', 'reflect', 'wrap'}, optional
            Points outside the boundaries of the input are filled according
            to the given mode.  Modes match the behaviour of `numpy.pad`.
        absolute_scale (bool): Take `scale` as an absolute value rather than a relative value.
        keypoints_threshold (float): Used as threshold in conversion from distance maps to keypoints.
            The search for keypoints works by searching for the
            argmin (non-inverted) or argmax (inverted) in each channel. This
            parameters contains the maximum (non-inverted) or minimum (inverted) value to accept in order to view a hit
            as a keypoint. Use ``None`` to use no min/max. Default: 0.01

    Targets:
        image, mask, keypoints, bboxes

    Image types:
        uint8, float32

    """

    def __init__(
        self,
        scale: ScaleFloatType = (0.03, 0.05),
        nb_rows: Union[int, Sequence[int]] = 4,
        nb_cols: Union[int, Sequence[int]] = 4,
        interpolation: int = 1,
        mask_interpolation: int = 0,
        cval: int = 0,
        cval_mask: int = 0,
        mode: str = "constant",
        absolute_scale: bool = False,
        always_apply: bool = False,
        keypoints_threshold: float = 0.01,
        p: float = 0.5,
    ):
        super(PiecewiseAffine, self).__init__(always_apply, p)

        self.scale = to_tuple(scale, scale)
        self.nb_rows = to_tuple(nb_rows, nb_rows)
        self.nb_cols = to_tuple(nb_cols, nb_cols)
        self.interpolation = interpolation
        self.mask_interpolation = mask_interpolation
        self.cval = cval
        self.cval_mask = cval_mask
        self.mode = mode
        self.absolute_scale = absolute_scale
        self.keypoints_threshold = keypoints_threshold

    def get_transform_init_args_names(self):
        return (
            "scale",
            "nb_rows",
            "nb_cols",
            "interpolation",
            "mask_interpolation",
            "cval",
            "cval_mask",
            "mode",
            "absolute_scale",
            "keypoints_threshold",
        )

    @property
    def targets_as_params(self):
        return ["image"]

    def get_params_dependent_on_targets(self, params) -> dict:
        h, w = params["image"].shape[:2]

        nb_rows = np.clip(random.randint(*self.nb_rows), 2, None)
        nb_cols = np.clip(random.randint(*self.nb_cols), 2, None)
        nb_cells = nb_cols * nb_rows
        scale = random.uniform(*self.scale)

        jitter: np.ndarray = random_utils.normal(0, scale, (nb_cells, 2))
        if not np.any(jitter > 0):
            return {
    
    "matrix": None}

        y = np.linspace(0, h, nb_rows)
        x = np.linspace(0, w, nb_cols)

        # (H, W) and (H, W) for H=rows, W=cols
        xx_src, yy_src = np.meshgrid(x, y)

        # (1, HW, 2) => (HW, 2) for H=rows, W=cols
        points_src = np.dstack([yy_src.flat, xx_src.flat])[0]

        if self.absolute_scale:
            jitter[:, 0] = jitter[:, 0] / h if h > 0 else 0.0
            jitter[:, 1] = jitter[:, 1] / w if w > 0 else 0.0

        jitter[:, 0] = jitter[:, 0] * h
        jitter[:, 1] = jitter[:, 1] * w

        points_dest = np.copy(points_src)
        points_dest[:, 0] = points_dest[:, 0] + jitter[:, 0]
        points_dest[:, 1] = points_dest[:, 1] + jitter[:, 1]

        # Restrict all destination points to be inside the image plane.
        # This is necessary, as otherwise keypoints could be augmented
        # outside of the image plane and these would be replaced by
        # (-1, -1), which would not conform with the behaviour of the other augmenters.
        points_dest[:, 0] = np.clip(points_dest[:, 0], 0, h - 1)
        points_dest[:, 1] = np.clip(points_dest[:, 1], 0, w - 1)

        matrix = skimage.transform.PiecewiseAffineTransform()
        matrix.estimate(points_src[:, ::-1], points_dest[:, ::-1])

        return {
    
    
            "matrix": matrix,
        }

    def apply(self, img: np.ndarray, matrix: skimage.transform.PiecewiseAffineTransform = None, **params) -> np.ndarray:
        return F.piecewise_affine(img, matrix, self.interpolation, self.mode, self.cval)

    def apply_to_mask(
        self, img: np.ndarray, matrix: skimage.transform.PiecewiseAffineTransform = None, **params
    ) -> np.ndarray:
        return F.piecewise_affine(img, matrix, self.mask_interpolation, self.mode, self.cval_mask)

    def apply_to_bbox(
        self,
        bbox: BoxInternalType,
        rows: int = 0,
        cols: int = 0,
        matrix: skimage.transform.PiecewiseAffineTransform = None,
        **params
    ) -> BoxInternalType:
        return F.bbox_piecewise_affine(bbox, matrix, rows, cols, self.keypoints_threshold)

    def apply_to_keypoint(
        self,
        keypoint: KeypointInternalType,
        rows: int = 0,
        cols: int = 0,
        matrix: skimage.transform.PiecewiseAffineTransform = None,
        **params
    ):
        return F.keypoint_piecewise_affine(keypoint, matrix, rows, cols, self.keypoints_threshold)

Insertar descripción de la imagen aquí

功能: 丢弃像素,即 设置某些像素值为0。
参数说明: dropout_prob:丢弃像素的概率。
per_channel:通道维度是否独立操作,若为True,表示每个通道单独生成drop mask。
drop_value:丢弃位置重置的像素值,默认值为0。若drop_value=None,则在数据范围内随机取值。

  • uint8 - [0, 255]
  • uint16 - [0, 65535]
  • uint32 - [0, 4294967295]
  • float, double - [0, 1]

mask_drop_value:mask丢弃位置重置的像素值,默认值为0。若mask_drop_value=None,mask值不变。

# source code
class PixelDropout(DualTransform):
    """Set pixels to 0 with some probability.

    Args:
        dropout_prob (float): pixel drop probability. Default: 0.01
        per_channel (bool): if set to `True` drop mask will be sampled fo each channel,
            otherwise the same mask will be sampled for all channels. Default: False
        drop_value (number or sequence of numbers or None): Value that will be set in dropped place.
            If set to None value will be sampled randomly, default ranges will be used:
                - uint8 - [0, 255]
                - uint16 - [0, 65535]
                - uint32 - [0, 4294967295]
                - float, double - [0, 1]
            Default: 0
        mask_drop_value (number or sequence of numbers or None): Value that will be set in dropped place in masks.
            If set to None masks will be unchanged. Default: 0
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask
    Image types:
        any
    """

    def __init__(
        self,
        dropout_prob: float = 0.01,
        per_channel: bool = False,
        drop_value: Optional[Union[float, Sequence[float]]] = 0,
        mask_drop_value: Optional[Union[float, Sequence[float]]] = None,
        always_apply: bool = False,
        p: float = 0.5,
    ):
        super().__init__(always_apply, p)
        self.dropout_prob = dropout_prob
        self.per_channel = per_channel
        self.drop_value = drop_value
        self.mask_drop_value = mask_drop_value

        if self.mask_drop_value is not None and self.per_channel:
            raise ValueError("PixelDropout supports mask only with per_channel=False")

    def apply(
        self, img: np.ndarray, drop_mask: np.ndarray = None, drop_value: Union[float, Sequence[float]] = (), **params
    ) -> np.ndarray:
        assert drop_mask is not None
        return F.pixel_dropout(img, drop_mask, drop_value)

    def apply_to_mask(self, img: np.ndarray, drop_mask: np.ndarray = np.array([]), **params) -> np.ndarray:
        if self.mask_drop_value is None:
            return img

        if img.ndim == 2:
            drop_mask = np.squeeze(drop_mask)

        return F.pixel_dropout(img, drop_mask, self.mask_drop_value)

    def apply_to_bbox(self, bbox, **params):
        return bbox

    def apply_to_keypoint(self, keypoint, **params):
        return keypoint

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        img = params["image"]
        shape = img.shape if self.per_channel else img.shape[:2]

        rnd = np.random.RandomState(random.randint(0, 1 << 31))
        # Use choice to create boolean matrix, if we will use binomial after that we will need type conversion
        drop_mask = rnd.choice([True, False], shape, p=[self.dropout_prob, 1 - self.dropout_prob])

        drop_value: Union[float, Sequence[float], np.ndarray]
        if drop_mask.ndim != img.ndim:
            drop_mask = np.expand_dims(drop_mask, -1)
        if self.drop_value is None:
            drop_shape = 1 if is_grayscale_image(img) else int(img.shape[-1])

            if img.dtype in (np.uint8, np.uint16, np.uint32):
                drop_value = rnd.randint(0, int(F.MAX_VALUES_BY_DTYPE[img.dtype]), drop_shape, img.dtype)
            elif img.dtype in [np.float32, np.double]:
                drop_value = rnd.uniform(0, 1, drop_shape).astype(img.dtpye)
            else:
                raise ValueError(f"Unsupported dtype: {
      
      img.dtype}")
        else:
            drop_value = self.drop_value

        return {
    
    "drop_mask": drop_mask, "drop_value": drop_value}

    @property
    def targets_as_params(self) -> List[str]:
        return ["image"]

    def get_transform_init_args_names(self) -> Tuple[str, str, str, str]:
        return ("dropout_prob", "per_channel", "drop_value", "mask_drop_value")

# F.pixel_dropout()
@preserve_shape
def pixel_dropout(image: np.ndarray, drop_mask: np.ndarray, drop_value: Union[float, Sequence[float]]) -> np.ndarray:
    if isinstance(drop_value, (int, float)) and drop_value == 0:
        drop_values = np.zeros_like(image)
    else:
        drop_values = np.full_like(image, drop_value)  # type: ignore
    return np.where(drop_mask, drop_values, image)

下图中右下角per_channel=True时通道独立进行pixel drop操作,所以会出现彩噪现象。
Insertar descripción de la imagen aquí

功能: 随机裁剪
参数说明: height、width (int): 裁剪区域的宽高。

class RandomCrop(DualTransform):
    """Crop a random part of the input.

    Args:
        height (int): height of the crop.
        width (int): width of the crop.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    def __init__(self, height, width, always_apply=False, p=1.0):
        super().__init__(always_apply, p)
        self.height = height
        self.width = width

    def apply(self, img, h_start=0, w_start=0, **params):
        return F.random_crop(img, self.height, self.width, h_start, w_start)

    def get_params(self):
        return {
    
    "h_start": random.random(), "w_start": random.random()}

    def apply_to_bbox(self, bbox, **params):
        return F.bbox_random_crop(bbox, self.height, self.width, **params)

    def apply_to_keypoint(self, keypoint, **params):
        return F.keypoint_random_crop(keypoint, self.height, self.width, **params)

    def get_transform_init_args_names(self):
        return ("height", "width")

Insertar descripción de la imagen aquí


功能: 图像四周边缘裁剪掉部分,结果不resize,所以会改变原始图像尺寸。
参数说明:
以下四个参数表示四边的裁剪比例,有效范围: (0.0, 1.0),默认值均为0.1。
crop_left (float): 图像左侧裁剪比例。裁剪的像素值将在 [0, crop_left * width)范围内随机取值。
crop_right (float): 图像右侧裁剪比例。裁剪的像素值将在 [(1 - crop_right) * width, width)范围内随机取值。
crop_top (float): 图像顶侧裁剪比例。裁剪的像素值将在[0, crop_top * height)范围内随机取值。
crop_bottom (float): 图像底侧裁剪比例。裁剪的像素值将在[(1 - crop_bottom) * height, height)范围内随机取值。

# source code
class RandomCropFromBorders(DualTransform):
    """Crop bbox from image randomly cut parts from borders without resize at the end

    Args:
        crop_left (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from left side in range [0, crop_left * width)
        crop_right (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from right side in range [(1 - crop_right) * width, width)
        crop_top (float): singlefloat value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from top side in range [0, crop_top * height)
        crop_bottom (float): single float value in (0.0, 1.0) range. Default 0.1. Image will be randomly cut
        from bottom side in range [(1 - crop_bottom) * height, height)
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        crop_left=0.1,
        crop_right=0.1,
        crop_top=0.1,
        crop_bottom=0.1,
        always_apply=False,
        p=1.0,
    ):
        super(RandomCropFromBorders, self).__init__(always_apply, p)
        self.crop_left = crop_left
        self.crop_right = crop_right
        self.crop_top = crop_top
        self.crop_bottom = crop_bottom

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        x_min = random.randint(0, int(self.crop_left * img.shape[1]))
        x_max = random.randint(max(x_min + 1, int((1 - self.crop_right) * img.shape[1])), img.shape[1])
        y_min = random.randint(0, int(self.crop_top * img.shape[0]))
        y_max = random.randint(max(y_min + 1, int((1 - self.crop_bottom) * img.shape[0])), img.shape[0])
        return {
    
    "x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}

    def apply(self, img, x_min=0, x_max=0, y_min=0, y_max=0, **params):
        return F.clamping_crop(img, x_min, y_min, x_max, y_max)

    def apply_to_mask(self, mask, x_min=0, x_max=0, y_min=0, y_max=0, **params):
        return F.clamping_crop(mask, x_min, y_min, x_max, y_max)

    def apply_to_bbox(self, bbox, x_min=0, x_max=0, y_min=0, y_max=0, **params):
        rows, cols = params["rows"], params["cols"]
        return F.bbox_crop(bbox, x_min, y_min, x_max, y_max, rows, cols)

    def apply_to_keypoint(self, keypoint, x_min=0, x_max=0, y_min=0, y_max=0, **params):
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

    @property
    def targets_as_params(self):
        return ["image"]

    def get_transform_init_args_names(self):
        return "crop_left", "crop_right", "crop_top", "crop_bottom"

结果图可以看到该操作会改变图像尺寸。
Insertar descripción de la imagen aquí

功能: 在指定box区域附近裁剪图像。
参数说明:

  • max_part_shift (float, (float, float)): 高和宽方向上相对于 cropping_bbox 最大偏移。 Default (0.3, 0.3).

  • cropping_box_key (str): 指定的rect区域键值。 Default cropping_bbox。rect区域坐标为四个数分别对应左上角x,y坐标,右下角x,y坐标。注意cropping_bbox未支持多个区域指定,从以下代码可以看出。

    bbox = params[self.cropping_bbox_key]
    h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
    w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])
    
class RandomCropNearBBox(DualTransform):
    """Crop bbox from image with random shift by x,y coordinates

    Args:
        max_part_shift (float, (float, float)): Max shift in `height` and `width` dimensions relative
            to `cropping_bbox` dimension.
            If max_part_shift is a single float, the range will be (max_part_shift, max_part_shift).
            Default (0.3, 0.3).
        cropping_box_key (str): Additional target key for cropping box. Default `cropping_bbox`
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32

    Examples:
        >>> aug = Compose(RandomCropNearBBox(max_part_shift=(0.1, 0.5), cropping_box_key='test_box'),
        >>>               bbox_params=BboxParams("pascal_voc"))
        >>> result = aug(image=image, bboxes=bboxes, test_box=[0, 5, 10, 20])

    """

    def __init__(
        self,
        max_part_shift: Union[float, Tuple[float, float]] = (0.3, 0.3),
        cropping_box_key: str = "cropping_bbox",
        always_apply: bool = False,
        p: float = 1.0,
    ):
        super(RandomCropNearBBox, self).__init__(always_apply, p)
        self.max_part_shift = to_tuple(max_part_shift, low=max_part_shift)
        self.cropping_bbox_key = cropping_box_key

        if min(self.max_part_shift) < 0 or max(self.max_part_shift) > 1:
            raise ValueError("Invalid max_part_shift. Got: {}".format(max_part_shift))

    def apply(
        self, img: np.ndarray, x_min: int = 0, x_max: int = 0, y_min: int = 0, y_max: int = 0, **params
    ) -> np.ndarray:
        return F.clamping_crop(img, x_min, y_min, x_max, y_max)

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
        bbox = params[self.cropping_bbox_key]
        h_max_shift = round((bbox[3] - bbox[1]) * self.max_part_shift[0])
        w_max_shift = round((bbox[2] - bbox[0]) * self.max_part_shift[1])

        x_min = bbox[0] - random.randint(-w_max_shift, w_max_shift)
        x_max = bbox[2] + random.randint(-w_max_shift, w_max_shift)

        y_min = bbox[1] - random.randint(-h_max_shift, h_max_shift)
        y_max = bbox[3] + random.randint(-h_max_shift, h_max_shift)

        x_min = max(0, x_min)
        y_min = max(0, y_min)

        return {
    
    "x_min": x_min, "x_max": x_max, "y_min": y_min, "y_max": y_max}

    def apply_to_bbox(self, bbox: Tuple[float, float, float, float], **params) -> Tuple[float, float, float, float]:
        return F.bbox_crop(bbox, **params)

    def apply_to_keypoint(
        self,
        keypoint: Tuple[float, float, float, float],
        x_min: int = 0,
        x_max: int = 0,
        y_min: int = 0,
        y_max: int = 0,
        **params
    ) -> Tuple[float, float, float, float]:
        return F.crop_keypoint_by_coords(keypoint, crop_coords=(x_min, y_min, x_max, y_max))

指定的box区域是小鸟坐标,所以裁剪得到的三张图都包含小鸟。
Insertar descripción de la imagen aquí


**功能:**将图像分块,并随机打乱
参数说明: grid ((int, int)): 图像分为多少块,第一个数表示高度方向,第二个数表示宽度方向

# source code
class RandomGridShuffle(DualTransform):
    """
    Random shuffle grid's cells on image.

    Args:
        grid ((int, int)): size of grid for splitting image.

    Targets:
        image, mask, keypoints

    Image types:
        uint8, float32
    """

    def __init__(self,
                 grid: Tuple[int, int] = (3, 3),
                 always_apply: bool = False,
                 p: float = 0.5):
        super(RandomGridShuffle, self).__init__(always_apply, p)
        self.grid = grid

    def apply(self, img: np.ndarray, tiles: np.ndarray = None, **params):
        if tiles is not None:
            img = F.swap_tiles_on_image(img, tiles)
        return img

    def apply_to_mask(self,
                      img: np.ndarray,
                      tiles: np.ndarray = None,
                      **params):
        if tiles is not None:
            img = F.swap_tiles_on_image(img, tiles)
        return img

    def apply_to_keypoint(self,
                          keypoint: Tuple[float, ...],
                          tiles: np.ndarray = None,
                          rows: int = 0,
                          cols: int = 0,
                          **params):
        if tiles is None:
            return keypoint

        for (
                current_left_up_corner_row,
                current_left_up_corner_col,
                old_left_up_corner_row,
                old_left_up_corner_col,
                height_tile,
                width_tile,
        ) in tiles:
            x, y = keypoint[:2]

            if (old_left_up_corner_row <= y <
                (old_left_up_corner_row + height_tile)) and (
                    old_left_up_corner_col <= x <
                    (old_left_up_corner_col + width_tile)):
                x = x - old_left_up_corner_col + current_left_up_corner_col
                y = y - old_left_up_corner_row + current_left_up_corner_row
                keypoint = (x, y) + tuple(keypoint[2:])
                break

        return keypoint

    def get_params_dependent_on_targets(self, params):
        height, width = params["image"].shape[:2]
        n, m = self.grid

        if n <= 0 or m <= 0:
            raise ValueError(
                "Grid's values must be positive. Current grid [%s, %s]" %
                (n, m))

        if n > height // 2 or m > width // 2:
            raise ValueError(
                "Incorrect size cell of grid. Just shuffle pixels of image")

        height_split = np.linspace(0, height, n + 1, dtype=np.int)
        width_split = np.linspace(0, width, m + 1, dtype=np.int)

        height_matrix, width_matrix = np.meshgrid(height_split,
                                                  width_split,
                                                  indexing="ij")

        index_height_matrix = height_matrix[:-1, :-1]
        index_width_matrix = width_matrix[:-1, :-1]

        shifted_index_height_matrix = height_matrix[1:, 1:]
        shifted_index_width_matrix = width_matrix[1:, 1:]

        height_tile_sizes = shifted_index_height_matrix - index_height_matrix
        width_tile_sizes = shifted_index_width_matrix - index_width_matrix

        tiles_sizes = np.stack((height_tile_sizes, width_tile_sizes), axis=2)

        index_matrix = np.indices((n, m))
        new_index_matrix = np.stack(index_matrix, axis=2)

        for bbox_size in np.unique(tiles_sizes.reshape(-1, 2), axis=0):
            eq_mat = np.all(tiles_sizes == bbox_size, axis=2)
            new_index_matrix[eq_mat] = random_utils.permutation(
                new_index_matrix[eq_mat])

        new_index_matrix = np.split(new_index_matrix, 2, axis=2)

        old_x = index_height_matrix[new_index_matrix[0],
                                    new_index_matrix[1]].reshape(-1)
        old_y = index_width_matrix[new_index_matrix[0],
                                   new_index_matrix[1]].reshape(-1)

        shift_x = height_tile_sizes.reshape(-1)
        shift_y = width_tile_sizes.reshape(-1)

        curr_x = index_height_matrix.reshape(-1)
        curr_y = index_width_matrix.reshape(-1)

        tiles = np.stack([curr_x, curr_y, old_x, old_y, shift_x, shift_y],
                         axis=1)

        return {
    
    "tiles": tiles}

    @property
    def targets_as_params(self):
        return ["image"]

    def get_transform_init_args_names(self):
        return ("grid", )

Insertar descripción de la imagen aquí


功能: 裁剪图像某个区域,并缩放至指定尺寸。相似功能:RandomResizedCrop
参数说明:
height、width (int): 缩放的目标尺寸。
scale ((float, float)): 相对原始图像的裁剪范围。
ratio ((float, float)): 宽高比变化范围。
interpolation (OpenCV flag): 插值方式。 Should be one of:
cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
Default: cv2.INTER_LINEAR.

# source code
class RandomResizedCrop(_BaseRandomSizedCrop):
    """Torchvision's variant of crop a random part of the input and rescale it to some size.

    Args:
        height (int): height after crop and resize.
        width (int): width after crop and resize.
        scale ((float, float)): range of size of the origin size cropped
        ratio ((float, float)): range of aspect ratio of the origin aspect ratio cropped
        interpolation (OpenCV flag): flag that is used to specify the interpolation algorithm. Should be one of:
            cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
            Default: cv2.INTER_LINEAR.
        p (float): probability of applying the transform. Default: 1.

    Targets:
        image, mask, bboxes, keypoints

    Image types:
        uint8, float32
    """

    def __init__(
        self,
        height,
        width,
        scale=(0.08, 1.0),
        ratio=(0.75, 1.3333333333333333),
        interpolation=cv2.INTER_LINEAR,
        always_apply=False,
        p=1.0,
    ):

        super(RandomResizedCrop, self).__init__(
            height=height, width=width, interpolation=interpolation, always_apply=always_apply, p=p
        )
        self.scale = scale
        self.ratio = ratio

    def get_params_dependent_on_targets(self, params):
        img = params["image"]
        area = img.shape[0] * img.shape[1]

        for _attempt in range(10):
            target_area = random.uniform(*self.scale) * area
            log_ratio = (math.log(self.ratio[0]), math.log(self.ratio[1]))
            aspect_ratio = math.exp(random.uniform(*log_ratio))
			#  aspect_ratio = w / h
            w = int(round(math.sqrt(target_area * aspect_ratio)))  # skipcq: PTC-W0028
            h = int(round(math.sqrt(target_area / aspect_ratio)))  # skipcq: PTC-W0028

            if 0 < w <= img.shape[1] and 0 < h <= img.shape[0]:
                i = random.randint(0, img.shape[0] - h)
                j = random.randint(0, img.shape[1] - w)
                return {
    
    
                    "crop_height": h,
                    "crop_width": w,
                    "h_start": i * 1.0 / (img.shape[0] - h + 1e-10),
                    "w_start": j * 1.0 / (img.shape[1] - w + 1e-10),
                }

        # Fallback to central crop
        in_ratio = img.shape[1] / img.shape[0]
        if in_ratio < min(self.ratio):
            w = img.shape[1]
            h = int(round(w / min(self.ratio)))
        elif in_ratio > max(self.ratio):
            h = img.shape[0]
            w = int(round(h * max(self.ratio)))
        else:  # whole image
            w = img.shape[1]
            h = img.shape[0]
        i = (img.shape[0] - h) // 2
        j = (img.shape[1] - w) // 2
        return {
    
    
            "crop_height": h,
            "crop_width": w,
            "h_start": i * 1.0 / (img.shape[0] - h + 1e-10),
            "w_start": j * 1.0 / (img.shape[1] - w + 1e-10),
        }

    def get_params(self):
        return {
    
    }

    @property
    def targets_as_params(self):
        return ["image"]

    def get_transform_init_args_names(self):
        return "height", "width", "scale", "ratio", "interpolation"

Insertar descripción de la imagen aquí


Función: Gire la imagen 90 grados 0 o más veces, es decir, gire aleatoriamente la imagen original 0°, 90°, 180° y 270°.

class RandomRotate90(DualTransform):
    """Randomly rotate the input by 90 degrees zero or more times.

    Args:
        p (float): probability of applying the transform. Default: 0.5.

    Targets:
        image, mask, bboxes, keypoints
    """

    def apply(self, img, factor=0, **params):
        """
        Args:
            factor (int): number of times the input will be rotated by 90 degrees.
        """
        return np.ascontiguousarray(np.rot90(img, factor))

    def get_params(self):
        # Random int in the range [0, 3]
        return {
    
    "factor": random.randint(0, 3)}

    def apply_to_bbox(self, bbox, factor=0, **params):
        return F.bbox_rot90(bbox, factor, **params)

    def apply_to_keypoint(self, keypoint, factor=0, **params):
        return F.keypoint_rot90(keypoint, factor, **params)

    def get_transform_init_args_names(self):
        return ()

Insertar descripción de la imagen aquí

Supongo que te gusta

Origin blog.csdn.net/qq_39735236/article/details/127655509
Recomendado
Clasificación