memory ¶

MemoryUsageEstimator ¶

convert_bytes_to_unit `classmethod` ¶

convert_bytes_to_unit(
    value: float, unit: Literal["b", "mb", "gb"]
) -> float

Convenience method to convert bytes to a different unit.

Parameters:

Name	Type	Description	Default
`value`	`float`	The number of bytes.	required
`unit`	`Literal['b', 'mb', 'gb']`	The unit to convert to.	required

Returns:

Type	Description
`float`	The number of bytes in the new unit.

convert_units `classmethod` ¶

convert_units(
    value: float,
    from_unit: Literal["b", "mb", "gb"],
    to_unit: Literal["b", "mb", "gb"],
) -> float

Convert a value from one unit to another.

estimate_memory_of_one_batch `classmethod` ¶

estimate_memory_of_one_batch(
    X: Tensor,
    model: Module,
    *,
    cache_kv: bool,
    dtype_byte_size: int,
    unit: Literal["b", "mb", "gb"] = "gb",
    n_train_samples: int | None = None
) -> float

Estimate the memory usage of a single batch.

The calculation is done based on the assumption that save_peak_mem_factor is not used (since this estimation is used to determine whether to use it).

Parameters:

Name	Type	Description	Default
`X`	`Tensor`	The input tensor.	required
`model`	`Module`	The model to estimate the memory usage of.	required
`cache_kv`	`bool`	Whether key and value tensors are cached.	required
`dtype_byte_size`	`int`	The size of the data type in bytes.	required
`unit`	`Literal['b', 'mb', 'gb']`	The unit to convert the memory usage to.	`'gb'`
`n_train_samples`	`int \| None`	The number of training samples (only for cache_kv mode)	`None`

Returns:

Type	Description
`float`	The estimated memory usage of a single batch.

estimate_memory_remainder_after_batch `classmethod` ¶

estimate_memory_remainder_after_batch(
    X: Tensor,
    model: Module,
    *,
    cache_kv: bool,
    device: device,
    dtype_byte_size: int,
    safety_factor: float,
    n_train_samples: int | None = None,
    max_free_mem: float | int | None = None
) -> float

Whether to save peak memory or not.

Parameters:

Name	Type	Description	Default
`X`	`Tensor`	The input tensor.	required
`model`	`Module`	The model to estimate the memory usage of.	required
`cache_kv`	`bool`	Whether key and value tensors are cached.	required
`device`	`device`	The device to use.	required
`dtype_byte_size`	`int`	The size of the data type in bytes.	required
`safety_factor`	`float`	The safety factor to apply.	required
`n_train_samples`	`int \| None`	The number of training samples (only for cache_kv mode)	`None`
`max_free_mem`	`float \| int \| None`	The amount of free memory available.	`None`

Returns:

Type	Description
`float`	The amount of free memory available after a batch is computed.

get_max_free_memory `classmethod` ¶

get_max_free_memory(
    device: device,
    *,
    unit: Literal["b", "mb", "gb"] = "gb",
    default_gb_cpu_if_failed_to_calculate: float
) -> float

How much memory to use at most in GB, the memory usage will be calculated based on an estimation of the systems free memory.

For CUDA will use the free memory of the GPU. For CPU will default to 32 GB.

Returns:¶

The maximum memory usage in GB.

reset_peak_memory_if_required `classmethod` ¶

reset_peak_memory_if_required(
    save_peak_mem: bool | Literal["auto"] | float | int,
    model: Module,
    X: Tensor,
    *,
    cache_kv: bool,
    device: device,
    dtype_byte_size: int,
    safety_factor: float = 5.0,
    n_train_samples: int | None = None
) -> None

Reset the peak memory if required.

Parameters:

Name	Type	Description	Default
`save_peak_mem`	`bool \| 'auto' \| float \| int`	If bool, specifies whether to save peak memory or not. If "auto", the amount of free memory is estimated and the option is enabled or disabled based on the estimated usage. If float or int, it is considered as the amount of memory available (in GB) explicitly specified by the user. In this case, this value is used to estimate whether or not to save peak memory.	required
`model`	`Module`	The model to reset the peak memory of.	required
`X`	`Tensor`	The input tensor.	required
`cache_kv`	`bool`	Whether key and value tensors are cached.	required
`device`	`device`	The device to use.	required
`dtype_byte_size`	`int`	The size of the data type in bytes.	required
`safety_factor`	`float`	The safety factor to apply.	`5.0`
`n_train_samples`	`int`	The number of training samples (to be used only for cache_kv mode)	`None`

support_save_peak_mem_factor ¶

support_save_peak_mem_factor(
    method: MethodType,
) -> Callable

Can be applied to a method acting on a tensor 'x' whose first dimension is a flat batch dimension (i.e. the operation is trivially parallel over the first dimension).

For additional tensor arguments, it is assumed that the first dimension is again the batch dimension, and that non-tensor arguments can be passed as-is to splits when parallelizing over the batch dimension.

The decorator adds options 'add_input' to add the principal input 'x' to the result of the method and 'allow_inplace'. By setting 'allow_inplace', the caller indicates that 'x' is not used after the call and its buffer can be reused for the output.

Setting 'allow_inplace' does not ensure that the operation will be inplace, and the return value should be used for clarity and simplicity.

Moreover, it adds an optional int parameter 'save_peak_mem_factor' that is only supported in combination with 'allow_inplace' during inference and subdivides the operation into the specified number of chunks to reduce peak memory consumption.

memory ¶

MemoryUsageEstimator ¶

convert_bytes_to_unit classmethod ¶

convert_units classmethod ¶

estimate_memory_of_one_batch classmethod ¶

estimate_memory_remainder_after_batch classmethod ¶

get_max_free_memory classmethod ¶

Returns:¶

reset_peak_memory_if_required classmethod ¶

support_save_peak_mem_factor ¶

convert_bytes_to_unit `classmethod` ¶

convert_units `classmethod` ¶

estimate_memory_of_one_batch `classmethod` ¶

estimate_memory_remainder_after_batch `classmethod` ¶

get_max_free_memory `classmethod` ¶

reset_peak_memory_if_required `classmethod` ¶