Skip to content

memory

MemoryUsageEstimator

convert_bytes_to_unit classmethod

convert_bytes_to_unit(
    value: float, unit: Literal["b", "mb", "gb"]
) -> float

Convenience method to convert bytes to a different unit.

Parameters:

Name Type Description Default
value float

The number of bytes.

required
unit Literal['b', 'mb', 'gb']

The unit to convert to.

required

Returns:

Type Description
float

The number of bytes in the new unit.

convert_units classmethod

convert_units(
    value: float,
    from_unit: Literal["b", "mb", "gb"],
    to_unit: Literal["b", "mb", "gb"],
) -> float

Convert a value from one unit to another.

estimate_memory_of_one_batch classmethod

estimate_memory_of_one_batch(
    X: Tensor,
    model: Module,
    *,
    cache_kv: bool,
    dtype_byte_size: int,
    unit: Literal["b", "mb", "gb"] = "gb",
    n_train_samples: int | None = None
) -> float

Estimate the memory usage of a single batch.

The calculation is done based on the assumption that save_peak_mem_factor is not used (since this estimation is used to determine whether to use it).

Parameters:

Name Type Description Default
X Tensor

The input tensor.

required
model Module

The model to estimate the memory usage of.

required
cache_kv bool

Whether key and value tensors are cached.

required
dtype_byte_size int

The size of the data type in bytes.

required
unit Literal['b', 'mb', 'gb']

The unit to convert the memory usage to.

'gb'
n_train_samples int | None

The number of training samples (only for cache_kv mode)

None

Returns:

Type Description
float

The estimated memory usage of a single batch.

estimate_memory_remainder_after_batch classmethod

estimate_memory_remainder_after_batch(
    X: Tensor,
    model: Module,
    *,
    cache_kv: bool,
    device: device,
    dtype_byte_size: int,
    safety_factor: float,
    n_train_samples: int | None = None,
    max_free_mem: float | int | None = None
) -> float

Whether to save peak memory or not.

Parameters:

Name Type Description Default
X Tensor

The input tensor.

required
model Module

The model to estimate the memory usage of.

required
cache_kv bool

Whether key and value tensors are cached.

required
device device

The device to use.

required
dtype_byte_size int

The size of the data type in bytes.

required
safety_factor float

The safety factor to apply.

required
n_train_samples int | None

The number of training samples (only for cache_kv mode)

None
max_free_mem float | int | None

The amount of free memory available.

None

Returns:

Type Description
float

The amount of free memory available after a batch is computed.

get_max_free_memory classmethod

get_max_free_memory(
    device: device,
    *,
    unit: Literal["b", "mb", "gb"] = "gb",
    default_gb_cpu_if_failed_to_calculate: float
) -> float

How much memory to use at most in GB, the memory usage will be calculated based on an estimation of the systems free memory.

For CUDA will use the free memory of the GPU. For CPU will default to 32 GB.

Returns:

The maximum memory usage in GB.

reset_peak_memory_if_required classmethod

reset_peak_memory_if_required(
    save_peak_mem: bool | Literal["auto"] | float | int,
    model: Module,
    X: Tensor,
    *,
    cache_kv: bool,
    device: device,
    dtype_byte_size: int,
    safety_factor: float = 5.0,
    n_train_samples: int | None = None
) -> None

Reset the peak memory if required.

Parameters:

Name Type Description Default
save_peak_mem bool | 'auto' | float | int

If bool, specifies whether to save peak memory or not. If "auto", the amount of free memory is estimated and the option is enabled or disabled based on the estimated usage. If float or int, it is considered as the amount of memory available (in GB) explicitly specified by the user. In this case, this value is used to estimate whether or not to save peak memory.

required
model Module

The model to reset the peak memory of.

required
X Tensor

The input tensor.

required
cache_kv bool

Whether key and value tensors are cached.

required
device device

The device to use.

required
dtype_byte_size int

The size of the data type in bytes.

required
safety_factor float

The safety factor to apply.

5.0
n_train_samples int

The number of training samples (to be used only for cache_kv mode)

None

support_save_peak_mem_factor

support_save_peak_mem_factor(
    method: MethodType,
) -> Callable

Can be applied to a method acting on a tensor 'x' whose first dimension is a flat batch dimension (i.e. the operation is trivially parallel over the first dimension).

For additional tensor arguments, it is assumed that the first dimension is again the batch dimension, and that non-tensor arguments can be passed as-is to splits when parallelizing over the batch dimension.

The decorator adds options 'add_input' to add the principal input 'x' to the result of the method and 'allow_inplace'. By setting 'allow_inplace', the caller indicates that 'x' is not used after the call and its buffer can be reused for the output.

Setting 'allow_inplace' does not ensure that the operation will be inplace, and the return value should be used for clarity and simplicity.

Moreover, it adds an optional int parameter 'save_peak_mem_factor' that is only supported in combination with 'allow_inplace' during inference and subdivides the operation into the specified number of chunks to reduce peak memory consumption.