memory ¶
MemoryUsageEstimator ¶
convert_bytes_to_unit
classmethod
¶
Convenience method to convert bytes to a different unit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value |
float
|
The number of bytes. |
required |
unit |
Literal['b', 'mb', 'gb']
|
The unit to convert to. |
required |
Returns:
Type | Description |
---|---|
float
|
The number of bytes in the new unit. |
convert_units
classmethod
¶
convert_units(
value: float,
from_unit: Literal["b", "mb", "gb"],
to_unit: Literal["b", "mb", "gb"],
) -> float
Convert a value from one unit to another.
estimate_memory_of_one_batch
classmethod
¶
estimate_memory_of_one_batch(
X: Tensor,
model: Module,
*,
cache_kv: bool,
dtype_byte_size: int,
unit: Literal["b", "mb", "gb"] = "gb",
n_train_samples: int | None = None
) -> float
Estimate the memory usage of a single batch.
The calculation is done based on the assumption that save_peak_mem_factor is not used (since this estimation is used to determine whether to use it).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Tensor
|
The input tensor. |
required |
model |
Module
|
The model to estimate the memory usage of. |
required |
cache_kv |
bool
|
Whether key and value tensors are cached. |
required |
dtype_byte_size |
int
|
The size of the data type in bytes. |
required |
unit |
Literal['b', 'mb', 'gb']
|
The unit to convert the memory usage to. |
'gb'
|
n_train_samples |
int | None
|
The number of training samples (only for cache_kv mode) |
None
|
Returns:
Type | Description |
---|---|
float
|
The estimated memory usage of a single batch. |
estimate_memory_remainder_after_batch
classmethod
¶
estimate_memory_remainder_after_batch(
X: Tensor,
model: Module,
*,
cache_kv: bool,
device: device,
dtype_byte_size: int,
safety_factor: float,
n_train_samples: int | None = None,
max_free_mem: float | int | None = None
) -> float
Whether to save peak memory or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Tensor
|
The input tensor. |
required |
model |
Module
|
The model to estimate the memory usage of. |
required |
cache_kv |
bool
|
Whether key and value tensors are cached. |
required |
device |
device
|
The device to use. |
required |
dtype_byte_size |
int
|
The size of the data type in bytes. |
required |
safety_factor |
float
|
The safety factor to apply. |
required |
n_train_samples |
int | None
|
The number of training samples (only for cache_kv mode) |
None
|
max_free_mem |
float | int | None
|
The amount of free memory available. |
None
|
Returns:
Type | Description |
---|---|
float
|
The amount of free memory available after a batch is computed. |
get_max_free_memory
classmethod
¶
get_max_free_memory(
device: device,
*,
unit: Literal["b", "mb", "gb"] = "gb",
default_gb_cpu_if_failed_to_calculate: float
) -> float
How much memory to use at most in GB, the memory usage will be calculated based on an estimation of the systems free memory.
For CUDA will use the free memory of the GPU. For CPU will default to 32 GB.
Returns:¶
The maximum memory usage in GB.
reset_peak_memory_if_required
classmethod
¶
reset_peak_memory_if_required(
save_peak_mem: bool | Literal["auto"] | float | int,
model: Module,
X: Tensor,
*,
cache_kv: bool,
device: device,
dtype_byte_size: int,
safety_factor: float = 5.0,
n_train_samples: int | None = None
) -> None
Reset the peak memory if required.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_peak_mem |
bool | 'auto' | float | int
|
If bool, specifies whether to save peak memory or not. If "auto", the amount of free memory is estimated and the option is enabled or disabled based on the estimated usage. If float or int, it is considered as the amount of memory available (in GB) explicitly specified by the user. In this case, this value is used to estimate whether or not to save peak memory. |
required |
model |
Module
|
The model to reset the peak memory of. |
required |
X |
Tensor
|
The input tensor. |
required |
cache_kv |
bool
|
Whether key and value tensors are cached. |
required |
device |
device
|
The device to use. |
required |
dtype_byte_size |
int
|
The size of the data type in bytes. |
required |
safety_factor |
float
|
The safety factor to apply. |
5.0
|
n_train_samples |
int
|
The number of training samples (to be used only for cache_kv mode) |
None
|
support_save_peak_mem_factor ¶
Can be applied to a method acting on a tensor 'x' whose first dimension is a flat batch dimension (i.e. the operation is trivially parallel over the first dimension).
For additional tensor arguments, it is assumed that the first dimension is again the batch dimension, and that non-tensor arguments can be passed as-is to splits when parallelizing over the batch dimension.
The decorator adds options 'add_input' to add the principal input 'x' to the result of the method and 'allow_inplace'. By setting 'allow_inplace', the caller indicates that 'x' is not used after the call and its buffer can be reused for the output.
Setting 'allow_inplace' does not ensure that the operation will be inplace, and the return value should be used for clarity and simplicity.
Moreover, it adds an optional int parameter 'save_peak_mem_factor' that is only supported in combination with 'allow_inplace' during inference and subdivides the operation into the specified number of chunks to reduce peak memory consumption.