Gradient Accumulation Wrappers

GradientAccumulateModel(*args, **kwargs)

Model wrapper for gradient accumulation.

GradientAccumulateOptimizer([optimizer, ...])

Optimizer wrapper for gradient accumulation.

GradientAccumulateModel

class gradient_accumulator.GradientAccumulateModel(*args, **kwargs)[source]

Model wrapper for gradient accumulation.

apply_accu_gradients()[source]

Performs gradient update and resets slots afterwards.

reinit_grad_accum()[source]

Reinitialized gradient accumulator slots.

train_step(data)[source]

Performs single train step.

GradientAccumulateOptimizer

class gradient_accumulator.GradientAccumulateOptimizer(optimizer='SGD', accum_steps=1, reduction: str = 'MEAN', name: str = 'GradientAccumulateOptimizer', **kwargs)[source]

Optimizer wrapper for gradient accumulation.

apply_gradients(grads_and_vars, name=None, **kwargs)[source]

Updates weights using gradients.

Parameters:
  • grads_and_vars – dict containing variables and corresponding gradients.

  • name – name to set when applying gradients.

  • **kwargs – keyword arguments.

Returns:

Updated weights.

classmethod from_config(config, custom_objects=None)[source]

Gets config of original optimizer and deserializes it.

get_config()[source]

Returns the configuration as dict.

property gradients[source]

The accumulated gradients on the current replica.

Returns:

Current gradients in optimizer.

property iterations[source]

Returns current iteration value of optimizer.

Returns:

iterations of optimizer.

property learning_rate[source]

Returns the learning rate of the optimizer.

Returns:

learning rate of optimizer.

property optimizer[source]

The optimizer that this AccumOptimizer is wrapping.

reset()[source]

Resets the accumulated gradients on the current replica.

property step[source]

The number of training steps this Optimizer has run. Initializes step variable if None.

Returns:

Current number of optimizer steps.

Custom Layers

AccumBatchNormalization

class gradient_accumulator.AccumBatchNormalization(*args, **kwargs)[source]

Custom Batch Normaliztion layer with gradient accumulation support.

build(input_shape)[source]

Builds layer and variables.

Parameters:

input_shape – input feature map size.

call(inputs, training=None, mask=None)[source]

Performs the batch normalization step.

Parameters:
  • inputs – input feature map to apply batch normalization across.

  • training – whether layer should be in training mode or not.

  • mask – whether to calculate statistics within masked region of feature map.

Returns:

Normalized feature map.

get_config()[source]

Returns configurations as dict.

Returns:

Configuration file.

get_moving_average(statistic, new_value)[source]

Returns the moving average given a statistic and current estimate.

Parameters:
  • statistic – summary statistic e.g. average across for single feature over multiple samples

  • new_value – statistic of single feature for single forward step.

Returns:

Updated statistic.

reset_accum()[source]

Resets accumulator slots.

property trainable[source]

Returns whether layer is trainable.

Returns:

trainable boolean state.

update_variables(mean, var)[source]

Updates the batch normalization variables.

Parameters:
  • mean – average for single feature

  • var – variance for single feature

General Utilities

Adaptive Gradient Clipping

gradient_accumulator.compute_norm(x, axis, keepdims)[source]

Computes the euclidean norm of a tensor \(x\).

Parameters:
  • x – input tensor.

  • axis – which axis to compute norm across.

  • keepdims – whether to keep dimension after applying along axis.

Returns:

Euclidean norm.

gradient_accumulator.unitwise_norm(x)[source]

Wrapper class which dynamically sets axis and keepdims given an input x for calculating euclidean norm.

Parameters:

x – input tensor.

Returns:

Euclidean norm.

gradient_accumulator.adaptive_clip_grad(parameters, gradients, clip_factor: float = 0.01, eps: float = 0.001)[source]

Performs adaptive gradient clipping on a given set of parameters and gradients.

Parameters:
  • parameters – Which parameters to apply method on.

  • gradients – Which gradients to apply clipping on.

  • clip_factor – Sets upper limit for gradient clipping.

  • eps – Epsilon - small number in \(max()\) to avoid zero norm and preserve numerical stability.

Returns:

Updated gradients after gradient clipping.