Hugging Face

Note that Hugging Face provides a variety of different pretrained models. However, it was observed that when loading these models into TensorFlow, the computational graph may not be set up correctly, such that the model.input and model.output exist.

To fix this, we basically wrap the model into a new tf.keras.Model, but define the inputs and outputs ourselves:

from gradient_accumulator import GradientAccumulateModel
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from transformers import TFx

#load your model checkpoint
HF_model = TFx.from_pretrained(checkpoint)

# define model inputs and outputs -> for different models, different inputs/outputs need to be defined
input_ids = tf.keras.Input(shape=(None,), dtype='int32', name="input_ids")
attention_mask = tf.keras.Input(shape=(None,), dtype='int32', name="attention_mask")
model_input={'input_ids': input_ids, 'attention_mask': attention_mask}

#create a new Model which has model.input and model.output properties
new_model = Model(inputs=model_input, outputs=HF_model(model_input))

#create the GA model
model = GradientAccumulateModel(accum_steps=4, inputs=new_model.input, outputs=new_model.output)

For more details, see this jupyter notebook.