How To Add A Layer To A Checkpoint Tensorlfow

The phrase "Saving a TensorFlow model" typically means 1 of two things:

Checkpoints, OR
SavedModel.

Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source lawmaking that will use the saved parameter values is available.

The SavedModel format on the other hand includes a serialized clarification of the computation defined by the model in add-on to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Low-cal, TensorFlow.js, or programs in other programming languages (the C, C++, Coffee, Go, Rust, C# etc. TensorFlow APIs).

This guide covers APIs for writing and reading checkpoints.

Setup

          import tensorflow as tf

          class Net(tf.keras.Model):   """A simple linear model."""    def __init__(cocky):     super(Net, self).__init__()     cocky.l1 = tf.keras.layers.Dense(5)    def call(self, x):     return self.l1(10)

          net = Net()

Saving from `tf.keras` training APIs

See the tf.keras guide on saving and restoring.

tf.keras.Model.save_weights saves a TensorFlow checkpoint.

          net.save_weights('easy_checkpoint')

Writing checkpoints

The persistent land of a TensorFlow model is stored in tf.Variable objects. These tin can be constructed directly, just are often created through loftier-level APIs like tf.keras.layers or tf.keras.Model.

The easiest way to manage variables is by attaching them to Python objects, then referencing those objects.

Subclasses of tf.train.Checkpoint, tf.keras.layers.Layer, and tf.keras.Model automatically track variables assigned to their attributes. The following example constructs a simple linear model, then writes checkpoints which contain values for all of the model'due south variables.

You tin easily salvage a model-checkpoint with Model.save_weights.

Manual checkpointing

Setup

To assistance demonstrate all the features of tf.train.Checkpoint, define a toy dataset and optimization step:

          def toy_dataset():   inputs = tf.range(10.)[:, None]   labels = inputs * 5. + tf.range(v.)[None, :]   return tf.data.Dataset.from_tensor_slices(     dict(ten=inputs, y=labels)).repeat().batch(two)

          def train_step(net, instance, optimizer):   """Trains `net` on `example` using `optimizer`."""   with tf.GradientTape() as record:     output = net(example['ten'])     loss = tf.reduce_mean(tf.abs(output - instance['y']))   variables = net.trainable_variables   gradients = tape.gradient(loss, variables)   optimizer.apply_gradients(zip(gradients, variables))   return loss

Create the checkpoint objects

Utilise a tf.train.Checkpoint object to manually create a checkpoint, where the objects you want to checkpoint are set equally attributes on the object.

A tf.railroad train.CheckpointManager can also be helpful for managing multiple checkpoints.

          opt = tf.keras.optimizers.Adam(0.1) dataset = toy_dataset() iterator = iter(dataset) ckpt = tf.train.Checkpoint(step=tf.Variable(1), optimizer=opt, net=internet, iterator=iterator) manager = tf.train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=iii)

Train and checkpoint the model

The following grooming loop creates an example of the model and of an optimizer, then gathers them into a tf.train.Checkpoint object. It calls the training footstep in a loop on each batch of information, and periodically writes checkpoints to deejay.

          def train_and_checkpoint(net, manager):   ckpt.restore(director.latest_checkpoint)   if manager.latest_checkpoint:     print("Restored from {}".format(manager.latest_checkpoint))   else:     print("Initializing from scratch.")    for _ in range(50):     case = next(iterator)     loss = train_step(cyberspace, instance, opt)     ckpt.footstep.assign_add(1)     if int(ckpt.stride) % 10 == 0:       save_path = manager.relieve()       print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))       print("loss {:ane.2f}".format(loss.numpy()))

          train_and_checkpoint(net, director)

Initializing from scratch. Saved checkpoint for step 10: ./tf_ckpts/ckpt-1 loss 32.44 Saved checkpoint for step 20: ./tf_ckpts/ckpt-2 loss 25.86 Saved checkpoint for stride 30: ./tf_ckpts/ckpt-3 loss 19.30 Saved checkpoint for stride 40: ./tf_ckpts/ckpt-four loss 12.79 Saved checkpoint for step l: ./tf_ckpts/ckpt-v loss 6.51

Restore and continue training

Afterward the showtime grooming bicycle you can laissez passer a new model and manager, but option upwards training exactly where you left off:

          opt = tf.keras.optimizers.Adam(0.1) cyberspace = Net() dataset = toy_dataset() iterator = iter(dataset) ckpt = tf.train.Checkpoint(step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator) manager = tf.railroad train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=three)  train_and_checkpoint(cyberspace, manager)

Restored from ./tf_ckpts/ckpt-5 Saved checkpoint for pace 60: ./tf_ckpts/ckpt-6 loss 1.58 Saved checkpoint for step lxx: ./tf_ckpts/ckpt-7 loss 0.88 Saved checkpoint for step 80: ./tf_ckpts/ckpt-eight loss 0.54 Saved checkpoint for pace 90: ./tf_ckpts/ckpt-nine loss 0.43 Saved checkpoint for step 100: ./tf_ckpts/ckpt-ten loss 0.23

The tf.train.CheckpointManager object deletes former checkpoints. Above it's configured to keep only the three almost recent checkpoints.

          print(manager.checkpoints)  # List the three remaining checkpoints

['./tf_ckpts/ckpt-8', './tf_ckpts/ckpt-nine', './tf_ckpts/ckpt-10']

These paths, e.yard. './tf_ckpts/ckpt-10', are not files on disk. Instead they are prefixes for an index file and one or more information files which comprise the variable values. These prefixes are grouped together in a single checkpoint file ('./tf_ckpts/checkpoint') where the CheckpointManager saves its country.

          ls ./tf_ckpts

checkpoint           ckpt-8.data-00000-of-00001  ckpt-nine.alphabetize ckpt-10.information-00000-of-00001  ckpt-8.index ckpt-10.index            ckpt-9.data-00000-of-00001

Loading mechanics

TensorFlow matches variables to checkpointed values by traversing a directed graph with named edges, starting from the object being loaded. Border names typically come from attribute names in objects, for instance the "l1" in self.l1 = tf.keras.layers.Dense(5). tf.railroad train.Checkpoint uses its keyword argument names, as in the "footstep" in tf.train.Checkpoint(pace=...).

The dependency graph from the example higher up looks like this:

Visualization of the dependency graph for the example training loop

The optimizer is in cherry-red, regular variables are in bluish, and the optimizer slot variables are in orange. The other nodes—for case, representing the tf.train.Checkpoint—are in blackness.

Slot variables are function of the optimizer's state, but are created for a specific variable. For example, the 'm' edges in a higher place represent to momentum, which the Adam optimizer tracks for each variable. Slot variables are merely saved in a checkpoint if the variable and the optimizer would both be saved, thus the dashed edges.

Calling restore on a tf.railroad train.Checkpoint object queues the requested restorations, restoring variable values as before long equally there's a matching path from the Checkpoint object. For case, y'all can load just the bias from the model yous defined higher up by reconstructing i path to information technology through the network and the layer.

          to_restore = tf.Variable(tf.zeros([5])) print(to_restore.numpy())  # All zeros fake_layer = tf.railroad train.Checkpoint(bias=to_restore) fake_net = tf.train.Checkpoint(l1=fake_layer) new_root = tf.train.Checkpoint(net=fake_net) status = new_root.restore(tf.train.latest_checkpoint('./tf_ckpts/')) print(to_restore.numpy())  # This gets the restored value.

[0. 0. 0. 0. 0.] [3.2460225 iii.2595956 iii.360168  4.5620303 4.827786 ]

The dependency graph for these new objects is a much smaller subgraph of the larger checkpoint you wrote in a higher place. It includes but the bias and a save counter that tf.train.Checkpoint uses to number checkpoints.

Visualization of a subgraph for the bias variable

restore returns a condition object, which has optional assertions. All of the objects created in the new Checkpoint have been restored, so condition.assert_existing_objects_matched passes.

          condition.assert_existing_objects_matched()

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7ff53842c150>

In that location are many objects in the checkpoint which haven't matched, including the layer'due south kernel and the optimizer's variables. status.assert_consumed only passes if the checkpoint and the program lucifer exactly, and would throw an exception hither.

Deferred restorations

Layer objects in TensorFlow may defer the creation of variables to their offset telephone call, when input shapes are bachelor. For example, the shape of a Dense layer's kernel depends on both the layer's input and output shapes, and so the output shape required as a constructor statement is not enough data to create the variable on its own. Since calling a Layer likewise reads the variable's value, a restore must happen betwixt the variable's creation and its first use.

To support this idiom, tf.train.Checkpoint defers restores which don't nonetheless have a matching variable.

          deferred_restore = tf.Variable(tf.zeros([1, v])) impress(deferred_restore.numpy())  # Not restored; however zeros fake_layer.kernel = deferred_restore print(deferred_restore.numpy())  # Restored

[[0. 0. 0. 0. 0.]] [[iv.4692097 iv.6525683 four.7541327 4.786906  four.8251786]]

Manually inspecting checkpoints

tf.railroad train.load_checkpoint returns a CheckpointReader that gives lower level admission to the checkpoint contents. Information technology contains mappings from each variable'due south fundamental, to the shape and dtype for each variable in the checkpoint. A variable's key is its object path, similar in the graphs displayed above.

          reader = tf.train.load_checkpoint('./tf_ckpts/') shape_from_key = reader.get_variable_to_shape_map() dtype_from_key = reader.get_variable_to_dtype_map()  sorted(shape_from_key.keys())

['_CHECKPOINTABLE_OBJECT_GRAPH',  'iterator/.ATTRIBUTES/ITERATOR_STATE',  'net/l1/bias/.ATTRIBUTES/VARIABLE_VALUE',  'cyberspace/l1/bias/.OPTIMIZER_SLOT/optimizer/g/.ATTRIBUTES/VARIABLE_VALUE',  'cyberspace/l1/bias/.OPTIMIZER_SLOT/optimizer/v/.ATTRIBUTES/VARIABLE_VALUE',  'net/l1/kernel/.ATTRIBUTES/VARIABLE_VALUE',  'net/l1/kernel/.OPTIMIZER_SLOT/optimizer/m/.ATTRIBUTES/VARIABLE_VALUE',  'net/l1/kernel/.OPTIMIZER_SLOT/optimizer/5/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/beta_1/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/beta_2/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/disuse/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/learning_rate/.ATTRIBUTES/VARIABLE_VALUE',  'save_counter/.ATTRIBUTES/VARIABLE_VALUE',  'step/.ATTRIBUTES/VARIABLE_VALUE']

So if y'all're interested in the value of cyberspace.l1.kernel you lot can get the value with the post-obit code:

          key = 'net/l1/kernel/.ATTRIBUTES/VARIABLE_VALUE'  print("Shape:", shape_from_key[key]) impress("Dtype:", dtype_from_key[central].name)

Shape: [1, 5] Dtype: float32

It also provides a get_tensor method assuasive you to inspect the value of a variable:

          reader.get_tensor(key)

array([[4.4692097, 4.6525683, 4.7541327, four.786906 , iv.8251786]],       dtype=float32)

Object tracking

Checkpoints save and restore the values of tf.Variable objects by "tracking" any variable or trackable object ready in i of its attributes. When executing a relieve, variables are gathered recursively from all of the reachable tracked objects.

Equally with direct attribute assignments like self.l1 = tf.keras.layers.Dense(5), assigning lists and dictionaries to attributes will rail their contents.

          save = tf.train.Checkpoint() relieve.listed = [tf.Variable(1.)] salve.listed.suspend(tf.Variable(2.)) save.mapped = {'one': save.listed[0]} relieve.mapped['2'] = save.listed[ane] save_path = save.salve('./tf_list_example')  restore = tf.train.Checkpoint() v2 = tf.Variable(0.) affirm 0. == v2.numpy()  # Not restored even so restore.mapped = {'two': v2} restore.restore(save_path) assert 2. == v2.numpy()

You lot may notice wrapper objects for lists and dictionaries. These wrappers are checkpointable versions of the underlying data-structures. Just like the attribute based loading, these wrappers restore a variable'south value as soon as it's added to the container.

          restore.listed = [] print(restore.listed)  # ListWrapper([]) v1 = tf.Variable(0.) restore.listed.suspend(v1)  # Restores v1, from restore() in the previous cell assert 1. == v1.numpy()

ListWrapper([])

Trackable objects include tf.train.Checkpoint, tf.Module and its subclasses (due east.g. keras.layers.Layer and keras.Model), and recognized Python containers:

dict (and collections.OrderedDict)
list
tuple (and collections.namedtuple, typing.NamedTuple)

Other container types are not supported, including:

collections.defaultdict
gear up

All other Python objects are ignored, including:

int
cord
bladder

Summary

TensorFlow objects provide an easy automatic mechanism for saving and restoring the values of variables they use.