Can I delete events.out.tfevents.XXXXXXXXXX.computer_name files from training folder
tfevents files are not essential for training and can be safely removed.
In Tensorflow tfevents are created by FileWriters and are generally used to store summary output. Here are some common examples of how tf.summaries are used:
- storing a description of the tensorflow graph before training starts
- writing a value of the loss function for every training step
- storing a histogram of activations or weights for a layer once per epoch
- storing an example of output image of the network once on every validation
- storing average precision (or any other metric) for the whole validation set
This information is not essential for training and can therefore be deleted. Yet, it might come in handy for debugging or studying behavior of the model. TensorBoard is the most common tool to read and visualize data stored in tfevent files. Anyone can read and interpret TFRecord files manually using protobuf protocol and it's implementation for Python, C++ and other.
tfevents are written in TFRecord format. TFRecord is a simple format for storing a sequence of binary records. Tensorflow always appends new events/summaries to the end of the file if file already exists. This explains file grows.
Due to details of implementation of optimization routine provided with tensorflow/models/reserach/object_detection training and evaluation event files have different behaviour. Namely, evaluation event file is created using a FileWriter directly, which will reuse latest existing event file in the log_dir whenever one exists. Implementation also has large number of summaries that are collected regularly, which increases event file during training.
For the training routine, on the other hand, developers explicitly specify an empty list of summaries when training is done on TPU. Which means that event file is created once and is never used afterwards. This behaviour can be different when training is performed on non-TPU hardware or summarize_gradients option is enabled for training.