Keras Text Preprocessing - Saving Tokenizer object to file for scoring
The most common way is to use either pickle
or joblib
. Here you have an example on how to use pickle
in order to save Tokenizer
:
import pickle
# saving
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
# loading
with open('tokenizer.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
Tokenizer class has a function to save date into JSON format:
tokenizer_json = tokenizer.to_json()
with io.open('tokenizer.json', 'w', encoding='utf-8') as f:
f.write(json.dumps(tokenizer_json, ensure_ascii=False))
The data can be loaded using tokenizer_from_json
function from keras_preprocessing.text
:
with open('tokenizer.json') as f:
data = json.load(f)
tokenizer = tokenizer_from_json(data)