How to create a Image Dataset just like MNIST dataset?
You can either write a function that loads all your images and stack them into a numpy array if all fits in RAM or use Keras ImageDataGenerator (https://keras.io/preprocessing/image/) which includes a function flow_from_directory
. You can find an example here https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d.
You should write your own function to load all the images or do it like:
imagePaths = sorted(list(paths.list_images(args["testset"])))
# loop over the input images
for imagePath in imagePaths:
# load the image, pre-process it, and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
image = img_to_array(image)
data.append(image)
# extract the class label from the image path and update the
# labels list
data = np.array(data, dtype="float") / 255.0
I might be late, but I am posting my answer to help others who visit this question in search of an answer. In this answer, I will be explaining the dataset type, how to generate such datasets, and how to load those files.
What is the file format
These datasets are datasets already vectorized
and in Numpy format
. Check here (Keras Datasets Documentation) for the reference. These datasets are stored in .npz
file format. Check here(MNIST digits classification dataset). Here is a code block copied from the documentation for reference.
tf.keras.datasets.mnist.load_data(path="mnist.npz")
Once you generate a .npz file you can use it the way you use the mnist default datasets.
How to generate a .npz file
Here is how to generate such a dataset from all the images in a folder
#generate and save file
from PIL import Image
import os
import numpy as np
path_to_files = "./images/"
vectorized_images = []
for _, file in enumerate(os.listdir(path_to_files)):
image = Image.open(path_to_files + file)
image_array = np.array(image)
vectorized_images.append(image_array)
# save as DataX or any other name. But the same element name is to be used while loading it back.
np.savez("./mnistlikedataset.npz",DataX=vectorized_images)
if you want to use save more than one element you can do something like this with appropriate other changes to code.
np.savez("./mnistlikedataset.npz",DataX=vectorized_images_x,DataY=vectorized_images_Y)
How to load the data file
#load and use file
import numpy as np
path = "./mnistlikedataset.npz"
with np.load(path) as data:
#load DataX as train_data
train_data = data['DataX']
print(train_data)
Similar to saving multiple elements, if you want to load multiple elements from a file you can do something like this with other appropriate changes
with np.load(path) as data:
train_data = data['DataX']
print(train_data)
test_data = data['DataY']
print(test_data)