How do I create a sklearn.datasets.base.Bunch object in scikit-learn from my own data?
You can do it like this:
import numpy as np
import sklearn.datasets
examples = []
examples.append('some text')
examples.append('another example text')
examples.append('example 3')
target = np.zeros((3,), dtype=np.int64)
target[0] = 0
target[1] = 1
target[2] = 0
dataset = sklearn.datasets.base.Bunch(data=examples, target=target)
You don't have to create Bunch objects. They are just useful for loading the internal sample datasets of scikit-learn.
You can directly feed a list of Python strings to your vectorizer object.