How can I split a Dataset from a .csv file for Training and Testing?
Better practice and maybe more random is to use df.sample
:
from numpy.random import RandomState
import pandas as pd
df = pd.read_csv('C:/Dataset.csv')
rng = RandomState()
train = df.sample(frac=0.7, random_state=rng)
test = df.loc[~df.index.isin(train.index)]
You should use the read_csv ()
function from the pandas module. It reads all your data straight into the dataframe which you can use further to break your data into train and test. Equally, you can use the train_test_split()
function from the scikit-learn module.
You can use pandas
:
import pandas as pd
import numpy as np
df = pd.read_csv('C:/Dataset.csv')
df['split'] = np.random.randn(df.shape[0], 1)
msk = np.random.rand(len(df)) <= 0.7
train = df[msk]
test = df[~msk]