split data to train and test python code example
Example 1: sklearn split train test
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
X_train
y_train
X_test
y_test
Example 2: train test split python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
Example 3: split data train, test by id python
train_inds, test_inds = next(GroupShuffleSplit(test_size=.20, n_splits=2, random_state = 7).split(df, groups=df['Group_Id']))
train = df.iloc[train_inds]
test = df.iloc[test_inds]
Example 4: test_size
This parameter decides the size of the data that has to be split as the test dataset. This is given as a fraction. For example, if you pass 0.5 as the value, the dataset will be split 50% as the test dataset. If you’re specifying this parameter, you can ignore the next parameter.
Example 5: train-test split code in pandas
df_permutated = df.sample(frac=1)
train_size = 0.8
train_end = int(len(df_permutated)*train_size)
df_train = df_permutated[:train_end]
df_test = df_permutated[train_end:]