split data to train and test python code example

Example 1: sklearn split train test

import numpy as np
from sklearn.model_selection import train_test_split

X, y = np.arange(10).reshape((5, 2)), range(5)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

X_train
# array([[4, 5],
#        [0, 1],
#        [6, 7]])

y_train
# [2, 0, 3]

X_test
# array([[2, 3],
#        [8, 9]])

y_test
# [1, 4]

Example 2: train test split python

from sklearn.model_selection import train_test_split
				
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Example 3: split data train, test by id python

train_inds, test_inds = next(GroupShuffleSplit(test_size=.20, n_splits=2, random_state = 7).split(df, groups=df['Group_Id']))

train = df.iloc[train_inds]
test = df.iloc[test_inds]

Example 4: test_size

This parameter decides the size of the data that has to be split as the test dataset. This is given as a fraction. For example, if you pass 0.5 as the value, the dataset will be split 50% as the test dataset. If you’re specifying this parameter, you can ignore the next parameter.

Example 5: train-test split code in pandas

df_permutated = df.sample(frac=1)

train_size = 0.8
train_end = int(len(df_permutated)*train_size)

df_train = df_permutated[:train_end]
df_test = df_permutated[train_end:]