random state in train_test_split code example

Example 1: sklearn split train test

import numpy as np
from sklearn.model_selection import train_test_split

X, y = np.arange(10).reshape((5, 2)), range(5)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

X_train
# array([[4, 5],
#        [0, 1],
#        [6, 7]])

y_train
# [2, 0, 3]

X_test
# array([[2, 3],
#        [8, 9]])

y_test
# [1, 4]

Example 2: random_state

Here you pass an integer, which will act as the seed for the random number generator during the split. Or, you can also pass an instance of the RandomState class, which will become the number generator. If you don’t pass anything, the RandomState instance used by np.random will be used instead.

Example 3: sklearn train test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Example 4: test_size

This parameter decides the size of the data that has to be split as the test dataset. This is given as a fraction. For example, if you pass 0.5 as the value, the dataset will be split 50% as the test dataset. If you’re specifying this parameter, you can ignore the next parameter.