PLS-DA algorithm in python

PLS-DA is really a "trick" to use PLS for categorical outcomes instead of the usual continuous vector/matrix. The trick consists of creating a dummy identity matrix of zeros/ones which represents membership to each of the categories. So if you have a binary outcome to be predicted (i.e. male/female , yes/no, etc) your dummy matrix will have TWO columns representing the membership to either category.

For example, consider the outcome gender for four people: 2 males and 2 females. The dummy matrix should be coded as :

import numpy as np
dummy=np.array([[1,1,0,0],[0,0,1,1]]).T

, where each column represents the membership to the two categories (male, female)

Then your model for data in variable Xdata ( shape 4 rows,arbitrary columns ) would be:

myplsda=PLSRegression().fit(X=Xdata,Y=dummy)

The predicted categories can be extracted from comparison of the two indicator variables in mypred:

mypred= myplsda.predict(Xdata)

For each row/case the predicted gender is that with the highest predicted membership.

You can use the Linear Discriminate Analysis package in SKLearn, it will take integers for the y value:

LDA-SKLearn

Here is a short tutorial on how to use the LDA: sklearn LDA tutorial

PLS-DA algorithm in python

Tags:

Python

Scikit Learn

Related

Recent Posts