Why won't Perceptron Learning Algorithm converge?

Perceptrons by Minsky and Papert (in)famously demonstrated in 1969 that the perceptron learning algorithm is not guaranteed to converge for datasets that are not linearly separable.

If you're sure that your dataset is linearly separable, you might try adding a bias to each of your data vectors, as described by the question: Perceptron learning algorithm not converging to 0 -- adding a bias can help model decision boundaries that do not pass through the origin.

Alternatively, if you'd like to use a variant of the perceptron learning algorithm that is guaranteed to converge to a margin of specified width, even for datasets that are not linearly separable, have a look at the Averaged Perceptron -- PDF. The averaged perceptron is an approximation to the voted perceptron, which was introduced (as far as I know) in a nice paper by Freund and Schapire, "Large Margin Classification Using the Perceptron Algorithm" -- PDF.

Using an averaged perceptron, you make a copy of the parameter vector after each presentation of a training example during training. The final classifier uses the mean of all parameter vectors.