Support Vector Machine or Artificial Neural Network for text processing?

This question is very old. Lot of developments were happened in NLP area in last 7 years.

Convolutional_neural_network and Recurrent_neural_network evolved during this time.

Word Embeddings: Words appearing within similar context possess similar meaning. Word embeddings are pre-trained on a task where the objective is to predict a word based on its context.

Word embeddings

CNN for NLP:

  1. Sentences are first tokenized into words, which are further transformed into a word embedding matrix (i.e., input embedding layer) of d dimension.

  2. Convolutional filters are applied on this input embedding layer to produce a feature map.

  3. A max-pooling operation on each filter obtain a fixed length output and reduce the dimensionality of the output.

CNN for NLP

Since CNN had a short-coming of not preserving long-distance contextual information, RNNs have been introduced.

RNNs are specialized neural-based approaches that are effective at processing sequential information.

RNN memorizes the result of previous computations and use it in current computation.

RNN for NLP

There are few variations in RNN - Long Short Term Memory Unit (LSTM) and Gated recurrent units (GRUs)

Have a look at below resources:

deep-learning-for-nlp

Recent trends in deep learning paper


You might want to also take a look at maxent classifiers (/log linear models).

They're really popular for NLP problems. Modern implementations, which use quasi-newton methods for optimization rather than the slower iterative scaling algorithms, train more quickly than SVMs. They also seem to be less sensitive to the exact value of the regularization hyperparameter. You should probably only prefer SVMs over maxent, if you'd like to use a kernel to get feature conjunctions for free.

As for SVMs vs. neural networks, using SVMs would probably be better than using ANNs. Like maxent models, training SVMs is a convex optimization problem. This means, given a data set and a particular classifier configuration, SVMs will consistently find the same solution. When training multilayer neural networks, the system can converge to various local minima. So, you'll get better or worse solutions depending on what weights you use to initialize the model. With ANNs, you'll need to perform multiple training runs in order to evaluate how good or bad a given model configuration is.


I think you'll get a competitive results from both of the algorithms, so you should aggregate the results... think about ensemble learning.

Update:
I don't know if this is specific enough: use Bayes Optimal Classifier to combine the prediction from each algorithm. You have to train both of your algorithms, then you have to train the Bayes Optimal Classifier to use your algorithms and make optimal predictions based on the input of the algorithms.

Separate your training data in 3:

  • 1st data set will be used to train the (Artificial) Neural Network and the Support Vector Machines.
  • 2nd data set will be used to train the Bayes Optimal Classifier by taking the raw predictions from the ANN and SVM.
  • 3rd data set will be your qualification data set where you will test your trained Bayes Optimal Classifier.

Update 2.0:
Another way to create an ensemble of the algorithms is to use 10-fold (or more generally, k-fold) cross-validation:

  • Break data into 10 sets of size n/10.
  • Train on 9 datasets and test on 1.
  • Repeat 10 times and take a mean accuracy.

Remember that you can generally combine many the classifiers and validation methods in order to produce better results. It's just a matter of finding what works best for your domain.