Java's Mahout equivalent in Python
scikits learn is highly recommended http://scikit-learn.sourceforge.net/
Spark MLlib is recommmended. It is a scalable machine learning lib, can read data from HDFS and of course runs on top of Spark.
You can access it via PySpark (see the Programming Guide's Python examples).