State of Map-Reduce on Appengine?

Which is the library to call from my project.

They serve different purposes, and you've provided no details about what you're attempting to do.

The most fundamental layer here is the task queue, which lets you schedule background work that can be highly parallelized. This is fan-out. Let's say you had a list of 1000 websites, and you wanted to check the response time for each one and send an email for any site that takes more than 5 seconds to load. By running these as concurrent tasks, you can complete the work much faster than if you checked all 1000 sites in sequence.

Now let's say you don't want to send an email for every slow site, you just want to check all 1000 sites and send one summary email that says how many took more than 5 seconds and how many took fewer. This is fan-in. It's trickier with the task queue, because you need to know when all tasks have completed, and you need to collect and summarize their results.

Enter the Pipeline API. The Pipeline API abstracts the task queue to make fan-in easier. You write what looks like synchronous, procedural code, but uses Python futures and is executed (as much as possible) in parallel. The Pipeline API keeps track of task dependencies and collects results to facilitate building distributed workflows.

The MapReduce API wraps the Pipeline API to facilitate a specific type of distributed workflow: mapping the results of a piece of work into a set of key/value pairs, and reducing multiple sets of results to one by combining their values.

So they provide increasing layers of abstraction and convenience around a common system of distributed task execution. The right solution depends on what you're trying to accomplish.

State of Map-Reduce on Appengine?

Tags:

Google App Engine

Mapreduce

Related

Recent Posts