Difference between ARM A and M series processors?
The M series ARM CPU's have a small instruction set, often no floating point unit, no memory management, no cache. They are optimized for low cost rather than high performance. They are generally combined with FLASH, RAM and peripherals into a micro-controller chip. They are mostly used to control hardware, and programmed either bare metal (without libraries) or linked with some libraries that could provide OS-like features. ARM likes to see these CPUs as 8-bit and 16-bit micro-controller killers.
The A series ARM CPU's have a larger instruction set (of which the M instruction set is a small subset), and they often (always?) have a floating point unit, memory management unit, and cache(s). They are optimized for high performance rather than low cost (but still optimized for high performance per unit of power). They are generally sold as micro-processor (often combined with high-end peripherals like ethernet, video, mpeg decoder), intended to be combined with off-chip RAM and FLASH. They often run some OS, often Linux, with a separation between OS space and space for application programs. ARM likes to see these CPUs as THE choice for mobile phones and tablets (competing with the Intel CPUs).
Very short summary: M is for (high-end) micro-controllers, A is for running Linux on battery-powered gadgets.
ARM architecture profiles
The ARM architecture profiles are:
Application profile (Cortex-A)
Application profiles implement a traditional ARM architecture with multiple modes and support a virtual memory system architecture based on an MMU. These profiles support both ARM and Thumb instruction sets.
Real-time profile (Cortex-R)
Real-time profiles implement a traditional ARM architecture with multiple modes and support a protected memory system architecture based on an MPU.
Microcontroller profile (Cortex-M)
Microcontroller profiles implement a programmers' model designed for fast interrupt processing, with hardware stacking of registers and support for writing interrupt handlers in high-level languages. The processor is designed for integration into an FPGA and is ideal for use in very low power applications.
source