Cycle counter on ARM Cortex M4 (or M3)?
Take a look at the DWT_CYCCNT register defined here. Note that this register is implementation-dependent. Who is the chip vendor? I know the STM32 implementation offers this set of registers.
This post provides instructions for using the DWT Cycle Counter Register for timing. (See the post form 11 December 2009 - 06:29 PM)
This Stack overflow post is an example on how to DWT_CYCCNT as well.
If your part incorporates the CoreSight Embedded Trace Macrocell and you have appropriate trace capable debugger hardware and software then you can profile the code directly. Trace capable debug hardware is of course more expensive, and your board needs to be designed to make the trace port pins available on the debug header. Since these pins are often multiplexed to other functions, that may not always be possible or practical.
Otherwise if your tool-chain includes a cycle-accurate simulator (such as that available in Keil uVision), you can use that to analyse the code timing. The simulator provides debug, trace and profiling features that are generally more powerful and flexible that those available on chip, so even if you do have trace hardware, the simulator may still be the easier solution.
This is just easier:
[code]
#define start_timer() *((volatile uint32_t*)0xE0001000) = 0x40000001 // Enable CYCCNT register
#define stop_timer() *((volatile uint32_t*)0xE0001000) = 0x40000000 // Disable CYCCNT register
#define get_timer() *((volatile uint32_t*)0xE0001004) // Get value from CYCCNT register
/***********
* How to use:
* uint32_t it1, it2; // start and stop flag
start_timer(); // start the timer.
it1 = get_timer(); // store current cycle-count in a local
// do something
it2 = get_timer() - it1; // Derive the cycle-count difference
stop_timer(); // If timer is not needed any more, stop
print_int(it2); // Display the difference
****/
[/code]
Works on Cortex M4: STM32F407VGT on a CJMCU Board and just counts the required cycles.