Writing DSP algorithms directly in C or assembly?
If the compiler writers put some effort into optimizing it for that target, it will at least make some use of the special DSP instructions / architecture. But for ultimate performance it will never be as good as hand-tuned assembly. It might be plenty good enough, though - depends on your application.
Other alternatives include:
- Write the majority of your program in C, and just the most critical numerical portion in assembly.
- Write the program in C and use libraries supplied by the manufacturer or third parties - if you're doing common DSP tasks such as FFTs, FIR / IIR filters etc somebody has probably already written the hand-tuned machine code to do it, so you can use that (you may have to pay for it) and link it to your application.
Premature optimization is the root of all evil. - Donald Knuth
When you find that you don't get enough performance from your code, profile your program first, find the bottlenecks, analyze your performance requirements, and only then start doing optimizations. Writing assembly code is last resort.
My question is if I just program in C, wouldn't the compiler(which also comes from the DSP chip company) optimize it for that DSP and use its capabilities?
Yes, C compiler can do a fair amount of optimization. But this depends on the quality of the compiler. Frequently, a human can write faster assembly code than the compiled C code. At great expense of human pain and suffering, that is.
Or do I really need to write DSP routines directly in assembly?
First write in C, then profile, then decide if you need to write in assembly. Hopefully, you would not need the assembly.
It's always better to have your algorithm implemented in a higher-level language (which C is compared to assembly), even if you plan to implement everything in assembly in the end.
chances are, you won't even need assembly. If the code generated by your compiler meets your design goals, your job is done.
if not, you won't be starting your assembly coding from scratch. Let the compiler generate the initial code for you, and use that as a base for your optimized assembly version.
later, when you'll need to test your optimized assembly code, you'll be glad to have the C version. Instead of manually calculating the correct output for your test input data, you can just feed that input data to your unoptimized C implementation, then check that the assembly produces exactly the same output after the optimizations you have made.
If, after a few years a new developer will need to make modifications to your algorithm and all they have at hand is a highly optimized assembly code, there's a high chance they'll have to start from scratch.